MySQL's utf8
charset is not actually UTF-8, it's a subset of UTF-8 only supporting the basic plane (characters up to U+FFFF). Most emoji use code points higher than U+FFFF. MySQL's utf8mb4
is actual UTF-8 which can encode all those code points. Outside of MySQL there's no such thing as "utf8mb4", there's just UTF-8. So:
Does POST allow utf8mb4, or should I convert the data in the client to plain utf8?
Again, no such thing as "utf8mb4". HTTP POST requests support any raw bytes, if your client sends UTF-8 encoded data you're fine.
If my DB has collation and character set utf8mb4, does it mean I should be able to store 'raw' emojis?
Yes.
Should I try to work in the DB with utf8mb4 or is it safer/better/more supported to work in utf8 and encode symbols?
God no, use raw UTF-8 (utf8mb4
) for all that is holy.
When I retrieve this symbols in PHP I first need to execute SET CHARACTER SET utf8
Well, there's your problem; channeling your data through MySQL's utf8
charset will discard any characters above U+FFFF. Use utf8mb4
all the way through MySQL.
if I get them in utf8mb4 the json_decode function doesn't work
You'll have to specify what that means exactly. PHP's JSON functions should be able to handle any Unicode code point just fine, as long as it's valid UTF-8:
echo json_encode('??');
"ud83dude00"
echo json_decode('"ud83dude00"');
??
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…