Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
222 views
in Technique[技术] by (71.8m points)

utf 8 - PHP: Convert any string to UTF-8 without knowing the original character set, or at least try

I have an application that deals with clients from all over the world, and, naturally, I want everything going into my databases to be UTF-8 encoded.

The main problem for me is that I don't know what encoding the source of any string is going to be - it could be from a text box (using <form accept-charset="utf-8"> is only useful if the user is actually submitted the form), or it could be from an uploaded text file, so I really have no control over the input.

What I need is a function or class that makes sure the stuff going into my database is, as far as is possible, UTF-8 encoded. I've tried iconv(mb_detect_encoding($text), "UTF-8", $text); but that has problems (if the input is 'fiancée' it returns 'fianc'). I've tried a lot of things =/

For file uploads, I like the idea of asking the end user to specify the encoding they use, and show them previews of what the output will look like, but this doesn't help against nasty hackers (in fact, it could make their life a little easier).

I've read the other SO questions on the subject, but they seem to all have subtle differences like "I need to parse RSS feeds" or "I scrape data from websites" (or, indeed, "You can't").

But there must be something that at least has a good try!

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

What you're asking for is extremely hard. If possible, getting the user to specify the encoding is the best. Preventing an attack shouldn't be much easier or harder that way.

However, you could try doing this:

iconv(mb_detect_encoding($text, mb_detect_order(), true), "UTF-8", $text);

Setting it to strict might help you get a better result.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...