Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
564 views
in Technique[技术] by (71.8m points)

php - Replace unicode character

I am trying to replace a certain character in a string with another. They are quite obscure latin characters. I want to replace character (hex) 259 with 4d9, so I tried this:

str_replace("x02x59","x04xd9",$string);

This didn't work. How do I go about this?

**EDIT: Additional information.

Thanks bobince, that has done the trick. Although, I want to replace the uppercase schwa also and it is not working for some reason. I calculated U+018F (?) as UTF-8 0xC68F and this is to be replaced with U+04D8 (0xD398):

$string = str_replace("xC9x99", "xD3x99", $_POST['string_with_schwa']); //lc 259->4d9
$string = str_replace( "xC68F", "xD3x98" , $string); //uc 18f->4d8

I am copying the '?' into a textbox and posting it. The first str_replace works fine on the lowercase, but does not detect the uppercase in the second str_replace, strange. It remains as U+018F. Guess I could run the string through strtolower but this should work though.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

U+0259 Latin Small Letter Schwa is only encoded as the byte sequence 0x02,0x59 in the UTF-16BE encoding. It is very unlikely you will be working with byte strings in the UTF-16BE encoding as it's not an ASCII-compatible encoding and almost no-one uses it.

The encoding you want to be working with (the only ASCII-superset encoding to support both Latin Schwa and Cyrillic Schwa, as it supports all Unicode characters) is UTF-8. Ensure your input is in UTF-8 format (if it is coming from form data, serve the page containing the form as UTF-8). Then, in UTF-8, the character U+0259 is represented using the byte sequence 0xC9,0x99.

str_replace("xC9x99", "xD3x99", $string);

If you make sure to save your .php file as UTF-8-no-BOM in the text editor, you can skip the escaping and just directly say:

str_replace('?', '?', $string);

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...