Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
801 views
in Technique[技术] by (71.8m points)

php - How can I preg_replace special character like 'Prêt-à-porter'?

There are heaps of Qs about this on this forum and on the web in general. But I don't just get it.

Here is my code:

function updateGuideKeywords($dal)
{
    $pattern = "/[^a-zA-Z-êàé]/";
    $keywords = preg_replace($pattern, '', $_POST['keywords']);
    echo json_encode($keywords);
}

Now, the input is Prêt-à-porter, and the output is "Pru00eat-u00e0-porter".

Why do I get the 'u00e' ?

And how can I alter my pattern to include the characters ê, à and é ?

EDIT
humm... since it looks like a unicode / character issue, I might go for the solution I found on this page.

Here they suggest doing something like this:

$chain="prêt-à-porter";

$pattern = array("'é'", "'è'", "'?'", "'ê'", "'é'", "'è'", "'?'", "'ê'", "'á'", "'à'", "'?'", "'a'", "'?'", "'á'", "'à'", "'?'", "'?'", "'?'", "'ó'", "'ò'", "'?'", "'?'", "'ó'", "'ò'", "'?'", "'?'", "'í'", "'ì'", "'?'", "'?'", "'í'", "'ì'", "'?'", "'?'", "'ú'", "'ù'", "'ü'", "'?'", "'ú'", "'ù'", "'ü'", "'?'", "'y'", "'?'", "'Y'", "'?'", "'?'", "'?'", "'?'", "'?'", "'?'", "'?'");

$replace = array('e', 'e', 'e', 'e', 'E', 'E', 'E', 'E', 'a', 'a', 'a', 'a', 'a', 'A', 'A', 'A', 'A', 'A', 'o', 'o', 'o', 'o', 'O', 'O', 'O', 'O', 'i', 'i', 'i', 'I', 'I', 'I', 'I', 'I', 'u', 'u', 'u', 'u', 'U', 'U', 'U', 'U', 'y', 'y', 'Y', 'o', 'O', 'a', 'A', 'A', 'c', 'C'); 

$chain = preg_replace($pattern, $replace, $chain);

EDIT 2
This is my solution so far:

function updateGuideKeywords()
{
    //First we replace characters with accents
    $pattern = array("'é'", "'è'", "'?'", "'ê'", "'é'", "'è'", "'?'", "'ê'", "'á'", "'à'", "'?'", "'a'", "'?'", "'á'", "'à'", "'?'", "'?'", "'?'", "'ó'", "'ò'", "'?'", "'?'", "'ó'", "'ò'", "'?'", "'?'", "'í'", "'ì'", "'?'", "'?'", "'í'", "'ì'", "'?'", "'?'", "'ú'", "'ù'", "'ü'", "'?'", "'ú'", "'ù'", "'ü'", "'?'", "'y'", "'?'", "'Y'", "'?'", "'?'", "'?'", "'?'", "'?'", "'?'", "'?'");
    $replace = array('e', 'e', 'e', 'e', 'E', 'E', 'E', 'E', 'a', 'a', 'a', 'a', 'a', 'A', 'A', 'A', 'A', 'A', 'o', 'o', 'o', 'o', 'O', 'O', 'O', 'O', 'i', 'i', 'i', 'I', 'I', 'I', 'I', 'I', 'u', 'u', 'u', 'u', 'U', 'U', 'U', 'U', 'y', 'y', 'Y', 'o', 'O', 'a', 'A', 'A', 'c', 'C');        $shguideID = $_POST['shguideID'];
    $keywords = preg_replace($pattern, $replace, $_POST['keywords']);
    //Then we remove unwanted characters by only allowing a-z, A-Z, comma, 'minus' and white space
    $keywords = preg_replace("/[^a-zA-Z-,s]/", "", $keywords);

    echo json_encode($keywords);
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

"Pru00eat-u00e0-porter" is a correct JavaScript string literal representation of Prêt-à-porter. I assume you're doing a json_encode at some point along the line?

Note also that PHP's regular expressions are not Unicode-aware, so if you are using UTF-8 (which generally you want to be), the character ê is not a single character, but byte C3 followed by byte AA. That's fine for simple literal matches, but in situations like a character class you're now matching two bytes separately instead of one after each other, which can easily mess up your expression.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...