Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
679 views
in Technique[技术] by (71.8m points)

mysql - Check the language of string based on glyphs in PHP

I have a MySQL database with book titles in both English and Arabic and I'm using a PHP class that can automatically transliterate Arabic text into Latin script.

I'd like my output HTML to look something like this:

<h3>A book</h3>
<h3>???? <em>(kitaab)</em></h3>
<h3>Another book</h3>

Is there a way for PHP to determine the language of a string based on the Unicode characters and glyphs used in it? I'm trying to get something like this:

$Ar = new Arabic('EnTransliteration');
while ($item = mysql_fetch_array($results)) {
    ...
    if (some test to see if $item['item_title'] has Arabic glyphs in it) {
      echo "<h3>$item[item_title] <em>(" . $Ar->ar2en($item['item_title']) . ")</em></h3>";
    } else {
      echo "<h3>$item[item_title]</h3>";
    }
    ...
}

Fortunately the class doesn't choke when fed Latin characters, so in theory I could send every result through the transformation, but that seems like a waste of processing.

Thanks!

Edit: I still haven't found a way to check for glyphs or characters. I suppose I could put all the Arabic characters in an array and check if anything in the array matches a part of the string...

I did, however, figure out an interim solution that might work fine in the end. It puts every title through the transformation regardless of language, but only outputs the parenthetical transliteration if the string was changed:

while ($item = mysql_fetch_array($mysql_results)) {
    $transliterate = trim(strtolower($Ar->ar2en($item['item_title'])));
    $item_title = (strtolower($item['item_title']) == $transliterate) ? $item['item_title'] : $item['item_title'] . " <em>($transliterate)</em>";

    echo "<h3>$item_title</h3>";
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This should do it:

preg_match("/p{Arabic}/u", $item['item_title'])

You could make that regular expression a bit more sophisticated if you want to, but I don't think you really need to.

The p escape sequence lets you select characters based on their Unicode properties (when the u pattern modifier is used).

The PHP manual mentions: "Extended properties such as "Greek" or "InMusicalSymbols" are not supported by PCRE." But that's not entirely true anymore. PCRE release 6.5 added support for script names.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...