Just for the record - my first question here but hopefully not my last input in the community.
But that's not why I'm here.
I'm currently developing a simple system that has to generate an image with a text on it. Everthing went well until I realised that GD cannot handle UTF-8 characters like
ā, ?, ?, ?, ?, é
and so on.
To clear things up - I'm using imagettftext()
Trying to solve my problem I dug into depths of google and some solutions were returned, none of them, sadly, solved my problem completely.
Currently I'm using this script I found in this thread - PHP function imagettftext() and unicode
private function properText($text){
// Convert UTF-8 string to HTML entities
$text = mb_convert_encoding($text, 'HTML-ENTITIES',"UTF-8");
// Convert HTML entities into ISO-8859-1
$text = html_entity_decode($text,ENT_NOQUOTES, "ISO-8859-1");
// Convert characters > 127 into their hexidecimal equivalents
$out = "";
for($i = 0; $i < strlen($text); $i++) {
$letter = $text[$i];
$num = ord($letter);
if($num>127) {
$out .= "&#$num;";
} else {
$out .= $letter;
}
}
return $out;
}
and it works fine for some characters but not all of them, for example, a with umlaut isn't converted correctly.
So at this point I'm not sure where and what to look for anymore as I cannot predict the user input. To be more precise, the system is pulling artist names from an xml feed and using the data for the image generation (I'm not planning to support hieroglyphs).
I've made sure that the data gathered from the feed is indeed UTF-8 by using PHP's mb_detect_encoding() and I've made sure that all the characters that currently aren't displayed correctly are indded in the font file I'm feeding to the imagettftext()
function by checking it with windows charmap tool.
Hopefully I can find my answer here and thank you for your help in advance!
edit
To clarify - the characters are not displayed correctly, or, to be more precise, are replaced by malformed characters. Here is a screenshot -
it should read "José González"
edit No2
Using bin2hex() function on data retrieved from the xml feed returns this.
José González -> 4a6f73c3a920476f6e7ac3a16c657a
// input -> bin2hex(input)
edit - fixed
As I continued my research I came up with an answer for my problem, this piece of code did it!
$text = mb_convert_encoding($text, "HTML-ENTITIES", "UTF-8");
$text = preg_replace('~^(&([a-zA-Z0-9]);)~',htmlentities('${1}'),$text);
return($text);
Now all the characters that troubled me are displayed correctly!
See Question&Answers more detail:
os