The problem here is that SimpleXML doesn't look at the HTTP header to determine the character encoding used in the document and simply assumes it's UTF-8 even though Google's server does advertise it as
Content-Type: text/xml; charset=GB2312
You can write a function that will take a look at that header using the super-secret magic variable $http_response_header
and transform the response accordingly. Something like that:
function sxe($url)
{
$xml = file_get_contents($url);
foreach ($http_response_header as $header)
{
if (preg_match('#^Content-Type: text/xml; charset=(.*)#i', $header, $m))
{
switch (strtolower($m[1]))
{
case 'utf-8':
// do nothing
break;
case 'iso-8859-1':
$xml = utf8_encode($xml);
break;
default:
$xml = iconv($m[1], 'utf-8', $xml);
}
break;
}
}
return simplexml_load_string($xml);
}
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…