I am using DOMDocument to manipulate / modify HTML before it gets output to the page. This is only a html fragment, not a complete page. My initial problem was that all french character got messed up, which I was able to correct after some trial-and-error. Now, it seems only one problem remains : ' character gets transformed into ? .
The code :
<?php
$dom = new DOMDocument('1.0','utf-8');
$dom->loadHTML(utf8_decode($row->text));
//Some pretty basic modification here, not even related to text
//reinsert HTML, and make sure to remove DOCTYPE, html and body that get added auto.
$row->text = utf8_encode(preg_replace('/^<!DOCTYPE.+?>/', '', str_replace( array('<html>', '</html>', '<body>', '</body>'), array('', '', '', ''), $dom->saveHTML())));
?>
I know it's getting messy with the utf8 decode/encode, but this is the only way I could make it work so far. Here is a sample string :
Input :
Sans doute parce qu’il vient d’atteindre une date déterminante dans son spectaculaire cheminement
Output :
Sans doute parce qu?il vient d?atteindre une date déterminante dans son spectaculaire cheminement
If I find any more details, I'll add them. Thank you for your time and support!
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…