I've been using HTMLPurifier for sanitizing the output of a rich text editor, and ended up with:
include_once('htmlpurifier/library/HTMLPurifier.auto.php');
$config = HTMLPurifier_Config::createDefault();
$config->set('Core', 'Encoding', 'UTF-8');
$config->set('HTML', 'Doctype', 'HTML 4.01 Transitional');
if (defined('PURIFIER_CACHE')) {
$config->set('Cache', 'SerializerPath', PURIFIER_CACHE);
} else {
# Disable the cache entirely
$config->set('Cache', 'DefinitionImpl', null);
}
# Help out the Purifier a bit, until it develops this functionality
while (($cleaner = preg_replace('!<(em|strong)>(s*)</1>!', '$2', $input)) != $input) {
$input = $cleaner;
}
$filter = new HTMLPurifier($config);
$output = $filter->purify($input);
The main points of interest:
- Include the autoloader.
- Create an instance of
HTMLPurifier_Config
as $config
.
- Set configuration settings as needed, with
$config->set()
.
- Create an instance of
HTMLPurifier
, passing $config
to it.
- Use
$filter->purify()
on your input.
However, it's entirely overkill for something that doesn't need to allow any HTML in the output.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…