In both ISO-8859-1 and ISO-8859-15 the character number 146 is a control character MW
(Message Waiting) from the C1 range.
SGML refers to ISO 8859-1 (mind the space between ISO and 8859-1, which is not a hyphen as in the character sets you use). It does not allow control characters but three (here: SGML in HTML):
In the HTML document character set only three control characters are allowed: Horizontal
Tab, Carriage Return, and Line Feed (code positions 9, 13, and 10).
You therefore did pass an illegal character. There does not exist a SGML/HTML entity for it you could replace it with.
I suggest you validate the input that comes into your application that it does not allow control characters. If you believe those characters were originally representing a useful thing, like a letter that can be actually read (e.g. not a control character), it's likely that when you process the data the encoding is broken at some point.
From the information given in your question it's hard to say where, because you only specify the input encoding and the encoding of the database filed - but those two already don't match (which should not produce the issue you're asking about, but it can produce other issues). Next to those two places, there is also the database client connection charset (unspecified in your question), the output encoding (unspecified in your question) and the response content encoding (unspecified in your question).
It might make sense that you change your overall encoding to UTF-8 to support a wider range of characters, but that's really a might.
Edit: The part above is somewhat a strict view. It came to my mind that the input you receive is not ISO-8859-1(5) actually but something else, like a windows code page. I'd probably say, it's Windows-1252 (cp1252)-Wikipedia. Compared to the C1 range of ISO-8859-1 (128-159) it has several non-control characters.
The Wikipedia page also notes that most browsers treat ISO-8859-1 as Windows-1252/CP1252/CP-1252. The PHP htmlentities()
function is not able to deal with these characters, the translation table for HTML Entities does not cover the codepoints (PHP 5.3, not tested against 5.4). You need to create your own translation table and use it with strtr
to replace the characters not available in ISO 8859-15 for windows-1252:
/*
* mappings of Windows-1252 (cp1252) 128 (0x80) - 159 (0x9F) characters:
* @link http://en.wikipedia.org/wiki/Windows-1252
* @link http://www.w3.org/TR/html4/sgml/entities.html
*/
$cp1252HTML401Entities = array(
"x80" => '€', # 128 -> euro sign, U+20AC NEW
"x82" => '‚', # 130 -> single low-9 quotation mark, U+201A NEW
"x83" => 'ƒ', # 131 -> latin small f with hook = function = florin, U+0192 ISOtech
"x84" => '„', # 132 -> double low-9 quotation mark, U+201E NEW
"x85" => '…', # 133 -> horizontal ellipsis = three dot leader, U+2026 ISOpub
"x86" => '†', # 134 -> dagger, U+2020 ISOpub
"x87" => '‡', # 135 -> double dagger, U+2021 ISOpub
"x88" => 'ˆ', # 136 -> modifier letter circumflex accent, U+02C6 ISOpub
"x89" => '‰', # 137 -> per mille sign, U+2030 ISOtech
"x8A" => 'Š', # 138 -> latin capital letter S with caron, U+0160 ISOlat2
"x8B" => '‹', # 139 -> single left-pointing angle quotation mark, U+2039 ISO proposed
"x8C" => 'Œ', # 140 -> latin capital ligature OE, U+0152 ISOlat2
"x8E" => 'Ž', # 142 -> U+017D
"x91" => '‘', # 145 -> left single quotation mark, U+2018 ISOnum
"x92" => '’', # 146 -> right single quotation mark, U+2019 ISOnum
"x93" => '“', # 147 -> left double quotation mark, U+201C ISOnum
"x94" => '”', # 148 -> right double quotation mark, U+201D ISOnum
"x95" => '•', # 149 -> bullet = black small circle, U+2022 ISOpub
"x96" => '–', # 150 -> en dash, U+2013 ISOpub
"x97" => '—', # 151 -> em dash, U+2014 ISOpub
"x98" => '˜', # 152 -> small tilde, U+02DC ISOdia
"x99" => '™', # 153 -> trade mark sign, U+2122 ISOnum
"x9A" => 'š', # 154 -> latin small letter s with caron, U+0161 ISOlat2
"x9B" => '›', # 155 -> single right-pointing angle quotation mark, U+203A ISO proposed
"x9C" => 'œ', # 156 -> latin small ligature oe, U+0153 ISOlat2
"x9E" => 'ž', # 158 -> U+017E
"x9F" => 'Ÿ', # 159 -> latin capital letter Y with diaeresis, U+0178 ISOlat2
);
$outputWithEntities = strtr($output, $cp1252HTML401Entities);
If you want to be even more safe, you can spare the named entities and just only pick the numeric ones which should work in very old browsers as well:
$cp1252HTMLNumericEntities = array(
"x80" => '€', # 128 -> euro sign, U+20AC NEW
"x82" => '‚', # 130 -> single low-9 quotation mark, U+201A NEW
"x83" => 'ƒ', # 131 -> latin small f with hook = function = florin, U+0192 ISOtech
"x84" => '„', # 132 -> double low-9 quotation mark, U+201E NEW
"x85" => '…', # 133 -> horizontal ellipsis = three dot leader, U+2026 ISOpub
"x86" => '†', # 134 -> dagger, U+2020 ISOpub
"x87" => '‡', # 135 -> double dagger, U+2021 ISOpub
"x88" => 'ˆ', # 136 -> modifier letter circumflex accent, U+02C6 ISOpub
"x89" => '‰', # 137 -> per mille sign, U+2030 ISOtech
"x8A" => 'Š', # 138 -> latin capital letter S with caron, U+0160 ISOlat2
"x8B" => '‹', # 139 -> single left-pointing angle quotation mark, U+2039 ISO proposed
"x8C" => 'Œ', # 140 -> latin capital ligature OE, U+0152 ISOlat2
"x8E" => 'Ž', # 142 -> U+017D
"x91" => '‘', # 145 -> left single quotation mark, U+2018 ISOnum
"x92" => '’', # 146 -> right single quotation mark, U+2019 ISOnum
"x93" => '“', # 147 -> left double quotation mark, U+201C ISOnum
"x94" => '”', # 148 -> right double quotation mark, U+201D ISOnum
"x95" => '•', # 149 -> bullet = black small circle, U+2022 ISOpub
"x96" => '–', # 150 -> en dash, U+2013 ISOpub
"x97" => '—', # 151 -> em dash, U+2014 ISOpub
"x98" => '˜', # 152 -> small tilde, U+02DC ISOdia
"x99" => '™', # 153 -> trade mark sign, U+2122 ISOnum
"x9A" => 'š', # 154 -> latin small letter s with caron, U+0161 ISOlat2
"x9B" => '›', # 155 -> single right-pointing angle quotation mark, U+203A ISO proposed
"x9C" => 'œ', # 156 -> latin small ligature oe, U+0153 ISOlat2
"x9E" => 'ž', # 158 -> U+017E
"x9F" => 'Ÿ', # 159 -> latin capital letter Y with diaeresis, U+0178 ISOlat2
);
Hope this is more helpful now. See as well the Wikipedia page linked above for some characters that are in windows-1242 and ISO 8859-15 but at different points. You should probably consider to use UTF-8 on your website.