Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
434 views
in Technique[技术] by (71.8m points)

xhtml - When should one use HTML entities?

This has been confusing me for some time. With the advent of UTF-8 as the de-facto standard in web development I'm not sure in which situations I'm supposed to use the HTML entities and for which ones should I just use the UTF-8 character. For example,

  • em dash (–, &emdash;)
  • ampersand (&, &)
  • 3/4 fraction (?, ¾)

Please do shed light on this issue. It will be appreciated.

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Based on the comments I have received, I looked into this a little further. It seems that currently the best practice is to forgo using HTML entities and use the actual UTF-8 character instead. The reasons listed are as follows:

  1. UTF-8 encodings are easier to read and edit for those who understand what the character means and know how to type it.
  2. UTF-8 encodings are just as unintelligible as HTML entity encodings for those who don't understand them, but they have the advantage of rendering as special characters rather than hard to understand decimal or hex encodings.

As long as your page's encoding is properly set to UTF-8, you should use the actual character instead of an HTML entity. I read several documents about this topic, but the most helpful were:

From the UTF-8: The Secret of Character Encoding article:

Wikipedia is a great case study for an application that originally used ISO-8859-1 but switched to UTF-8 when it became far too cumbersome to support foreign languages. Bots will now actually go through articles and convert character entities to their corresponding real characters for the sake of user-friendliness and searchability.

That article also gives a nice example involving Chinese encoding. Here is the abbreviated example for the sake of laziness:

UTF-8:

這兩個字是甚麼意思

HTML Entities:

這兩個字是甚麼意思

The UTF-8 and HTML entity encodings are both meaningless to me, but at least the UTF-8 encoding is recognizable as a foreign language, and it will render properly in an edit box. The article goes on to say the following about the HTML entity-encoded version:

Extremely inconvenient for those of us who actually know what character entities are, totally unintelligible to poor users who don't! Even the slightly more user-friendly, "intelligible" character entities like θ will leave users who are uninterested in learning HTML scratching their heads. On the other hand, if they see θ in an edit box, they'll know that it's a special character, and treat it accordingly, even if they don't know how to write that character themselves.

As others have noted, you still have to use HTML entities for reserved XML characters (ampersand, less-than, greater-than).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...