Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
470 views
in Technique[技术] by (71.8m points)

utf 8 - Dynamically generating Ge'ez unicodes

enter image description here

Hi. If you look at the image above, you will see a set of very weird-looking characters displayed along with some Latin characters. The weird ones are Eritrean characters. They are the characters we use in my country. So, to go strait to the point, I am hoping to create even the simplest possible bit of software or maybe even a batch file (if possible) to help me make these characters applicable on the web and make PCs understand and display them when being typed. Just like Arabic, Hindu, Chinese... characters are used. I think, since the question of 'creating a language' is often rare or because I may not know the correct term to use, when I searched the internet to find any tutorial or even a freelancer or anything, all I got was... nothing. So, I am hoping, if anyone can give me a step-by-step guide, or even just a clue about how to create this, would be very helpful.

Thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Your question asks "how to create a language", so I will describe all the pieces that need to be in place for a new language (or more accurately, writing system). You ask specifically about the Eritrean alphabet, so I will provide specific examples of how that is supported on modern systems, and try to provide you pointers for the pieces you are missing. The answer is long, and provides lots of links, to support the two explanations.

To work with a script like Ge'ez (also known as Ethiopic, the script used to write Amharic in Ethiopia and Tigrinya in Eritrea) you need a few things. The first is a way to encode the characters; a set of numbers representing each character, that the computer can use to represent the text. Luckily, Unicode has become widespread, and Unicode is designed to be a universal character set that includes all of the world's languages. Unicode 3.0 introduced Ethiopic in the range U+1200-U+137F, and later versions added supplements of more obscure characters in the ranges U+1380-U+1394, U+2D80-U+2DDF and U+AB00-U+AB2F. If you wanted to support a language that Unicode didn't yet support, you would either need to use the private use area and define your own mapping of characters to code points, or submit a proposal to have your script added to Unicode; for example, see the proposal for Ethiopic.

Now, Unicode is just a character set; an abstract mapping between characters and numbers. To actually transmit these characters as a sequence of bytes, you use a character encoding. There are many encodings; some of them, like ASCII and ISO-8859-1 only cover a subset of the full Unicode character set, while others, like UTF-8 and UTF-16, cover the full range. For documents on the web, UTF-8 is the recommended character encoding; you should never use anything else if you can help it. In UTF-8, you can write Ge'ez directly in the document, for example: ????. One thing to watch out for is that some programs (especially on Windows) will offer you "Unicode" as an encoding, when they mean UTF-16; you want to make sure to choose UTF-8, as it's more efficient and more compatible with a wider variety of software.

If you are using encodings that don't cover the full range of Unicode, or you don't have a good way to type those characters, and you are writing HTML or XML, you can use numeric character references instead. To do this, you write the Unicode code point of the character you want to refer between &# and ;. You can write the number in decimal, or in hexadecimal prefixed with an x. For example, ? can be written ሀ or ሀ (the semicolon at the end is important; it wasn't working for you in the comments because you were missing it).

Now that you have a character set, and a way of encoding it, you need a way to display it. Some scripts are easier to display in others. For all scripts, you need a font; a file defining how each character looks. A font contains a collection of glyphs, or drawings of each character. Some scripts, like the Latin alphabet (the alphabet used for English and most European languages) are relatively simple; each character is a separate glyph, and how they are drawn doesn't depend on what characters come before or after (though diacritics and ligatures can make it a little more complicated). Others, like Arabic and Indic scripts are written in cursive, where letters join to each other so how they are drawn can depend on the characters near them. These languages require special rendering support like Uniscribe or DirectWrite on Windows, Pango on Linux, or advanced font technology like Apple Advanced Typography or Graphite.

Luckily, Ge'ez is a fairly simple writing system, that doesn't require any specialized rending support or advanced font systems. Each of the characters is a separate glyph, and it doesn't require any reordering. So a normal OpenType font, displayed with the rendering systems already available on most computers, will do the job. But you still need the font in order to be able to display the characters. To create you own font, you can use FontForge (a free/open source tool), Fontographer, FontLab Studio, or other similar software.

For Ethiopic, you don't need to create your own. There are numerous fonts available that include the Ethiopic characters, but one that I would recommend is Abyssinica SIL from SIL (the Summer Institute of Linguistics), which does a lot of great work for minority languages and writing systems. Their fonts are available under a free license, that allows you to use the font, redistribute the font, and modify the font, so their fonts are quite flexible and can be used in a wide variety of situations. Windows ships with Nyala, which includes Ethiopic characters, since Windows Vista, and Ebrima, which added support for Ethiopic characters in Windows 8; so people on Windows Vista or later should be able to view Ethiopic characters already. Mac OS X ships with Kefa as of 10.6.

Once you have the font, you will be able to view Ethiopic characters. But other people reading your documents might not have those fonts (if they are using an older version of Windows or Mac OS X, if they didn't install all of the fonts that came with Windows, or the like), in which case the characters will probably show up as boxes or question marks on their machine. You could give those people a redistributable font like Abyssinica SIL, or they could buy a font that includes Ethiopic characters, but that can be inconvenient. For working with word processor documents or plain text, that's probably the best you can do; they will need the font installed on their computer to be able to display the text. If you create a PDF on your computer, it should embed the fonts that it needs to display the text, so creating a PDF can be a convenient way to include uncommon fonts with your document.

On a web page, you can use web fonts to link to a font from your stylesheet, allowing the users web browser to load that font for that web page. Web fonts are supported all the way back to IE 6, and in recent versions of most other web browsers, so they are actually quite widely supported. Different web browsers support different font file formats (EOT, TTF, OpenType, SVG, and WOFF), and slightly different syntaxes for the CSS (older versions of IE are based on an older draft), so it can be a bit tricky to make a page that is compatible with all browsers. Luckily, people have automated that process. Some web fonts are available online from Google Web Fonts or FontSquirrel, but sadly, I couldn't find any Ethiopic fonts already hosted. However, you can upload a font to FontSquirrel, and it will convert it into all of the major formats, and provide example CSS that will work on all modern browsers. Note that you should only do this with fonts that allow web embedding; not all fonts do. Since Abyssinica SIL is available under the Open Font License, you can use it, and I've run it through FontSquirrel for you; you can see how it works (check out the Glyphs & Languages tab), or download the kit. To use it, just put the font files (.ttf, .eot, .svg, and .woff) on your server in the same directory as your CSS, and include the following in your CSS:

@font-face {
    font-family: 'abyssinica_silregular';
    src: url('abyssinicasil-r.eot');
    src: url('abyssinicasil-r.eot?#iefix') format('embedded

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...