First things first:
- Read in the buffer
- Use libiconv or similar to obtain wchar_t type from UTF-8 and use the wide character handling functions such as wprintf()
- Use the wide character functions in C! Most file/output handling functions have a wide-character variant
Ensure that your terminal can handle UTF-8 output. Having the correct locale setup and manipulating the locale data can automate alot of the file opening and conversion for you ... depending on what you are doing.
Remember that the width of a code-point or character in UTF-8 is variable. This means you can't just seek to a byte and begin reading like with ASCII ... because you might land in the middle of a code point. Good libraries can do this in some cases.
Here is some code (not mine) that demonstrates some usage of UTF-8 file reading and wide character handling in C.
#include <stdio.h>
#include <wchar.h>
int main()
{
FILE *f = fopen("data.txt", "r, ccs=UTF-8");
if (!f)
return 1;
for (wint_t c; (c = fgetwc(f)) != WEOF;)
printf("%04X
", c);
fclose(f);
return 0;
}
Links
- libiconv
- Locale data in C/GNU libc
- Some handy info
- Another good Unicode/UTF-8 in C resource
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…