Provided that all the characters that you're catering for exist in the Basic Multilingual Plane (it's unlikely that you'll need anything more), then a simple UTF-16 encoding should suffice.
Wikipedia:
All possible code points from U+0000
through U+10FFFF, except for the
surrogate code points U+D800–U+DFFF
(which are not characters), are
uniquely mapped by UTF-16 regardless
of the code point's current or future
character assignment or use.
The following sample program illustrates doing something along the lines of what you want:
static void Main(string[] args)
{
// ?
char[] ca = Encoding.Unicode.GetChars(new byte[] { 0xeb, 0x00 });
var sw = new StreamWriter(@"c:/helloworld.rtf");
sw.WriteLine(@"{
tf
{fonttbl {f0 Times New Roman;}}
f0fs60 H" + GetRtfUnicodeEscapedString(new String(ca)) + @"llo, World!
}");
sw.Close();
}
static string GetRtfUnicodeEscapedString(string s)
{
var sb = new StringBuilder();
foreach (var c in s)
{
if (c <= 0x7f)
sb.Append(c);
else
sb.Append("\u" + Convert.ToUInt32(c) + "?");
}
return sb.ToString();
}
The important bit is the Convert.ToUInt32(c)
which essentially returns the code point value for the character in question. The RTF escape for unicode requires a decimal unicode value. The System.Text.Encoding.Unicode
encoding corresponds to UTF-16 as per the MSDN documentation.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…