In my C# code, I am extracting text from a PDF document. When I do that, I get a string that's in UTF-8 or Unicode encoding (I'm not sure which). When I use Encoding.UTF8.GetBytes(src);
to convert it into a byte array, I notice that the whitespace is actually two characters with byte values of 194 and 160.
For example the string "CLE action" looks like
[67, 76, 69, 194 ,160, 65 ,99, 116, 105, 111, 110]
in a byte array, where the whitespace is 194 and 160... And because of this src.IndexOf("CLE action");
is returning -1 when I need it to return 1.
How can I fix the encoding of the string?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…