Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
4.9k views
in Technique[技术] by (71.8m points)

C#: WebClient - Cant Unrecognize cyrillic characters

Trying to parse site: link

Code to download content:

WebClient client = new WebClient();
client.Encoding = System.Text.Encoding.ASCII; // OR UTF8
string reply = client.DownloadString(url);

Response:

<!DOCTYPE HTML>
<html prefix="og: http://ogp.me/ns#">
<head><meta http-equiv="Content-Type" content="text/html; charset=windows-1251">
    <link rel="icon" type="image/vnd.microsoft.icon" href="https://spravnik.com/favicon.ico"/>
    <link rel="SHORTCUT ICON" href="https://spravnik.com/favicon.ico"/>
    <link href="/src/main.css?v=1.25" rel="stylesheet" type="text/css" />
    <script src="https://cdn.contentsitesrv.com/js/push/subscribe.js?v=1.3.0"></script>
<title>??????????? 12 ??????? ??. - ?????????? ?????????? ??????</title>
<meta name="keywords" content="?????????? ?????????? ????????????, ???? 09 ????????????, ?????????? ????? ????????????"/>
<meta name="description" content="? ??????????? &#9742; ????????? ?? ??????? ????? ??? ???? ???????????? ?? 12 ??????? ??. ????? ???????? ?? ?????? ????????, ?????? ???????? ????? ???????? ? ????? ?? ?????? ????????."/>
<meta property="og:title" content="?????????? ??????????. ??????????? ? ?? ??????...!"/>

All cyrillic characters are converted in "???" Or in ????


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It looks like this site just ignores your Client Encoding and return you data in 1251 encoding. I prefer using RestClient and check responses ContentType. But if you absolutely sure about this site - the code below works correctly.

WebClient client = new WebClient {Encoding = Encoding.UTF8};
byte[] reply = client.DownloadData(url);

Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
Encoding encoding1251 = Encoding.GetEncoding("windows-1251");
var convertedBytes = Encoding.Convert(encoding1251, Encoding.UTF8, reply);

string result = Encoding.UTF8.GetString(convertedBytes);

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...