Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
92 views
in Technique[技术] by (71.8m points)

How to decode HTML Entities in C?

I'm interested in unescaping text for example: \ maps to in C. Does anyone know of a good library?

As reference the Wikipedia List of XML and HTML Character Entity References.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I had some free time today and wrote a decoder from scratch: entities.c, entities.h.

The only function with external linkage is

size_t decode_html_entities_utf8(char *dest, const char *src);

If src is a null pointer, the string will be taken from dest, ie the entities will be decoded in-place. Otherwise, the decoded string will be put in dest - which should point to a buffer big enough to hold strlen(src) + 1 characters - and src will be unchanged.

The function will return the length of the decoded string.

Please note that I haven't done any extensive testing, so there's a high probability of bugs...


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...