Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.4k views
in Technique[技术] by (71.8m points)

c# - Regex to get src value from an img tag

I am using the following regex to get the src value of the first img tag in an HTML document.

string match = "src=(?:"|')?(?<imgSrc>[^>]*[^/].(?:jpg|png))(?:"|')?"

Now it captures total src attribute that I dont need. I just need the url inside the src attribute. How to do it?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Parse your HTML with something else. HTML is not regular and thus regular expressions aren't at all suited to parsing it.

Use an HTML parser, or an XML parser if the HTML is strict. It's a lot easier to get the src attribute's value using XPath:

//img/@src

XML parsing is built into the System.Xml namespace. It's incredibly powerful. HTML parsing is a bit more difficult if the HTML isn't strict, but there are lots of libraries around that will do it for you.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

Just Browsing Browsing

[5] html - How to create even cell spacing within a

1.4m articles

1.4m replys

5 comments

57.0k users

...