Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
278 views
in Technique[技术] by (71.8m points)

regex - Javascript match URL pattern with special characters allowed

My text is:

<A HREF="http://ad.doubleclick.net/get/N97638.2534621.INTERSTITIAL/B7532631099.4;sz=1x1;ord=[timestamp]?">

I am using the following regex to match URL:

var uri_pattern = /((?:[a-z][w-]+:(?:/{1,3}|[a-z0-9%])|wwwd{0,3}[.]|[a-z0-9.-]+[.][a-z]{2,4}/)(?:[^s()<>]+|(([^s()<>]+|(([^s()<>]+)))*))+(?:(([^s()<>]+|(([^s()<>]+)))*)|[^s`!()[]{};:'".,<>???“”‘’]))/ig

This works fine, expect that it doesn't catch characters like [ ] ?. I tried manipulating the regex to include special chars, but it didnt seem to work.

For example:

var text = '<A HREF="http://ad.doubleclick.net/get/N97638.2534621.INTERSTITIAL/B7532631099.4;sz=1x1;ord=[timemacro]?">';
console.log(text.match(uri_pattern));

//OUTPUT
"http://ad.doubleclick.net/get/N97638.2534621.INTERSTITIAL/B7532631099.4;sz=1x1;ord=[timemacro"

Whereas I want:

"http://ad.doubleclick.net/get/N97638.2534621.INTERSTITIAL/B7532631099.4;sz=1x1;ord=[timemacro]?"
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use the following code:

var uri_pattern = /((?:[a-z][w-]+:(?:/{1,3}|[a-z0-9%])|wwwd{0,3}[.]|[a-z0-9.-]+[.][a-z]{2,4}/)(?:[^s()<>]+|(([^s()<>]+|(([^s()<>]+)))*))+(?:(([^s()<>]+|(([^s()<>]+)))*)|[^s`!()[{};:'".,<>???“”‘’]|]|?))/ig

var text = '<A HREF="http://ad.doubleclick.net/get/N97638.2534621.INTERSTITIAL/B7532631099.4;sz=1x1;ord=[timemacro]?">';

console.log(text.match(uri_pattern));

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...