Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
218 views
in Technique[技术] by (71.8m points)

php - Regex ignore URL already in HTML tags

I'm having a little problem with my Regex

I've made a custom BBcode for my website, however I also want URLs to be parsed too.

I'm using preg_replace and this is the pattern used to identify URLS:

/([w]+://[w-?&;#~=./@]+[w/])/is

Which works great, however if a URL is within a [img][/img] block, the above pattern also picks it up and produces a result like this:

//[img]http://url.com/toimg.jeg[/img] will produce this result:
<img src="<a href="http://url.com/toimg.jeg" target="_blank">/>
//When it should produce:
<img src="http://url.com/toimg.jeg"/>

I tried using this:

/([^"][w]+://[w-?&;#~=./@]+[w/][^"])/is

With no luck.

Any help will be appreciated.

Edit: For solution See the 2nd comment on stema's answer.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Try this

(?<!href=")([w]+://[w-?&;#~=./@]+[w/])

See it here on Regexr

To make it more general you can simplify your lookbehind to check only for "=""

(?<!=")([w]+://[w-?&;#~=./@]+[w/])

See it on Regexr

(?<!href=") is a negative lookbehind assertion, it ensures that there is no "href="" before your pattern.

is a word boundary that anchors the start of your link to a change from a non word to a word character. without this the lookbehind would be useless and it would match from the "ttp://..." on.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...