Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.0k views
in Technique[技术] by (71.8m points)

regex - How can I write a javascript regular expression to replace hyperlinks in this format [*](*) with html hyperlinks?

I need the parse text with links in the following formats:

[html title](http://www.htmlpage.com)
http://www.htmlpage.com
http://i.imgur.com/OgQ9Uaf.jpg

The output for those two strings would be:

<a href='http://www.htmlpage.com'>html title</a>
<a href='http://www.htmlpage.com'>http://www.htmlpage.com</a>
<a href='http://i.imgur.com/OgQ9Uaf.jpg'>http://i.imgur.com/OgQ9Uaf.jpg</a>

The string could include an arbitrary amount of these links, ie:

[html title](http://www.htmlpage.com)[html title](http://www.htmlpage.com)
[html title](http://www.htmlpage.com)   [html title](http://www.htmlpage.com)
[html title](http://www.htmlpage.com) wejwelfj http://www.htmlpage.com

output:

<a href='http://www.htmlpage.com'>html title</a><a href='http://www.htmlpage.com'>html title</a>
<a href='http://www.htmlpage.com'>html title</a>    <a href='http://www.htmlpage.com'>html title</a>
<a href='http://www.htmlpage.com'>html title</a> wejwelfj <a href='http://www.htmlpage.com'>http://www.htmlpage.com</a>

I have an extremely long function that does an alright job by passing over the string 3 times, but I can't successfully parse this string:

[This](http://i.imgur.com/iIlhrEu.jpg) one got me crying first, then once the floodgates were opened [this](http://i.imgur.com/IwSNFVD.jpg) one did it again and [this](http://i.imgur.com/hxIwPKJ.jpg). Ugh, feels. Gotta go hug someone/something.

For brevity, I'll post the regular expressions I've tried rather than the entire find/replace function:

var matchArray2 = inString.match(/[.*](.*)/g);

for matching [*](*), doesn't work because []()[]() is matched

Really that's it, I guess. Once I make that match I search that match for () and [] to parse out the link an link text and build the href tag. I delete matches from a temp string so I don't match them when I do my second pass to find plain hyperlinks:

var plainLinkArray = tempString2.match(/httpS*://S*/g);

I'm not parsing any html with regex. I'm parsing a string and attempting to output html.

edit: I added the requirement that it parse the third link http://i.imgur.com/OgQ9Uaf.jpg after the fact.

my final solution (based on @Cerbrus's answer):

function parseAndHandleHyperlinks(inString)
{
    var result = inString.replace(/[(.+?)]((https?://.+?))/g, '<a href="$2">$1</a>');
    return result.replace(/(?: |^)(https?://[a-zA-Z0-9/.(]+)/g, ' <a href="$1">$1</a>');     
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Try this regex:

/[(.+?)]((https?://[a-zA-Z0-9/.(]+?))/g

var s = "[html title](http://www.htmlpage.com)[html title](http://www.htmlpage.com)

[html title](http://www.htmlpage.com)   [html title](http://www.htmlpage.com)

[html title](http://www.htmlpage.com) wejwelfj http://www.htmlpage.com";

string.replace(/[(.+?)]((https?://[a-zA-Z0-9/.(]+?))/g, '<a href="$2">$1</a>');

Regex Explanation:

# /                   - Regex Start
# [                  - a `[` character (escaped)
# (.+?)               - Followed by any amount of words, grouped, non-greedy, so it won't match past:
# ]                  - a `]` character (escaped)
# (                  - Followed by a `(` character (escaped)
# (https?://
#   [a-zA-Z0-9/.(]+?) - Followed by a string that starts with `http://` or `https://`
# )                  - Followed by a `)` character (escaped)
# /g                  - End of the regex, search globally.

Now the 2 strings in the () / [] are captured, and placed in the following string:

'<a href="$2">$1</a>';

This works for your "problematic" string:

var s = "[This](http://i.imgur.com/iIlhrEu.jpg) one got me crying first, then once the floodgates were opened [this](http://i.imgur.com/IwSNFVD.jpg) one did it again and [this](http://i.imgur.com/hxIwPKJ.jpg). Ugh, feels. Gotta go hug someone/something."
s.replace(/[(.+?)]((https?://[a-zA-Z0-9/.(]+?))/g, '<a href="$2">$1</a>')

// Result:

'<a href="http://i.imgur.com/iIlhrEu.jpg">This</a> one got me crying first, then once the floodgates were opened <a href="http://i.imgur.com/IwSNFVD.jpg">this</a> one did it again and <a href="http://i.imgur.com/hxIwPKJ.jpg">this</a>. Ugh, feels. Gotta go hug someone/something.'

Some more examples with "Incorrect" input:

var s = "[Th][][is](http://x.com)

    [this](http://x(.com)

    [this](http://x).com)"
s.replace(/[(.+?)]((https?://[a-zA-Z0-9/.(]+?))/g, '<a href="$2">$1</a>')

//   "<a href="http://x.com">Th][][is</a>
//    <a href="http://x(.com">this</a>
//    <a href="http://x">this</a>.com)"

You can't really blame the last line for breaking, since there's no way to know if the user meant to stop the url there, or not.

To catch loose urls, add this:

.replace(/(?: |^)(https?://[a-zA-Z0-9/.(]+)/g, ' <a href="$1">$1</a>');

The (?: |^) bit catches a String start or space character, so it'll also match lines starting with a url.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...