Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
516 views
in Technique[技术] by (71.8m points)

python - replace URLs in text with links to URLs

Using Python I want to replace all URLs in a body of text with links to those URLs, like what Gmail does. Can this be done in a one liner regular expression?

Edit: by body of text I just meant plain text - no HTML

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can load the document up with a DOM/HTML parsing library ( see html5lib ), grab all text nodes, match them against a regular expression and replace the text nodes with a regex replacement of the URI with anchors around it using a PCRE such as:

/(https?:[;/?\@&=+$,[]A-Za-z0-9-_.!~*'()%][;/?:@&=+$,[]A-Za-z0-9-_.!~*'()%#]*|[KZ]:\*.*w+)/g

I'm quite sure you can scourge through and find some sort of utility that does this, I can't think of any off the top of my head though.

Edit: Try using the answers here: How do I get python-markdown to additionally "urlify" links when formatting plain text?

import re

urlfinder = re.compile("([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}|((news|telnet|nttp|file|http|ftp|https)://)|(www|ftp)[-A-Za-z0-9]*\.)[-A-Za-z0-9\.]+):[0-9]*)?/[-A-Za-z0-9_\$\.\+\!\*\(\),;:@&=\?/~\#\%]*[^]'\.}>\),"]")

def urlify2(value):
    return urlfinder.sub(r'<a href="1">1</a>', value)

call urlify2 on a string and I think that's it if you aren't dealing with a DOM object.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...