You can load the document up with a DOM/HTML parsing library ( see html5lib ), grab all text nodes, match them against a regular expression and replace the text nodes with a regex replacement of the URI with anchors around it using a PCRE such as:
/(https?:[;/?\@&=+$,[]A-Za-z0-9-_.!~*'()%][;/?:@&=+$,[]A-Za-z0-9-_.!~*'()%#]*|[KZ]:\*.*w+)/g
I'm quite sure you can scourge through and find some sort of utility that does this, I can't think of any off the top of my head though.
Edit: Try using the answers here: How do I get python-markdown to additionally "urlify" links when formatting plain text?
import re
urlfinder = re.compile("([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}|((news|telnet|nttp|file|http|ftp|https)://)|(www|ftp)[-A-Za-z0-9]*\.)[-A-Za-z0-9\.]+):[0-9]*)?/[-A-Za-z0-9_\$\.\+\!\*\(\),;:@&=\?/~\#\%]*[^]'\.}>\),"]")
def urlify2(value):
return urlfinder.sub(r'<a href="1">1</a>', value)
call urlify2 on a string and I think that's it if you aren't dealing with a DOM object.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…