python - replace URLs in text with links to URLs

Question

Welcome To Ask or Share your Answers For Others

python - replace URLs in text with links to URLs

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:38:20+0000

You can load the document up with a DOM/HTML parsing library ( see html5lib ), grab all text nodes, match them against a regular expression and replace the text nodes with a regex replacement of the URI with anchors around it using a PCRE such as:

/(https?:[;/?\@&=+$,[]A-Za-z0-9-_.!~*'()%][;/?:@&=+$,[]A-Za-z0-9-_.!~*'()%#]*|[KZ]:\*.*w+)/g

I'm quite sure you can scourge through and find some sort of utility that does this, I can't think of any off the top of my head though.

Edit: Try using the answers here: How do I get python-markdown to additionally "urlify" links when formatting plain text?

import re

urlfinder = re.compile("([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}|((news|telnet|nttp|file|http|ftp|https)://)|(www|ftp)[-A-Za-z0-9]*\.)[-A-Za-z0-9\.]+):[0-9]*)?/[-A-Za-z0-9_\$\.\+\!\*\(\),;:@&=\?/~\#\%]*[^]'\.}>\),"]")

def urlify2(value):
    return urlfinder.sub(r'<a href="1">1</a>', value)

call urlify2 on a string and I think that's it if you aren't dealing with a DOM object.

Categories

python - replace URLs in text with links to URLs

python - replace URLs in text with links to URLs

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags