Use tldextract
which is more efficient version of urlparse
, tldextract
accurately separates the gTLD
or ccTLD
(generic or country code top-level domain) from the registered domain
and subdomains
of a URL.
>>> import tldextract
>>> ext = tldextract.extract('http://forums.news.cnn.com/')
ExtractResult(subdomain='forums.news', domain='cnn', suffix='com')
>>> ext.domain
'cnn'
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…