There is a package called fuzzywuzzy
. Install via pip:
pip install fuzzywuzzy
Simple usage:
>>> from fuzzywuzzy import fuzz
>>> fuzz.ratio("this is a test", "this is a test!")
96
The package is built on top of difflib
. Why not just use that, you ask? Apart from being a bit simpler, it has a number of different matching methods (like token order insensitivity, partial string matching) which make it more powerful in practice. The process.extract
functions are especially useful: find the best matching strings and ratios from a set. From their readme:
Partial Ratio
>>> fuzz.partial_ratio("this is a test", "this is a test!")
100
Token Sort Ratio
>>> fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
90
>>> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
100
Token Set Ratio
>>> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
84
>>> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
100
Process
>>> choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
>>> process.extract("new york jets", choices, limit=2)
[('New York Jets', 100), ('New York Giants', 78)]
>>> process.extractOne("cowboys", choices)
("Dallas Cowboys", 90)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…