Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
227 views
in Technique[技术] by (71.8m points)

python - Fuzzy String Comparison

What I am striving to complete is a program which reads in a file and will compare each sentence according to the original sentence. The sentence which is a perfect match to the original will receive a score of 1 and a sentence which is the total opposite will receive a 0. All other fuzzy sentences will receive a grade in between 1 and 0.

I am unsure which operation to use to allow me to complete this in Python 3.

I have included the sample text in which the Text 1 is the original and the other preceding strings are the comparisons.

Text: Sample

Text 1: It was a dark and stormy night. I was all alone sitting on a red chair. I was not completely alone as I had three cats.

Text 20: It was a murky and stormy night. I was all alone sitting on a crimson chair. I was not completely alone as I had three felines // Should score high point but not 1

Text 21: It was a murky and tempestuous night. I was all alone sitting on a crimson cathedra. I was not completely alone as I had three felines // Should score lower than text 20

Text 22: I was all alone sitting on a crimson cathedra. I was not completely alone as I had three felines. It was a murky and tempestuous night. // Should score lower than text 21 but NOT 0

Text 24: It was a dark and stormy night. I was not alone. I was not sitting on a red chair. I had three cats. // Should score a 0!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

There is a package called fuzzywuzzy. Install via pip:

pip install fuzzywuzzy

Simple usage:

>>> from fuzzywuzzy import fuzz
>>> fuzz.ratio("this is a test", "this is a test!")
    96

The package is built on top of difflib. Why not just use that, you ask? Apart from being a bit simpler, it has a number of different matching methods (like token order insensitivity, partial string matching) which make it more powerful in practice. The process.extract functions are especially useful: find the best matching strings and ratios from a set. From their readme:

Partial Ratio

>>> fuzz.partial_ratio("this is a test", "this is a test!")
    100

Token Sort Ratio

>>> fuzz.ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
    90
>>> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear")
    100

Token Set Ratio

>>> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
    84
>>> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear")
    100

Process

>>> choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
>>> process.extract("new york jets", choices, limit=2)
    [('New York Jets', 100), ('New York Giants', 78)]
>>> process.extractOne("cowboys", choices)
    ("Dallas Cowboys", 90)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...