Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
323 views
in Technique[技术] by (71.8m points)

python - Find a repeating pattern in a list of strings

I'm looking for a way to clean strings from their longest repeating pattern.

I have a list of approximately 1000 web pages titles, and they all share a common suffix, which is the name of the website.

They follow this pattern:

['art gallery - museum and visits | expand knowledge',
 'lasergame - entertainment | expand knowledge',
 'coffee shop - confort and food | expand knowledge',
 ...
]

How could I automatically strip all strings from their common suffix " | expand knowledge" ?

Thanks!

Edit: Sorry, I did not make myself clear enough. I have no information about the " | expand knowledge" suffix in advance. I want to be able to clear a list of strings of a potential common suffix, even if I do not know what it is.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here's a solution using the os.path.commonprefix function on the reversed titles:

titles = ['art gallery - museum and visits | expand knowledge',
 'lasergame - entertainment | expand knowledge',
 'coffee shop - confort and food | expand knowledge',
]

# Find the longest common suffix by reversing the strings and using a 
# library function to find the common "prefix".
common_suffix = os.path.commonprefix([title[::-1] for title in titles])[::-1]

# Strips all titles from the number of characters in the common suffix.
stripped_titles = [title[:-len(common_suffix)] for title in titles]

Result:

['art gallery - museum and visits', 'lasergame - entertainment', 'coffee shop - confort and food']

Because it finds the common suffix by itself, it should work on any group of titles, even if you don't know the suffix.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...