Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
309 views
in Technique[技术] by (71.8m points)

web scraping - Exclude a combination of characters with regex or add a letter

I'm trying to adjust KODI's search filter with regex so the scrapers recognize tv shows from their original file names.

They either come in this pattern: "TV show name S04E01 some extra info" or this "TV show name 01 some extra info" The first is not recognized, because "S04" scrambles the search in a number of ways, this needs to go. The second is not recognized, because it needs an 'e' before numbers, otherwise, it won't be recognized as an episode number.

So I see two approaches.

  1. Make the filter ignore s01-99

  2. prepend an 'e' any freestanding two-digit numbers, but I worry if regex can even do that.

I have no experience in the regex, but I've been playing around coming up with this, which unsurprisingly doesn't do the trick

^(?!s{00,99})d{2}$
question from:https://stackoverflow.com/questions/65886155/exclude-a-combination-of-characters-with-regex-or-add-a-letter

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You may either find ([0-9]{2}) regex matches and replace with E$1, or match s(0[1-9]|[1-9][0-9]) pattern in an ignore filter.

Details

  • ([0-9]{2}) - matches and captures into Group 1 any two digits that are not enclosed with letters, digits and _. The E$1 replacement means that the matched text (two digits) is replaced with itself (since $1 refers to the Group 1 value) with E prepended to the value.
  • s(0[1-9]|[1-9][0-9]) - matches an s followed with number between 01 and 99 because (0[1-9]|[1-9][0-9]) is a capturing group matching either 0 and then any digit from 1 to 9 ([1-9]), or (|) any digit from 1 to 9 ([1-9]) and then any digit ([0-9]).

NOTE: If you need to generate a number range regex, you may use this JSFiddle of mine.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

56.9k users

...