Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
268 views
in Technique[技术] by (71.8m points)

regex - Match street number from different formats without suffixes

We've a "street_number" field which has been freely filed over the years that we want to format. Using regular expressions, we'd like to to extract the real "street_number", and the "street_number_suffix".

Ex: 17 b, "street_number" would be 17, and "street_number_suffix" would be b.

As there's a dozen of different patterns, I'm having troubles to tune the regular expression correctly. I consider using 2 different regexes, one to extract the "street_number", and another to extract the "street_number_suffix"

Here's an exhaustive set of patterns we'd like to format and the expected output:

# Extract street_number using PCRE

input           street_number   street_number_suffix

19-21           19              null
2 G             2               G
A               null            A
1 bis           1               bis
3 C             3               C
N°10            10              null
17 b            17              b
76 B            76              B
7 ter           7               ter
9/11            9               null
21.3            21              3
42              42              null

I know I could invoke an expressions that matches any digits until a hyphen using d+(?=-). It could be extended to match until a hyphen OR a slash using d+(?=-|/), thought, once I include s to this pattern, 21 from 19-21 will match. Adding conditions may no be that simple, which is why I ask your help.

Could anyone give me a helping hand on this ? If it can help, here's a draft: https://regex101.com/r/jGK5Sa/4


Edit: at the time I'm editing, here's the closest regex I could find:

(?:(N°|(?<!-|/|.|[a-z]|.{1})))d+

Thought the full match of N°10 isn't 10 but N°10 (and our ETL doesn't support capturing groups, so I can't use /......(d+)/)

question from:https://stackoverflow.com/questions/65904349/match-street-number-from-different-formats-without-suffixes

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

To get the street numbers, you could update the pattern to:

(?<![-/.a-zd])d+

Explanation

  • (?<! Negative lookbehind
    • [-/.a-zd] Match any of the listed using a charater class
  • ) Close the negative lookbehind
  • d+ Match 1+ digits

Regex demo


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...