regex - Match street number from different formats without suffixes

Question

Welcome To Ask or Share your Answers For Others

regex - Match street number from different formats without suffixes

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

regex - Match street number from different formats without suffixes

We've a "street_number" field which has been freely filed over the years that we want to format. Using regular expressions, we'd like to to extract the real "street_number", and the "street_number_suffix".

Ex: 17 b, "street_number" would be 17, and "street_number_suffix" would be b.

As there's a dozen of different patterns, I'm having troubles to tune the regular expression correctly. I consider using 2 different regexes, one to extract the "street_number", and another to extract the "street_number_suffix"

Here's an exhaustive set of patterns we'd like to format and the expected output:

# Extract street_number using PCRE

input           street_number   street_number_suffix

19-21           19              null
2 G             2               G
A               null            A
1 bis           1               bis
3 C             3               C
N°10            10              null
17 b            17              b
76 B            76              B
7 ter           7               ter
9/11            9               null
21.3            21              3
42              42              null

I know I could invoke an expressions that matches any digits until a hyphen using d+(?=-). It could be extended to match until a hyphen OR a slash using d+(?=-|/), thought, once I include s to this pattern, 21 from 19-21 will match. Adding conditions may no be that simple, which is why I ask your help.

Could anyone give me a helping hand on this ? If it can help, here's a draft: https://regex101.com/r/jGK5Sa/4

Edit: at the time I'm editing, here's the closest regex I could find:

(?:(N°|(?<!-|/|.|[a-z]|.{1})))d+

Thought the full match of N°10 isn't 10 but N°10 (and our ETL doesn't support capturing groups, so I can't use /......(d+)/)

question from:https://stackoverflow.com/questions/65904349/match-street-number-from-different-formats-without-suffixes

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:15:08+0000

To get the street numbers, you could update the pattern to:

(?<![-/.a-zd])d+

Explanation

(?<! Negative lookbehind
- [-/.a-zd] Match any of the listed using a charater class
) Close the negative lookbehind
d+ Match 1+ digits

Regex demo

Categories

regex - Match street number from different formats without suffixes

regex - Match street number from different formats without suffixes

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags