Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
301 views
in Technique[技术] by (71.8m points)

python 3.x - Regex to match words and those with an apostrophe

Update: As per comments regarding the ambiguity of my question, I've increased the detail in the question.

(Terminology: by words I am refering to any succession of alphanumerical characters.)

I'm looking for a regex to match the following, verbatim:

  • Words.
  • Words with one apostrophe at the beginning.
  • Words with any number of non-contiguous apostrophe throughout the middle.
  • Words with one apostrophe at the end.

I would like to match the following, however not verbatim, rather, removing the apostrophes:

  • Words with an apostrophe at the beginning and at the end would be matched to the word, without the apostrophes. So 'foo' would be matched to foo.
  • Words with more than one contiguous apostrophe in the middle would be resolved to two different words: the fragment before the contiguous apostrophes and the fragment after the contiguous apostrophes. So, foo''bar would be matched to foo and bar.
  • Words with more than one contiguous apostrophe at the beginning or at the end would be matched to the word, without the apostrophes. So, ''foo would be matched to foo and ''foo'' to foo.

Examples These would be matched verbatim:

  • 'bout
  • it's
  • persons'

But these would be ignored:

  • '
  • ''

And, for 'open', open would be matched.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Try using this:

(?=.*w)^(w|')+$

'bout     # pass
it's      # pass
persons'  # pass
'         # fail
''        # fail

Regex Explanation

NODE      EXPLANATION
  (?=       look ahead to see if there is:
    .*        any character except 
 (0 or more times
              (matching the most amount possible))
    w        word characters (a-z, A-Z, 0-9, _)
  )         end of look-ahead
  ^         the beginning of the string
  (         group and capture to 1 (1 or more times
            (matching the most amount possible)):
    w        word characters (a-z, A-Z, 0-9, _)
   |         OR
    '         '''
  )+        end of 1 (NOTE: because you're using a
            quantifier on this capture, only the LAST
            repetition of the captured pattern will be
            stored in 1)
  $         before an optional 
, and the end of the
            string

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...