Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
861 views
in Technique[技术] by (71.8m points)

regex - How to split long regular expression rules to multiple lines in Python

Is this actually doable? I have some very long regex pattern rules that are hard to understand because they don't fit into the screen at once. Example:

test = re.compile('(?P<full_path>.+):d+:s+warning:s+Members+(?P<member_name>.+)s+((?P<member_type>%s)) of (class|group|namespace)s+(?P<class_name>.+)s+is not documented' % (self.__MEMBER_TYPES), re.IGNORECASE)

Backslash or triple quotes won't work.

EDIT. I ended using the VERBOSE mode. Here's how the regexp pattern looks now:

test = re.compile('''
  (?P<full_path>                                  # Capture a group called full_path
    .+                                            #   It consists of one more characters of any type
  )                                               # Group ends                      
  :                                               # A literal colon
  d+                                             # One or more numbers (line number)
  :                                               # A literal colon
  s+warning:s+parameterssofsmembers+         # An almost static string
  (?P<member_name>                                # Capture a group called member_name
    [                                             #   
      ^:                                          #   Match anything but a colon (so finding a colon ends group)
    ]+                                            #   Match one or more characters
   )                                              # Group ends
   (                                              # Start an unnamed group 
     ::                                           #   Two literal colons
     (?P<function_name>                           #   Start another group called function_name
       w+                                        #     It consists on one or more alphanumeric characters
     )                                            #   End group
   )*                                             # This group is entirely optional and does not apply to C
   s+aresnots(all)sdocumented''',           # And line ends with an almost static string
   re.IGNORECASE|re.VERBOSE)                      # Let's not worry about case, because it seems to differ between Doxygen versions
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can split your regex pattern by quoting each segment. No backslashes needed.

test = re.compile(('(?P<full_path>.+):d+:s+warning:s+Member'
                   's+(?P<member_name>.+)s+((?P<member_type>%s)) '
                   'of (class|group|namespace)s+(?P<class_name>.+)'
                   's+is not documented') % (self.__MEMBER_TYPES), re.IGNORECASE)

You can also use the raw string flag 'r' and you'll have to put it before each segment.

See the docs.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...