Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

regex - Python re.sub multiline on string

I try to use the flag re.MULTILINE.

I read these posts : Bug in Python Regex? (re.sub with re.MULTILINE), Python re.sub MULTILINE caret match but it doesn't work. The code :

import re
if __name__ == '__main__':

    txt = "

<?php

/* Multi-line

comment */

$var = 1;
"
    new_txt = re.sub(r'/*[.
]*?*/', '', txt, flags=re.MULTILINE)
    print("
=========== TXT ============")
    print(txt)
    print("
=========== NEW TXT ============")
    print(new_txt)

The code output :

=========== TXT ============

<?php
/* Multi-line
comment */
$var = 1;


=========== NEW TXT ============

<?php
/* Multi-line
comment */
$var = 1;

But new_txt should not contains Multi-line comment. I want to get the txt without the Multi-line comment. Do you have any idea ?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You need to replace re.MULTILINE with re.DOTALL/re.S and move out period outside the character class as inside it, the dot matches a literal ..

Note that re.MULTILINE only redefines the behavior of ^ and $ that are forced to match at the start/end of a line rather than the whole string. The re.DOTALL flag redefines the behavior of . inside the pattern outside the character class only. It starts matching a newline symbol, too.

So, the regex you could use for the current example: /*.*?*/. It matches a literal /* with /*, then .*? matches as few any symbols as possible up to and including */ (matched with */).

See the code demo:

txt = """

<?php

/* Multi-line

comment */

$var = 1;
"""
new_txt = re.sub(r'/*.*?*/', '', txt, flags=re.S)
print("
=========== TXT ============")
print(txt)
print("
=========== NEW TXT ============")
print(new_txt)

See IDEONE demo

However, it is not the best solution, as in most cases multiline comments are very long. The best is an unrolling-the-loop technique. The regex above can be "unrolled" like this:

/*[^*]*(?:*(?!/)[^*]*)**/

See the regex demo


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...