This handles C++-style comments, C-style comments, strings and simple nesting thereof.
def comment_remover(text):
def replacer(match):
s = match.group(0)
if s.startswith('/'):
return " " # note: a space and not an empty string
else:
return s
pattern = re.compile(
r'//.*?$|/*.*?*/|'(?:\.|[^\'])*'|"(?:\.|[^"])*"',
re.DOTALL | re.MULTILINE
)
return re.sub(pattern, replacer, text)
Strings needs to be included, because comment-markers inside them does not start a comment.
Edit: re.sub didn't take any flags, so had to compile the pattern first.
Edit2: Added character literals, since they could contain quotes that would otherwise be recognized as string delimiters.
Edit3: Fixed the case where a legal expression int/**/x=5;
would become intx=5;
which would not compile, by replacing the comment with a space rather then an empty string.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…