Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
660 views
in Technique[技术] by (71.8m points)

regex - How to remove trailing comments via regexp?

For non-MATLAB-savvy readers: not sure what family they belong to, but the MATLAB regexes are described here in full detail. MATLAB's comment character is % (percent) and its string delimiter is ' (apostrophe). A string delimiter inside a string is written as a double-apostophe ('this is how you write "it''s" in a string.'). To complicate matters more, the matrix transpose operators are also apostrophes (A' (Hermitian) or A.' (regular)).

Now, for dark reasons (that I will not elaborate on :), I'm trying to interpret MATLAB code in MATLAB's own language.

Currently I'm trying to remove all trailing comments in a cell-array of strings, each containing a line of MATLAB code. At first glance, this might seem simple:

>> str = 'simpleCommand(); % simple trailing comment';
>> regexprep(str, '%.*$', '')
ans =
    simpleCommand(); 

But of course, something like this might come along:

>> str = ' fprintf(''%d%*c%3.0f
'', value, args{:}); % Let''s do this! ';
>> regexprep(str, '%.*$', '') 
ans = 
    fprintf('        %//   <-- WRONG!

Obviously, we need to exclude all comment characters that reside inside strings from the match, while also taking into account that a single apostrophe (or a dot-aposrotphe) directly following a statement is an operator, not a string delimiter.

Based on the assumption that the amount of string opening/closing characters before the comment character must be even (which I know is incomplete, because of the matrix-transpose operator), I conjured up the following dynamic regex to handle this sort of case:

>> str = {
       'myFun( {''test'' ''%''}); % let''s '                 
       'sprintf(str, ''%*8.0f%*s%c%3d
''); % it''s '        
       'sprintf(str, ''%*8.0f%*s%c%3d
''); % let''s '       
       'sprintf(str, ''%*8.0f%*s%c%3d
'');  '
       'A = A.'';%tight trailing comment'
   };
>> 
>> C = regexprep(str, '(^.*)(?@mod(sum(1==''''''''),2)==0;)(%.*$)', '$1')

However,

C = 
    'myFun( {'test' '%'}); '              %// sucess
    'sprintf(str, '%*8.0f%*s%c%3d
'); '  %// sucess
    'sprintf(str, '%*8.0f%*s%c%3d
'); '  %// sucess
    'sprintf(str, '%*8.0f%*s%c'           %// FAIL
    'A = A.';'                            %// success (although I'm not sure why)

so I'm almost there, but not quite yet :)

Unfortunately I've exhausted the amount of time I can spend thinking about this and need to continue with other things, so perhaps someone else who has more time is friendly enough to think about these questions:

  1. Are comment characters inside strings the only exception I need to look out for?
  2. What is the correct and/or more efficient way to do this?
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

How do you feel about using undocumented features? If you dont object, you can use the mtree function to parse the code and strip the comments. No regexps involved, and we all know that we shouldn't try to parse context-free grammars using regular expressions.

This function is a full parser of MATLAB code written in pure M-code. As far as I can tell, it is an experimental implementation, but it's already used by Mathworks in a few places (this is the same function used by MATLAB Cody and Contests to measure code length), and can be used for other useful things.

If the input is a cellarray of strings, we do:

>> str = {..};
>> C = deblank(cellfun(@(s) tree2str(mtree(s)), str, 'UniformOutput',false))
C = 
    'myFun( { 'test', '%' } );'
    'sprintf( str, '%*8.0f%*s%c%3d
' );'
    'sprintf( str, '%*8.0f%*s%c%3d
' );'
    'sprintf( str, '%*8.0f%*s%c%3d
' );'
    'A = A.';'

If you already have an M-file stored on disk, you can strip the comments simply as:

s = tree2str(mtree('myfile.m', '-file'))

If you want to see the comments back, add: mtree(.., '-comments')


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...