For non-MATLAB-savvy readers: not sure what family they belong to, but the MATLAB regexes are described here in full detail. MATLAB's comment character is %
(percent) and its string delimiter is '
(apostrophe). A string delimiter inside a string is written as a double-apostophe ('this is how you write "it''s" in a string.'
). To complicate matters more, the matrix transpose operators are also apostrophes (A'
(Hermitian) or A.'
(regular)).
Now, for dark reasons (that I will not elaborate on :), I'm trying to interpret MATLAB code in MATLAB's own language.
Currently I'm trying to remove all trailing comments in a cell-array of strings, each containing a line of MATLAB code. At first glance, this might seem simple:
>> str = 'simpleCommand(); % simple trailing comment';
>> regexprep(str, '%.*$', '')
ans =
simpleCommand();
But of course, something like this might come along:
>> str = ' fprintf(''%d%*c%3.0f
'', value, args{:}); % Let''s do this! ';
>> regexprep(str, '%.*$', '')
ans =
fprintf(' %// <-- WRONG!
Obviously, we need to exclude all comment characters that reside inside strings from the match, while also taking into account that a single apostrophe (or a dot-aposrotphe) directly following a statement is an operator, not a string delimiter.
Based on the assumption that the amount of string opening/closing characters before the comment character must be even (which I know is incomplete, because of the matrix-transpose operator), I conjured up the following dynamic regex to handle this sort of case:
>> str = {
'myFun( {''test'' ''%''}); % let''s '
'sprintf(str, ''%*8.0f%*s%c%3d
''); % it''s '
'sprintf(str, ''%*8.0f%*s%c%3d
''); % let''s '
'sprintf(str, ''%*8.0f%*s%c%3d
''); '
'A = A.'';%tight trailing comment'
};
>>
>> C = regexprep(str, '(^.*)(?@mod(sum(1==''''''''),2)==0;)(%.*$)', '$1')
However,
C =
'myFun( {'test' '%'}); ' %// sucess
'sprintf(str, '%*8.0f%*s%c%3d
'); ' %// sucess
'sprintf(str, '%*8.0f%*s%c%3d
'); ' %// sucess
'sprintf(str, '%*8.0f%*s%c' %// FAIL
'A = A.';' %// success (although I'm not sure why)
so I'm almost there, but not quite yet :)
Unfortunately I've exhausted the amount of time I can spend thinking about this and need to continue with other things, so perhaps someone else who has more time is friendly enough to think about these questions:
- Are comment characters inside strings the only exception I need to look out for?
- What is the correct and/or more efficient way to do this?
See Question&Answers more detail:
os