I must handle an already existing custom markup language (which is ugly, but unfortunately can not be altered because I'm handling legacy data and it needs to stay compatible with a legacy app).
I need to parse command "ranges", and depending on the action taken by the user either replace these "ranges" in the data with something else (HTML or LaTeX code) or entirely remove these "ranges" from the input.
My current solution solution is using preg_replace_callback()
in a loop until there are no matches left, but it is utterly slow for huge documents. (i.e. ~7 seconds for 394 replacements in a 57 KB document)
Recursive regular expressions don't seem to be flexible enough for this task, as i need to access all matches, even in recursion.
Question: How could i improve the performance of my parsing?
Regular expressions may be completely removed - they are not a requirement but the only thing i could come up with.
Note: The code example below is heavily reduced. (SSCCE) Actually there are many different "types" of ranges and the closure function does different things depending on the mode of operation. (insert values from DB, remove entire ranges, convert to another format, etc..) Please keep this in mind!
Example of what I'm currently doing:
<?php
$data = <<<EOF
some text 1
begin-command
some text 2
begin-command
some text 3
command-end
some text 4
begin-command-if "%VAR%" == "value"
some text 5
begin-command
some text 6
command-end
command-end
command-end
EOF;
$regex = '~
# opening tag
begin-(?P<type>command(?:-if)?)
# must not contain a nested "command" or "command-if" command!
(?!.*begin-command(?:-if)?.*command(?:-if)?-end)
# the parameters for "command-if" are optional
(?:
[s
]*?
(?:")[s
]*(?P<leftvalue>[^\\]*?)[s
]*(?:")
[s
]*
# the operator is optional
(?P<operator>[=<>!]*)
[s
]*
(?:")[s
]*(?P<rightvalue>[^\\]*?)[s
]*(?:")
[s
]*?
)?
# the real content
(?P<content>.*?)
# closing tag
command(?:-if)?-end
~smx';
$counter = 0;
$loop_replace = true;
while ($loop_replace) {
$data = preg_replace_callback($regex, function ($matches) use ($counter) {
global $counter;
$counter++;
return "<command id='{$counter}'>{$matches['content']}</command>";
}, $data, -1, $loop_replace);
}
echo $data;
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…