don't use regexps for parsing formal languages - you'll always run into haystacks you did not anticipate. like:
<?
$bla = '?> now what? <?';
it's safer to use a processor that knows about the structure of the language. for html, that would be a xml processor; for php, the built-in tokenizer extension. it has the T_OPEN_TAG
parser token, which matches <?php
, <?
or <%
, and T_OPEN_TAG_WITH_ECHO
, which matches <?=
or <%=
. to replace all short open tags, you find all these tokens and replace T_OPEN_TAG
with <?php
and T_OPEN_TAG_WITH_ECHO
with <?php echo
.
the implementation is left as an exercise for the reader :)
EDIT 1: ringmaster was so kind to provide one.
EDIT 2: on systems with short_open_tag
turned off in php.ini
, <?
, <%
, and <?=
won't be recognized by a replacement script. to make the script work on such systems, enable short_open_tag
via command line option:
php -d short_open_tag=On short_open_tag_replacement_script.php
p.s. the man page for token_get_all() and googleing for creative combinations of tokenizer, token_get_all, and the parser token names might help.
p.p.s. see also Regex to parse define() contents, possible? here on SO
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…