Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
189 views
in Technique[技术] by (71.8m points)

php - Converting indentation with preg_replace (no callback)

I have some XML chunk returned by DOMDocument::saveXML(). It's already pretty indented, with two spaces per level, like so:

<?xml version="1.0"?>
<root>
  <error>
    <a>eee</a>
    <b>sd</b>
  </error>
</root>

As it's not possible to configure DOMDocument (AFAIK) about the indentation character(s), I thought it's possible to run a regular expression and change the indentation by replacing all two-space-pairs into a tab. This can be done with a callback function (Demo):

$xml_string = $doc->saveXML();
function callback($m)
{
    $spaces = strlen($m[0]);
    $tabs = $spaces / 2;
    return str_repeat("", $tabs);
}
$xml_string = preg_replace_callback('/^(?:[ ]{2})+/um', 'callback', $xml_string);

I'm now wondering if it's possible to do this w/o a callback function (and without the e-modifier (EVAL)). Any regex wizards with an idea?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use G:

preg_replace('/^  |G  /m', "", $string);

Did some benchmarks and got following results on Win32 with PHP 5.2 and 5.4:

>php -v
PHP 5.2.17 (cli) (built: Jan  6 2011 17:28:41)
Copyright (c) 1997-2010 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2010 Zend Technologies

>php -n test.php
XML length: 21100
Iterations: 1000
callback: 2.3627231121063
G:       1.4221360683441
while:    3.0971200466156
/e:       7.8781840801239


>php -v
PHP 5.4.0 (cli) (built: Feb 29 2012 19:06:50)
Copyright (c) 1997-2012 The PHP Group
Zend Engine v2.4.0, Copyright (c) 1998-2012 Zend Technologies

>php -n test.php
XML length: 21100
Iterations: 1000
callback: 1.3771259784698
G:       1.4414191246033
while:    2.7389969825745
/e:       5.5516891479492

Surprising that callback is faster than than G in PHP 5.4 (altho that seems to depend on the data, G is faster in some other cases).

For G /^ |G /m is used, and is a bit faster than /(?:^|G) /m. /(?>^|G) /m is even slower than /(?:^|G) /m. /u, /S, /X switches didn't affect G performance noticeably.

The while replace is fastest if depth is low (up to about 4 indentations, 8 spaces, in my test), but then gets slower as the depth increases.

The following code was used:

<?php

$base_iter = 1000;

$xml_string = str_repeat(<<<_STR_
<?xml version="1.0"?>
<root>
  <error>
    <a>  eee  </a>
    <b>  sd    </b>         
    <c>
            deep
                deeper  still
                    deepest  !
    </c>
  </error>
</root>
_STR_
, 100);


//*** while ***

$re = '%# Match leading spaces following leading tabs.
    ^                     # Anchor to start of line.
    (*)                 # $1: Preserve any/all leading tabs.
    [ ]{2}                # Match "n" spaces.
    %mx';

function conv_indent_while($xml_string) {
    global $re;

    while(preg_match($re, $xml_string))
        $xml_string = preg_replace($re, "$1", $xml_string);

    return $xml_string;
}


//*** G ****

function conv_indent_g($string){
    return preg_replace('/^  |G  /m', "", $string);
}


//*** callback ***

function callback($m)
{
    $spaces = strlen($m[0]);
    $tabs = $spaces / 2;
    return str_repeat("", $tabs);
}
function conv_indent_callback($str){
    return preg_replace_callback('/^(?:[ ]{2})+/m', 'callback', $str);
}


//*** callback /e *** 

function conv_indent_e($str){
    return preg_replace('/^(?:  )+/me', 'str_repeat("", strlen("$0")/2)', $str);
}



//*** tests

function test2() {
    global $base_iter;
    global $xml_string;
    $t = microtime(true);

    for($i = 0; $i < $base_iter; ++$i){
        $s = conv_indent_while($xml_string);
        if(strlen($s) >= strlen($xml_string))
            exit("strlen invalid 2");
    }

    return (microtime(true) - $t);
}

function test1() {
    global $base_iter;
    global $xml_string;
    $t = microtime(true);

    for($i = 0; $i < $base_iter; ++$i){
        $s = conv_indent_g($xml_string);
        if(strlen($s) >= strlen($xml_string))
            exit("strlen invalid 1");
    }

    return (microtime(true) - $t);
}

function test0(){
    global $base_iter;
    global $xml_string;
    $t = microtime(true);

    for($i = 0; $i < $base_iter; ++$i){     
        $s = conv_indent_callback($xml_string);
        if(strlen($s) >= strlen($xml_string))
            exit("strlen invalid 0");
    }

    return (microtime(true) - $t);
}


function test3(){
    global $base_iter;
    global $xml_string;
    $t = microtime(true);

    for($i = 0; $i < $base_iter; ++$i){     
        $s = conv_indent_e($xml_string);
        if(strlen($s) >= strlen($xml_string))
            exit("strlen invalid 02");
    }

    return (microtime(true) - $t);
}



echo 'XML length: ' . strlen($xml_string) . "
";
echo 'Iterations: ' . $base_iter . "
";

echo 'callback: ' . test0() . "
";
echo 'G:       ' . test1() . "
";
echo 'while:    ' . test2() . "
";
echo '/e:       ' . test3() . "
";


?>

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...