Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
476 views
in Technique[技术] by (71.8m points)

regex - Regular Expression to extract php code partially (( array definition ))

I have php code stored (( array definition )) in a string like this

$code=' array(

  0  => "a",
 "a" => $GlobalScopeVar,
 "b" => array("nested"=>array(1,2,3)),  
 "c" => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; },

); ';

there is a regular expression to extract this array??, i mean i want something like

$array=(  

  0  => '"a"',
 'a' => '$GlobalScopeVar',
 'b' => 'array("nested"=>array(1,2,3))',
 'c' => 'function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }',

);

pD :: i do research trying to find a regular expression but nothing was found.
pD2 :: gods of stackoverflow, let me bounty this now and i will offer 400 :3
pD3 :: this will be used in a internal app, where i need extract an array of some php file to be 'processed' in parts, i try explain with this codepad.org/td6LVVme

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Regex

So here's the MEGA regex I came up with:

s*                                     # white spaces
########################## KEYS START ##########################
(?:                                     # We'll use this to make keys optional
(?P<keys>                               # named group: keys
d+                                     # match digits
|                                       # or
"(?(?=")..|[^"])*"                  # match string between "", works even 4 escaped ones "hello " world"
|                                       # or
'(?(?=\\')..|[^'])*'              # match string between '', same as above :p
|                                       # or
$w+(?:[(?:[^[]]|(?R))*])*          # match variables $_POST, $var, $var["foo"], $var["foo"]["bar"], $foo[$bar["fail"]]
)                                       # close group: keys
########################## KEYS END ##########################
s*                                     # white spaces
=>                                      # match =>
)?                                      # make keys optional
s*                                     # white spaces
########################## VALUES START ##########################
(?P<values>                             # named group: values
d+                                     # match digits
|                                       # or
"(?(?=")..|[^"])*"                  # match string between "", works even 4 escaped ones "hello " world"
|                                       # or
'(?(?=\\')..|[^'])*'              # match string between '', same as above :p
|                                       # or
$w+(?:[(?:[^[]]|(?R))*])*          # match variables $_POST, $var, $var["foo"], $var["foo"]["bar"], $foo[$bar["fail"]]
|                                       # or
arrays*((?:[^()]|(?R))*)             # match an array()
|                                       # or
[(?:[^[]]|(?R))*]                    # match an array, new PHP array syntax: [1, 3, 5] is the same as array(1,3,5)
|                                       # or
(?:functions+)?w+s*                  # match functions: helloWorld, function name
(?:((?:[^()]|(?R))*))                 # match function parameters (wut), (), (array(1,2,4))
(?:(?:s*uses*((?:[^()]|(?R))*)s*)? # match use(&$var), use($foo, $bar) (optionally)
{(?:[^{}]|(?R))*}                     # match { whatever}
)?;?                                    # match ; (optionally)
)                                       # close group: values
########################## VALUES END ##########################
s*                                     # white spaces

I've put some comments, note that you need to use 3 modifiers:
x : let's me make comments s : match newlines with dots i : match case insensitive

PHP

$code='array(0  => "a", 123 => 123, $_POST["hello"]['world'] => array("is", "actually", "An array !"), 1234, 'got problem ?', 
 "a" => $GlobalScopeVar, $test_further => function test($noway){echo "this works too !!!";}, "yellow" => "blue",
 "b" => array("nested"=>array(1,2,3), "nested"=>array(1,2,3),"nested"=>array(1,2,3)), "c" => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }
  "bug", "fixed", "mwahahahaa" => "Yeaaaah"
);'; // Sample data

$code = preg_replace('#(^s*arrays*(s*)|(s*)s*;?s*$)#s', '', $code); // Just to get ride of array( at the beginning, and ); at the end

preg_match_all('~
s*                                     # white spaces
########################## KEYS START ##########################
(?:                                     # We'll use this to make keys optional
(?P<keys>                               # named group: keys
d+                                     # match digits
|                                       # or
"(?(?=")..|[^"])*"                  # match string between "", works even 4 escaped ones "hello " world"
|                                       # or
'(?(?=\\')..|[^'])*'              # match string between '', same as above :p
|                                       # or
$w+(?:[(?:[^[]]|(?R))*])*          # match variables $_POST, $var, $var["foo"], $var["foo"]["bar"], $foo[$bar["fail"]]
)                                       # close group: keys
########################## KEYS END ##########################
s*                                     # white spaces
=>                                      # match =>
)?                                      # make keys optional
s*                                     # white spaces
########################## VALUES START ##########################
(?P<values>                             # named group: values
d+                                     # match digits
|                                       # or
"(?(?=")..|[^"])*"                  # match string between "", works even 4 escaped ones "hello " world"
|                                       # or
'(?(?=\\')..|[^'])*'              # match string between '', same as above :p
|                                       # or
$w+(?:[(?:[^[]]|(?R))*])*          # match variables $_POST, $var, $var["foo"], $var["foo"]["bar"], $foo[$bar["fail"]]
|                                       # or
arrays*((?:[^()]|(?R))*)             # match an array()
|                                       # or
[(?:[^[]]|(?R))*]                    # match an array, new PHP array syntax: [1, 3, 5] is the same as array(1,3,5)
|                                       # or
(?:functions+)?w+s*                  # match functions: helloWorld, function name
(?:((?:[^()]|(?R))*))                 # match function parameters (wut), (), (array(1,2,4))
(?:(?:s*uses*((?:[^()]|(?R))*)s*)? # match use(&$var), use($foo, $bar) (optionally)
{(?:[^{}]|(?R))*}                     # match { whatever}
)?;?                                    # match ; (optionally)
)                                       # close group: values
########################## VALUES END ##########################
s*                                     # white spaces
~xsi', $code, $m); // Matching :p

print_r($m['keys']); // Print keys
print_r($m['values']); // Print values


// Since some keys may be empty in case you didn't specify them in the array, let's fill them up !
foreach($m['keys'] as $index => &$key){
    if($key === ''){
        $key = 'made_up_index_'.$index;
    }
}
$results = array_combine($m['keys'], $m['values']);
print_r($results); // printing results

Output

Array
(
    [0] => 0
    [1] => 123
    [2] => $_POST["hello"]['world']
    [3] => 
    [4] => 
    [5] => "a"
    [6] => $test_further
    [7] => "yellow"
    [8] => "b"
    [9] => "c"
    [10] => 
    [11] => 
    [12] => "mwahahahaa"
    [13] => "this is"
)
Array
(
    [0] => "a"
    [1] => 123
    [2] => array("is", "actually", "An array !")
    [3] => 1234
    [4] => 'got problem ?'
    [5] => $GlobalScopeVar
    [6] => function test($noway){echo "this works too !!!";}
    [7] => "blue"
    [8] => array("nested"=>array(1,2,3), "nested"=>array(1,2,3),"nested"=>array(1,2,3))
    [9] => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }
    [10] => "bug"
    [11] => "fixed"
    [12] => "Yeaaaah"
    [13] => "a test"
)
Array
(
    [0] => "a"
    [123] => 123
    [$_POST["hello"]['world']] => array("is", "actually", "An array !")
    [made_up_index_3] => 1234
    [made_up_index_4] => 'got problem ?'
    ["a"] => $GlobalScopeVar
    [$test_further] => function test($noway){echo "this works too !!!";}
    ["yellow"] => "blue"
    ["b"] => array("nested"=>array(1,2,3), "nested"=>array(1,2,3),"nested"=>array(1,2,3))
    ["c"] => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }
    [made_up_index_10] => "bug"
    [made_up_index_11] => "fixed"
    ["mwahahahaa"] => "Yeaaaah"
    ["this is"] => "a test"
)

                                   Online regex demo                                     Online php demo

Known bug (fixed)

    $code='array("aaa", "sdsd" => "dsdsd");'; // fail
    $code='array('aaa', 'sdsd' => "dsdsd");'; // fail
    $code='array("aaa", 'sdsd' => "dsdsd");'; // succeed
    // Which means, if a value with no keys is followed
    // by key => value and they are using the same quotation
    // then it will fail (first value gets merged with the key)

Online bug demo

Credits

Goes to Bart Kiers for his recursive pattern to match nested brackets.

Advice

You maybe should go with a parser since regexes are sensitive. @bwoebi has done a great job in his answer.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...