Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
288 views
in Technique[技术] by (71.8m points)

php - Replacing variables in a string

I am working on a multilingual website in PHP and in my languages files i often have strings which contain multiple variables that will be later filled in to complete the sentences.

Currently i am placing {VAR_NAME} in the string and manually replacing each occurence with its matching value when used.

So basically :

{X} created a thread on {Y}

becomes :

Dany created a thread on Stack Overflow

I have already thought of sprintf but i find it inconvenient because it depends on the order of the variables which can change from a language to another.

And I have already checked How replace variable in string with value in php? and for now i basically use this method.

But i am interested in knowing if there is a built-in (or maybe not) convenient way in PHP to do that considering that i already have variables named exactly as X and Y in the previous example, more like $$ for a variable variable.

So instead of doing str_replace on the string i would maybe call a function like so :

$X = 'Dany';
$Y = 'Stack Overflow';
$lang['example'] = '{X} created a thread on {Y}';

echo parse($lang['example']);

would also print out :

Dany created a thread on Stack Overflow

Thanks!

Edit

The strings serve as templates and can be used multiple times with different inputs.

So basically doing "{$X} ... {$Y}" won't do the trick because i will lose the template and the string will be initialized with the starting values of $X and $Y which aren't yet determined.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I'm going to add an answer here because none of the current answers really cut the mustard in my view. I'll dive straight in and show you the code I would use to do this:

function parse(
    /* string */ $subject,
    array        $variables,
    /* string */ $escapeChar = '@',
    /* string */ $errPlaceholder = null
) {
    $esc = preg_quote($escapeChar);
    $expr = "/
        $esc$esc(?=$esc*+{)
      | $esc{
      | {(w+)}
    /x";

    $callback = function($match) use($variables, $escapeChar, $errPlaceholder) {
        switch ($match[0]) {
            case $escapeChar . $escapeChar:
                return $escapeChar;

            case $escapeChar . '{':
                return '{';

            default:
                if (isset($variables[$match[1]])) {
                    return $variables[$match[1]];
                }

                return isset($errPlaceholder) ? $errPlaceholder : $match[0];
        }
    };

    return preg_replace_callback($expr, $callback, $subject);
}

What does that do?

In a nutshell:

  • Create a regular expression using the specified escape character that will match one of three sequences (more on that below)
  • Feed that into preg_replace_callback(), where the callback handles two of those sequences exactly and treats everything else as a replacement operation.
  • Return the resulting string

The regex

The regex matches any one of these three sequences:

  • Two occurrences of the escape character, followed by zero or more occurrences of the escape character, followed by an opening curly brace. Only the first two occurrences of the escape character are consumed. This is replaced by a single occurrence of the escape character.
  • A single occurrence of the escape character followed by an opening curly brace. This is replaced by a literal open curly brace.
  • An opening curly brace, followed by one or more perl word characters (alpha-numerics and the underscore character) followed by a closing curly brace. This is treated as a placeholder and a lookup is performed for the name between the braces in the $variables array, if it is found then return the replacement value, if not then return the value of $errPlaceholder - by default this is null, which is treated as a special case and the original placeholder is returned (i.e. the string is not modified).

Why is it better?

To understand why it's better, let's look at the replacement approaches take by other answers. With one exception (the only failing of which is compatibility with PHP<5.4 and slightly non-obvious behaviour), these fall into two categories:

  • strtr() - This provides no mechanism for handling an escape character. What if your input string needs a literal {X} in it? strtr() does not account for this, and it would be substituted for the value $X.
  • str_replace() - this suffers from the same issue as strtr(), and another problem as well. When you call str_replace() with an array argument for the search/replace arguments, it behaves as if you had called it multiple times - one for each of the array of replacement pairs. This means that if one of your replacement strings contains a value that appears later in the search array, you will end up substituting that as well.

To demonstrate this issue with str_replace(), consider the following code:

$pairs = array('A' => 'B', 'B' => 'C');
echo str_replace(array_keys($pairs), array_values($pairs), 'AB');

Now, you'd probably expect the output here to be BC but it will actually be CC (demo) - this is because the first iteration replaced A with B, and in the second iteration the subject string was BB - so both of these occurrences of B were replaced with C.

This issue also betrays a performance consideration that might not be immediately obvious - because each pair is handled separately, the operation is O(n), for each replacement pair the entire string is searched and the single replacement operation handled. If you had a very large subject string and a lot of replacement pairs, that's a sizeable operation going on under the bonnet.

Arguably this performance consideration is a non-issue - you would need a very large string and a lot of replacement pairs before you got a meaningful slowdown, but it's still worth remembering. It's also worth remembering that regex has performance penalties of its own, so in general this consideration shouldn't be included in the decision-making process.

Instead we use preg_replace_callback(). This visits any given part of the string looking for matches exactly once, within the bounds of the supplied regular expression. I add this qualifier because if you write an expression that causes catastrophic backtracking then it will be considerably more than once, but in this case that shouldn't be a problem (to help avoid this I made the only repetition in the expression possessive).

We use preg_replace_callback() instead of preg_replace() to allow us to apply custom logic while looking for the replacement string.

What this allows you to do

The original example from the question

$X = 'Dany';
$Y = 'Stack Overflow';
$lang['example'] = '{X} created a thread on {Y}';

echo parse($lang['example']);

This becomes:

$pairs = array(
    'X' = 'Dany',
    'Y' = 'Stack Overflow',
);

$lang['example'] = '{X} created a thread on {Y}';

echo parse($lang['example'], $pairs);
// Dany created a thread on Stack Overflow

Something more advanced

Now let's say we have:

$lang['example'] = '{X} created a thread on {Y} and it contained {X}';
// Dany created a thread on Stack Overflow and it contained Dany

...and we want the second {X} to appear literally in the resulting string. Using the default escape character of @, we would change it to:

$lang['example'] = '{X} created a thread on {Y} and it contained @{X}';
// Dany created a thread on Stack Overflow and it contained {X}

OK, looks good so far. But what if that @ was supposed to be a literal?

$lang['example'] = '{X} created a thread on {Y} and it contained @@{X}';
// Dany created a thread on Stack Overflow and it contained @Dany

Note that the regular expression has been designed to only pay attention to escape sequences that immediately precede an opening curly brace. This means that you don't need to escape the escape character unless it appears immediately in front of a placeholder.

A note about the use of an array as an argument

Your original code sample uses variables named the same way as the placeholders in the string. Mine uses an array with named keys. There are two very good reasons for this:

  1. Clarity and security - it's much easier to see what will end up being substituted, and you don't risk accidentally substituting variables you don't want to be exposed. It wouldn't be much good if someone could simply feed in {dbPass} and see your database password, now would it?
  2. Scope - it's not possible to import variables from the calling scope unless the caller is the global scope. This makes the function useless if called from another function, and importing data from another scope is very bad practice.

If you really want to use named variables from the current scope (and I do not recommend this due to the aforementioned security issues) you can pass the result of a call to get_defined_vars() to the second argument.

A note about choosing an escape character

You'll notice I chose @ as the default escape character. You can use any character (or sequence of characters, it can be more than one) by passing it to the third argument - and you may be tempted to use since that's what many languages use, but hold on before you do that.

The reason you don't want to use is because many languages use it as their own escape character, which means that when you want to specify your escape character in, say, a PHP string literal, you run into this problem:

$lang['example'] = '\{X}';   // results in {X}
$lang['example'] = '\{X}';  // results in Dany
$lang['example'] = '\\{X}'; // results in Dany

It can lead to a readability nightmare, and some non-obvious behaviour with complex patterns. Pick an escape character that is not used by any other language involved (for example, if you are using this technique to generate fragments of HTML, don't use & as an escape character either).

To sum up

What you are doing has edge-cases. To solve the problem properly, you need to use a tool capable of handling those edge-cases - and when it comes to string manipulation, the tool for the job is most often regex.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...