Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
240 views
in Technique[技术] by (71.8m points)

php - Find first character that is different between two strings

Given two equal-length strings, is there an elegant way to get the offset of the first different character?

The obvious solution would be:

for ($offset = 0; $offset < $length; ++$offset) {
    if ($str1[$offset] !== $str2[$offset]) {
        return $offset;
    }
}

But that doesn't look quite right, for such a simple task.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use a nice property of bitwise XOR (^) to achieve this: Basically, when you xor two strings together, the characters that are the same will become null bytes (""). So if we xor the two strings, we just need to find the position of the first non-null byte using strspn:

$position = strspn($string1 ^ $string2, "");

That's all there is to it. So let's look at an example:

$string1 = 'foobarbaz';
$string2 = 'foobarbiz';
$pos = strspn($string1 ^ $string2, "");

printf(
    'First difference at position %d: "%s" vs "%s"',
    $pos, $string1[$pos], $string2[$pos]
);

That will output:

First difference at position 7: "a" vs "i"

So that should do it. It's very efficient since it's only using C functions, and requires only a single copy of memory of the string.

Edit: A MultiByte Solution Along The Same Lines:

function getCharacterOffsetOfDifference($str1, $str2, $encoding = 'UTF-8') {
    return mb_strlen(
        mb_strcut(
            $str1,
            0, strspn($str1 ^ $str2, ""),
            $encoding
        ),
        $encoding
    );
}

First the difference at the byte level is found using the above method and then the offset is mapped to the character level. This is done using the mb_strcut function, which is basically substr but honoring multibyte character boundaries.

var_dump(getCharacterOffsetOfDifference('foo', 'foa')); // 2
var_dump(getCharacterOffsetOfDifference('?oo', 'foa')); // 0
var_dump(getCharacterOffsetOfDifference('f?o', 'faa')); // 1

It's not as elegant as the first solution, but it's still a one-liner (and if you use the default encoding a little bit simpler):

return mb_strlen(mb_strcut($str1, 0, strspn($str1 ^ $str2, "")));

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...