Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
300 views
in Technique[技术] by (71.8m points)

What does it mean that string and character comparisons in Swift are not locale-sensitive?

I started learning Swift language and I am very curious What does it mean that string and character comparisons in Swift are not locale-sensitive? Does it mean that all the characters are stored in Swift like UTF-8 characters?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

(All code examples updated for Swift 3 now.)

Comparing Swift strings with < does a lexicographical comparison based on the so-called "Unicode Normalization Form D" (which can be computed with decomposedStringWithCanonicalMapping)

For example, the decomposition of

"?" = U+00E4 = LATIN SMALL LETTER A WITH DIAERESIS

is the sequence of two Unicode code points

U+0061,U+0308 = LATIN SMALL LETTER A + COMBINING DIAERESIS

For demonstration purposes, I have written a small String extension which dumps the contents of the String as an array of Unicode code points:

extension String {
    var unicodeData : String {
        return self.unicodeScalars.map {
            String(format: "%04X", $0.value)
            }.joined(separator: ",")
    }
}

Now lets take some strings, sort them with <:

let someStrings = ["?ψ", "?ψ", "?x", "?x"].sorted()
print(someStrings)
// ["a", "?", "?", "?", "?", "b"]

and dump the Unicode code points of each string (in original and decomposed form) in the sorted array:

for str in someStrings {
    print("(str)  (str.unicodeData)  (str.decomposedStringWithCanonicalMapping.unicodeData)")
}

The output

?x  00E4,0078  0061,0308,0078
?x  01DF,0078  0061,0308,0304,0078
?ψ  01DF,03C8  0061,0308,0304,03C8
?ψ  00E4,03C8  0061,0308,03C8

nicely shows that the comparison is done by a lexicographic ordering of the Unicode code points in the decomposed form.

This is also true for strings of more than one character, as the following example shows. With

let someStrings = ["?ψ", "?ψ", "?x", "?x"].sorted()

the output of above loop is

?x  00E4,0078  0061,0308,0078
?x  01DF,0078  0061,0308,0304,0078
?ψ  01DF,03C8  0061,0308,0304,03C8
?ψ  00E4,03C8  0061,0308,03C8

which means that

"?x" < "?x", but "?ψ" > "?ψ"

(which was at least unexpected for me).

Finally let's compare this with a locale-sensitive ordering, for example swedish:

let locale = Locale(identifier: "sv") // svenska
var someStrings = ["?", "?", "?", "a", "?", "b"]
someStrings.sort {
    $0.compare($1, locale: locale) == .orderedAscending
}

print(someStrings)
// ["a", "?", "?", "b", "?", "?"]

As you see, the result is different from the Swift < sorting.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...