(All code examples updated for Swift 3 now.)
Comparing Swift strings with <
does a lexicographical comparison
based on the so-called "Unicode Normalization Form D" (which can be computed with
decomposedStringWithCanonicalMapping
)
For example, the decomposition of
"?" = U+00E4 = LATIN SMALL LETTER A WITH DIAERESIS
is the sequence of two Unicode code points
U+0061,U+0308 = LATIN SMALL LETTER A + COMBINING DIAERESIS
For demonstration purposes, I have written a small String extension which dumps the
contents of the String as an array of Unicode code points:
extension String {
var unicodeData : String {
return self.unicodeScalars.map {
String(format: "%04X", $0.value)
}.joined(separator: ",")
}
}
Now lets take some strings, sort them with <
:
let someStrings = ["?ψ", "?ψ", "?x", "?x"].sorted()
print(someStrings)
// ["a", "?", "?", "?", "?", "b"]
and dump the Unicode code points of each string (in original and decomposed
form) in the sorted array:
for str in someStrings {
print("(str) (str.unicodeData) (str.decomposedStringWithCanonicalMapping.unicodeData)")
}
The output
?x 00E4,0078 0061,0308,0078
?x 01DF,0078 0061,0308,0304,0078
?ψ 01DF,03C8 0061,0308,0304,03C8
?ψ 00E4,03C8 0061,0308,03C8
nicely shows that the comparison is done by a lexicographic ordering of the Unicode
code points in the decomposed form.
This is also true for strings of more than one character, as the following example
shows. With
let someStrings = ["?ψ", "?ψ", "?x", "?x"].sorted()
the output of above loop is
?x 00E4,0078 0061,0308,0078
?x 01DF,0078 0061,0308,0304,0078
?ψ 01DF,03C8 0061,0308,0304,03C8
?ψ 00E4,03C8 0061,0308,03C8
which means that
"?x" < "?x", but "?ψ" > "?ψ"
(which was at least unexpected for me).
Finally let's compare this with a locale-sensitive ordering, for example swedish:
let locale = Locale(identifier: "sv") // svenska
var someStrings = ["?", "?", "?", "a", "?", "b"]
someStrings.sort {
$0.compare($1, locale: locale) == .orderedAscending
}
print(someStrings)
// ["a", "?", "?", "b", "?", "?"]
As you see, the result is different from the Swift <
sorting.