Update for Swift 4 (Xcode 9)
As of Swift 4 (tested with Xcode 9 beta) a "Emoji ZWJ Sequence" is
treated as a single Character
as mandated by the Unicode 9 standard:
let str = "?????????????"
print(str.count) // 2
print(Array(str)) // ["???????????", "??"]
Also String
is a collection of its characters (again), so we can
call str.count
to get the length, and Array(str)
to get all
characters as an array.
(Old answer for Swift 3 and earlier)
This is only a partial answer which may help in this particular case.
"???????????" is indeed a combination of four separate characters:
let str = "?????????????" //
print(Array(str.characters))
// Output: ["???", "???", "???", "??", "??"]
which are glued together with U+200D (ZERO WIDTH JOINER):
for c in str.unicodeScalars {
print(String(c.value, radix: 16))
}
/* Output:
1f468
200d
1f468
200d
1f467
200d
1f467
1f60d
*/
Enumerating the string with the .ByComposedCharacterSequences
options combines these characters correctly:
var chars : [String] = []
str.enumerateSubstringsInRange(str.characters.indices, options: .ByComposedCharacterSequences) {
(substring, _, _, _) -> () in
chars.append(substring!)
}
print(chars)
// Output: ["???????????", "??"]
But there are other cases where this does not work,
e.g. the "flags" which are a sequence of "Regional Indicator
characters" (compare Swift countElements() return incorrect value when count flag emoji). With
let str = "????"
the result of the above loop is
["??", "??"]
which is not the desired result.
The full rules are defined in "3 Grapheme Cluster Boundaries"
in the "Standard Annex #29 UNICODE TEXT SEGMENTATION" in the
Unicode standard.