Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
387 views
in Technique[技术] by (71.8m points)

ios - Swift countElements() return incorrect value when count flag emoji

let str1 = "????????????????????"
let str2 = "????.????.????.????.????."

println("(countElements(str1)), (countElements(str2))")

Result: 1, 10

But should not str1 have 5 elements?

The bug seems only occurred when I use the flag emoji.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Update for Swift 4 (Xcode 9)

As of Swift 4 (tested with Xcode 9 beta) grapheme clusters break after every second regional indicator symbol, as mandated by the Unicode 9 standard:

let str1 = "????????????????????"
print(str1.count) // 5
print(Array(str1)) // ["????", "????", "????", "????", "????"]

Also String is a collection of its characters (again), so one can obtain the character count with str1.count.


(Old answer for Swift 3 and older:)

From "3 Grapheme Cluster Boundaries" in the "Standard Annex #29 UNICODE TEXT SEGMENTATION": (emphasis added):

A legacy grapheme cluster is defined as a base (such as A or カ) followed by zero or more continuing characters. One way to think of this is as a sequence of characters that form a “stack”.

The base can be single characters, or be any sequence of Hangul Jamo characters that form a Hangul Syllable, as defined by D133 in The Unicode Standard, or be any sequence of Regional_Indicator (RI) characters. The RI characters are used in pairs to denote Emoji national flag symbols corresponding to ISO country codes. Sequences of more than two RI characters should be separated by other characters, such as U+200B ZWSP.

(Thanks to @rintaro for the link).

A Swift Character represents an extended grapheme cluster, so it is (according to this reference) correct that any sequence of regional indicator symbols is counted as a single character.

You can separate the "flags" by a ZERO WIDTH NON-JOINER:

let str1 = "????u{200C}????"
print(str1.characters.count) // 2

or insert a ZERO WIDTH SPACE:

let str2 = "????u{200B}????"
print(str2.characters.count) // 3

This solves also possible ambiguities, e.g. should "???????????" be "??????????" or "?????????" ?

See also How to know if two emojis will be displayed as one emoji? about a possible method to count the number of "composed characters" in a Swift string, which would return 5 for your let str1 = "????????????????????".


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...