Since Apple screwed up emoji (actually Unicode planes above 0) this becomes difficult. It seems it is necessary to enumerate through the composed character to get the actual length.
Note: The NSString
method length
does not return the number of characters but the number of code units (not characters) in unichars. See NSString and Unicode - Strings - objc.io issue #9.
Example code:
NSString *text = @"qqq??rrr";
int maxCharacters = 4;
__block NSInteger unicharCount = 0;
__block NSInteger charCount = 0;
[text enumerateSubstringsInRange:NSMakeRange(0, text.length)
options:NSStringEnumerationByComposedCharacterSequences
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop) {
unicharCount += substringRange.length;
if (++charCount >= maxCharacters)
*stop = YES;
}];
NSString *textStart = [text substringToIndex: unicharCount];
NSLog(@"textStart: '%@'", textStart);
textStart: 'qqq??'
An alternative approach is to use utf32 encoding:
int byteCount = maxCharacters*4; // 4 utf32 characters
char buffer[byteCount];
NSUInteger usedBufferCount;
[text getBytes:buffer maxLength:byteCount usedLength:&usedBufferCount encoding:NSUTF32StringEncoding options:0 range:NSMakeRange(0, text.length) remainingRange:NULL];
NSString * textStart = [[NSString alloc] initWithBytes:buffer length:usedBufferCount encoding:NSUTF32LittleEndianStringEncoding];
There is some rational for this in Session 128 - Advance Text Processing from 2011 WWDC.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…