If you want the number of bytes in a string, this function should do it for you pretty solidly.
def utf8len(s):
return len(s.encode('utf-8'))
The reason you got weird numbers is because encapsulated in a string is a bunch of other information due to the fact that strings are actual objects in python.
Its interesting because if you look at my solution to encode the string into 'utf-8', there's an 'encode' method on the 's' object (which is a string). Well, it needs to be stored somewhere right? Hence, the higher than normal byte count. Its including that method, along with a few others :).
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…