Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
209 views
in Technique[技术] by (71.8m points)

Removing control characters from a string in python

I currently have the following code

def removeControlCharacters(line):
    i = 0
    for c in line:
        if (c < chr(32)):
            line = line[:i - 1] + line[i+1:]
            i += 1
    return line

This is just does not work if there are more than one character to be deleted.

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

There are hundreds of control characters in unicode. If you are sanitizing data from the web or some other source that might contain non-ascii characters, you will need Python's unicodedata module. The unicodedata.category(…) function returns the unicode category code (e.g., control character, whitespace, letter, etc.) of any character. For control characters, the category always starts with "C".

This snippet removes all control characters from a string.

import unicodedata
def remove_control_characters(s):
    return "".join(ch for ch in s if unicodedata.category(ch)[0]!="C")

Examples of unicode categories:

>>> from unicodedata import category
>>> category('
')      # carriage return --> Cc : control character
'Cc'
>>> category('')      # null character ---> Cc : control character
'Cc'
>>> category('')      # tab --------------> Cc : control character
'Cc'
>>> category(' ')       # space ------------> Zs : separator, space
'Zs'
>>> category(u'u200A') # hair space -------> Zs : separator, space
'Zs'
>>> category(u'u200b') # zero width space -> Cf : control character, formatting
'Cf'
>>> category('A')       # letter "A" -------> Lu : letter, uppercase
'Lu'
>>> category(u'u4e21') # 両 ---------------> Lo : letter, other
'Lo'
>>> category(',')       # comma  -----------> Po : punctuation
'Po'
>>>

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...