Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
209 views
in Technique[技术] by (71.8m points)

python - Removing all Emojis from Text

This question has been asked here Python : How to remove all emojis Without a solution, I have as step towards the solution. But need help finishing it off.

I went and got all the emoji hex code points from the emoji site: https://www.unicode.org/emoji/charts/emoji-ordering.txt

I then read in the file like so:

file = open('emoji-ordering.txt')
temp = file.readline()

final_list = []

while temp != '':
    #print(temp)
    if not temp[0] == '#' :
            utf_8_values = ((temp.split(';')[0]).rstrip()).split(' ')
            values = ["u"+(word[0]+((8 - len(word[2:]))*'0' + word[2:]).rstrip()) for word in utf_8_values]
            #print(values[0])
            final_list = final_list + values
    temp = file.readline()

print(final_list)

I hoped this would give me unicode literals. It does not, my goal is to get unicode literals so I can use part of the solution from the last question and be able to exclude all emojis. Any ideas what we need to get a solution?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

First install emoji:

pip install emoji

or

pip3 install emoji

So do this:

import emoji

def give_emoji_free_text(self, text):
    allchars = [str for str in text]
    emoji_list = [c for c in allchars if c in emoji.UNICODE_EMOJI]
    clean_text = ' '.join([str for str in text.split() if not any(i in str for i in emoji_list)])

    return clean_text

text = give_emoji_free_text(text)

This work for me!

Or you can try:

emoji_pattern = re.compile("["
        u"U0001F600-U0001F64F"  # emoticons
        u"U0001F300-U0001F5FF"  # symbols & pictographs
        u"U0001F680-U0001F6FF"  # transport & map symbols
        u"U0001F1E0-U0001F1FF"  # flags (iOS)
        u"U0001F1F2-U0001F1F4"  # Macau flag
        u"U0001F1E6-U0001F1FF"  # flags
        u"U0001F600-U0001F64F"
        u"U00002702-U000027B0"
        u"U000024C2-U0001F251"
        u"U0001f926-U0001f937"
        u"U0001F1F2"
        u"U0001F1F4"
        u"U0001F620"
        u"u200d"
        u"u2640-u2642"
        "]+", flags=re.UNICODE)

text = emoji_pattern.sub(r'', text)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...