Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
571 views
in Technique[技术] by (71.8m points)

Remove u202a from Python string

I'm trying to open a file in Python, but I got an error, and in the beginning of the string I got a /u202a character... Does anyone know how to remove it?

def carregar_uml(arquivo, variaveis):
    cadastro_uml = {}
    id_uml = 0

    for i in open(arquivo):
        linha = i.split(",")


carregar_uml("?H:\7 - Script\teste.csv", variaveis)

OSError: [Errno 22] Invalid argument: 'u202aH:7 - Scripteste.csv'

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

When you initially created your .py file, your text editor introduced a non-printing character.

Consider this line:

carregar_uml("?H:\7 - Script\teste.csv", variaveis)

Let's carefully select the string, including the quotes, and copy-paste it into an interactive Python session:

$ python
Python 3.6.1 (default, Jul 25 2017, 12:45:09) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> "?H:\7 - Script\teste.csv"
'u202aH:\7 - Script\teste.csv'
>>> 

As you can see, there is a character with codepoint U-202A immediately before the H.

As someone else pointed out, the character at codepoint U-202A is LEFT-TO-RIGHT EMBEDDING. Returning to our Python session:

>>> s = "?H:\7 - Script\teste.csv"
>>> import unicodedata
>>> unicodedata.name(s[0])
'LEFT-TO-RIGHT EMBEDDING'
>>> unicodedata.name(s[1])
'LATIN CAPITAL LETTER H'
>>> 

This further confirms that the first character in your string is not H, but the non-printing LEFT-TO-RIGHT EMBEDDING character.

I don't know what text editor you used to create your program. Even if I knew, I'm probably not an expert in that editor. Regardless, some text editor that you used inserted, unbeknownst to you, U+202A.

One solution is to use a text editor that won't insert that character, and/or will highlight non-printing characters. For example, in vim that line appears like so:

carregar_uml("<202a>H:\7 - Script\teste.csv", variaveis)

Using such an editor, simply delete the character between " and H.

carregar_uml("H:\7 - Script\teste.csv", variaveis)

Even though this line is visually identical to your original line, I have deleted the offending character. Using this line will avoid the OSError that you report.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...