Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
267 views
in Technique[技术] by (71.8m points)

python - Is it possible to specify the encoding of a file with Paramiko?

I'm trying to read a CSV over SFTP using pysftp/Paramiko. My code looks like this:

input_conn = pysftp.Connection(hostname, username, password)
file = input_conn.open("Data.csv")
file_contents = list(csv.reader(file))

But when I do this, I get the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 23: invalid start byte

I know that this means the file is expected to be in UTF-8 encoding but isn't. The strange thing is, if I download the file and then use my code to open the file, I can specify the encoding as "macroman" and get no error:

with open("Data.csv", "r", encoding="macroman") as csvfile:
    file_contents = list(csv.reader(csvfile))

The Paramiko docs say that the encoding of a file is meaningless over SFTP because it treats all files as bytes – but then, how can I get Python's CSV module to recognize the encoding if I use Paramiko to open the file?

question from:https://stackoverflow.com/questions/65853835/is-it-possible-to-specify-the-encoding-of-a-file-with-paramiko

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If the file is not huge, so it's not a problem to have it loaded twice into the memory, you can download and convert the contents in memory:

with io.BytesIO() as bio:
    input_conn.getfo("Data.csv", bio)
    bio.seek(0)

    with io.TextIOWrapper(bio, encoding='macroman') as f:
        file_contents = list(csv.reader(f))

Partially based on Convert io.BytesIO to io.StringIO to parse HTML page.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...