python 3.x - Python3 textcoding issue: extra first character when reading from text file using for loop

Question

Welcome To Ask or Share your Answers For Others

python 3.x - Python3 textcoding issue: extra first character when reading from text file using for loop

posted Jan 29, 2021 in Technique[技术] by 深蓝 (71.8m points)

python 3.x - Python3 textcoding issue: extra first character when reading from text file using for loop

I'm trying to read a number of ticker symbols from a text file, but seem to have a textcoding issue.

This is the contents of a test file 'tickers.txt':

SPG
WBA

This is my testcode:

    f = open("tickers.txt", 'r')
    for ticker in f:
        t = ticker.strip()
        if t:
          try:
            print(">"+t+"<" + ' length = '+ str(len(t)))
            i = 0
            while i < len(t):
              print(t[i])
              i += 1
            print('End')
          except ValueError:
            print('ValueError ticker')

And this is the resulting output:

>SPG< length = 4

S
P
G
End
>WBA< length = 3
W
B
A
End

For some reason there is an extra character in the first ticker symbol, which does not show when printed. Have read through several Q&A's here on StackOverflow I now assume it is a text coding issue, but don't understand yet how to solve this.... Do I need to add an 'encoding' statement to the file open command ? If so, which one ? How to detect ?

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

...

深蓝 · Answer 1 · 2021-01-29T04:27:33+0000

Changing print(t[i]) to print(i, t[i], '{:04x}'.format(ord(t[i]))), I can get the followin output indicating that the extra first character is Byte order mark.

>?SPG< length = 4
0 ? feff
1 S 0053
2 P 0050
3 G 0047
End
>WBA< length = 3
0 W 0057
1 B 0042
2 A 0041
End

Use utf_8_sig — UTF-8 codec with BOM signature. On decoding, an optional UTF-8 encoded BOM at the start of the data will be skipped.

f = open("tickers.txt", mode='r', encoding='utf_8_sig')

instead of



f = open("tickers.txt", 'r')

BTW, do not forget f.close() …

Categories

python 3.x - Python3 textcoding issue: extra first character when reading from text file using for loop

python 3.x - Python3 textcoding issue: extra first character when reading from text file using for loop

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags