I'm trying to find a reliable way to scan files on Windows in Python, while allowing for the possibility that there may be various Unicode code points in the filenames. I've seen several proposed solutions to this problem, but none of them work for all of the actual issues that I've encountered in scanning filenames created by real-world software and users.
The code sample below is an attempt to extricate and demonstrate the core issue. It creates three files in a subfolder with the sorts of variations I've encountered, and then attempts to scan through that folder and display each filename followed by the file's contents. It will crash on the attempt to read the third test file, with OSError [Errno 22] Invalid argument.
import os
# create files in .emp that demonstrate various issues encountered in the wild
tempfolder = os.getcwd() + '\temp'
if not os.path.exists(tempfolder):
os.makedirs(tempfolder)
print('file contents', file=open('temp/simple.txt','w'))
print('file contents', file=open('temp/with a ? symbol.txt','w'))
print('file contents', file=open('temp/with these chars ΣΑΠΦΩ.txt','w'))
# goal is to scan the files in a manner that allows for printing
# the filename as well as opening/reading the file ...
for root,dirs,files in os.walk(tempfolder.encode('UTF-8')):
for filename in files:
fullname = os.path.join(tempfolder.encode('UTF-8'), filename)
print(fullname)
print(open(fullname,'r').read())
As it says in the code, I just want to be able to display the filenames and open/read the files. Regarding display of the filename, I don't care whether the Unicode characters are rendered correctly for the special cases. I just want to print the filename in a manner that uniquely identifies which file is being processed, and doesn't throw an error for these unusual sorts of filenames.
If you comment out the final line of code, the approach shown here will display all three filenames with no errors. But it won't open the file with miscellaneous Unicode in the name.
Is there a single approach that will reliably display/open all three of these filename variations in Python? I'm hoping there is, and my limited grasp of Unicode subtleties is preventing me from seeing it.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…