The simple answer is: read the first 4 bytes and look at them.
with open("utf32le.file", "rb") as file:
beginning = file.read(4)
# The order of these if-statements is important
# otherwise UTF32 LE may be detected as UTF16 LE as well
if beginning == b'x00x00xfexff':
print("UTF-32 BE")
elif beginning == b'xffxfex00x00':
print("UTF-32 LE")
elif beginning[0:3] == b'xefxbbxbf':
print("UTF-8")
elif beginning[0:2] == b'xffxfe':
print("UTF-16 LE")
elif beginning[0:2] == b'xfexff':
print("UTF-16 BE")
else:
print("Unknown or no BOM")
The not so simple answer is:
There may be binary files that seem to have BOM, but they might still just be binary files with data that accidentally looks like a BOM.
Other than that you can typically treat text files without BOM as UTF-8 as well.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…