Detect Byte Order Mark (BOM) in Python

Question

Welcome To Ask or Share your Answers For Others

Detect Byte Order Mark (BOM) in Python

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:31:30+0000

The simple answer is: read the first 4 bytes and look at them.

with open("utf32le.file", "rb") as file:
    beginning = file.read(4)
    # The order of these if-statements is important
    # otherwise UTF32 LE may be detected as UTF16 LE as well
    if beginning == b'x00x00xfexff':
        print("UTF-32 BE")
    elif beginning == b'xffxfex00x00':
        print("UTF-32 LE")
    elif beginning[0:3] == b'xefxbbxbf':
        print("UTF-8")
    elif beginning[0:2] == b'xffxfe':
        print("UTF-16 LE")
    elif beginning[0:2] == b'xfexff':
        print("UTF-16 BE")
    else:
        print("Unknown or no BOM")

The not so simple answer is:

There may be binary files that seem to have BOM, but they might still just be binary files with data that accidentally looks like a BOM.

Other than that you can typically treat text files without BOM as UTF-8 as well.

Categories

Detect Byte Order Mark (BOM) in Python

Detect Byte Order Mark (BOM) in Python

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags