unicode - Difference between open and codecs.open in Python

Question

Welcome To Ask or Share your Answers For Others

unicode - Difference between open and codecs.open in Python

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

unicode - Difference between open and codecs.open in Python

There are two ways to open a text file in Python:

f = open(filename)

And

import codecs
f = codecs.open(filename, encoding="utf-8")

When is codecs.open preferable to open?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:13:16+0000

Since Python 2.6, a good practice is to use io.open(), which also takes an encoding argument, like the now obsolete codecs.open(). In Python 3, io.open is an alias for the open() built-in. So io.open() works in Python 2.6 and all later versions, including Python 3.4. See docs: http://docs.python.org/3.4/library/io.html

Now, for the original question: when reading text (including "plain text", HTML, XML and JSON) in Python 2 you should always use io.open() with an explicit encoding, or open() with an explicit encoding in Python 3. Doing so means you get correctly decoded Unicode, or get an error right off the bat, making it much easier to debug.

Pure ASCII "plain text" is a myth from the distant past. Proper English text uses curly quotes, em-dashes, bullets, € (euro signs) and even diaeresis (¨). Don't be na?ve! (And let's not forget the Fa?ade design pattern!)

Because pure ASCII is not a real option, open() without an explicit encoding is only useful to read binary files.

Categories

unicode - Difference between open and codecs.open in Python

unicode - Difference between open and codecs.open in Python

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags