Since Python 2.6, a good practice is to use io.open()
, which also takes an encoding
argument, like the now obsolete codecs.open()
. In Python 3, io.open
is an alias for the open()
built-in. So io.open()
works in Python 2.6 and all later versions, including Python 3.4. See docs: http://docs.python.org/3.4/library/io.html
Now, for the original question: when reading text (including "plain text", HTML, XML and JSON) in Python 2 you should always use io.open()
with an explicit encoding, or open()
with an explicit encoding in Python 3. Doing so means you get correctly decoded Unicode, or get an error right off the bat, making it much easier to debug.
Pure ASCII "plain text" is a myth from the distant past. Proper English text uses curly quotes, em-dashes, bullets, € (euro signs) and even diaeresis (¨). Don't be na?ve! (And let's not forget the Fa?ade design pattern!)
Because pure ASCII is not a real option, open()
without an explicit encoding is only useful to read binary files.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…