Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
368 views
in Technique[技术] by (71.8m points)

python - Wrap an open stream with io.TextIOWrapper

How can I wrap an open binary stream – a Python 2 file, a Python 3 io.BufferedReader, an io.BytesIO – in an io.TextIOWrapper?

I'm trying to write code that will work unchanged:

  • Running on Python 2.
  • Running on Python 3.
  • With binary streams generated from the standard library (i.e. I can't control what type they are)
  • With binary streams made to be test doubles (i.e. no file handle, can't re-open).
  • Producing an io.TextIOWrapper that wraps the specified stream.

The io.TextIOWrapper is needed because its API is expected by other parts of the standard library. Other file-like types exist, but don't provide the right API.

Example

Wrapping the binary stream presented as the subprocess.Popen.stdout attribute:

import subprocess
import io

gnupg_subprocess = subprocess.Popen(
        ["gpg", "--version"], stdout=subprocess.PIPE)
gnupg_stdout = io.TextIOWrapper(gnupg_subprocess.stdout, encoding="utf-8")

In unit tests, the stream is replaced with an io.BytesIO instance to control its content without touching any subprocesses or filesystems.

gnupg_subprocess.stdout = io.BytesIO("Lorem ipsum".encode("utf-8"))

That works fine on the streams created by Python 3's standard library. The same code, though, fails on streams generated by Python 2:

[Python 2]
>>> type(gnupg_subprocess.stdout)
<type 'file'>
>>> gnupg_stdout = io.TextIOWrapper(gnupg_subprocess.stdout, encoding="utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'file' object has no attribute 'readable'

Not a solution: Special treatment for file

An obvious response is to have a branch in the code which tests whether the stream actually is a Python 2 file object, and handle that differently from io.* objects.

That's not an option for well-tested code, because it makes a branch that unit tests – which, in order to run as fast as possible, must not create any real filesystem objects – can't exercise.

The unit tests will be providing test doubles, not real file objects. So creating a branch which won't be exercised by those test doubles is defeating the test suite.

Not a solution: io.open

Some respondents suggest re-opening (e.g. with io.open) the underlying file handle:

gnupg_stdout = io.open(
        gnupg_subprocess.stdout.fileno(), mode='r', encoding="utf-8")

That works on both Python 3 and Python 2:

[Python 3]
>>> type(gnupg_subprocess.stdout)
<class '_io.BufferedReader'>
>>> gnupg_stdout = io.open(gnupg_subprocess.stdout.fileno(), mode='r', encoding="utf-8")
>>> type(gnupg_stdout)
<class '_io.TextIOWrapper'>
[Python 2]
>>> type(gnupg_subprocess.stdout)
<type 'file'>
>>> gnupg_stdout = io.open(gnupg_subprocess.stdout.fileno(), mode='r', encoding="utf-8")
>>> type(gnupg_stdout)
<type '_io.TextIOWrapper'>

But of course it relies on re-opening a real file from its file handle. So it fails in unit tests when the test double is an io.BytesIO instance:

>>> gnupg_subprocess.stdout = io.BytesIO("Lorem ipsum".encode("utf-8"))
>>> type(gnupg_subprocess.stdout)
<type '_io.BytesIO'>
>>> gnupg_stdout = io.open(gnupg_subprocess.stdout.fileno(), mode='r', encoding="utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
io.UnsupportedOperation: fileno

Not a solution: codecs.getreader

The standard library also has the codecs module, which provides wrapper features:

import codecs

gnupg_stdout = codecs.getreader("utf-8")(gnupg_subprocess.stdout)

That's good because it doesn't attempt to re-open the stream. But it fails to provide the io.TextIOWrapper API. Specifically, it doesn't inherit io.IOBase and doesn't have the encoding attribute:

>>> type(gnupg_subprocess.stdout)
<type 'file'>
>>> gnupg_stdout = codecs.getreader("utf-8")(gnupg_subprocess.stdout)
>>> type(gnupg_stdout)
<type 'instance'>
>>> isinstance(gnupg_stdout, io.IOBase)
False
>>> gnupg_stdout.encoding
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/codecs.py", line 643, in __getattr__
    return getattr(self.stream, name)
AttributeError: '_io.BytesIO' object has no attribute 'encoding'

So codecs doesn't provide objects which substitute for io.TextIOWrapper.

What to do?

So how can I write code that works for both Python 2 and Python 3, with both the test doubles and the real objects, which wraps an io.TextIOWrapper around the already-open byte stream?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Use codecs.getreader to produce a wrapper object:

text_stream = codecs.getreader("utf-8")(bytes_stream)

Works on Python 2 and Python 3.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...