Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
322 views
in Technique[技术] by (71.8m points)

python - How to get IDLE to accept paste of Unicode characters?

Oftentimes when I'm working interactively in IDLE, I'd like to paste a Unicode string into the IDLE window. It appears to paste properly but generates an error immediately. It has no trouble displaying the same character on output.

>>> c = u'?'
Unsupported characters in input

>>> print u'u0109'
?

I suspect that the input window, like most Windows programs, uses UTF-16 internally and has no trouble dealing with the full Unicode set; the problem is that IDLE insists on coercing all input to the default mbcs code page, and anything not in that page gets rejected.

Is there any way to configure or cajole IDLE into accepting the full Unicode character set as input?

Python 3.2 handles this much better and has no trouble with anything I throw at it.

I know that I can simply save the code to a file in UTF-8 and import it, but I want to be able to work with Unicode characters in the interactive window.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I finally figured out a way. Since the sources to IDLE are part of the distribution you can make a couple of quick edits to enable the capability. The files will typically be found in C:Python27Libidlelib.

The first step is to prevent IDLE from trying to encode all those nice Unicode characters into a character set that can't handle them. This is controlled by IOBinding.py. Edit the file, find the section after if sys.platform == 'win32': and comment out this line:

#encoding = locale.getdefaultlocale()[1]

Now add this line after it:

encoding = 'utf-8'

I was hoping that there would be a way to override this with an environment variable or something, but getdefaultlocale calls directly into a Win32 function that gets the globally configured Windows mbcs encoding.

This is half the battle, now we need to get the command line interpreter to recognize that the input bytes are UTF-8 encoded. It didn't appear that there was a way to pass an encoding into the interpreter, so I came up with the mother of all hacks. Maybe someone with a little more patience can come up with a better way, but this works for now. The input is processed in PyShell.py, in the runsource function. Change the following:

    if isinstance(source, types.UnicodeType):
        from idlelib import IOBinding
        try:
            source = source.encode(IOBinding.encoding)
        except UnicodeError:
            self.tkconsole.resetoutput()
            self.write("Unsupported characters in input
")
            return

To:

    from idlelib import IOBinding  # line moved
    if isinstance(source, types.UnicodeType):
        try:
            source = source.encode(IOBinding.encoding)
        except UnicodeError:
            self.tkconsole.resetoutput()
            self.write("Unsupported characters in input
")
            return
    source = "#coding=%s
%s" % (IOBinding.encoding, source)  # line added

We're taking advantage of PEP 263 to specify the encoding for each line of input provided to the interpreter.

Update: In Python 2.7.10 it is no longer necessary to make the change in PyShell.py, it already works properly if the encoding is set to utf-8. Unfortunately I haven't found a way to bypass the change in IOBinding.py.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...