Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
551 views
in Technique[技术] by (71.8m points)

python - ConfigParser with Unicode items

my troubles with ConfigParser continue. It seems it doesn't support Unicode very well. The config file is indeed saved as UTF-8, but when ConfigParser reads it it seems to be encoded into something else. I assumed it was latin-1 and I thougt overriding optionxform could help:

-- configfile.cfg -- 
[rules]
H?jsan = 3
? = my snowman

-- myapp.py --
# -*- coding: utf-8 -*-  
import ConfigParser

def _optionxform(s):
    try:
        newstr = s.decode('latin-1')
        newstr = newstr.encode('utf-8')
        return newstr
    except Exception, e:
        print e

cfg = ConfigParser.ConfigParser()
cfg.optionxform = _optionxform    
cfg.read("myconfig") 

Of course, when I read the config I get:

'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

I've tried a couple of different variations of decoding 's' but the point seems moot, since it really should be a unicode object from the beginning. After all, the config file is UTF-8? I have confirmed that's something is wrong in the way ConfigParser reads the file by stubbing it out with this DummyConfig class. If I use that then everything is nice unicode, fine and dandy.

-- config.py --
# -*- coding: utf-8 -*-                
apa = {'rules': [(u'H?jsan', 3), (u'?', u'my snowman')]}

class DummyConfig(object):
    def sections(self):
        return apa.keys()
    def items(self, section):
       return apa[section]
    def add_section(self, apa):
        pass  
    def set(self, *args):
        pass  

Any ideas what could be causing this or suggestions of other config modules that supports Unicode better are most welcome. I don't want to use sys.setdefaultencoding()!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The ConfigParser.readfp() method can take a file object, have you tried opening the file object with the correct encoding using the codecs module before sending it to ConfigParser like below:

cfg.readfp(codecs.open("myconfig", "r", "utf8"))

For Python 3.2 or above, readfp() is deprecated. Use read_file() instead.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...