Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
496 views
in Technique[技术] by (71.8m points)

python - UnicodeDecodeError when reading data from DBF database

I need to write a script that connects an ERP program to a manufacturing program. With the production program the matter is clear - I send it data via HTTP requests. It is worse with the ERP program, because in its case, the data must be read from a DBF file.

I use the dbf library because (if I'm not mistaken) it's the only one that provides the ability to filter data in a fairly simple and fast way. I open the database this way

table = dbf.Table(path).open()
dbf_index = dbf.pql(table, "select * where ident == 'M'")

I then loop through each successive record that the query returned. I need to "package" the selected data from the DBF database into json and send it to the production program api.

data = {
    "warehouse_id" : parseDbfData(record['SYMBOL']),
    "code" : parseDbfData(record['SYMBOL']),
    "name" : parseDbfData(record['NAZWA']),
    "main_warehouse" : False,
    "blocked" : False
}

The parseDbfData function looks like this, but it's not the one causing the problem because it didn't work the same way without it. I added it trying to fix the problem.

def parseDbfData(data):
    return str(data.strip())

When run, if the function encounters any "mismatching" character from DBF database (e.g. any Polish characters i.e. ?, ?, ?, ?) the script terminates with an error

UnicodeDecodeError: 'ascii' codec can't decode byte 0x88 in position 15: ordinal not in range(128)

The error points to a line containing this (in building json)

"name" : parseDbfData(record['NAZWA']),

The value the script is trying to read at this point is probably "Magazyn materia?ów Podgórna". As you can see, this value contains the characters "?" and "ó". I think this makes the whole script break but I don't know how to fix it.

I'll mention that I'm using Python version 3.9. I know that there were character encoding issues in versions 2., but I thought that the Python 3. era had remedied that problem. I found out it didn't :(

question from:https://stackoverflow.com/questions/65952363/unicodedecodeerror-when-reading-data-from-dbf-database

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I came to the conclusion that I have to use encoding directly when reading the DBF database. However, I could not read from the documentation, how exactly to do this.

After a thorough analysis of the dbf module itself, I came to the conclusion that I need to use the codepage parameter when opening the database. A moment of combining and I was able to determine that of all the encoding standards available in the module, cp852 suits me best.

After the correction, the code to open a DBF database looks like this:

table = dbf.Table(path, codepage='cp852').open()

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...