What's going wrong
After checking the link in your comment below and doing some more research and testing, I was able to reproduce the error with MySQLdb versions 1.2.4b4 and 1.2.5. As explained in unubtu's answer, this has to do with the limitations of a regular expression that appears in cursors.py
. The exact regular expression is slightly different in each release, probably because people keep finding cases it doesn't handle and adjusting the expression instead of looking for a better approach entirely.
What the regular expression does is try to match the VALUES ( ... )
clause of the INSERT
statement and identify the beginning and end of the tuple expression it contains. If the match succeeds, executemany
tries to convert the single-row insert statement template into a multiple-row insert statement so that it runs faster. I.e., instead of executing this for every row you want to insert:
INSERT INTO table
(foo, bar, ...)
VALUES
(%s, %s, ...);
It tries to rewrite the statement so that it only has to execute once:
INSERT INTO table
(foo, bar, ...)
VALUES
(1, 2, ...),
(3, 4, ...),
(5, 6, ...),
...;
The problem you're running into is that executemany
assumes you only have parameter placeholders in the tuple immediately after VALUES
. When you also have placeholders later on, it takes this:
INSERT INTO table
(foo, bar, ...)
VALUES
(%s, %s, ...)
ON DUPLICATE KEY UPDATE baz=%s;
And tries to rewrite it like this:
INSERT INTO table
(foo, bar, ...)
VALUES
(1, 2, ...),
(3, 4, ...),
(5, 6, ...),
...
ON DUPLICATE KEY UPDATE baz=%s;
The problem here is that MySQLdb is trying to do string formatting at the same time that it's rewriting the query. Only the VALUES ( ... )
clause needs to be rewritten, so MySQLdb tries to put all your parameters into the matching group (%s, %s, ...)
, not realizing that some parameters need to go into the UPDATE
clause instead.
If you only send parameters for the VALUES
clause to executemany
, you'll avoid the TypeError
but run into a different problem. Notice that the rewritten INSERT ... ON DUPLICATE UPDATE
query has numeric literals in the VALUES
clause, but there's still a %s
placeholder in the UPDATE
clause. That's going to throw a syntax error when it reaches the MySQL server.
When I first tested your sample code, I was using MySQLdb 1.2.3c1 and couldn't reproduce your problem. Amusingly, the reason that particular version of the package avoids these problems is that the regular expression is broken and doesn't match the statement at all. Since it doesn't match, executemany
doesn't attempt to rewrite the query, and instead just loops through your parameters calling execute
repeatedly.
What to do about it
First of all, don't go back and install 1.2.3c1 to make this work. You want to be using updated code where possible.
You could move to another package, as unubtu suggests in the linked Q&A, but that would involve some amount of adjustment and possibly changes to other code.
What I would recommend instead is to rewrite your query in a way that is more straightforward and takes advantage of the VALUES()
function in your UPDATE
clause. This function allows you to refer back to the values that you would have inserted in the absence of a duplicate key violation, by column name (examples are in the MySQL docs).
With that in mind, here's one way to do it:
dData = [[u'Daniel', u'00-50-56-C0-00-12', u'Daniel']] # exact input you gave
sql = """
INSERT INTO app_network_white_black_list
(biz_id, shop_id, type, mac_phone, remarks, create_time)
VALUES
(%s, %s, %s, %s, %s, NOW())
ON DUPLICATE KEY UPDATE
type=VALUES(type), remarks=VALUES(remarks), create_time=VALUES(create_time);
""" # keep parameters in one part of the statement
# generator expression takes care of the repeated values
cur.executemany(sql, ((bsid, shop_id, dType, mac, rem) for mac, rem in dData))
This approach should work because there are no parameters in the UPDATE
clause, meaning MySQLdb will be able to successfully convert the single-line insert template with parameters into a multi-line insert statement with literal values.
Some things to note:
- You don't have to supply a tuple to
executemany
; any iterable is fine.
- Multiline strings make for much more readable SQL statements in your Python code than implicitly concatenated strings; when you separate the statement from the string delimiters, it's easy to quickly grab the statement and copy it into a client application for testing.
- If you're going to parameterize part of your query, why not parameterize all of your query? Even if only part of it is user input, it's more readable and maintainable to handle all your input values the same way.
- That said, I didn't parameterize
NOW()
. My preferred approach here would be to use CURRENT_TIMESTAMP
as the column default and take advantage of DEFAULT
in the statement. Others might prefer to generate this value in the application and supply it as a parameter. If you're not worried about version compatibility, it's probably fine as-is.
- If you can't avoid having parameter placeholders in the
UPDATE
clause – e.g., because the UPDATE
value(s) can't be hard-coded in the statement or derived from the VALUES
tuple – you'll have to iterate over execute
instead of using executemany
.