I have csv files containing user-submitted form data from a few websites. These files could have any number of columns (one per form field) while the values could be anything. A few columns are constant, such as Form ID
and Form URL
. I need to dynamically create tables for each form/csv and input the data to a predefined MySQL database.
I wrote a script some time ago leveraging MySQLdb that does exactly this, but at a rate of roughly 2 rows per second. In total I have about 90K rows of data.
My process was this:
- Grab a part of the site name, form name and form ID to dynamically create a table name
- Create the table if it does not exist, with string concatenated SQL syntax. Table names are derived from csv fieldnames stripped of special characters and made snake case. Tables are typed
VARCHAR
- Loop through the csv rows and INSERT to the table, using a dictionary and placeholders to prevent SQL injection
Upon revision I noticed I didn't make use of bulk_query()
or executemany()
, which should make a difference. But it would likely also be better to do away with string concatenation and make use of SQLAlchemy instead. However, I understand that would require predefined classes to build the models from. Is that something that could be defined on the fly?
question from:
https://stackoverflow.com/questions/65901327/python-speeding-up-imports-from-csv-files-with-unknown-columns-to-database 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…