python - how to set the primary key when writing a pandas dataframe to a sqlite database table using df.to_sql

Question

Welcome To Ask or Share your Answers For Others

python - how to set the primary key when writing a pandas dataframe to a sqlite database table using df.to_sql

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - how to set the primary key when writing a pandas dataframe to a sqlite database table using df.to_sql

I have created a sqlite database using pandas df.to_sql however accessing it seems considerably slower than just reading in the 500mb csv file.

I need to:

set the primary key for each table using the df.to_sql method
tell the sqlite database what datatype each of the columns in my 3.dataframe are? - can I pass a list like [integer,integer,text,text]

code.... (format code button not working)

if ext == ".csv": 
df = pd.read_csv("/Users/data/" +filename) 
columns = df.columns columns = [i.replace(' ', '_') for i in columns]

df.columns = columns
df.to_sql(name,con,flavor='sqlite',schema=None,if_exists='replace',index=True,index_label=None, chunksize=None, dtype=None)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T02:51:19+0000

Unfortunately there is no way right now to set a primary key in the pandas df.to_sql() method. Additionally, just to make things more of a pain there is no way to set a primary key on a column in sqlite after a table has been created.

However, a work around at the moment is to create the table in sqlite with the pandas df.to_sql() method. Then you could create a duplicate table and set your primary key followed by copying your data over. Then drop your old table to clean up.

It would be something along the lines of this.

import pandas as pd
import sqlite3

df = pd.read_csv("/Users/data/" +filename) 
columns = df.columns columns = [i.replace(' ', '_') for i in columns]

#write the pandas dataframe to a sqlite table
df.columns = columns
df.to_sql(name,con,flavor='sqlite',schema=None,if_exists='replace',index=True,index_label=None, chunksize=None, dtype=None)

#connect to the database
conn = sqlite3.connect('database')
c = conn.curser()

c.executescript('''
    PRAGMA foreign_keys=off;

    BEGIN TRANSACTION;
    ALTER TABLE table RENAME TO old_table;

    /*create a new table with the same column names and types while
    defining a primary key for the desired column*/
    CREATE TABLE new_table (col_1 TEXT PRIMARY KEY NOT NULL,
                            col_2 TEXT);

    INSERT INTO new_table SELECT * FROM old_table;

    DROP TABLE old_table;
    COMMIT TRANSACTION;

    PRAGMA foreign_keys=on;''')

#close out the connection
c.close()
conn.close()

In the past I have done this as I have faced this issue. Just wrapped the whole thing as a function to make it more convenient...

In my limited experience with sqlite I have found that not being able to add a primary key after a table has been created, not being able to perform Update Inserts or UPSERTS, and UPDATE JOIN has caused a lot of frustration and some unconventional workarounds.

Lastly, in the pandas df.to_sql() method there is a a dtype keyword argument that can take a dictionary of column names:types. IE: dtype = {col_1: TEXT}

Categories

python - how to set the primary key when writing a pandas dataframe to a sqlite database table using df.to_sql

python - how to set the primary key when writing a pandas dataframe to a sqlite database table using df.to_sql

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags