In pandas, there is no convenient argument in to_sql
to append only non-duplicates to a final table. Consider using a staging temp table that pandas always replaces and then run a final append query to migrate temp table records to final table accounting only for unique PK's using the NOT EXISTS
clause.
engine = sqlalchemy.create_engine(...)
df.to_sql(name='myTempTable', con=engine, if_exists='replace')
with engine.begin() as cn:
sql = """INSERT INTO myFinalTable (Col1, Col2, Col3, ...)
SELECT t.Col1, t.Col2, t.Col3, ...
FROM myTempTable t
WHERE NOT EXISTS
(SELECT 1 FROM myFinalTable f
WHERE t.MatchColumn1 = f.MatchColumn1
AND t.MatchColumn2 = f.MatchColumn2)"""
cn.execute(sql)
This would be an ANSI SQL solution and not restricted to vendor-specific methods like UPSERT
and so is compliant in practically all SQL-integrated relational databases.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…