I am using Pandas, python, and postgresql. I want to avoid inserting data from excel file that is duplicated in the database
The table is assembled with Django models:
the crq_number column has the unique attribute
class Crq(models.Model):
id = models.IntegerField(primary_key=True)
project = models.ForeignKey(Project, on_delete=models.CASCADE, null=True)
crq_number = models.CharField(max_length=200, unique=True)
crq_title = models.CharField(max_length=200)
crq_impact = models.CharField(max_length=32)
crq_type = models.CharField(max_length=32)
pub_date = models.DateTimeField(auto_now_add=True)
success = models.BooleanField(blank=True, null=True, default=True)
CREATE TABLE public.srv_crq
(
id bigint NOT NULL GENERATED ALWAYS AS IDENTITY ( INCREMENT 1 START 1 MINVALUE 1 MAXVALUE 9223372036854775807 CACHE 1 ),
crq_number character varying(200) COLLATE pg_catalog."default" NOT NULL,
crq_title character varying(200) COLLATE pg_catalog."default" NOT NULL,
crq_impact character varying(32) COLLATE pg_catalog."default" NOT NULL,
crq_type character varying(32) COLLATE pg_catalog."default",
pub_date timestamp with time zone NOT NULL,
success boolean,
project_id integer,
CONSTRAINT srv_crq_pkey PRIMARY KEY (id),
CONSTRAINT srv_crq_crq_number_f3f26821_uniq UNIQUE (crq_number),
CONSTRAINT srv_crq_project_id_745a788b_fk_srv_project_id FOREIGN KEY (project_id)
REFERENCES public.srv_project (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
DEFERRABLE INITIALLY DEFERRED
)
and I have the following script to import data to the database:
import pandas
from sqlalchemy import create_engine, MetaData, Column, Table, exists, String
df = pandas.read_excel('crq.xlsx')
select_column = df[[
'INFRASTRUCTURE_CHANGE_ID',
'DESCRIPTION',
'SCHEDULED_START_DATE',
'IMPACTO',
'ORIGEN',
]]
df = select_column.rename(columns={
"INFRASTRUCTURE_CHANGE_ID": "crq_number",
"DESCRIPTION": "crq_title",
"SCHEDULED_START_DATE": "pub_date",
"IMPACTO": "crq_impact",
"ORIGEN": "crq_type",
})
engine = create_engine('postgresql+psycopg2://srv:srv@localhost:5432/srv_db')
meta = MetaData(engine).reflect()
df.to_sql(
name='srv_crq',
con=engine,
if_exists='append',
index=False,
)
the error
sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) llave duplicada viola restricción de unicidad ?srv_crq_crq_number_f3f26821_uniq?
DETAIL: Ya existe la llave (crq_number)=(CRQ000000157475).
How can I avoid duplicate arrows and only insert data that does not exist in the database based on a specific field in the table?
question from:
https://stackoverflow.com/questions/65944600/pandas-excel-to-postgres-append-data-without-duplication