Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
456 views
in Technique[技术] by (71.8m points)

django - PANDAS: Excel to postgres append data without duplication

I am using Pandas, python, and postgresql. I want to avoid inserting data from excel file that is duplicated in the database

The table is assembled with Django models: the crq_number column has the unique attribute

class Crq(models.Model):
    id = models.IntegerField(primary_key=True)
    project = models.ForeignKey(Project, on_delete=models.CASCADE, null=True)
    crq_number = models.CharField(max_length=200, unique=True)
    crq_title = models.CharField(max_length=200)
    crq_impact = models.CharField(max_length=32)
    crq_type = models.CharField(max_length=32)
    pub_date = models.DateTimeField(auto_now_add=True)
    success = models.BooleanField(blank=True, null=True, default=True)


CREATE TABLE public.srv_crq
(
    id bigint NOT NULL GENERATED ALWAYS AS IDENTITY ( INCREMENT 1 START 1 MINVALUE 1 MAXVALUE 9223372036854775807 CACHE 1 ),
    crq_number character varying(200) COLLATE pg_catalog."default" NOT NULL,
    crq_title character varying(200) COLLATE pg_catalog."default" NOT NULL,
    crq_impact character varying(32) COLLATE pg_catalog."default" NOT NULL,
    crq_type character varying(32) COLLATE pg_catalog."default",
    pub_date timestamp with time zone NOT NULL,
    success boolean,
    project_id integer,
    CONSTRAINT srv_crq_pkey PRIMARY KEY (id),
    CONSTRAINT srv_crq_crq_number_f3f26821_uniq UNIQUE (crq_number),
    CONSTRAINT srv_crq_project_id_745a788b_fk_srv_project_id FOREIGN KEY (project_id)
        REFERENCES public.srv_project (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
        DEFERRABLE INITIALLY DEFERRED
)

and I have the following script to import data to the database:

import pandas
from sqlalchemy import create_engine, MetaData, Column, Table, exists, String

df = pandas.read_excel('crq.xlsx')

select_column = df[[
    'INFRASTRUCTURE_CHANGE_ID',
    'DESCRIPTION',
    'SCHEDULED_START_DATE',
    'IMPACTO',
    'ORIGEN',
]]

df = select_column.rename(columns={
    "INFRASTRUCTURE_CHANGE_ID": "crq_number",
    "DESCRIPTION": "crq_title",
    "SCHEDULED_START_DATE": "pub_date",
    "IMPACTO": "crq_impact",
    "ORIGEN": "crq_type",
})

engine = create_engine('postgresql+psycopg2://srv:srv@localhost:5432/srv_db')
meta = MetaData(engine).reflect()

df.to_sql(
        name='srv_crq',
        con=engine,
        if_exists='append',
        index=False,
    )

the error

sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) llave duplicada viola restricción de unicidad ?srv_crq_crq_number_f3f26821_uniq?
DETAIL:  Ya existe la llave (crq_number)=(CRQ000000157475).

How can I avoid duplicate arrows and only insert data that does not exist in the database based on a specific field in the table?

question from:https://stackoverflow.com/questions/65944600/pandas-excel-to-postgres-append-data-without-duplication

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...