Import data
COPY
everything to a temporary staging table and insert only new titles into your target table.
CREATE TEMP TABLE tmp(title text);
COPY tmp FROM 'path/to/file.csv';
ANALYZE tmp;
INSERT INTO tbl
SELECT DISTINCT tmp.title
FROM tmp
LEFT JOIN tbl USING (title)
WHERE tbl.title IS NULL;
IDs should be generated automatically with a serial
column tbl_id
in tbl
.
The LEFT JOIN
/ IS NULL
construct disqualifies already existing titles. NOT EXISTS
would be another possibility.
DISTINCT
prevents duplicates in the incoming data in the temporary table tmp
.
ANALYZE
is useful to make sure the query planner picks a sensible plan, and temporary tables are not analyzed by autovacuum.
Since you have 3 million items, it might pay to raise the setting for temp_buffer
(for this session only):
SET temp_buffers = 1000MB;
Or however much you can afford and is enough to hold the temp table in RAM, which is much faster. Note: must be done first in the session - before any temp objects are created.
Retrieve IDs
To see all IDs for the imported data:
SELECT tbl.tbl_id, tbl.title
FROM tbl
JOIN tmp USING (title)
In the same session! A temporary table is dropped automatically at the end of the session.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…