sql - Deleting duplicates rows from redshift

Question

Welcome To Ask or Share your Answers For Others

sql - Deleting duplicates rows from redshift

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

sql - Deleting duplicates rows from redshift

I am trying to delete some duplicate data in my redshift table.

Below is my query:-

With duplicates
As
(Select *, ROW_NUMBER() Over (PARTITION by record_indicator Order by record_indicator) as Duplicate From table_name)
delete from duplicates
Where Duplicate > 1 ;

This query is giving me an error.

Amazon Invalid operation: syntax error at or near "delete";

Not sure what the issue is as the syntax for with clause seems to be correct. Has anybody faced this situation before?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T02:58:08+0000

Redshift being what it is (no enforced uniqueness for any column), Ziggy's 3rd option is probably best. Once we decide to go the temp table route it is more efficient to swap things out whole. Deletes and inserts are expensive in Redshift.

begin;
create table table_name_new as select distinct * from table_name;
alter table table_name rename to table_name_old;
alter table table_name_new rename to table_name;
drop table table_name_old;
commit;

If space isn't an issue you can keep the old table around for a while and use the other methods described here to validate that the row count in the original accounting for duplicates matches the row count in the new.

If you're doing constant loads to such a table you'll want to pause that process while this is going on.

If the number of duplicates is a small percentage of a large table, you might want to try copying distinct records of the duplicates to a temp table, then delete all records from the original that join with the temp. Then append the temp table back to the original. Make sure you vacuum the original table after (which you should be doing for large tables on a schedule anyway).

Categories

sql - Deleting duplicates rows from redshift

sql - Deleting duplicates rows from redshift

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags