oracle - SQL to find duplicate entries (within a group)

Question

Welcome To Ask or Share your Answers For Others

oracle - SQL to find duplicate entries (within a group)

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

oracle - SQL to find duplicate entries (within a group)

SQL to find duplicate entries (within a group)

I have a small problem and I'm not sure what would be the best way to fix it, as I only have limited access to the database (Oracle) itself. In our Table "EVENT" we have about 160k entries, each EVENT has a GROUPID and a normal entry has exactly 5 rows with the same GROUPID. Due to a bug we currently get a couple of duplicate entries (duplicate, so 10 rows instead of 5, just a different EVENTID. This may change, so it's just <> 5). We need to filter all the entries of these groups.

Due to limited access to the database we can not use a temporary table, nor can we add an index to the GROUPID column to make it faster.

We can get the GROUPIDs with this query, but we would need a second query to get the needed data

select A."GROUPID"
from "EVENT" A
group by A."GROUPID"
having count(A."GROUPID") <> 5

One solution would be a subselect:

select *
from "EVENT" A
where A."GROUPID" IN (
  select B."GROUPID"
  from "EVENT" B
  group by B."GROUPID"
  having count(B."GROUPID") <> 5
)

Without an index on GROUPID and 160k entries, this takes much too long. Tried thinking about a join that can handle this, but can't find a good solution so far.

Anybody can find a good solution for this maybe?

Small edit: We don't have 100% duplicates here, as each entry still has a unique ID and the GROUPID is not unique either (that's why we need to use "group by") - or maybe I just miss an easy solution for it :)

Small example about the data (I don't want to delete it, just find it)

EVENTID | GROUPID | TYPEID 123456 123 12 123457 123 145 123458 123 2612 123459 123 41 123460 123 238 234567 123 12 234568 123 145 234569 123 2612 234570 123 41 234571 123 238
It has some more columns, like timestamp etc, but as you can see already, everything is identical, besides the EVENTID.

We will run it more often for testing, to find the bug and check if it happens again.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:57:36+0000

A classic problem for analytic queries to solve:

select eventid,
       groupid,
       typeid
from   (
       Select eventid,
              groupid,
              typeid,
              count(*) over (partition by group_id) count_by_group_id
       from   EVENT
       )
where count_by_group_id <> 5

Categories

oracle - SQL to find duplicate entries (within a group)

oracle - SQL to find duplicate entries (within a group)

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags