SQL to find duplicate entries (within a group)
I have a small problem and I'm not sure what would be the best way to fix it, as I only have limited access to the database (Oracle) itself.
In our Table "EVENT" we have about 160k entries, each EVENT has a GROUPID and a normal entry has exactly 5 rows with the same GROUPID. Due to a bug we currently get a couple of duplicate entries (duplicate, so 10 rows instead of 5, just a different EVENTID. This may change, so it's just <> 5). We need to filter all the entries of these groups.
Due to limited access to the database we can not use a temporary table, nor can we add an index to the GROUPID column to make it faster.
We can get the GROUPIDs with this query, but we would need a second query to get the needed data
select A."GROUPID"
from "EVENT" A
group by A."GROUPID"
having count(A."GROUPID") <> 5
One solution would be a subselect:
select *
from "EVENT" A
where A."GROUPID" IN (
select B."GROUPID"
from "EVENT" B
group by B."GROUPID"
having count(B."GROUPID") <> 5
)
Without an index on GROUPID and 160k entries, this takes much too long.
Tried thinking about a join that can handle this, but can't find a good solution so far.
Anybody can find a good solution for this maybe?
Small edit:
We don't have 100% duplicates here, as each entry still has a unique ID and the GROUPID is not unique either (that's why we need to use "group by") - or maybe I just miss an easy solution for it :)
Small example about the data (I don't want to delete it, just find it)
EVENTID | GROUPID | TYPEID
123456 123 12
123457 123 145
123458 123 2612
123459 123 41
123460 123 238
234567 123 12
234568 123 145
234569 123 2612
234570 123 41
234571 123 238
It has some more columns, like timestamp etc, but as you can see already, everything is identical, besides the EVENTID.
We will run it more often for testing, to find the bug and check if it happens again.
See Question&Answers more detail:
os