Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
878 views
in Technique[技术] by (71.8m points)

oracle - SQL to find duplicate entries (within a group)

SQL to find duplicate entries (within a group)

I have a small problem and I'm not sure what would be the best way to fix it, as I only have limited access to the database (Oracle) itself. In our Table "EVENT" we have about 160k entries, each EVENT has a GROUPID and a normal entry has exactly 5 rows with the same GROUPID. Due to a bug we currently get a couple of duplicate entries (duplicate, so 10 rows instead of 5, just a different EVENTID. This may change, so it's just <> 5). We need to filter all the entries of these groups.

Due to limited access to the database we can not use a temporary table, nor can we add an index to the GROUPID column to make it faster.

We can get the GROUPIDs with this query, but we would need a second query to get the needed data

select A."GROUPID"
from "EVENT" A
group by A."GROUPID"
having count(A."GROUPID") <> 5

One solution would be a subselect:

select *
from "EVENT" A
where A."GROUPID" IN (
  select B."GROUPID"
  from "EVENT" B
  group by B."GROUPID"
  having count(B."GROUPID") <> 5
)

Without an index on GROUPID and 160k entries, this takes much too long. Tried thinking about a join that can handle this, but can't find a good solution so far.

Anybody can find a good solution for this maybe?

Small edit: We don't have 100% duplicates here, as each entry still has a unique ID and the GROUPID is not unique either (that's why we need to use "group by") - or maybe I just miss an easy solution for it :)

Small example about the data (I don't want to delete it, just find it)

EVENTID | GROUPID | TYPEID
123456    123       12
123457    123       145
123458    123       2612
123459    123       41
123460    123       238

234567    123       12
234568    123       145
234569    123       2612
234570    123       41
234571    123       238

It has some more columns, like timestamp etc, but as you can see already, everything is identical, besides the EVENTID.

We will run it more often for testing, to find the bug and check if it happens again.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

A classic problem for analytic queries to solve:

select eventid,
       groupid,
       typeid
from   (
       Select eventid,
              groupid,
              typeid,
              count(*) over (partition by group_id) count_by_group_id
       from   EVENT
       )
where count_by_group_id <> 5

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...