Generally speaking, is there a performance difference between using a JOIN to select rows versus an EXISTS where clause? Searching various Q&A web sites suggests that a join is more efficient, but I recall learning a long time ago that EXISTS was better in Teradata.
I do see other SO answers, like this and this, but my question is specific to Teradata.
For example, consider these two queries, which return identical results:
select svc.ltv_scr, count(*) as freq
from MY_BASE_TABLE svc
join MY_TARGET_TABLE x
on x.srv_accs_id=svc.srv_accs_id
group by 1
order by 1
-and-
select svc.ltv_scr, count(*) as freq
from MY_BASE_TABLE svc
where exists(
select 1
from MY_TARGET_TABLE x
where x.srv_accs_id=svc.srv_accs_id)
group by 1
order by 1
The primary index (unique) on both tables is 'srv_accs_id'. MY_BASE_TABLE is rather large (200 million rows) and MY_TARGET_TABLE relatively small (200,000 rows).
There is one significant difference in the EXPLAIN plans: The first says the two tables are joined "by way of a RowHash match scan" and the second says "by way of an all-rows scan". Both say it is "an all-AMPs JOIN step" and the total estimated time is identical (0.32 seconds).
Both queries perform the same (I'm using Teradata 13.10).
A similar experiment to find non-matches comparing a LEFT OUTER JOIN with a corresponding IS NULL where clause to a NOT EXISTS sub-query does show a performance difference:
select svc.ltv_scr, count(*) as freq
from MY_BASE_TABLE svc
left outer join MY_TARGET_TABLE x
on x.srv_accs_id=svc.srv_accs_id
where x.srv_accs_id is null
group by 1
order by 1
-and-
select svc.ltv_scr, count(*) as freq
from MY_BASE_TABLE svc
where not exists(
select 1
from MY_TARGET_TABLE x
where x.srv_accs_id=svc.srv_accs_id)
group by 1
order by 1
The second query plan is faster (2.21 versus 2.14 seconds as described by EXPLAIN).
My example may be too trivial to see a difference; I'm just looking for coding guidance.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…