google bigquery - Why the cost of a query on today cluster/partition is much higher than on previous dates?

Question

Welcome To Ask or Share your Answers For Others

google bigquery - Why the cost of a query on today cluster/partition is much higher than on previous dates?

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

google bigquery - Why the cost of a query on today cluster/partition is much higher than on previous dates?

I have a partition/cluster table as follow:

When I run this query:

SELECT
  projectId
FROM
  `projectId.dataset.tables`
WHERE _PARTITIONTIME >= "2019-03-16 00:00:00" AND _PARTITIONTIME <= "2019-03-17 00:00:00" 
  AND projectId='myproject' 
GROUP BY
  projectId
limit 1

I see an actual scan of 597 MB

However, When I run the same query on the previous day as follow:

SELECT
  projectId
FROM
  `projectId.dataset.tables`
WHERE _PARTITIONTIME >= "2019-03-15 00:00:00" AND _PARTITIONTIME <= "2019-03-16 00:00:00" 
  AND projectId='myproject' 
GROUP BY
  projectId
limit 1

I see an actual scan of 122 MB

Note: The results are even worse if I add more columns.

To make sure my partition has the same size I counted the number of projectId in each partition

SELECT _partitionTime as date, count(projectId) as count
FROM
  `projectId.dataset.tables`
WHERE _PARTITIONTIME >= "2019-03-15 00:00:00" AND _PARTITIONTIME <= "2019-03-17 00:00:00" 
GROUP BY
  date

And as you can see today partition has even fewer rows than the previous 2 days

In addtion I tried to query the streaming buffer using this query which returned no result

SELECT projectId FROM `projectId.dataset.tables`
WHERE _PARTITIONTIME IS NULL

My conclusion is that the streaming buffer is impacting the cost of the query on a cluster table but I'm not sure how can that be and why.

Any ideas on what is going on here and why do I see higher cost when querying today partition

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

google bigquery - Why the cost of a query on today cluster/partition is much higher than on previous dates?

google bigquery - Why the cost of a query on today cluster/partition is much higher than on previous dates?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags