I have a partition/cluster table as follow:
When I run this query:
SELECT
projectId
FROM
`projectId.dataset.tables`
WHERE _PARTITIONTIME >= "2019-03-16 00:00:00" AND _PARTITIONTIME <= "2019-03-17 00:00:00"
AND projectId='myproject'
GROUP BY
projectId
limit 1
I see an actual scan of 597 MB
However, When I run the same query on the previous day as follow:
SELECT
projectId
FROM
`projectId.dataset.tables`
WHERE _PARTITIONTIME >= "2019-03-15 00:00:00" AND _PARTITIONTIME <= "2019-03-16 00:00:00"
AND projectId='myproject'
GROUP BY
projectId
limit 1
I see an actual scan of 122 MB
Note: The results are even worse if I add more columns.
To make sure my partition has the same size I counted the number of projectId in each partition
SELECT _partitionTime as date, count(projectId) as count
FROM
`projectId.dataset.tables`
WHERE _PARTITIONTIME >= "2019-03-15 00:00:00" AND _PARTITIONTIME <= "2019-03-17 00:00:00"
GROUP BY
date
And as you can see today partition has even fewer rows than the previous 2 days
In addtion I tried to query the streaming buffer using this query which returned no result
SELECT projectId FROM `projectId.dataset.tables`
WHERE _PARTITIONTIME IS NULL
My conclusion is that the streaming buffer is impacting the cost of the query on a cluster table but I'm not sure how can that be and why.
Any ideas on what is going on here and why do I see higher cost when querying today partition
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…