Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
299 views
in Technique[技术] by (71.8m points)

Check LAG and LEAD until NULL when condition is met BigQuery

DISCLAIMER: I apologize myself if the expression of this problem is too complicated. I tried my best. I had a really rough idea of how to solve this so I append the code down below although it doesn't contribute to solve my problem. Thank you.

I have a table like this:

enter image description here

WITH table1 AS (

  SELECT  '2020-12-01 00:00:06 UTC' Datetime,   NULL Column1,  'A0' Column2, 'x' Column3 UNION ALL
  SELECT  '2020-12-01 00:00:16 UTC',           'A1',           'A0',         'x'  UNION ALL
  SELECT  '2020-12-01 00:00:26 UTC',           'A1',           'A1',         'y'  UNION ALL
  SELECT  '2020-12-01 00:00:36 UTC',           'A1',           'A1',         'y'  UNION ALL
  SELECT  '2020-12-01 00:00:46 UTC',           'A1',           'A2',         'z'  UNION ALL
  SELECT  '2020-12-01 00:00:56 UTC',           'A1',           'A2',         'z'  UNION ALL
  SELECT  '2020-12-01 00:00:66 UTC',           NULL,           'A2',         'z'  UNION ALL
  SELECT  '2020-12-01 00:01:06 UTC',           NULL,           'A2',         'z'   
)  

select * from table1

I want to create a new column and append each value from Column3 for the whole length of Column1 where NULLS are the boundaries if the condition (Column1 = Column2) has been met at least once.

The detailed explanation: I want to first check for the condition Column1 = Column2; when this is true, I want to check the value in Column3. In my case this is 'y'. When the condition is met, I want to do LAG and LEAD until NULL in Column1 and then apply 'y' for the whole length of A1 in Column1 to the new column Column4. If the condition Column1 = Column2 would not be met in the first place, there should be NULL in Column4.

The output table would then be:

enter image description here

I tried with the code below but unfortunately the only thing it does is mess.

select *,
case when lag(Column1) over (partition by Column1 = Column2 order by DateTime) is null
             then Column3
     when lead(Column1) over (partition by Column1 = Column2 order by DateTime) is null
             then Column3
             else NULL
        end as Column4
from XX.YY.ZZ 
order by datetime
question from:https://stackoverflow.com/questions/65943156/check-lag-and-lead-until-null-when-condition-is-met-bigquery

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can solve it with a few window functions.

There are 2 things to do:
1- Getting the value from Column 3 if Column 1 = Column 2 (Easy part :) )
2- Finding the Column 1 groups (islands)

For the first one, it's pretty clear what I do:
IF(Column1 = Column2, Column3, NULL) Col4_temp,

For the second problem, 1- I try to find the changes in column1. If Column1 value is different then previous row, I mark it 1, otherwise 0.

To be able to do it, first I bring the value from the previous row with LAG(Column1). If it's equal to Column1 value, I return 0, otherwise 1.

If previous row is NULL, that comparison also returns NULL, so I fill it with COALESCE and mark it as 1 as well.

All change catching code is here:
COALESCE(IF(Lag(Column1) over wd = Column1, 0, 1), 1) as Col1_changes

In the next step, I cumulatively sum total number of changes. If there is no change, it means they're in the same group.
sum(Col1_changes) over (order by Datetime) as Col1_Group

In the Col1_Group, you can see that I grouped Column1 based on the changes. Finally, I get the maximum value within each Col1_Group and assign it to Column4.

I hope it's clear now :) If not, I'd recommend you to learn more about window functions.

All code is here:

WITH t1 AS (

  SELECT  '2020-12-01 00:00:06 UTC' Datetime,   NULL Column1,  'A0' Column2, 'x' Column3 UNION ALL
  SELECT  '2020-12-01 00:00:16 UTC',           'A1',           'A0',         'x'  UNION ALL
  SELECT  '2020-12-01 00:00:26 UTC',           'A1',           'A1',         'y'  UNION ALL
  SELECT  '2020-12-01 00:00:36 UTC',           'A1',           'A1',         'y'  UNION ALL
  SELECT  '2020-12-01 00:00:46 UTC',           'A1',           'A2',         'z'  UNION ALL
  SELECT  '2020-12-01 00:00:56 UTC',           'A1',           'A2',         'z'  UNION ALL
  SELECT  '2020-12-01 00:00:66 UTC',           NULL,           'A2',         'z'  UNION ALL
  SELECT  '2020-12-01 00:01:06 UTC',           NULL,           'A2',         'z'   
),
t2 AS
(
    select *, 
        IF(Column1 = Column2, Column3, NULL) Col4_temp,
        COALESCE(IF(Lag(Column1) over wd = Column1, 0, 1), 1) as Col1_changes
    from t1
    window wd as (order by Datetime)
),
t3 as
(
    select *,
        sum(Col1_changes) over (order by Datetime) as Col1_Group
    from t2
)
select *,
    MAX(Col4_temp) over (partition by Col1_Group) as Column4
from t3

Query output


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...