You can solve it with a few window functions.
There are 2 things to do:
1- Getting the value from Column 3 if Column 1 = Column 2 (Easy part :) )
2- Finding the Column 1 groups (islands)
For the first one, it's pretty clear what I do:
IF(Column1 = Column2, Column3, NULL) Col4_temp,
For the second problem,
1- I try to find the changes in column1. If Column1 value is different then previous row, I mark it 1, otherwise 0.
To be able to do it, first I bring the value from the previous row with LAG(Column1)
. If it's equal to Column1 value, I return 0, otherwise 1.
If previous row is NULL, that comparison also returns NULL, so I fill it with COALESCE and mark it as 1 as well.
All change catching code is here:
COALESCE(IF(Lag(Column1) over wd = Column1, 0, 1), 1) as Col1_changes
In the next step, I cumulatively sum total number of changes. If there is no change, it means they're in the same group.
sum(Col1_changes) over (order by Datetime) as Col1_Group
In the Col1_Group, you can see that I grouped Column1 based on the changes. Finally, I get the maximum value within each Col1_Group
and assign it to Column4.
I hope it's clear now :) If not, I'd recommend you to learn more about window functions.
All code is here:
WITH t1 AS (
SELECT '2020-12-01 00:00:06 UTC' Datetime, NULL Column1, 'A0' Column2, 'x' Column3 UNION ALL
SELECT '2020-12-01 00:00:16 UTC', 'A1', 'A0', 'x' UNION ALL
SELECT '2020-12-01 00:00:26 UTC', 'A1', 'A1', 'y' UNION ALL
SELECT '2020-12-01 00:00:36 UTC', 'A1', 'A1', 'y' UNION ALL
SELECT '2020-12-01 00:00:46 UTC', 'A1', 'A2', 'z' UNION ALL
SELECT '2020-12-01 00:00:56 UTC', 'A1', 'A2', 'z' UNION ALL
SELECT '2020-12-01 00:00:66 UTC', NULL, 'A2', 'z' UNION ALL
SELECT '2020-12-01 00:01:06 UTC', NULL, 'A2', 'z'
),
t2 AS
(
select *,
IF(Column1 = Column2, Column3, NULL) Col4_temp,
COALESCE(IF(Lag(Column1) over wd = Column1, 0, 1), 1) as Col1_changes
from t1
window wd as (order by Datetime)
),
t3 as
(
select *,
sum(Col1_changes) over (order by Datetime) as Col1_Group
from t2
)
select *,
MAX(Col4_temp) over (partition by Col1_Group) as Column4
from t3