I have a pyspark dataframe:
id | column
------------------------------
1 | [0.2, 2, 3, 4, 3, 0.5]
------------------------------
2 | [7, 0.3, 0.3, 8, 2,]
------------------------------
I would like to create a 3 columns:
Column 1
: contain the sum of the elements < 2
Column 2
: contain the sum of the elements > 2
Column 3
: contain the sum of the elements = 2 (some times I have duplicate values so I do their sum) In case if I don't have a
values I put null.
Expect result:
id | column | column<2 | column>2 | column=2
------------------------------|--------------------------------------------
1 | [0.2, 2, 3, 4, 3, 0.5]| [0.7] | [12] | null
---------------------------------------------------------------------------
2 | [7, 0.3, 0.3, 8, 2,] | [0.6] | [15] | [2]
---------------------------------------------------------------------------
Can you help me please ?
Thank you
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…