I'm trying to filter a PySpark dataframe that has None
as a row value:
df.select('dt_mvmt').distinct().collect()
[Row(dt_mvmt=u'2016-03-27'),
Row(dt_mvmt=u'2016-03-28'),
Row(dt_mvmt=u'2016-03-29'),
Row(dt_mvmt=None),
Row(dt_mvmt=u'2016-03-30'),
Row(dt_mvmt=u'2016-03-31')]
and I can filter correctly with an string value:
df[df.dt_mvmt == '2016-03-31']
# some results here
but this fails:
df[df.dt_mvmt == None].count()
0
df[df.dt_mvmt != None].count()
0
But there are definitely values on each category. What's going on?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…