I use Spark to perform data transformations that I load into Redshift. Redshift does not support NaN values, so I need to replace all occurrences of NaN with NULL.
I tried something like this:
some_table = sql('SELECT * FROM some_table')
some_table = some_table.na.fill(None)
But I got the following error:
ValueError: value should be a float, int, long, string, bool or dict
So it seems like na.fill()
doesn't support None. I specifically need to replace with NULL
, not some other value, like 0
.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…