You can simply call takeSample
on a RDD
:
df = sqlContext.createDataFrame(
[(1, "a"), (2, "b"), (3, "c"), (4, "d")], ("k", "v"))
df.rdd.takeSample(False, 1, seed=0)
## [Row(k=3, v='c')]
If you don't want to collect you can simply take a higher fraction and limit:
df.sample(False, 0.1, seed=0).limit(1)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…