Let's say I have a spark data frame df1
, with several columns (among which the column id
) and data frame df2
with two columns, id
and other
.
Is there a way to replicate the following command
sqlContext.sql("SELECT df1.*, df2.other FROM df1 JOIN df2 ON df1.id = df2.id")
by using only pyspark functions such as join()
, select()
and the like?
I have to implement this join in a function and I don't want to be forced to have sqlContext as a function parameter.
Thanks!
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…