I have a dataframe df and a column name setp To create a list I wrote
setp_list=df.select ('setp').distinct().collect() setp_array=[row.setp for row in setp_list] setp_array= str(setp_array)[1:-1]
I wanted to use it in the spark.sql statement
df1=spark.sql(f"select * from table where setp in ({setp_array})").
I am not sure how to display the list to see how is was created but mainly I want it to include in the spark sql statement. It throws error at spark sql statement as invalid syntax
Avoid collecting items from one table and use it in the query of another table. Use a JOIN to write relational queries.
JOIN
df.createOrReplaceTempView('df') df1 = spark.sql("select * from table semi join df using(setp)")
1.4m articles
1.4m replys
5 comments
57.0k users