I have a dataframe which contains two column 'mobile' & 'totalcount' and I want to update 'totalcount' column of the dataframe in mysql table in which 'totalcount' column is already present. I was doing this with simple 'for' loop but it's taking too long to update. How can I speed up my process or how can I use UDF functions to do the same?
for row in df.rdd.collect(): query = f"UPDATE dummy.member_report SET totalcount = '{row.totalcount}' WHERE mobile = '{row.mobile}'" Mysql_insert(spark,query) print("finsished")
1.4m articles
1.4m replys
5 comments
57.0k users