Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

apache spark - Capturing the result of explain() in pyspark

In pyspark, running:

sdf = sqlContext.sql("""SELECT * FROM t1 JOIN t2 on t1.c1 = t2.c1 """)

and then:

sdf.explain(extended=True)

it prints the logical and physical plans of the query execution.

My question is: How can I capture the output in a variable, instead of printing it?

v = sdf.explain(extended=True) naturally, does not work

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If you take a look at the source code of explain (version 2.4 or older), you see that :

def explain(self, extended=False):
    if extended:
        print(self._jdf.queryExecution().toString())
    else:
        print(self._jdf.queryExecution().simpleString())

Therefore, if you want to retrieve the explain plan directly, just use the method _jdf.queryExecution() on your dataframe :

v = sdf._jdf.queryExecution().toString()  # or .simpleString()

From 3.0, the code is :

print(
    self._sc._jvm.PythonSQLUtils.explainString(self._jdf.queryExecution(), explain_mode)
)

Removing the print, you get the explain as a string.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...