So, when running from pyspark i would type in (without specifying any contexts) :
df_openings_latest = sqlContext.sql('select * from experian_int_openings_latest_orc')
.. and it works fine.
However, when i run my script from spark-submit
, like
spark-submit script.py
i put the following in
from pyspark.sql import SQLContext
from pyspark import SparkConf, SparkContext
conf = SparkConf().setAppName('inc_dd_openings')
sc = SparkContext(conf=conf)
sqlContext = SQLContext(sc)
df_openings_latest = sqlContext.sql('select * from experian_int_openings_latest_orc')
But it gives me an error
pyspark.sql.utils.AnalysisException: u'Table not found:
experian_int_openings_latest_orc;'
So it doesnt see my table.
What am I doing wrong? Please help
P.S. Spark version is 1.6 running on Amazon EMR
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…