I have been trying to run this code in pyspark.
sqlContext = HiveContext(sc)
datumDF = sqlContext.createDataFrame(datumX, schema)
But have been receiving this warning:
Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly", Py4JJavaError(u'An error occurred while calling None.org.apache.spark.sql.hive.HiveContext.
', JavaObject id=o44))
I log in to AWS and spin up clusters with this code: /User/Downloads/spark-1.5.2-bin-hadoop2.6/ec2/spark-ec2 -k name -i /User/Desktop/pemfile.pem login clustername
However I all the docs I've found involve this commands, which exist in the file
/users/downloads/spark-1.5.2/
I've run them anyway, and tried logging into was using the ec2 command in that folder after I did. Still, just got the same error
I submit export SPARK_HIVE=TRUE
before running these commands on my local machine, but I've seen messages saying its deprecated and will be ignored anyway.
Build hive with maven:
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0
-Phive -Phive-thriftserver -DskipTests clean package
Build hive with sbt
build/sbt -Pyarn -Phadoop-2.3 assembly
And another I found
./sbt/sbt -Phive assembly
I also took the hive-site.xml file
and put in both the /Users/Downloads/spark-1.5.2-bin-hadoop2.6/conf folder and the /Users/Downloads/spark-1.5.2/conf
Still no luck.
I can't seem to run the hive commands no matter what I build it with or how I log in. Is there anything obvious I'm missing.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…