I'm trying to run the example code below:
import sparknlp
sparknlp.start()
from sparknlp.pretrained import PretrainedPipeline
explain_document_pipeline = PretrainedPipeline("explain_document_ml")
annotations = explain_document_pipeline.annotate("We are very happy about SparkNLP")
print(annotations)
I'm using Pycharm in Anaconda env, originally I downloaded spark-nlp with pip spark-nlp==2.4.4
but I saw someone online said I should use:
pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.11:2.4.4
Because with pip
install I might have some dependencies missing, so better use pyspark --packages
, but this gave me error:
:: problems summary ::
:::: WARNINGS
[NOT FOUND ] com.typesafe#config;1.3.0!config.jar(bundle) (0ms)
==== local-m2-cache: tried
file:/C:/Users/xxxxxxxx/.m2/repository/com/typesafe/config/1.3.0/config-1.3.0.jar
[NOT FOUND ] com.fasterxml.jackson.core#jackson-annotations;2.6.0!jackson-annotations.jar(bundle) (0ms)
==== local-m2-cache: tried
file:/C:/Users/xxxxxxx/.m2/repository/com/fasterxml/jackson/core/jackson-annotations/2.6.0/jackson-annotations-2.6.0.jar
::::::::::::::::::::::::::::::::::::::::::::::
:: FAILED DOWNLOADS ::
:: ^ see resolution messages for details ^ ::
::::::::::::::::::::::::::::::::::::::::::::::
:: com.typesafe#config;1.3.0!config.jar(bundle)
:: com.fasterxml.jackson.core#jackson-annotations;2.6.0!jackson-annotations.jar(bundle)
::::::::::::::::::::::::::::::::::::::::::::::
:::: ERRORS
unknown resolver null
unknown resolver null
unknown resolver null
unknown resolver null
unknown resolver default
unknown resolver null
:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [download failed: com.typesafe#config;1.3.0!config.jar(bundle), download failed: com.fasterxml.jackson.core#jackson-annotations;2.6.0!jackson-annotations.jar(
bundle)]
Then I downloaded these two missing jars and copy them to the corresponding folder, then run the command and everything seems look fine now:
21/02/05 15:41:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_ / _ / _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_ version 2.4.4
/_/
Using Python version 3.7.1 (default, Oct 28 2018 08:39:03)
SparkSession available as 'spark'.
>>>
Then I tried to re-run the example python script at the top, it gave me error, here's the logs:
Ivy Default Cache set to: C:Usersxxxx.ivy2cache
The jars for the packages stored in: C:Usersxxxx.ivy2jars
:: loading settings :: url = jar:file:/C:/Users/xxxx/AppData/Local/Continuum/anaconda3/envs/workEnv-python3.7/lib/site-packages/pyspark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.johnsnowlabs.nlp#spark-nlp_2.11 added as a dependency
............
I'm new to this, I've been messing around for two days, might someone able to help me please???
question from:
https://stackoverflow.com/questions/66065986/pyspark-sql-utils-illegalargumentexception-requirement-failed-was-not-found-a 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…