Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
946 views
in Technique[技术] by (71.8m points)

pyspark - spark worker initially connecting and then disconnecting, trying to reconnect

My setup is simple, centos master, centos worker. In master spark-env.sh

export STANDALONE_SPARK_MASTER_HOST=`hostname -f`
export SPARK_MASTER_HOST='10.0.0.6'
#export SPARK_EXECUTOR_CORES=1
[sudip@master sbin]$ hostname -f
master
[sudip@master sbin]$ cat /etc/hosts
10.0.0.6    master
10.0.0.20   slave01
#::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
#127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
[sudip@master sbin]$ 

Master started, the worker started... But the worker not able to connect to the master when started by start-all.sh. I posted earlier, can anybody please help. I am badly stuck.

Here are the logs from the worker

[sudip@slave01 ~]$ cat /opt/spark/logs/spark-sudip-org.apache.spark.deploy.worker.Worker-1-slave01.out
Spark Command: /usr/java/jdk1.8.0_271-amd64/bin/java -cp /opt/spark/conf/:/opt/spark/jars/* -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://10.0.0.6:7077
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/01/25 01:30:26 INFO Worker: Started daemon with process name: 4519@slave01
21/01/25 01:30:26 INFO SignalUtils: Registered signal handler for TERM
21/01/25 01:30:26 INFO SignalUtils: Registered signal handler for HUP
21/01/25 01:30:26 INFO SignalUtils: Registered signal handler for INT
21/01/25 01:30:27 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/01/25 01:30:27 INFO SecurityManager: Changing view acls to: sudip
21/01/25 01:30:27 INFO SecurityManager: Changing modify acls to: sudip
21/01/25 01:30:27 INFO SecurityManager: Changing view acls groups to: 
21/01/25 01:30:27 INFO SecurityManager: Changing modify acls groups to: 
21/01/25 01:30:27 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(sudip); groups with view permissions: Set(); users  with modify permissions: Set(sudip); groups with modify permissions: Set()
21/01/25 01:30:27 INFO Utils: Successfully started service 'sparkWorker' on port 37195.
21/01/25 01:30:28 INFO Worker: Starting Spark worker 10.0.0.20:37195 with 8 cores, 22.3 GiB RAM
21/01/25 01:30:28 INFO Worker: Running Spark version 3.0.1
21/01/25 01:30:28 INFO Worker: Spark home: /opt/spark
21/01/25 01:30:28 INFO ResourceUtils: ==============================================================
21/01/25 01:30:28 INFO ResourceUtils: Resources for spark.worker:

21/01/25 01:30:28 INFO ResourceUtils: ==============================================================
21/01/25 01:30:28 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
21/01/25 01:30:28 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://slave01:8081
21/01/25 01:30:28 INFO Worker: Connecting to master 10.0.0.6:7077...
21/01/25 01:30:28 INFO TransportClientFactory: Successfully created connection to /10.0.0.6:7077 after 40 ms (0 ms spent in bootstraps)
21/01/25 01:30:38 INFO Worker: Retrying connection to master (attempt # 1)
21/01/25 01:30:38 INFO Worker: Connecting to master 10.0.0.6:7077...
21/01/25 01:30:48 INFO Worker: Retrying connection to master (attempt # 2)
21/01/25 01:30:48 INFO Worker: Connecting to master 10.0.0.6:7077...
......................
21/01/25 01:42:42 INFO Worker: Retrying connection to master (attempt # 16)
21/01/25 01:42:42 INFO Worker: Connecting to master 10.0.0.6:7077...
21/01/25 01:43:44 ERROR Worker: All masters are unresponsive! Giving up.
[sudip@slave01 ~]$ 
question from:https://stackoverflow.com/questions/65872172/spark-worker-initially-connecting-and-then-disconnecting-trying-to-reconnect

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...