Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
489 views
in Technique[技术] by (71.8m points)

windows - Run multiple commands in different SSH servers in parallel using Python Paramiko

I have an SSH.py with the goal of connecting to many servers over SSH to run a Python script (worker.py). I am using Paramiko, but am very new to it and learning as I go. On each server I ssh over with, I need to keep the Python script running -- this is for training a model parallely and so the script needs to run on all machines as to update model parameters/train jointly. The Python script on the servers need to be running so either all the SSH connections cannot close or I have to figure out a way for the Python script on the servers to keep running even if I close the connection.

From extensive googling, it looks like you can achieve this with nohup or:

client = paramiko.SSHClient()
client.connect(ip_address, username, password)
transport = client.get_transport()
channel = transport.open_session()
channel.exec_command("python worker.py > /logs/'command output' 2>&1")

However, what is unclear to me is how do we close/exit all SSH connections? I am running the SSH.py file on cmd.exe, would closing the cmd.exe be enough for all processes remotely to close?

In addition, is my use of client.close() correct for my purposes? Please see below what I have for my code.

# SSH.py

import paramiko
import argparse
import os

path = "path"
python_script = "worker.py"

# definitions for ssh connection and cluster
ip_list = ['XXX.XXX.XXX.XXX', XXX.XXX.XXX.XXX', XXX.XXX.XXX.XXX']
port_list = [':XXXX', ':XXXX', ':XXXX']
user_list = ['user', 'user', 'user']
password_list = ['pass', 'pass', 'pass']
node_list = list(map(lambda x: f'-node{x + 1} ', list(range(len(ip_list)))))
cluster = ' '.join([node + ip + port for node, ip, port in zip(node_list, ip_list, port_list)])

# run script on command line of local machine
os.system(f"cd {path} && python {python_script} {cluster} -type worker -index 0 -batch 64 > {path}/logs/'command output'/{ip_list[0]}.log 2>&1")

# loop for IP and password
for i, (ip, user, password) in enumerate(zip(ip_list[1:], user_list[1:], password_list[1:]), 1):
    try:
        print("Open session in: " + ip + "...")
        client = paramiko.SSHClient()
        client.connect(ip, user, password)
        transport = client.get_transport()
        channel = transport.open_session()
    except paramiko.SSHException:
        print("Connection Failed")
        quit()

    try:
        channel.exec_command(f"cd {path} && python {python_script} {cluster} -type worker -index {i} -batch 64 > {path}/logs/'command output'/{ip_list[i]}.log 2>&1", timeout=30)
        client.close() # here I am closing connection but above command should be running, my question is can I safely close cmd.exe on which I am running SSH.py? 
    except paramiko.SSHException:
        print("Cannot run file. Continue with other IPs in list...")
        client.close()
        continue

The code is based on Running process of remote SSH server in the background using Python Paramiko

Edit: It seems like the channel.exec_command() is not executing the command

f"cd {path} && python {python_script} {cluster} -type worker -index {i} -batch 64 > {path}/logs/'command output'/{ip_list[i]}.log 2>&1"

So I wonder if it is because of client.close()? What would happen if I comment out all the lines with client.close()? Would this help? Is this dangerous? When I quit my local Python script, would this close all my SSH connections and hence, no need for client.close()?

Also all my machines have Windows OS.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Indeed, the problem is that you close the SSH connection. As the remote process is not detached from the terminal, closing the terminal terminates the process. On Linux servers, you can use nohup. I do not know what is (if there is) a Windows equivalent.

Anyway, it seems that you do not need to close the connection. I understood, that you are ok with waiting for all the commands to complete.

stdouts = []
clients = []

# Start the commands
for i, (ip, user, password) in enumerate(zip(ip_list[1:], user_list[1:], password_list[1:]), 1):
    print("Open session in: " + ip + "...")
    client = paramiko.SSHClient()
    client.connect(ip, user, password)
    command = 
        f"cd {path} && " + 
        f"python {python_script} {cluster} -type worker -index {i} -batch 64 " + 
        f"> {path}/logs/'command output'/{ip_list[i]}.log 2>&1"
    stdin, stdout, stderr = client.exec_command(command)
    clients.append(client)
    stdouts.append(stdout)

# Wait for commands to complete
for i in range(len(stdouts)):
    stdouts[i].read()
    clients[i].close()

Note that the above simple solution with stdout.read() is working only because you redirect the commands output to a remote file. Were you not, the commands might deadlock.

Without that (or if you want to see the command output locally) you will need a code like this:

while any(x is not None for x in stdouts):
    for i in range(len(stdouts)):
        stdout = stdouts[i]
        if stdout is not None:
            channel = stdout.channel
            # To prevent losing output at the end, first test for exit, then for output
            exited = channel.exit_status_ready()
            while channel.recv_ready():
                s = channel.recv(1024).decode('utf8')
                print(f"#{i} stdout: {s}")
            while channel.recv_stderr_ready():
                s = channel.recv_stderr(1024).decode('utf8')
                print(f"#{i} stderr: {s}")
            if exited:
                print(f"#{i} done")
                clients[i].close()
                stdouts[i] = None
    time.sleep(0.1)

If you do not need to separate the stdout and stderr, you can greatly simplify the code by using Channel.set_combine_stderr. See Paramiko ssh die/hang with big output.


Regarding your question about SSHClient.close: If you do not call it, the connection will be closed implicitly, when the script finishes, when Python garbage collector cleans up the pending objects. It's a bad practice. And even if Python won't do it, the local OS will terminate all connections of the local Python process. That's a bad practice too. In any case, that will terminate the remote processes along.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...