Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
526 views
in Technique[技术] by (71.8m points)

python - Paramiko Fails to download large files >1GB

def download():
if os.path.exists( dst_dir_path ) == False:
    logger.error( "Cannot access destination folder %s. Please check path and permissions. " % ( dst_dir_path ))
    return 1
elif os.path.isdir( dst_dir_path ) == False:
    logger.error( "%s is not a folder. Please check path. " % ( dst_dir_path ))
    return 1

file_list = None
#transport = paramiko.Transport(( hostname, port)) 
paramiko.util.log_to_file('paramiko.log')
ssh = paramiko.SSHClient() 
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy()) 
#transport
try:
    ssh.connect( hostname, username=username, password=password, timeout=5.0) 
    #transport.connect(username=username, password=password ) 
except Exception, err:
    logger.error( "Failed to connect to the remote server. Reason: %s" % ( str(err) ) )
    return 1

try:
    #sftp = paramiko.SFTPClient.from_transport(transport)
    sftp = ssh.open_sftp() 
except Exception, err:
    logger.error( "Failed to start SFTP session from connection to %s. Check that SFTP service is running and available. Reason: %s" % ( hostname, str(err) ))
    return 1

try:    
    sftp.chdir(src_dir_path)
    #file_list = sftp.listdir(path="%s" % ( src_dir_path ) )
    file_list = sftp.listdir()

except Exception, err:
    logger.error( "Failed to list files in folder %s. Please check path and permissions. Reason: %s" % ( src_dir_path, str(err) ))
    return 1
match_text = re.compile( file_mask )
download_count = 0
for file in file_list:         
    # Here is an item name... but is it a file or directory?         
    #logger.info( "Downloading file %s." % ( file ) )
    if not re.match( file_mask, file ):
        continue
    else:
        logger.info( "File "%s" name matched file mask "%s". matches %s.Processing file..." % ( file, file_mask, (match_text.match( file_mask ) ) ) )
    src_file_path = "./%s" % ( file )
    dst_file_path = "/".join( [ dst_dir_path, file]   )
    retry_count = 0
    while True:
        try:
            logger.info( "Downloading file %s to %s."  % ( file, dst_file_path ) )
            #sftp.get( file, dst_file_path, callback=printTotals ) #sftp.get( remote file, local file )
            sftp.get( file, dst_file_path) #sftp.get( remote file, local file )
            logger.info( "Successfully downloaded file %s to %s."  % ( file, dst_file_path ) )
            download_count += 1
            break
        except Exception, err:
            if retry_count == retry_threshold:
                logger.error( "Failed to download %s to %s. Reason: %s." % ( file, dst_file_path, str(err) ) )
                sftp.close() 
                #transport.close()
                return 1
            else:
                logger.error( "Failed to download %s to %s. Reason: %s." % ( file, dst_file_path, str(err) ) )
                retry_count +=1

sftp.close() 
transport.close() 
logger.info( "%d files downloaded." % ( download_count ) )
return 0

When I run the below function, it downloads the source file for about 3 minutes and then closes the session, even though only 38-41MB(varies) of a 1-1.6GB file has downloaded.

From the Paramiko log file, it looks like the SSh connection stay open while the SFTP session closes:

DEB [20120913-10:05:00.894] thr=1 paramiko.transport: Switch to new keys ... DEB [20120913-10:05:06.953] thr=1 paramiko.transport: Rekeying (hit 401 packets, 1053444 bytes received) DEB [20120913-10:05:07.391] thr=1 paramiko.transport: kex algos:['diffie-hellman-group1-sha1', 'diffie-hellman-group-exchange-sha1'] server key:['ssh-dss'] client encrypt:['aes256-ctr', 'aes192-ctr', 'aes128-ctr', 'aes256-cbc', 'aes192-cbc', 'aes128-cbc', 'twofish-cbc', 'blowfish-cbc', '3des-cbc', 'arcfour'] server encrypt:['aes256-ctr', 'aes192-ctr', 'aes128-ctr', 'aes256-cbc', 'aes192-cbc', 'aes128-cbc', 'twofish-cbc', 'blowfish-cbc', '3des-cbc', 'arcfour'] client mac:['hmac-sha1', 'hmac-sha1-96', 'hmac-md5', 'hmac-md5-96', '[email protected]'] server mac:['hmac-sha1', 'hmac-sha1-96', 'hmac-md5', 'hmac-md5-96', '[email protected]'] client compress:['[email protected]', 'zlib', 'none'] server compress:['[email protected]', 'zlib', 'none'] client lang:[''] server lang:[''] kex follows?False DEB [20120913-10:05:07.421] thr=1 paramiko.transport: Ciphers agreed: local=aes128-ctr, remote=aes128-ctr DEB [20120913-10:05:07.421] thr=1 paramiko.transport: using kex diffie-hellman-group1-sha1; server key type ssh-dss; cipher: local aes128-ctr, remote aes128-ctr; mac: local hmac-sha1, remote hmac-sha1; compression: local none, remote none DEB [20120913-10:05:07.625] thr=1 paramiko.transport: Switch to new keys ... INF [20120913-10:05:10.374] thr=2 paramiko.transport.sftp: [chan 1] sftp session closed. DEB [20120913-10:05:10.388] thr=2 paramiko.transport: [chan 1] EOF sent (1)

After this point, the script quits with this exception ( from the sftp.get() try/except block )

There are insufficient resources to complete the request

The system itself has gigabytes of disk space free, so that isn't the problem.

The same transfer the parakmiko fails on works fine with FileZilla and with Java app that I wrote years ago to do SFTP transfers. So I think its a problem with paramiko.

This is running it on Windows XP and on Windows Server 2003.

I've tried patching Paramko 1.17 so that it refreshes keys more often, but the transfer still throws an exceptiom. Python 2.7.3 Paramiko 1.7 with patch Windows 2003 Sevfer

Ideas?

Additional Information: It fails on Windows XP SP3 and Windows 2003 server, exact same behavior and error messages. sys.version information Window XP Workstation: '2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)]' Windows 2003 Server: '2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)]' I patched the packet.py file to decrease time between key renewals. It had no effect on the behavior of sftp.get().

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The SFTP protocol doesn't have a way to stream file data; instead what it has is a way to request a block of data from a particular offset in an open file. The naive method of downloading a file would be to request the first block, write it to disk, then request the second block, and so forth. This is reliable, but very slow.

Instead, Paramiko has a performance trick it uses: when you call .get() it immediately sends a request for every block in the file, and it remembers what offset they're supposed to be written to. Then as each response arrives, it makes sure it gets written to the correct offset on-disk. For more information, see the SFTPFile.prefetch() and SFTPFile.readv() methods in the Paramiko documentation. I suspect the book-keeping information it stores when downloading your 1GB file might be causing... something to run out of resources, generating your "insufficient resources" message.

Rather than using .get(), if you just call .open() to get an SFTPFile instance, then call .read() on that object, or just hand it to the Python standard library function shutil.copyfileobj() to download the contents. That should avoid the Paramiko prefetch cache, and allow you to download the file even if it's not quite as fast.

i.e:

 def lazy_loading_ftp_file(sftp_host_conn, filename):
    """
        Lazy loading ftp file when exception simple sftp.get call
        :param sftp_host_conn: sftp host
        :param filename: filename to be downloaded
        :return: None, file will be downloaded current directory
    """
    import shutil
    try:
        with sftp_host_conn() as host:
            sftp_file_instance = host.open(filename, 'r')
            with open(filename, 'wb') as out_file:
                shutil.copyfileobj(sftp_file_instance, out_file)
            return {"status": "sucess", "msg": "sucessfully downloaded file: {}".format(filename)}
    except Exception as ex:
        return {"status": "failed", "msg": "Exception in Lazy reading too: {}".format(ex)}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...