Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
245 views
in Technique[技术] by (71.8m points)

python - How can I only download files with timestamps in their names from the last 14 days from an SFTP?

import os
import datetime
from datetime import datetime
from dateutil.relativedelta import relativedelta
from dateutil import parser
import pysftp

lt_all = []

# disable hostkey checking
cnopts = pysftp.CnOpts()
cnopts.hostkeys = None

lt_all = []

srv = pysftp.Connection('sftp.com', username = 'username', password = "password", cnopts = cnopts)
srv.chdir('download')
server_file_list = srv.listdir()

for lt_file in server_file_list:
    if srv.isfile(lt_file) and ('invoices' in lt_file.lower() and 'daily' in lt_file.lower() and lt_file.endswith('.csv')):
        try: 
            srv.get(lt_file,os.path.join(os.path.join(data_folder_path,'Invoices'),lt_file),preserve_mtime=True)
        except:
            print("No Invoices Today")

The good news: I have been successfully downloading all CSV files from the SFTP location.

The bad news: all CSV files are being downloaded. Downloading 300+ files everyday is sub-optimal because downloading files that have already been downloaded is redundant.

These CSV files are generated daily. These files follow the same naming convention everyday: invoices_daily_20200204.csv. Notice the date comes at the very end in yyyymmdd format. Edit: The format is actually mmddyy.

How can I limit my downloads to only files created in the last 14 days? Is pysftp the best module for this?

question from:https://stackoverflow.com/questions/66057375/how-can-i-only-download-files-with-timestamps-in-their-names-from-the-last-14-da

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

With your originally claimed fixed sortable timestamp format yyyymmdd, it would be easy. If you know that you will always have 14 files to download, use the solution by @lllrnr101. If this is not certain, generate a threshold file name with 14 days old timestamp and compare that against the file names in the listing:

from datetime import datetime, timedelta

d14ago = datetime.now() - timedelta(14)
ts = datetime.strftime(d14ago, '%Y%m%d')
threshold = f"invoices_daily_{ts}.csv"

for lt_file in server_file_list:
    if srv.isfile(lt_file) and (lt_file >= threshold):
        # Download

But it turned out that your timestamp format is mmddyy (%m%d%y), what is not lexicographically sortable. That complicates the solution. One thing you can do is to reorder the timestamp to make it lexicographically sortable:

ts = datetime.strftime(d14ago, '%m%d%y')

for lt_file in server_file_list:
    if srv.isfile(lt_file) and lt_file.startswith("invoices_daily_"):
        file_ts = lt_file[19:21] + lt_file[15:17] + lt_file[17:19]
        if file_ts >= ts:
            # Download

Two side notes:


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...