Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
269 views
in Technique[技术] by (71.8m points)

python - How to download images on Colab / AWS ML instance for ML purposes when Images already belong to AWS links

So I have my images links like: https://my_website_name.s3.ap-south-1.amazonaws.com/XYZ/image_id/crop_image.png

and I have almost 10M images which I want to use for Deep Learning purpose. I have a script to download the images and save them in desired directories already using requests and PIL

Most na?ve idea that I have and which I have been using my whole life is to first download all the images in my local machine, make a zip and upload it to Google Drive where I can just use gdown to download it anywhere based on my Network Speed. Or just copy to Colab using terminal.

But that data was not so big. Always under 200K images. But now, the data is huge so downloading and again uploading the images will take a whole lot of time in days and on top of that, it'll just make the Google Drive run out of space with 10M images. So I am thinking about using AWS ML (SageMaker) or something else from AWS. So is there a better approach to this? How can I import the data directly to my SSD supported based virtual machine?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use the AWS python library boto3 to connect to the S3 bucket from Colab: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...