So I have my images links like:
https://my_website_name.s3.ap-south-1.amazonaws.com/XYZ/image_id/crop_image.png
and I have almost 10M images which I want to use for Deep Learning purpose. I have a script to download the images and save them in desired directories already using requests
and PIL
Most na?ve idea that I have and which I have been using my whole life is to first download all the images in my local machine, make a zip and upload it to Google Drive
where I can just use gdown
to download it anywhere based on my Network Speed. Or just copy to Colab
using terminal.
But that data was not so big. Always under 200K images. But now, the data is huge so downloading and again uploading the images will take a whole lot of time in days and on top of that, it'll just make the Google Drive
run out of space with 10M images. So I am thinking about using AWS ML
(SageMaker) or something else from AWS
. So is there a better approach to this? How can I import the data directly to my SSD
supported based virtual machine?
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…