Yes you can;
- Make module for each scraper
- Make an main app
- Import your modules to the main app
- Scrape your target website by using multiprocessing or multithreading methods.
Conceptual Code:
# This code will not run!
from multiprocessing import Pool
from Scrapers import Scraper1, Scraper2, Scraper3, ...
def run_each_scraper(get_scraper_object):
get_scraper_object.run()
def launcher():
list_of_websites = []
# use loops here
reserve_scraper_objects_for_websites = [Scraper1.scrape(list_of_websites[0]),
Scraper2.scrape(list_of_websites[1]),
...]
process_objects = Pool(20) # Depends on your system resource
process_objects.map(run_each_scraper, reserve_scraper_objects_for_websites)
if __name__ == '__main__':
launcher()
But if you need some technologies i suggest you to switch in Scrapy/Spider; Also you can handle dynamic websites with Splash, Splash can works with Scrapy. They made for big/massive web crawler apps even for production.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…