python - can we write a script which could be usable to scrape multiple sites

Question

Welcome To Ask or Share your Answers For Others

python - can we write a script which could be usable to scrape multiple sites

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - can we write a script which could be usable to scrape multiple sites

I have written almost 30 different scraping scripts for 30 different websites. A friend of mine told me that it is possible to have a single code file for scraping all these 30 websites and bring it to the dashboard for dynamic scraping (I didn't understand what he means). I know every single website has its own structure and different data is coming from different pages and elements. On the other hand, some websites provide dynamic data instead of static data which I used selenium for its scraping.

I am really not sure about what he was thinking and is it possible to follow a path where I right one single long script file and use it for scraping many websites.

I would appreciate it if anyone having knowledge about this help me with the idea, tutorials, web contents and...

question from:https://stackoverflow.com/questions/65867436/can-we-write-a-script-which-could-be-usable-to-scrape-multiple-sites

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:24:28+0000

Yes you can;

Make module for each scraper
Make an main app
Import your modules to the main app
Scrape your target website by using multiprocessing or multithreading methods.

Conceptual Code:

# This code will not run!

from multiprocessing import Pool
from Scrapers import Scraper1, Scraper2, Scraper3, ...


def run_each_scraper(get_scraper_object):
    get_scraper_object.run()

def launcher():
    list_of_websites = []
    # use loops here
    reserve_scraper_objects_for_websites = [Scraper1.scrape(list_of_websites[0]),
                                            Scraper2.scrape(list_of_websites[1]),
                                            ...]
    process_objects = Pool(20) # Depends on your system resource
    process_objects.map(run_each_scraper, reserve_scraper_objects_for_websites)

if __name__ == '__main__':
    launcher()

But if you need some technologies i suggest you to switch in Scrapy/Spider; Also you can handle dynamic websites with Splash, Splash can works with Scrapy. They made for big/massive web crawler apps even for production.

Categories

python - can we write a script which could be usable to scrape multiple sites

python - can we write a script which could be usable to scrape multiple sites

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Yes you can;

Conceptual Code:

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags