javascript - How to bypass cloudflare bot/ddos protection in Scrapy?

Question

Welcome To Ask or Share your Answers For Others

javascript - How to bypass cloudflare bot/ddos protection in Scrapy?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

javascript - How to bypass cloudflare bot/ddos protection in Scrapy?

I used to scrape e-commerce webpage occasionally to get product prices information. I have not used the scraper built using Scrapy in a while and yesterday was trying to use it - I run into a problem with bot protection.

It is using CloudFlare’s DDOS protection which is basically using JavaScript evaluation to filter out the browsers (and therefore scrapers) with JS disabled. Once the function is evaluated, the response with calculated number is generated. In return, service sends back two authentication cookies which attached to each request allow to normally crawl the site. Here's the description of how it works.

I have also found a cloudflare-scrape Python module that uses external JS evaluation engine to calculate the number and send the request back to server. I'm not sure how to integrate it into Scrapy though. Or maybe there's a smarter way without using JS execution? In the end, it's a form...

I'd apriciate any help.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:46:33+0000

So I executed JavaScript using Python with help of cloudflare-scrape.

To your scraper, you need to add the following code:

def start_requests(self):
  for url in self.start_urls:
    token, agent = cfscrape.get_tokens(url, 'Your prefarable user agent, _optional_')
    yield Request(url=url, cookies=token, headers={'User-Agent': agent})

alongside parsing functions. And that's it!

Of course, you need to install cloudflare-scrape first and import it to your spider. You also need a JS execution engine installed. I had Node.JS already, no complaints.

Categories

javascript - How to bypass cloudflare bot/ddos protection in Scrapy?

javascript - How to bypass cloudflare bot/ddos protection in Scrapy?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags