I am using scrapy
to screen scrape data from a website. However, the data I wanted wasn't inside the html itself, instead, it is from a javascript. So, my question is:
How to get the values (text values) of such cases?
This, is the site I'm trying to screen scrape:
https://www.mcdonalds.com.sg/locate-us/
Attributes I'm trying to get:
Address, Contact, Operating hours.
If you do a "right click", "view source" inside a chrome browser you will see that such values aren't available itself in the HTML.
Edit
Sry paul, i did what you told me to, found the admin-ajax.php
and saw the body but, I'm really stuck now.
How do I retrieve the values from the json object and store it into a variable field of my own? It would be good, if you could share how to do just one attribute for the public and to those who just started scrapy as well.
Here's my code so far
Items.py
class McDonaldsItem(Item):
name = Field()
address = Field()
postal = Field()
hours = Field()
McDonalds.py
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
import re
from fastfood.items import McDonaldsItem
class McDonaldSpider(BaseSpider):
name = "mcdonalds"
allowed_domains = ["mcdonalds.com.sg"]
start_urls = ["https://www.mcdonalds.com.sg/locate-us/"]
def parse_json(self, response):
js = json.loads(response.body)
pprint.pprint(js)
Sry for long edit, so in short, how do i store the json value into my attribute? for eg
***item['address'] = * how to retrieve ****
P.S, not sure if this helps but, i run these scripts on the cmd line using
scrapy crawl mcdonalds -o McDonalds.json -t json ( to save all my data into a json file )
I cannot stress enough on how thankful i feel. I know it's kind of unreasonable to ask this of u, will totally be okay even if you dont have time for this.
See Question&Answers more detail:
os