I am trying to scrape e-commerce site that uses ajax call to load its next pages.
I am able to scrape data present on page 1 but page 2 loads automatically through ajax call when I scroll page 1 to bottom.
My code :
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as ureq
my_url='http://www.shopclues.com/mobiles-smartphones.html'
page=ureq(my_url).read()
page_soup=soup(page,"html.parser")
containers=page_soup.findAll("div",{"class":"column col3"})
for container in containers:
name=container.h3.text
price=container.find("span",{'class':'p_price'}).text
print("Name : "+name.replace(","," "))
print("Price : "+price)
for i in range(2,7):
my_url="http://www.shopclues.com/ajaxCall/moreProducts?catId=1431&filters=&pageType=c&brandName=&start="+str(36*(i-1))+"&columns=4&fl_cal=1&page="+str(i)
page=ureq(my_url).read()
print(page)
page_soup=soup(page,"html.parser")
containers=page_soup.findAll("div",{"class":"column col3"})
for container in containers:
name=container.h3.text
price=container.find("span",{'class':'p_price'}).text
print("Name : "+name.replace(","," "))
print("Price : "+price)
I have printed the ajax page read by ureq to know whether I am able to open the ajax page and I got an output as:
b' ' are the outputs of:
print(page)
please provide me a solution to scrape the remaining data.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…