It looks like the <class id>
for <img class>
on Instagram's web page is changing every day. Right now it is FFVAD
and tomorrow it will be something else. For example (I made it shorter, links are long):
<img class="FFVAD" alt="Tag your best friend" decoding="auto" style="" sizes="293px" src="https://scontent-lax3-2.cdninstagram.com/vp/0436c00a3ac9428b2b8c977b45abd022/5BAB3EBC/t51.2885-15/s640x640/sh0.08/e35/33110483_592294374461447_8669459880035221504_n.jpg">
By saying that, I need to fix the script and hardcode the Class ID
in order to be able scrape the web-page.
var = driver.find_elements_by_class_name('FFVAD')
Somebody told me that I could use img.get_attribute('class')
to find the class ID
and store it for later. But I still don't understand how this can be achieved, so selenium or soup could grab the Class ID
from the html tag
and store or parse it later.
All I got now is this. It's little dirty, and not right, but the idea is there.
import requests
import selenium.webdriver as webdriver
url = ('https://www.instagram.com/kitties')
driver = webdriver.Firefox()
driver.get(url)
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
imgs_dedupe = driver.find_elements_by_class_name('FFVAD')
for img in imgs_dedupe:
posts = img.get_attribute('class')
print posts
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(scroll_delay)
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
When I run it, I get this output, and because there are 3 images on the page, I get 3x Class ID
python tag_print.py
FFVAD
FFVAD
FFVAD
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…