python - Decoding Class names on facebook through Selenium

Question

Welcome To Ask or Share your Answers For Others

python - Decoding Class names on facebook through Selenium

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Decoding Class names on facebook through Selenium

I noticed that facebook has some weird class names that look computer generated. What I don't know is if these classes are at least constant over time or they change in some time interval? Maybe someone who has experience with that can answer. Only thing I can see is that when I exit Chrome and open it again it is still the same, so at least they don't change every browser session.

So I'd guess the best way to go about scraping facebook would be to use some elements in user interface and assume structure is always the same, like for example to get address from About section something like this:

from selenium import webdriver
driver = webdriver.Chrome("C:/chromedriver.exe")

driver.get("https://www.facebook.com/pg/Burma-Superstar-620442791345784/about/?ref=page_internal")
# wait some time
address_elements = driver.find_elements_by_xpath("//span[text()='FIND US']/../following-sibling::div//button[text()='Get Directions']/../../preceding-sibling::div[1]/div/span")
for item in address_elements:
    print item.text

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:11:53+0000

You were pretty correct. Facebook is built through ReactJS which is pretty much evident from the presence of the following keywords and tags within the HTML DOM:

{"react_render":true,"reflow":true}

["React-prod"]
["ReactDOM-prod"]
ReactComposerTaggerType:{r:["t5r69"],be:1}

So, the dynamically generated class names are bound to change after certain timegaps.

Solution

The solution would be to use the static attributes to construct a dynamic Locator Strategy.

To retrieve the first line of the address just below the text FIND US you need to induce WebDriverWait in conjunction with expected_conditions as visibility_of_element_located() and you can use the following optimized solution:

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[normalize-space()='FIND US']//following::span[2]"))))

References

You can find some relevant discussions in:

Outro

Note: Scrapping Facebook violates their Terms of Service of section 3.2.3 and you are liable to be questioned and may even land up in Facebook Jail. Use Facebook Graph API instead.

Categories

python - Decoding Class names on facebook through Selenium

python - Decoding Class names on facebook through Selenium

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Solution

References

Outro

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags