python - Selecting text element having specific style color

Question

Welcome To Ask or Share your Answers For Others

python - Selecting text element having specific style color

posted Jan 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Selecting text element having specific style color

I have a scrapping task to do in which I have to collect some articles. I know that I only need some paragraphs that are in red (#FF0000). Is there a way to use the Selenium WebDriver to extract only those colored in this colour? Through all the pages that I've to scrape, the only attribute that is always the same is the text color.

For example, in the following URL: https://www.boatos.org/saude/ivermectina-mata-covid-dois-dias-dose-unica.html

I want the webdriver to returns me just the following paragraph that is originally in painted in red:

Vers?o 1: “IVERMECTINA REALMENTE MATA COVID-19 EM 2 DIAS COMPROVA ESTUDO”. Vers?o 2: “Cientistas descobriram que dose única de ivermectina pode remover todo o RNA do novo coronavírus em um período de 48 horas. Mesmo no primeiro dia, a redu??o do material genético do vírus é significativo”.

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-01-24T00:20:58+0000

To print the text Vers?o 1: “IVERMECTINA REALMENTE MATA COVID-19 EM... you can use either of the following Locator Strategies:

Using css_selector and text attribute:

driver.get("https://www.boatos.org/saude/ivermectina-mata-covid-dois-dias-dose-unica.html")
print(driver.find_element_by_css_selector("span[style] > em").text)

Using xpath and get_attribute("innerHTML"):

driver.get("https://www.boatos.org/saude/ivermectina-mata-covid-dois-dias-dose-unica.html")
print(driver.find_element_by_xpath("//span[@style]/em").get_attribute("innerHTML"))

Ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

Using CSS_SELECTOR and get_attribute():

driver.get("https://www.boatos.org/saude/ivermectina-mata-covid-dois-dias-dose-unica.html")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span[style] > em"))).get_attribute("innerHTML"))

Using XPATH and text attribute:

driver.get("https://www.boatos.org/saude/ivermectina-mata-covid-dois-dias-dose-unica.html")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[@style]/em"))).text)

Console Output:

Vers?o 1: “IVERMECTINA REALMENTE MATA COVID-19 EM 2 DIAS COMPROVA ESTUDO”. Vers?o 2: “Cientistas descobriram que dose única de ivermectina pode remover todo o RNA do novo coronavírus em um período de 48 horas. Mesmo no primeiro dia, a redu??o do material genético do vírus é significativo”.

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python

References

Link to useful documentation:

get_attribute() method Gets the given attribute or property of the element.
text attribute returns The text of the element.
Difference between text and innerHTML using Selenium

Categories

python - Selecting text element having specific style color

python - Selecting text element having specific style color

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

References

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags