javascript - Scrape non-href link using selenium

Question

Welcome To Ask or Share your Answers For Others

javascript - Scrape non-href link using selenium

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

javascript - Scrape non-href link using selenium

I am relatively new to scraping and have come across a complex site (https://adviserinfo.sec.gov/firm/summary/104518) where I cannot figure out how to follow a link using Selenium (the link is called "View Form ADV By Section"). I have found the data I wanted now elsewhere but would love to find out how it can be done.

Usually there would be an href tag and hovering over the menu would display the target url however this is not evident. I've tried using the xpath for the div / li / span but get the error that the element is not interactable.

def get_details(url):
     chrome_options = webdriver.ChromeOptions()
     prefs = {"profile.default_content_setting_values.notifications" : 2}
     chrome_options.add_experimental_option("prefs",prefs)
     chrome_options.add_argument("--headless")
     driver = webdriver.Chrome(ChromeDriverManager().install(),options=chrome_options)
     #driver = webdriver.Chrome('./chromedriver',options=chrome_options)
     driver.get(url)
     print(driver.title)
     time.sleep(5)
     print(driver.current_url)
     html = driver.page_source
     soup = BeautifulSoup(html)
     for tag in soup.findAll("li", {"analytics-label": "View Form ADV By Section"}):
          print(tag)
     driver.find_element_by_xpath('/html/body/div[1]/div/div/div[1]/div/div/ul/div[6]/li').click()
     time.sleep(5)
     print(driver.current_url)
     driver.quit()

If you manually click the link then a new tab is opened where the url looked to be based on a combination of the ORG PK and the FLNG PK (https://files.adviserinfo.sec.gov/IAPD/content/viewform/adv/sections/iapd_AdvIdentifyingInfoSection.aspx?ORG_PK=104518&FLNG_PK=024FFC7A000801B405611B1102CCD535056C8CC0). I thought I could construct the url but cannot find where the FLNG PK comes from...

The questions I have are:

how is the website generating these links without any kind of href tags (assume the script is doing it somehow...)?
Is there a way to get Selenium to access the link?
If not, is there a way to find out / construct the link some other way?

thanks a lot

question from:https://stackoverflow.com/questions/65875313/scrape-non-href-link-using-selenium

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

javascript - Scrape non-href link using selenium

javascript - Scrape non-href link using selenium

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags