I am relatively new to scraping and have come across a complex site (https://adviserinfo.sec.gov/firm/summary/104518) where I cannot figure out how to follow a link using Selenium (the link is called "View Form ADV By Section"). I have found the data I wanted now elsewhere but would love to find out how it can be done.
Usually there would be an href tag and hovering over the menu would display the target url however this is not evident. I've tried using the xpath for the div / li / span but get the error that the element is not interactable.
def get_details(url):
chrome_options = webdriver.ChromeOptions()
prefs = {"profile.default_content_setting_values.notifications" : 2}
chrome_options.add_experimental_option("prefs",prefs)
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(ChromeDriverManager().install(),options=chrome_options)
#driver = webdriver.Chrome('./chromedriver',options=chrome_options)
driver.get(url)
print(driver.title)
time.sleep(5)
print(driver.current_url)
html = driver.page_source
soup = BeautifulSoup(html)
for tag in soup.findAll("li", {"analytics-label": "View Form ADV By Section"}):
print(tag)
driver.find_element_by_xpath('/html/body/div[1]/div/div/div[1]/div/div/ul/div[6]/li').click()
time.sleep(5)
print(driver.current_url)
driver.quit()
If you manually click the link then a new tab is opened where the url looked to be based on a combination of the ORG PK and the FLNG PK (https://files.adviserinfo.sec.gov/IAPD/content/viewform/adv/sections/iapd_AdvIdentifyingInfoSection.aspx?ORG_PK=104518&FLNG_PK=024FFC7A000801B405611B1102CCD535056C8CC0). I thought I could construct the url but cannot find where the FLNG PK comes from...
The questions I have are:
- how is the website generating these links without any kind of href tags (assume the script is doing it somehow...)?
- Is there a way to get Selenium to access the link?
- If not, is there a way to find out / construct the link some other way?
thanks a lot
question from:
https://stackoverflow.com/questions/65875313/scrape-non-href-link-using-selenium 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…