Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
335 views
in Technique[技术] by (71.8m points)

javascript - Scrape non-href link using selenium

I am relatively new to scraping and have come across a complex site (https://adviserinfo.sec.gov/firm/summary/104518) where I cannot figure out how to follow a link using Selenium (the link is called "View Form ADV By Section"). I have found the data I wanted now elsewhere but would love to find out how it can be done.

Usually there would be an href tag and hovering over the menu would display the target url however this is not evident. I've tried using the xpath for the div / li / span but get the error that the element is not interactable.

def get_details(url):
     chrome_options = webdriver.ChromeOptions()
     prefs = {"profile.default_content_setting_values.notifications" : 2}
     chrome_options.add_experimental_option("prefs",prefs)
     chrome_options.add_argument("--headless")
     driver = webdriver.Chrome(ChromeDriverManager().install(),options=chrome_options)
     #driver = webdriver.Chrome('./chromedriver',options=chrome_options)
     driver.get(url)
     print(driver.title)
     time.sleep(5)
     print(driver.current_url)
     html = driver.page_source
     soup = BeautifulSoup(html)
     for tag in soup.findAll("li", {"analytics-label": "View Form ADV By Section"}):
          print(tag)
     driver.find_element_by_xpath('/html/body/div[1]/div/div/div[1]/div/div/ul/div[6]/li').click()
     time.sleep(5)
     print(driver.current_url)
     driver.quit()

If you manually click the link then a new tab is opened where the url looked to be based on a combination of the ORG PK and the FLNG PK (https://files.adviserinfo.sec.gov/IAPD/content/viewform/adv/sections/iapd_AdvIdentifyingInfoSection.aspx?ORG_PK=104518&FLNG_PK=024FFC7A000801B405611B1102CCD535056C8CC0). I thought I could construct the url but cannot find where the FLNG PK comes from...

The questions I have are:

  1. how is the website generating these links without any kind of href tags (assume the script is doing it somehow...)?
  2. Is there a way to get Selenium to access the link?
  3. If not, is there a way to find out / construct the link some other way?

thanks a lot

question from:https://stackoverflow.com/questions/65875313/scrape-non-href-link-using-selenium

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...