I'm trying to parse all the company names from this webpage. There are around 2431
companies in there. However, the way I've tried below can fetches me 1000
results.
This is what I can see about the number of results in response while going through dev tools:
hitsPerPage: 1000
index: "YCCompany_production"
nbHits: 2431 <------------------------
nbPages: 1
page: 0
How can I get the rest of the results using requests?
I've tried so far:
import requests
url = 'https://45bwzj1sgc-dsn.algolia.net/1/indexes/*/queries?'
params = {
'x-algolia-agent': 'Algolia for JavaScript (3.35.1); Browser; JS Helper (3.1.0)',
'x-algolia-application-id': '45BWZJ1SGC',
'x-algolia-api-key': 'NDYzYmNmMTRjYzU4MDE0ZWY0MTVmMTNiYzcwYzMyODFlMjQxMWI5YmZkMjEwMDAxMzE0OTZhZGZkNDNkYWZjMHJlc3RyaWN0SW5kaWNlcz0lNUIlMjJZQ0NvbXBhbnlfcHJvZHVjdGlvbiUyMiU1RCZ0YWdGaWx0ZXJzPSU1QiUyMiUyMiU1RCZhbmFseXRpY3NUYWdzPSU1QiUyMnljZGMlMjIlNUQ='
}
payload = {"requests":[{"indexName":"YCCompany_production","params":"hitsPerPage=1000&query=&page=0&facets=%5B%22top100%22%2C%22isHiring%22%2C%22nonprofit%22%2C%22batch%22%2C%22industries%22%2C%22subindustry%22%2C%22status%22%2C%22regions%22%5D&tagFilters="}]}
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'
r = s.post(url,params=params,json=payload)
print(len(r.json()['results'][0]['hits']))
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…