Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
294 views
in Technique[技术] by (71.8m points)

python - 亚马逊将Selenium检测为机器人(Amazon is detecting Selenium as bot)

Amazon is detecting Selenium as bot so I changed useragent but problem still persists.

(亚马逊将Selenium检测为机器人,因此我更改了useragent,但问题仍然存在。)

I am scraping data from 5 different sites.

(我正在从5个不同的站点抓取数据。)

They are amazon (com,mx,uk,au,ae,ca) But i have just this problem on (com,mx,ca) These pages data not loading, amazon perceives me as a bot.

(他们是亚马逊(com,mx,uk,au,ae,ca),但是我在(com,mx,ca)上有这个问题,这些页面数据没有加载,亚马逊认为我是一个机器人。)

No relevant data appears on these sites.

(这些站点上没有相关数据。)

How does it know I'm using selenium?

(怎么知道我正在使用硒?)

from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd

from selenium.webdriver.chrome.options import Options
from fake_useragent import UserAgent
options = Options()
ua = UserAgent()
a = ua.random
user_agent = ua.random
#print(user_agent)
options.add_argument(f'user-agent={user_agent}')

chrome_driver_path = r'C:chromedriver_win32chromedriver.exe'
driver = webdriver.Chrome(chrome_driver_path, chrome_options = options)



urls = ['https://www.amazon.com/gp/offer-listing/','https://www.amazon.ca/gp/offer-listing/','https://www.amazon.co.uk/gp/offer-listing/','https://www.amazon.ae/gp/offer-listing/','https://www.amazon.com.au/gp/offer-listing/','https://www.amazon.com.mx/gp/offer-listing/']
marketler = ['USA','CA','UK','AE','AU','MX']
asins = ['B07BGLT25K']
OfferData = []
def offerlisting():
        soup = BeautifulSoup(driver.page_source, 'lxml')
        for i in range(len(asins)):
            offerlisting = asins[i]
            no = 0
            for url in urls:
                url2 = url+str(offerlisting)
                driver.get(url2)

                soup = BeautifulSoup(driver.page_source, 'lxml')
                sellers = soup.find_all('div', class_='olpOffer')
                print('Sat?c? Say?s?:',len(sellers)) 
                OfferData.append({"Asin":asins[i],"Sat?c? Say?s?": len(sellers),"Market": marketler[no]})   
                seller = soup.find_all(class_="a-spacing-none olpSellerName")
                seller2 = [o.get_text().strip().replace('
', '') for o in seller] 
                print(seller2)
                OfferData.append({"Sat?c?lar": seller2})
                no += 1


def save():
    df=pd.DataFrame(OfferData, columns = ['Asin','Market','Sat?c? Say?s?','Sat?c?lar'] )
    df_nan_sil = df.apply(lambda x: pd.Series(x.dropna().values))
    df_nan_sil.to_excel('C:/chromedriver_win32/amazon_karsilastirma.xlsx', encoding='utf-8-sig', index=False, header=True)



offerlisting()
save()
  ask by user1465063 translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...