python - Network capturing with Selenium/PhantomJS

Question

Welcome To Ask or Share your Answers For Others

python - Network capturing with Selenium/PhantomJS

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Network capturing with Selenium/PhantomJS

I want to capture the traffic to sites I'm browsing to using Selenium with python and since the traffic will be https using a proxy won't get me far.

My idea was to run phantomJS with selenium to and use phantomJS to execute a script (not on the page using webdriver.execute_script(), but on phantomJS itself). I was thinking of the netlog.js script (from here https://github.com/ariya/phantomjs/blob/master/examples/netlog.js).

Since it works like this in the command line

phantomjs --cookies-file=/tmp/foo netlog.js https://google.com

there must be a similar way to do this with selenium?

Thanks in advance

Update:

Solved it with browsermob-proxy.

pip3 install browsermob-proxy

Python3 code

from selenium import webdriver
from browsermobproxy import Server

server = Server(<path to browsermob-proxy>)
server.start()
proxy = server.create_proxy({'captureHeaders': True, 'captureContent': True, 'captureBinaryContent': True})

service_args = ["--proxy=%s" % proxy.proxy, '--ignore-ssl-errors=yes']
driver = webdriver.PhantomJS(service_args=service_args)

proxy.new_har()
driver.get('https://google.com')
print(proxy.har)  # this is the archive
# for example:
all_requests = [entry['request']['url'] for entry in proxy.har['log']['entries']]

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T03:05:35+0000

I am using a proxy for this

from selenium import webdriver
from browsermobproxy import Server

server = Server(environment.b_mob_proxy_path)
server.start()
proxy = server.create_proxy()
service_args = ["--proxy-server=%s" % proxy.proxy]
driver = webdriver.PhantomJS(service_args=service_args)

proxy.new_har()
driver.get('url_to_open')
print proxy.har  # this is the archive
# for example:
all_requests = [entry['request']['url'] for entry in proxy.har['log']['entries']]

the 'har' (http archive format) has a lot of other information about the requests and responses, it's very useful to me

installing on Linux:

pip install browsermob-proxy

Categories

python - Network capturing with Selenium/PhantomJS

python - Network capturing with Selenium/PhantomJS

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags