I am trying to login to a financial service I am a customer to retrieve automatically some data by using Python requests
.
I have been inspired by this page:
import requests
from typing import Dict
def get_payload(username:str, password:str) -> Dict[str, str]:
"""Return dictionary for credentials"""
return {
"USERNAME": username, "PASSWORD": password, "option": "login"
}
session_requests = requests.session()
result_login = session_requests.post(
URL,
data = get_payload("myusername", "MyPasswordSuperSafe"),
headers = dict(referer=URL)
)
tree = html.fromstring(result.text)
I am able to send the username and password and send login information. However, the system is using what I suppose is some kind of safety system: it uses some automatic redirection (see screenshot).
On a webbrowser, it automatically redirects to the page with a successful login.
However, I don't know how to deal with it with my Python webscraping program leads to a timeout.
For information, this is the code the redirection page (with the name of website ofuscated):
<!DOCTYPE html>
<!-- saved from url=(0059)https://somewebsite.com/scripts/customer.cgi?option=login -->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<script language="JavaScript">
function redirect() {
top.location.href = 'https://somewebsite.com/scripts/customer.cgi/SC/';
}
</script>
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="author" content="RL360">
<meta name="copyright" content="RL360">
<link href="./Online services redirection_files/screen.css" rel="styleSheet" media="screen">
<link href="./Online services redirection_files/print.css" rel="styleSheet" media="print">
<!--[if lte IE 8]>
<link href="https://somewebsite.com/scripts/customer.cgi/SF/stylesheets/desktop/ie8fix.css" rel="stylesheet" type="text/css" />
<![endif]-->
<script>
function setCookie(cname, cvalue, exdays, path) {
var d = new Date();
d.setTime(d.getTime() + (exdays * 24 * 60 * 60 * 1000));
var expires = "expires="+d.toUTCString();
document.cookie = cname + "=" + cvalue + ";" + expires + ";path=" + path;
}
function getCookie(cname) {
var name = cname + "=";
var ca = document.cookie.split(';');
for(var i = 0; i < ca.length; i++) {
var c = ca[i];
while (c.charAt(0) == ' ') {
c = c.substring(1);
}
if (c.indexOf(name) == 0) {
return c.substring(name.length, c.length);
}
}
return "";
}
</script>
<title>Online services redirection</title>
<link href="./Online services redirection_files/css" rel="stylesheet"></head><span id="warning-container"><i data-reactroot=""></i></span>
<body onload="redirect();" style="background-color: #ffffff;">
<div id="mainarea">
<div id="title"></div>
<!-- main content -->
<form action="https://somewebsite.com/scripts/customer.cgi/SC/" name="redirform" method="POST">
<div class="level1" style="width: 700px; margin-left: 123px; height: auto;"><h2>Online services redirection</h2>
<p><a href="https://somewebsite.com/scripts/customer.cgi/SC/" target="_top">Attempting to redirect, please click here if nothing happens after 30 seconds.</a></p>
</div>
</form>
</div>
</body></html>
How could I deal with this redirection?
I'm open to using requests
, mechanize
, BeautifulSoup
or any other solution (but would prefer to avoid selenium
if possible).
question from:
https://stackoverflow.com/questions/65872784/webscraping-in-python-a-page-with-login-and-redirection 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…