Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
625 views
in Technique[技术] by (71.8m points)

c - How do I use libcurl to login to a secure website and get at the html behind the login

I was wondering if you could help me work through accessing the html behind a login page using C and libcurl.

Specific Example:

The website I'm trying to access is https://onlineservices.ubs.com/olsauth/ex/pbl/ubso/dl

Is it possible to do something like this?

The problem is that we have a lot of clients each of which has a separate login. We need to get data from each of their accounts every day. It would be really slick if we could write something in C to do this and save all the pertinent data into a file. (like the values of the accounts and positions which I can parse from the html)

What do you guys think? Is this possible and could you help point me in the right direction with some examples, etc...?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

After a cursory glance at the login page, it is possible to do this with libcurl, by posting the username/password combo to their authenticating page, and assuming they use cookies to represent a login session. The first step is to make sure that you've got the following options set:

  • CURLOPT_FOLLOWLOCATION - The server may redirect after authenticating, this is quite common.
  • CURLOPT_POST - This tells libcurl to switch into post mode.
  • CURLOPT_POSTFIELDS - This tells libcurl the values to set for the post fields. Set this option to "userId=<insert username>&password=<insert password>". That value is derived from the source code for that page.
  • CURLOPT_USERAGENT - Set a simple user-agent, so that the web server won't throw it out (some strict ones will do this).

Then, once the post is complete, the libcurl instance should contain some sort of authorisation cookie used by the site to identify a logged-in user. Curl should keep track of cookies within a given instance. There are plenty of options for Curl if you want to tweak how cookies behave.

Make sure that once you are 'logged-in' that the same libcurl instance is used for each request under that account, otherwise it will see you as logged out.

As for parsing the resulting pages go, there are tonnes of HTML parsers for c - just google. The only thing I will say is do not try to write an HTML parser yourself. It is notoriously tricky, because a lot of sites don't produce good (or even working) HTML.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...