How to download file from website that requires login information using Python?

I am trying to download some data from a website using Python. If you simply copy and paste the url, it shows nothing unless you fill in the login information. I have the login name and password, however how should I include these in Python?

My current code is:

import urllib, urllib2, cookielib



username = my_user_name

password = my_pwd



link = 'www.google.com' # just for instance

cj = cookielib.CookieJar()

opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

login_data = urllib.urlencode({'username' : username, 'j_password' : password})



opener.open(link, login_data)

resp = opener.open(link,login_data)

print resp.read()

There is no error pops out, however resp.read() is a bunch of CSS and it only has the messages like "you have to login before reading news here."

So how can I retrieve the page that after logging in?

Just noticed that the website requires 3 entries:

Company: 



Username: 



Password:

I have all of them but how can I put all three in the login variable?

If I run it without login it returns:

cj = cookielib.CookieJar()

opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))



opener.open(dd)

resp = opener.open(dd)



print resp.read()

Here is the print-outs:

<DIV id=header>

<DIV id=strapline><!-- login_display -->

<P><FONT color=#000000>All third party users of this website and/or data produced by the Baltic do so at their own risk. The Baltic owes no duty of care or any other obligation to any party other than the contractual obligations which it owes to its direct contractual partners. </FONT></P><IMG src="images/top-strap.gif"> <!-- template [strapline]--></DIV><!-- end strapline -->

<DIV id=memberNav>

<FORM class=members id=form1 name=form1 action=client_login/client_authorise.asp?action=login method=post onsubmits="return check()">

edited Oct 31 '16 at 13:11

tmthydvnprt

5,31033158

asked Apr 2 '14 at 5:55

lsheng

1,71232133

It doesnt work, the print resp.read() still returns "<td><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p></td>"

– lsheng
Apr 2 '14 at 6:06

@André I have noticed that the page needs 3 items in logging in, I have all of them but Im not sure how should I put it in the login_info?

– lsheng
Apr 2 '14 at 6:12

I have edited it but not sure if this is what you asked for. I didnt find <form> in the print resp.read() results..

– lsheng
Apr 2 '14 at 6:24

add a comment |

My current code is:

import urllib, urllib2, cookielib



username = my_user_name

password = my_pwd



link = 'www.google.com' # just for instance

cj = cookielib.CookieJar()

opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

login_data = urllib.urlencode({'username' : username, 'j_password' : password})



opener.open(link, login_data)

resp = opener.open(link,login_data)

print resp.read()

There is no error pops out, however resp.read() is a bunch of CSS and it only has the messages like "you have to login before reading news here."

So how can I retrieve the page that after logging in?

Just noticed that the website requires 3 entries:

Company: 



Username: 



Password:

I have all of them but how can I put all three in the login variable?

If I run it without login it returns:

cj = cookielib.CookieJar()

opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))



opener.open(dd)

resp = opener.open(dd)



print resp.read()

Here is the print-outs:

<DIV id=header>

<DIV id=strapline><!-- login_display -->

<P><FONT color=#000000>All third party users of this website and/or data produced by the Baltic do so at their own risk. The Baltic owes no duty of care or any other obligation to any party other than the contractual obligations which it owes to its direct contractual partners. </FONT></P><IMG src="images/top-strap.gif"> <!-- template [strapline]--></DIV><!-- end strapline -->

<DIV id=memberNav>

<FORM class=members id=form1 name=form1 action=client_login/client_authorise.asp?action=login method=post onsubmits="return check()">

edited Oct 31 '16 at 13:11

tmthydvnprt

5,31033158

asked Apr 2 '14 at 5:55

lsheng

1,71232133

It doesnt work, the print resp.read() still returns "<td><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p></td>"

– lsheng
Apr 2 '14 at 6:06

@André I have noticed that the page needs 3 items in logging in, I have all of them but Im not sure how should I put it in the login_info?

– lsheng
Apr 2 '14 at 6:12

I have edited it but not sure if this is what you asked for. I didnt find <form> in the print resp.read() results..

– lsheng
Apr 2 '14 at 6:24

add a comment |

My current code is:

import urllib, urllib2, cookielib



username = my_user_name

password = my_pwd



link = 'www.google.com' # just for instance

cj = cookielib.CookieJar()

opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

login_data = urllib.urlencode({'username' : username, 'j_password' : password})



opener.open(link, login_data)

resp = opener.open(link,login_data)

print resp.read()

There is no error pops out, however resp.read() is a bunch of CSS and it only has the messages like "you have to login before reading news here."

So how can I retrieve the page that after logging in?

Just noticed that the website requires 3 entries:

Company: 



Username: 



Password:

I have all of them but how can I put all three in the login variable?

If I run it without login it returns:

cj = cookielib.CookieJar()

opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))



opener.open(dd)

resp = opener.open(dd)



print resp.read()

Here is the print-outs:

<DIV id=header>

<DIV id=strapline><!-- login_display -->

<P><FONT color=#000000>All third party users of this website and/or data produced by the Baltic do so at their own risk. The Baltic owes no duty of care or any other obligation to any party other than the contractual obligations which it owes to its direct contractual partners. </FONT></P><IMG src="images/top-strap.gif"> <!-- template [strapline]--></DIV><!-- end strapline -->

<DIV id=memberNav>

<FORM class=members id=form1 name=form1 action=client_login/client_authorise.asp?action=login method=post onsubmits="return check()">

edited Oct 31 '16 at 13:11

tmthydvnprt

5,31033158

asked Apr 2 '14 at 5:55

lsheng

1,71232133

My current code is:

import urllib, urllib2, cookielib



username = my_user_name

password = my_pwd



link = 'www.google.com' # just for instance

cj = cookielib.CookieJar()

opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

login_data = urllib.urlencode({'username' : username, 'j_password' : password})



opener.open(link, login_data)

resp = opener.open(link,login_data)

print resp.read()

There is no error pops out, however resp.read() is a bunch of CSS and it only has the messages like "you have to login before reading news here."

So how can I retrieve the page that after logging in?

Just noticed that the website requires 3 entries:

Company: 



Username: 



Password:

I have all of them but how can I put all three in the login variable?

If I run it without login it returns:

cj = cookielib.CookieJar()

opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))



opener.open(dd)

resp = opener.open(dd)



print resp.read()

Here is the print-outs:

<DIV id=header>

<DIV id=strapline><!-- login_display -->

<P><FONT color=#000000>All third party users of this website and/or data produced by the Baltic do so at their own risk. The Baltic owes no duty of care or any other obligation to any party other than the contractual obligations which it owes to its direct contractual partners. </FONT></P><IMG src="images/top-strap.gif"> <!-- template [strapline]--></DIV><!-- end strapline -->

<DIV id=memberNav>

<FORM class=members id=form1 name=form1 action=client_login/client_authorise.asp?action=login method=post onsubmits="return check()">

python html login web urllib2

edited Oct 31 '16 at 13:11

tmthydvnprt

5,31033158

asked Apr 2 '14 at 5:55

lsheng

1,71232133

edited Oct 31 '16 at 13:11

tmthydvnprt

5,31033158

asked Apr 2 '14 at 5:55

lsheng

1,71232133

edited Oct 31 '16 at 13:11

tmthydvnprt

5,31033158

edited Oct 31 '16 at 13:11

tmthydvnprt

5,31033158

edited Oct 31 '16 at 13:11

tmthydvnprt

5,31033158

asked Apr 2 '14 at 5:55

lsheng

1,71232133

asked Apr 2 '14 at 5:55

lsheng

1,71232133

asked Apr 2 '14 at 5:55

lsheng

1,71232133

It doesnt work, the print resp.read() still returns "<td><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p></td>"

– lsheng
Apr 2 '14 at 6:06

@André I have noticed that the page needs 3 items in logging in, I have all of them but Im not sure how should I put it in the login_info?

– lsheng
Apr 2 '14 at 6:12

I have edited it but not sure if this is what you asked for. I didnt find <form> in the print resp.read() results..

– lsheng
Apr 2 '14 at 6:24

add a comment |

It doesnt work, the print resp.read() still returns "<td><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p></td>"

– lsheng
Apr 2 '14 at 6:06

@André I have noticed that the page needs 3 items in logging in, I have all of them but Im not sure how should I put it in the login_info?

– lsheng
Apr 2 '14 at 6:12

I have edited it but not sure if this is what you asked for. I didnt find <form> in the print resp.read() results..

– lsheng
Apr 2 '14 at 6:24

It doesnt work, the print resp.read() still returns "<td><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p></td>"

– lsheng
Apr 2 '14 at 6:06

@André I have noticed that the page needs 3 items in logging in, I have all of them but Im not sure how should I put it in the login_info?

– lsheng
Apr 2 '14 at 6:12

I have edited it but not sure if this is what you asked for. I didnt find <form> in the print resp.read() results..

– lsheng
Apr 2 '14 at 6:24

add a comment |

2 Answers
2

active

oldest

votes

This code should work, using Python-Requests - just replace the ... with the actual domain and of course the login data.

from requests import Session



s = Session() # this session will hold the cookies



# here we first login and get our session cookie

s.post("http://.../client_login/client_authorise.asp?action=login", {"companyName":"some_company", "password":"some_password", "username":"some_user", "status":""})



# now we're logged in and can request any page

resp = s.get("http://.../").text



print(resp)

answered Apr 2 '14 at 6:24

user2629998

Thanks but in the resp variable I still have "><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p>"..... I'm sure the login names are correct

– lsheng
Apr 3 '14 at 1:09

add a comment |

Usign scrapy for crawling that data, Scrapy

And then you can just do this

class LoginSpider(Spider):

    name = 'example.com'

    start_urls = ['http://www.example.com/users/login.php']



    def parse(self, response):

        return [FormRequest.from_response(response,

                    formdata={'username': 'john', 'password': 'secret'},

                    callback=self.after_login)]



    def after_login(self, response):

        # check login succeed before going on

        if "authentication failed" in response.body:

            self.log("Login failed", level=log.ERROR)

            return

edited Jan 20 at 13:59

Jakub Bláha

474326

answered Apr 2 '14 at 6:13

pythondjango

112

That may work but I don't think he needs such a big library for a trivial task like logging in... The same can be done in one of two lines with Python-Requests or even urllib.

– user2629998
Apr 2 '14 at 6:18

I dont have scrapy right now and I have to ask IT to install this for me, as Python is on a server..

– lsheng
Apr 2 '14 at 6:18

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f22802666%2fhow-to-download-file-from-website-that-requires-login-information-using-python%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

This code should work, using Python-Requests - just replace the ... with the actual domain and of course the login data.

from requests import Session



s = Session() # this session will hold the cookies



# here we first login and get our session cookie

s.post("http://.../client_login/client_authorise.asp?action=login", {"companyName":"some_company", "password":"some_password", "username":"some_user", "status":""})



# now we're logged in and can request any page

resp = s.get("http://.../").text



print(resp)

answered Apr 2 '14 at 6:24

user2629998

Thanks but in the resp variable I still have "><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p>"..... I'm sure the login names are correct

– lsheng
Apr 3 '14 at 1:09

add a comment |

This code should work, using Python-Requests - just replace the ... with the actual domain and of course the login data.

from requests import Session



s = Session() # this session will hold the cookies



# here we first login and get our session cookie

s.post("http://.../client_login/client_authorise.asp?action=login", {"companyName":"some_company", "password":"some_password", "username":"some_user", "status":""})



# now we're logged in and can request any page

resp = s.get("http://.../").text



print(resp)

answered Apr 2 '14 at 6:24

user2629998

Thanks but in the resp variable I still have "><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p>"..... I'm sure the login names are correct

– lsheng
Apr 3 '14 at 1:09

add a comment |

This code should work, using Python-Requests - just replace the ... with the actual domain and of course the login data.

from requests import Session



s = Session() # this session will hold the cookies



# here we first login and get our session cookie

s.post("http://.../client_login/client_authorise.asp?action=login", {"companyName":"some_company", "password":"some_password", "username":"some_user", "status":""})



# now we're logged in and can request any page

resp = s.get("http://.../").text



print(resp)

answered Apr 2 '14 at 6:24

user2629998

This code should work, using Python-Requests - just replace the ... with the actual domain and of course the login data.

from requests import Session



s = Session() # this session will hold the cookies



# here we first login and get our session cookie

s.post("http://.../client_login/client_authorise.asp?action=login", {"companyName":"some_company", "password":"some_password", "username":"some_user", "status":""})



# now we're logged in and can request any page

resp = s.get("http://.../").text



print(resp)

answered Apr 2 '14 at 6:24

user2629998

answered Apr 2 '14 at 6:24

user2629998

answered Apr 2 '14 at 6:24

user2629998

answered Apr 2 '14 at 6:24

user2629998

Thanks but in the resp variable I still have "><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p>"..... I'm sure the login names are correct

– lsheng
Apr 3 '14 at 1:09

add a comment |

Thanks but in the resp variable I still have "><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p>"..... I'm sure the login names are correct

– lsheng
Apr 3 '14 at 1:09

Thanks but in the resp variable I still have "><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p>"..... I'm sure the login names are correct

– lsheng
Apr 3 '14 at 1:09

add a comment |

Usign scrapy for crawling that data, Scrapy

And then you can just do this

class LoginSpider(Spider):

    name = 'example.com'

    start_urls = ['http://www.example.com/users/login.php']



    def parse(self, response):

        return [FormRequest.from_response(response,

                    formdata={'username': 'john', 'password': 'secret'},

                    callback=self.after_login)]



    def after_login(self, response):

        # check login succeed before going on

        if "authentication failed" in response.body:

            self.log("Login failed", level=log.ERROR)

            return

edited Jan 20 at 13:59

Jakub Bláha

474326

answered Apr 2 '14 at 6:13

pythondjango

112

That may work but I don't think he needs such a big library for a trivial task like logging in... The same can be done in one of two lines with Python-Requests or even urllib.

– user2629998
Apr 2 '14 at 6:18

I dont have scrapy right now and I have to ask IT to install this for me, as Python is on a server..

– lsheng
Apr 2 '14 at 6:18

add a comment |

Usign scrapy for crawling that data, Scrapy

And then you can just do this

class LoginSpider(Spider):

    name = 'example.com'

    start_urls = ['http://www.example.com/users/login.php']



    def parse(self, response):

        return [FormRequest.from_response(response,

                    formdata={'username': 'john', 'password': 'secret'},

                    callback=self.after_login)]



    def after_login(self, response):

        # check login succeed before going on

        if "authentication failed" in response.body:

            self.log("Login failed", level=log.ERROR)

            return

edited Jan 20 at 13:59

Jakub Bláha

474326

answered Apr 2 '14 at 6:13

pythondjango

112

That may work but I don't think he needs such a big library for a trivial task like logging in... The same can be done in one of two lines with Python-Requests or even urllib.

– user2629998
Apr 2 '14 at 6:18

I dont have scrapy right now and I have to ask IT to install this for me, as Python is on a server..

– lsheng
Apr 2 '14 at 6:18

add a comment |

Usign scrapy for crawling that data, Scrapy

And then you can just do this

class LoginSpider(Spider):

    name = 'example.com'

    start_urls = ['http://www.example.com/users/login.php']



    def parse(self, response):

        return [FormRequest.from_response(response,

                    formdata={'username': 'john', 'password': 'secret'},

                    callback=self.after_login)]



    def after_login(self, response):

        # check login succeed before going on

        if "authentication failed" in response.body:

            self.log("Login failed", level=log.ERROR)

            return

edited Jan 20 at 13:59

Jakub Bláha

474326

answered Apr 2 '14 at 6:13

pythondjango

112

Usign scrapy for crawling that data, Scrapy

And then you can just do this

class LoginSpider(Spider):

    name = 'example.com'

    start_urls = ['http://www.example.com/users/login.php']



    def parse(self, response):

        return [FormRequest.from_response(response,

                    formdata={'username': 'john', 'password': 'secret'},

                    callback=self.after_login)]



    def after_login(self, response):

        # check login succeed before going on

        if "authentication failed" in response.body:

            self.log("Login failed", level=log.ERROR)

            return

edited Jan 20 at 13:59

Jakub Bláha

474326

answered Apr 2 '14 at 6:13

pythondjango

112

edited Jan 20 at 13:59

Jakub Bláha

474326

edited Jan 20 at 13:59

Jakub Bláha

474326

edited Jan 20 at 13:59

Jakub Bláha

474326

answered Apr 2 '14 at 6:13

pythondjango

112

answered Apr 2 '14 at 6:13

pythondjango

112

answered Apr 2 '14 at 6:13

pythondjango

112

That may work but I don't think he needs such a big library for a trivial task like logging in... The same can be done in one of two lines with Python-Requests or even urllib.

– user2629998
Apr 2 '14 at 6:18

I dont have scrapy right now and I have to ask IT to install this for me, as Python is on a server..

– lsheng
Apr 2 '14 at 6:18

add a comment |

That may work but I don't think he needs such a big library for a trivial task like logging in... The same can be done in one of two lines with Python-Requests or even urllib.

– user2629998
Apr 2 '14 at 6:18

I dont have scrapy right now and I have to ask IT to install this for me, as Python is on a server..

– lsheng
Apr 2 '14 at 6:18

That may work but I don't think he needs such a big library for a trivial task like logging in... The same can be done in one of two lines with Python-Requests or even urllib.

– user2629998
Apr 2 '14 at 6:18

I dont have scrapy right now and I have to ask IT to install this for me, as Python is on a server..

– lsheng
Apr 2 '14 at 6:18

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Brtdku