How to download file from website that requires login information using Python?












4















I am trying to download some data from a website using Python. If you simply copy and paste the url, it shows nothing unless you fill in the login information. I have the login name and password, however how should I include these in Python?



My current code is:



import urllib, urllib2, cookielib

username = my_user_name
password = my_pwd

link = 'www.google.com' # just for instance
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'j_password' : password})

opener.open(link, login_data)
resp = opener.open(link,login_data)
print resp.read()


There is no error pops out, however resp.read() is a bunch of CSS and it only has the messages like "you have to login before reading news here."



So how can I retrieve the page that after logging in?



Just noticed that the website requires 3 entries:



Company: 

Username:

Password:


I have all of them but how can I put all three in the login variable?



If I run it without login it returns:



cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

opener.open(dd)
resp = opener.open(dd)

print resp.read()


Here is the print-outs:



<DIV id=header>
<DIV id=strapline><!-- login_display -->
<P><FONT color=#000000>All third party users of this website and/or data produced by the Baltic do so at their own risk. The Baltic owes no duty of care or any other obligation to any party other than the contractual obligations which it owes to its direct contractual partners. </FONT></P><IMG src="images/top-strap.gif"> <!-- template [strapline]--></DIV><!-- end strapline -->
<DIV id=memberNav>
<FORM class=members id=form1 name=form1 action=client_login/client_authorise.asp?action=login method=post onsubmits="return check()">









share|improve this question

























  • It doesnt work, the print resp.read() still returns "<td><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p></td>"

    – lsheng
    Apr 2 '14 at 6:06











  • @André I have noticed that the page needs 3 items in logging in, I have all of them but Im not sure how should I put it in the login_info?

    – lsheng
    Apr 2 '14 at 6:12











  • I have edited it but not sure if this is what you asked for. I didnt find <form> in the print resp.read() results..

    – lsheng
    Apr 2 '14 at 6:24
















4















I am trying to download some data from a website using Python. If you simply copy and paste the url, it shows nothing unless you fill in the login information. I have the login name and password, however how should I include these in Python?



My current code is:



import urllib, urllib2, cookielib

username = my_user_name
password = my_pwd

link = 'www.google.com' # just for instance
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'j_password' : password})

opener.open(link, login_data)
resp = opener.open(link,login_data)
print resp.read()


There is no error pops out, however resp.read() is a bunch of CSS and it only has the messages like "you have to login before reading news here."



So how can I retrieve the page that after logging in?



Just noticed that the website requires 3 entries:



Company: 

Username:

Password:


I have all of them but how can I put all three in the login variable?



If I run it without login it returns:



cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

opener.open(dd)
resp = opener.open(dd)

print resp.read()


Here is the print-outs:



<DIV id=header>
<DIV id=strapline><!-- login_display -->
<P><FONT color=#000000>All third party users of this website and/or data produced by the Baltic do so at their own risk. The Baltic owes no duty of care or any other obligation to any party other than the contractual obligations which it owes to its direct contractual partners. </FONT></P><IMG src="images/top-strap.gif"> <!-- template [strapline]--></DIV><!-- end strapline -->
<DIV id=memberNav>
<FORM class=members id=form1 name=form1 action=client_login/client_authorise.asp?action=login method=post onsubmits="return check()">









share|improve this question

























  • It doesnt work, the print resp.read() still returns "<td><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p></td>"

    – lsheng
    Apr 2 '14 at 6:06











  • @André I have noticed that the page needs 3 items in logging in, I have all of them but Im not sure how should I put it in the login_info?

    – lsheng
    Apr 2 '14 at 6:12











  • I have edited it but not sure if this is what you asked for. I didnt find <form> in the print resp.read() results..

    – lsheng
    Apr 2 '14 at 6:24














4












4








4








I am trying to download some data from a website using Python. If you simply copy and paste the url, it shows nothing unless you fill in the login information. I have the login name and password, however how should I include these in Python?



My current code is:



import urllib, urllib2, cookielib

username = my_user_name
password = my_pwd

link = 'www.google.com' # just for instance
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'j_password' : password})

opener.open(link, login_data)
resp = opener.open(link,login_data)
print resp.read()


There is no error pops out, however resp.read() is a bunch of CSS and it only has the messages like "you have to login before reading news here."



So how can I retrieve the page that after logging in?



Just noticed that the website requires 3 entries:



Company: 

Username:

Password:


I have all of them but how can I put all three in the login variable?



If I run it without login it returns:



cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

opener.open(dd)
resp = opener.open(dd)

print resp.read()


Here is the print-outs:



<DIV id=header>
<DIV id=strapline><!-- login_display -->
<P><FONT color=#000000>All third party users of this website and/or data produced by the Baltic do so at their own risk. The Baltic owes no duty of care or any other obligation to any party other than the contractual obligations which it owes to its direct contractual partners. </FONT></P><IMG src="images/top-strap.gif"> <!-- template [strapline]--></DIV><!-- end strapline -->
<DIV id=memberNav>
<FORM class=members id=form1 name=form1 action=client_login/client_authorise.asp?action=login method=post onsubmits="return check()">









share|improve this question
















I am trying to download some data from a website using Python. If you simply copy and paste the url, it shows nothing unless you fill in the login information. I have the login name and password, however how should I include these in Python?



My current code is:



import urllib, urllib2, cookielib

username = my_user_name
password = my_pwd

link = 'www.google.com' # just for instance
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'j_password' : password})

opener.open(link, login_data)
resp = opener.open(link,login_data)
print resp.read()


There is no error pops out, however resp.read() is a bunch of CSS and it only has the messages like "you have to login before reading news here."



So how can I retrieve the page that after logging in?



Just noticed that the website requires 3 entries:



Company: 

Username:

Password:


I have all of them but how can I put all three in the login variable?



If I run it without login it returns:



cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

opener.open(dd)
resp = opener.open(dd)

print resp.read()


Here is the print-outs:



<DIV id=header>
<DIV id=strapline><!-- login_display -->
<P><FONT color=#000000>All third party users of this website and/or data produced by the Baltic do so at their own risk. The Baltic owes no duty of care or any other obligation to any party other than the contractual obligations which it owes to its direct contractual partners. </FONT></P><IMG src="images/top-strap.gif"> <!-- template [strapline]--></DIV><!-- end strapline -->
<DIV id=memberNav>
<FORM class=members id=form1 name=form1 action=client_login/client_authorise.asp?action=login method=post onsubmits="return check()">






python html login web urllib2






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Oct 31 '16 at 13:11









tmthydvnprt

5,31033158




5,31033158










asked Apr 2 '14 at 5:55









lshenglsheng

1,71232133




1,71232133













  • It doesnt work, the print resp.read() still returns "<td><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p></td>"

    – lsheng
    Apr 2 '14 at 6:06











  • @André I have noticed that the page needs 3 items in logging in, I have all of them but Im not sure how should I put it in the login_info?

    – lsheng
    Apr 2 '14 at 6:12











  • I have edited it but not sure if this is what you asked for. I didnt find <form> in the print resp.read() results..

    – lsheng
    Apr 2 '14 at 6:24



















  • It doesnt work, the print resp.read() still returns "<td><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p></td>"

    – lsheng
    Apr 2 '14 at 6:06











  • @André I have noticed that the page needs 3 items in logging in, I have all of them but Im not sure how should I put it in the login_info?

    – lsheng
    Apr 2 '14 at 6:12











  • I have edited it but not sure if this is what you asked for. I didnt find <form> in the print resp.read() results..

    – lsheng
    Apr 2 '14 at 6:24

















It doesnt work, the print resp.read() still returns "<td><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p></td>"

– lsheng
Apr 2 '14 at 6:06





It doesnt work, the print resp.read() still returns "<td><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p></td>"

– lsheng
Apr 2 '14 at 6:06













@André I have noticed that the page needs 3 items in logging in, I have all of them but Im not sure how should I put it in the login_info?

– lsheng
Apr 2 '14 at 6:12





@André I have noticed that the page needs 3 items in logging in, I have all of them but Im not sure how should I put it in the login_info?

– lsheng
Apr 2 '14 at 6:12













I have edited it but not sure if this is what you asked for. I didnt find <form> in the print resp.read() results..

– lsheng
Apr 2 '14 at 6:24





I have edited it but not sure if this is what you asked for. I didnt find <form> in the print resp.read() results..

– lsheng
Apr 2 '14 at 6:24












2 Answers
2






active

oldest

votes


















0














This code should work, using Python-Requests - just replace the ... with the actual domain and of course the login data.



from requests import Session

s = Session() # this session will hold the cookies

# here we first login and get our session cookie
s.post("http://.../client_login/client_authorise.asp?action=login", {"companyName":"some_company", "password":"some_password", "username":"some_user", "status":""})

# now we're logged in and can request any page
resp = s.get("http://.../").text

print(resp)





share|improve this answer
























  • Thanks but in the resp variable I still have "><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p>"..... I'm sure the login names are correct

    – lsheng
    Apr 3 '14 at 1:09



















0














Usign scrapy for crawling that data, Scrapy



And then you can just do this



class LoginSpider(Spider):
name = 'example.com'
start_urls = ['http://www.example.com/users/login.php']

def parse(self, response):
return [FormRequest.from_response(response,
formdata={'username': 'john', 'password': 'secret'},
callback=self.after_login)]

def after_login(self, response):
# check login succeed before going on
if "authentication failed" in response.body:
self.log("Login failed", level=log.ERROR)
return





share|improve this answer


























  • That may work but I don't think he needs such a big library for a trivial task like logging in... The same can be done in one of two lines with Python-Requests or even urllib.

    – user2629998
    Apr 2 '14 at 6:18











  • I dont have scrapy right now and I have to ask IT to install this for me, as Python is on a server..

    – lsheng
    Apr 2 '14 at 6:18











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f22802666%2fhow-to-download-file-from-website-that-requires-login-information-using-python%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














This code should work, using Python-Requests - just replace the ... with the actual domain and of course the login data.



from requests import Session

s = Session() # this session will hold the cookies

# here we first login and get our session cookie
s.post("http://.../client_login/client_authorise.asp?action=login", {"companyName":"some_company", "password":"some_password", "username":"some_user", "status":""})

# now we're logged in and can request any page
resp = s.get("http://.../").text

print(resp)





share|improve this answer
























  • Thanks but in the resp variable I still have "><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p>"..... I'm sure the login names are correct

    – lsheng
    Apr 3 '14 at 1:09
















0














This code should work, using Python-Requests - just replace the ... with the actual domain and of course the login data.



from requests import Session

s = Session() # this session will hold the cookies

# here we first login and get our session cookie
s.post("http://.../client_login/client_authorise.asp?action=login", {"companyName":"some_company", "password":"some_password", "username":"some_user", "status":""})

# now we're logged in and can request any page
resp = s.get("http://.../").text

print(resp)





share|improve this answer
























  • Thanks but in the resp variable I still have "><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p>"..... I'm sure the login names are correct

    – lsheng
    Apr 3 '14 at 1:09














0












0








0







This code should work, using Python-Requests - just replace the ... with the actual domain and of course the login data.



from requests import Session

s = Session() # this session will hold the cookies

# here we first login and get our session cookie
s.post("http://.../client_login/client_authorise.asp?action=login", {"companyName":"some_company", "password":"some_password", "username":"some_user", "status":""})

# now we're logged in and can request any page
resp = s.get("http://.../").text

print(resp)





share|improve this answer













This code should work, using Python-Requests - just replace the ... with the actual domain and of course the login data.



from requests import Session

s = Session() # this session will hold the cookies

# here we first login and get our session cookie
s.post("http://.../client_login/client_authorise.asp?action=login", {"companyName":"some_company", "password":"some_password", "username":"some_user", "status":""})

# now we're logged in and can request any page
resp = s.get("http://.../").text

print(resp)






share|improve this answer












share|improve this answer



share|improve this answer










answered Apr 2 '14 at 6:24







user2629998




















  • Thanks but in the resp variable I still have "><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p>"..... I'm sure the login names are correct

    – lsheng
    Apr 3 '14 at 1:09



















  • Thanks but in the resp variable I still have "><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p>"..... I'm sure the login names are correct

    – lsheng
    Apr 3 '14 at 1:09

















Thanks but in the resp variable I still have "><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p>"..... I'm sure the login names are correct

– lsheng
Apr 3 '14 at 1:09





Thanks but in the resp variable I still have "><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p>"..... I'm sure the login names are correct

– lsheng
Apr 3 '14 at 1:09













0














Usign scrapy for crawling that data, Scrapy



And then you can just do this



class LoginSpider(Spider):
name = 'example.com'
start_urls = ['http://www.example.com/users/login.php']

def parse(self, response):
return [FormRequest.from_response(response,
formdata={'username': 'john', 'password': 'secret'},
callback=self.after_login)]

def after_login(self, response):
# check login succeed before going on
if "authentication failed" in response.body:
self.log("Login failed", level=log.ERROR)
return





share|improve this answer


























  • That may work but I don't think he needs such a big library for a trivial task like logging in... The same can be done in one of two lines with Python-Requests or even urllib.

    – user2629998
    Apr 2 '14 at 6:18











  • I dont have scrapy right now and I have to ask IT to install this for me, as Python is on a server..

    – lsheng
    Apr 2 '14 at 6:18
















0














Usign scrapy for crawling that data, Scrapy



And then you can just do this



class LoginSpider(Spider):
name = 'example.com'
start_urls = ['http://www.example.com/users/login.php']

def parse(self, response):
return [FormRequest.from_response(response,
formdata={'username': 'john', 'password': 'secret'},
callback=self.after_login)]

def after_login(self, response):
# check login succeed before going on
if "authentication failed" in response.body:
self.log("Login failed", level=log.ERROR)
return





share|improve this answer


























  • That may work but I don't think he needs such a big library for a trivial task like logging in... The same can be done in one of two lines with Python-Requests or even urllib.

    – user2629998
    Apr 2 '14 at 6:18











  • I dont have scrapy right now and I have to ask IT to install this for me, as Python is on a server..

    – lsheng
    Apr 2 '14 at 6:18














0












0








0







Usign scrapy for crawling that data, Scrapy



And then you can just do this



class LoginSpider(Spider):
name = 'example.com'
start_urls = ['http://www.example.com/users/login.php']

def parse(self, response):
return [FormRequest.from_response(response,
formdata={'username': 'john', 'password': 'secret'},
callback=self.after_login)]

def after_login(self, response):
# check login succeed before going on
if "authentication failed" in response.body:
self.log("Login failed", level=log.ERROR)
return





share|improve this answer















Usign scrapy for crawling that data, Scrapy



And then you can just do this



class LoginSpider(Spider):
name = 'example.com'
start_urls = ['http://www.example.com/users/login.php']

def parse(self, response):
return [FormRequest.from_response(response,
formdata={'username': 'john', 'password': 'secret'},
callback=self.after_login)]

def after_login(self, response):
# check login succeed before going on
if "authentication failed" in response.body:
self.log("Login failed", level=log.ERROR)
return






share|improve this answer














share|improve this answer



share|improve this answer








edited Jan 20 at 13:59









Jakub Bláha

474326




474326










answered Apr 2 '14 at 6:13









pythondjangopythondjango

112




112













  • That may work but I don't think he needs such a big library for a trivial task like logging in... The same can be done in one of two lines with Python-Requests or even urllib.

    – user2629998
    Apr 2 '14 at 6:18











  • I dont have scrapy right now and I have to ask IT to install this for me, as Python is on a server..

    – lsheng
    Apr 2 '14 at 6:18



















  • That may work but I don't think he needs such a big library for a trivial task like logging in... The same can be done in one of two lines with Python-Requests or even urllib.

    – user2629998
    Apr 2 '14 at 6:18











  • I dont have scrapy right now and I have to ask IT to install this for me, as Python is on a server..

    – lsheng
    Apr 2 '14 at 6:18

















That may work but I don't think he needs such a big library for a trivial task like logging in... The same can be done in one of two lines with Python-Requests or even urllib.

– user2629998
Apr 2 '14 at 6:18





That may work but I don't think he needs such a big library for a trivial task like logging in... The same can be done in one of two lines with Python-Requests or even urllib.

– user2629998
Apr 2 '14 at 6:18













I dont have scrapy right now and I have to ask IT to install this for me, as Python is on a server..

– lsheng
Apr 2 '14 at 6:18





I dont have scrapy right now and I have to ask IT to install this for me, as Python is on a server..

– lsheng
Apr 2 '14 at 6:18


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f22802666%2fhow-to-download-file-from-website-that-requires-login-information-using-python%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Callistus III

Ostreoida

Plistias Cous