How to download file from website that requires login information using Python?
I am trying to download some data from a website using Python. If you simply copy and paste the url, it shows nothing unless you fill in the login information. I have the login name and password, however how should I include these in Python?
My current code is:
import urllib, urllib2, cookielib
username = my_user_name
password = my_pwd
link = 'www.google.com' # just for instance
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'j_password' : password})
opener.open(link, login_data)
resp = opener.open(link,login_data)
print resp.read()
There is no error pops out, however resp.read() is a bunch of CSS and it only has the messages like "you have to login before reading news here."
So how can I retrieve the page that after logging in?
Just noticed that the website requires 3 entries:
Company:
Username:
Password:
I have all of them but how can I put all three in the login variable?
If I run it without login it returns:
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
opener.open(dd)
resp = opener.open(dd)
print resp.read()
Here is the print-outs:
<DIV id=header>
<DIV id=strapline><!-- login_display -->
<P><FONT color=#000000>All third party users of this website and/or data produced by the Baltic do so at their own risk. The Baltic owes no duty of care or any other obligation to any party other than the contractual obligations which it owes to its direct contractual partners. </FONT></P><IMG src="images/top-strap.gif"> <!-- template [strapline]--></DIV><!-- end strapline -->
<DIV id=memberNav>
<FORM class=members id=form1 name=form1 action=client_login/client_authorise.asp?action=login method=post onsubmits="return check()">
python html login web urllib2
add a comment |
I am trying to download some data from a website using Python. If you simply copy and paste the url, it shows nothing unless you fill in the login information. I have the login name and password, however how should I include these in Python?
My current code is:
import urllib, urllib2, cookielib
username = my_user_name
password = my_pwd
link = 'www.google.com' # just for instance
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'j_password' : password})
opener.open(link, login_data)
resp = opener.open(link,login_data)
print resp.read()
There is no error pops out, however resp.read() is a bunch of CSS and it only has the messages like "you have to login before reading news here."
So how can I retrieve the page that after logging in?
Just noticed that the website requires 3 entries:
Company:
Username:
Password:
I have all of them but how can I put all three in the login variable?
If I run it without login it returns:
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
opener.open(dd)
resp = opener.open(dd)
print resp.read()
Here is the print-outs:
<DIV id=header>
<DIV id=strapline><!-- login_display -->
<P><FONT color=#000000>All third party users of this website and/or data produced by the Baltic do so at their own risk. The Baltic owes no duty of care or any other obligation to any party other than the contractual obligations which it owes to its direct contractual partners. </FONT></P><IMG src="images/top-strap.gif"> <!-- template [strapline]--></DIV><!-- end strapline -->
<DIV id=memberNav>
<FORM class=members id=form1 name=form1 action=client_login/client_authorise.asp?action=login method=post onsubmits="return check()">
python html login web urllib2
It doesnt work, the print resp.read() still returns "<td><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p></td>"
– lsheng
Apr 2 '14 at 6:06
@André I have noticed that the page needs 3 items in logging in, I have all of them but Im not sure how should I put it in the login_info?
– lsheng
Apr 2 '14 at 6:12
I have edited it but not sure if this is what you asked for. I didnt find <form> in the print resp.read() results..
– lsheng
Apr 2 '14 at 6:24
add a comment |
I am trying to download some data from a website using Python. If you simply copy and paste the url, it shows nothing unless you fill in the login information. I have the login name and password, however how should I include these in Python?
My current code is:
import urllib, urllib2, cookielib
username = my_user_name
password = my_pwd
link = 'www.google.com' # just for instance
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'j_password' : password})
opener.open(link, login_data)
resp = opener.open(link,login_data)
print resp.read()
There is no error pops out, however resp.read() is a bunch of CSS and it only has the messages like "you have to login before reading news here."
So how can I retrieve the page that after logging in?
Just noticed that the website requires 3 entries:
Company:
Username:
Password:
I have all of them but how can I put all three in the login variable?
If I run it without login it returns:
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
opener.open(dd)
resp = opener.open(dd)
print resp.read()
Here is the print-outs:
<DIV id=header>
<DIV id=strapline><!-- login_display -->
<P><FONT color=#000000>All third party users of this website and/or data produced by the Baltic do so at their own risk. The Baltic owes no duty of care or any other obligation to any party other than the contractual obligations which it owes to its direct contractual partners. </FONT></P><IMG src="images/top-strap.gif"> <!-- template [strapline]--></DIV><!-- end strapline -->
<DIV id=memberNav>
<FORM class=members id=form1 name=form1 action=client_login/client_authorise.asp?action=login method=post onsubmits="return check()">
python html login web urllib2
I am trying to download some data from a website using Python. If you simply copy and paste the url, it shows nothing unless you fill in the login information. I have the login name and password, however how should I include these in Python?
My current code is:
import urllib, urllib2, cookielib
username = my_user_name
password = my_pwd
link = 'www.google.com' # just for instance
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'j_password' : password})
opener.open(link, login_data)
resp = opener.open(link,login_data)
print resp.read()
There is no error pops out, however resp.read() is a bunch of CSS and it only has the messages like "you have to login before reading news here."
So how can I retrieve the page that after logging in?
Just noticed that the website requires 3 entries:
Company:
Username:
Password:
I have all of them but how can I put all three in the login variable?
If I run it without login it returns:
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
opener.open(dd)
resp = opener.open(dd)
print resp.read()
Here is the print-outs:
<DIV id=header>
<DIV id=strapline><!-- login_display -->
<P><FONT color=#000000>All third party users of this website and/or data produced by the Baltic do so at their own risk. The Baltic owes no duty of care or any other obligation to any party other than the contractual obligations which it owes to its direct contractual partners. </FONT></P><IMG src="images/top-strap.gif"> <!-- template [strapline]--></DIV><!-- end strapline -->
<DIV id=memberNav>
<FORM class=members id=form1 name=form1 action=client_login/client_authorise.asp?action=login method=post onsubmits="return check()">
python html login web urllib2
python html login web urllib2
edited Oct 31 '16 at 13:11
tmthydvnprt
5,31033158
5,31033158
asked Apr 2 '14 at 5:55
lshenglsheng
1,71232133
1,71232133
It doesnt work, the print resp.read() still returns "<td><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p></td>"
– lsheng
Apr 2 '14 at 6:06
@André I have noticed that the page needs 3 items in logging in, I have all of them but Im not sure how should I put it in the login_info?
– lsheng
Apr 2 '14 at 6:12
I have edited it but not sure if this is what you asked for. I didnt find <form> in the print resp.read() results..
– lsheng
Apr 2 '14 at 6:24
add a comment |
It doesnt work, the print resp.read() still returns "<td><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p></td>"
– lsheng
Apr 2 '14 at 6:06
@André I have noticed that the page needs 3 items in logging in, I have all of them but Im not sure how should I put it in the login_info?
– lsheng
Apr 2 '14 at 6:12
I have edited it but not sure if this is what you asked for. I didnt find <form> in the print resp.read() results..
– lsheng
Apr 2 '14 at 6:24
It doesnt work, the print resp.read() still returns "<td><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p></td>"
– lsheng
Apr 2 '14 at 6:06
It doesnt work, the print resp.read() still returns "<td><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p></td>"
– lsheng
Apr 2 '14 at 6:06
@André I have noticed that the page needs 3 items in logging in, I have all of them but Im not sure how should I put it in the login_info?
– lsheng
Apr 2 '14 at 6:12
@André I have noticed that the page needs 3 items in logging in, I have all of them but Im not sure how should I put it in the login_info?
– lsheng
Apr 2 '14 at 6:12
I have edited it but not sure if this is what you asked for. I didnt find <form> in the print resp.read() results..
– lsheng
Apr 2 '14 at 6:24
I have edited it but not sure if this is what you asked for. I didnt find <form> in the print resp.read() results..
– lsheng
Apr 2 '14 at 6:24
add a comment |
2 Answers
2
active
oldest
votes
This code should work, using Python-Requests - just replace the ... with the actual domain and of course the login data.
from requests import Session
s = Session() # this session will hold the cookies
# here we first login and get our session cookie
s.post("http://.../client_login/client_authorise.asp?action=login", {"companyName":"some_company", "password":"some_password", "username":"some_user", "status":""})
# now we're logged in and can request any page
resp = s.get("http://.../").text
print(resp)
Thanks but in the resp variable I still have "><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p>"..... I'm sure the login names are correct
– lsheng
Apr 3 '14 at 1:09
add a comment |
Usign scrapy for crawling that data, Scrapy
And then you can just do this
class LoginSpider(Spider):
name = 'example.com'
start_urls = ['http://www.example.com/users/login.php']
def parse(self, response):
return [FormRequest.from_response(response,
formdata={'username': 'john', 'password': 'secret'},
callback=self.after_login)]
def after_login(self, response):
# check login succeed before going on
if "authentication failed" in response.body:
self.log("Login failed", level=log.ERROR)
return
That may work but I don't think he needs such a big library for a trivial task like logging in... The same can be done in one of two lines with Python-Requests or even urllib.
– user2629998
Apr 2 '14 at 6:18
I dont have scrapy right now and I have to ask IT to install this for me, as Python is on a server..
– lsheng
Apr 2 '14 at 6:18
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f22802666%2fhow-to-download-file-from-website-that-requires-login-information-using-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
This code should work, using Python-Requests - just replace the ... with the actual domain and of course the login data.
from requests import Session
s = Session() # this session will hold the cookies
# here we first login and get our session cookie
s.post("http://.../client_login/client_authorise.asp?action=login", {"companyName":"some_company", "password":"some_password", "username":"some_user", "status":""})
# now we're logged in and can request any page
resp = s.get("http://.../").text
print(resp)
Thanks but in the resp variable I still have "><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p>"..... I'm sure the login names are correct
– lsheng
Apr 3 '14 at 1:09
add a comment |
This code should work, using Python-Requests - just replace the ... with the actual domain and of course the login data.
from requests import Session
s = Session() # this session will hold the cookies
# here we first login and get our session cookie
s.post("http://.../client_login/client_authorise.asp?action=login", {"companyName":"some_company", "password":"some_password", "username":"some_user", "status":""})
# now we're logged in and can request any page
resp = s.get("http://.../").text
print(resp)
Thanks but in the resp variable I still have "><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p>"..... I'm sure the login names are correct
– lsheng
Apr 3 '14 at 1:09
add a comment |
This code should work, using Python-Requests - just replace the ... with the actual domain and of course the login data.
from requests import Session
s = Session() # this session will hold the cookies
# here we first login and get our session cookie
s.post("http://.../client_login/client_authorise.asp?action=login", {"companyName":"some_company", "password":"some_password", "username":"some_user", "status":""})
# now we're logged in and can request any page
resp = s.get("http://.../").text
print(resp)
This code should work, using Python-Requests - just replace the ... with the actual domain and of course the login data.
from requests import Session
s = Session() # this session will hold the cookies
# here we first login and get our session cookie
s.post("http://.../client_login/client_authorise.asp?action=login", {"companyName":"some_company", "password":"some_password", "username":"some_user", "status":""})
# now we're logged in and can request any page
resp = s.get("http://.../").text
print(resp)
answered Apr 2 '14 at 6:24
user2629998
Thanks but in the resp variable I still have "><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p>"..... I'm sure the login names are correct
– lsheng
Apr 3 '14 at 1:09
add a comment |
Thanks but in the resp variable I still have "><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p>"..... I'm sure the login names are correct
– lsheng
Apr 3 '14 at 1:09
Thanks but in the resp variable I still have "><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p>"..... I'm sure the login names are correct
– lsheng
Apr 3 '14 at 1:09
Thanks but in the resp variable I still have "><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p>"..... I'm sure the login names are correct
– lsheng
Apr 3 '14 at 1:09
add a comment |
Usign scrapy for crawling that data, Scrapy
And then you can just do this
class LoginSpider(Spider):
name = 'example.com'
start_urls = ['http://www.example.com/users/login.php']
def parse(self, response):
return [FormRequest.from_response(response,
formdata={'username': 'john', 'password': 'secret'},
callback=self.after_login)]
def after_login(self, response):
# check login succeed before going on
if "authentication failed" in response.body:
self.log("Login failed", level=log.ERROR)
return
That may work but I don't think he needs such a big library for a trivial task like logging in... The same can be done in one of two lines with Python-Requests or even urllib.
– user2629998
Apr 2 '14 at 6:18
I dont have scrapy right now and I have to ask IT to install this for me, as Python is on a server..
– lsheng
Apr 2 '14 at 6:18
add a comment |
Usign scrapy for crawling that data, Scrapy
And then you can just do this
class LoginSpider(Spider):
name = 'example.com'
start_urls = ['http://www.example.com/users/login.php']
def parse(self, response):
return [FormRequest.from_response(response,
formdata={'username': 'john', 'password': 'secret'},
callback=self.after_login)]
def after_login(self, response):
# check login succeed before going on
if "authentication failed" in response.body:
self.log("Login failed", level=log.ERROR)
return
That may work but I don't think he needs such a big library for a trivial task like logging in... The same can be done in one of two lines with Python-Requests or even urllib.
– user2629998
Apr 2 '14 at 6:18
I dont have scrapy right now and I have to ask IT to install this for me, as Python is on a server..
– lsheng
Apr 2 '14 at 6:18
add a comment |
Usign scrapy for crawling that data, Scrapy
And then you can just do this
class LoginSpider(Spider):
name = 'example.com'
start_urls = ['http://www.example.com/users/login.php']
def parse(self, response):
return [FormRequest.from_response(response,
formdata={'username': 'john', 'password': 'secret'},
callback=self.after_login)]
def after_login(self, response):
# check login succeed before going on
if "authentication failed" in response.body:
self.log("Login failed", level=log.ERROR)
return
Usign scrapy for crawling that data, Scrapy
And then you can just do this
class LoginSpider(Spider):
name = 'example.com'
start_urls = ['http://www.example.com/users/login.php']
def parse(self, response):
return [FormRequest.from_response(response,
formdata={'username': 'john', 'password': 'secret'},
callback=self.after_login)]
def after_login(self, response):
# check login succeed before going on
if "authentication failed" in response.body:
self.log("Login failed", level=log.ERROR)
return
edited Jan 20 at 13:59
Jakub Bláha
474326
474326
answered Apr 2 '14 at 6:13
pythondjangopythondjango
112
112
That may work but I don't think he needs such a big library for a trivial task like logging in... The same can be done in one of two lines with Python-Requests or even urllib.
– user2629998
Apr 2 '14 at 6:18
I dont have scrapy right now and I have to ask IT to install this for me, as Python is on a server..
– lsheng
Apr 2 '14 at 6:18
add a comment |
That may work but I don't think he needs such a big library for a trivial task like logging in... The same can be done in one of two lines with Python-Requests or even urllib.
– user2629998
Apr 2 '14 at 6:18
I dont have scrapy right now and I have to ask IT to install this for me, as Python is on a server..
– lsheng
Apr 2 '14 at 6:18
That may work but I don't think he needs such a big library for a trivial task like logging in... The same can be done in one of two lines with Python-Requests or even urllib.
– user2629998
Apr 2 '14 at 6:18
That may work but I don't think he needs such a big library for a trivial task like logging in... The same can be done in one of two lines with Python-Requests or even urllib.
– user2629998
Apr 2 '14 at 6:18
I dont have scrapy right now and I have to ask IT to install this for me, as Python is on a server..
– lsheng
Apr 2 '14 at 6:18
I dont have scrapy right now and I have to ask IT to install this for me, as Python is on a server..
– lsheng
Apr 2 '14 at 6:18
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f22802666%2fhow-to-download-file-from-website-that-requires-login-information-using-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
It doesnt work, the print resp.read() still returns "<td><p>Access to this data is by subscription only. <a href="freetrialapplication/">Click here</a> for a free trial.</p></td>"
– lsheng
Apr 2 '14 at 6:06
@André I have noticed that the page needs 3 items in logging in, I have all of them but Im not sure how should I put it in the login_info?
– lsheng
Apr 2 '14 at 6:12
I have edited it but not sure if this is what you asked for. I didnt find <form> in the print resp.read() results..
– lsheng
Apr 2 '14 at 6:24