How to get html with javascript rendered sourcecode by using selenium
I run a query in one web page, then I get result url. If I right click see html source, I can see the html code generated by JS. If I simply use urllib, python cannot get the JS code. So I see some solution using selenium. Here's my code:
from selenium import webdriver
url = 'http://www.archives.com/member/Default.aspx?_act=VitalSearchResult&lastName=Smith&state=UT&country=US&deathYear=2004&deathYearSpan=10&location=UT&activityID=9b79d578-b2a7-4665-9021-b104999cf031&RecordType=2'
driver = webdriver.PhantomJS(executable_path='C:python27scriptsphantomjs.exe')
driver.get(url)
print driver.page_source
>>> <html><head></head><body></body></html> Obviously It's not right!!
Here's the source code I need in right click windows, (I want the INFORMATION part)
</script></div><div class="searchColRight"><div id="topActions" class="clearfix
noPrint"><div id="breadcrumbs" class="left"><a title="Results Summary"
href="Default.aspx? _act=VitalSearchR ...... <<INFORMATION I NEED>> ...
to view the entire record.</p></div><script xmlns:msxsl="urn:schemas-microsoft-com:xslt">
jQuery(document).ready(function() {
jQuery(".ancestry-information-tooltip").actooltip({
href: "#AncestryInformationTooltip", orientation: "bottomleft"});
});
=========== So my question is ===============
How to get the information generated by JS?
javascript python selenium
add a comment |
I run a query in one web page, then I get result url. If I right click see html source, I can see the html code generated by JS. If I simply use urllib, python cannot get the JS code. So I see some solution using selenium. Here's my code:
from selenium import webdriver
url = 'http://www.archives.com/member/Default.aspx?_act=VitalSearchResult&lastName=Smith&state=UT&country=US&deathYear=2004&deathYearSpan=10&location=UT&activityID=9b79d578-b2a7-4665-9021-b104999cf031&RecordType=2'
driver = webdriver.PhantomJS(executable_path='C:python27scriptsphantomjs.exe')
driver.get(url)
print driver.page_source
>>> <html><head></head><body></body></html> Obviously It's not right!!
Here's the source code I need in right click windows, (I want the INFORMATION part)
</script></div><div class="searchColRight"><div id="topActions" class="clearfix
noPrint"><div id="breadcrumbs" class="left"><a title="Results Summary"
href="Default.aspx? _act=VitalSearchR ...... <<INFORMATION I NEED>> ...
to view the entire record.</p></div><script xmlns:msxsl="urn:schemas-microsoft-com:xslt">
jQuery(document).ready(function() {
jQuery(".ancestry-information-tooltip").actooltip({
href: "#AncestryInformationTooltip", orientation: "bottomleft"});
});
=========== So my question is ===============
How to get the information generated by JS?
javascript python selenium
What does the html code you want look like on the page? You will want to use one of selenium'sget_element_by_*functions, but how exactly depends on the html itself.
– Victory
Mar 30 '14 at 2:25
I mean everything. For example, you enter something in google. In the result webpage, right click, see source. That's the "everything" I want.
– MacSanhe
Mar 30 '14 at 2:29
add a comment |
I run a query in one web page, then I get result url. If I right click see html source, I can see the html code generated by JS. If I simply use urllib, python cannot get the JS code. So I see some solution using selenium. Here's my code:
from selenium import webdriver
url = 'http://www.archives.com/member/Default.aspx?_act=VitalSearchResult&lastName=Smith&state=UT&country=US&deathYear=2004&deathYearSpan=10&location=UT&activityID=9b79d578-b2a7-4665-9021-b104999cf031&RecordType=2'
driver = webdriver.PhantomJS(executable_path='C:python27scriptsphantomjs.exe')
driver.get(url)
print driver.page_source
>>> <html><head></head><body></body></html> Obviously It's not right!!
Here's the source code I need in right click windows, (I want the INFORMATION part)
</script></div><div class="searchColRight"><div id="topActions" class="clearfix
noPrint"><div id="breadcrumbs" class="left"><a title="Results Summary"
href="Default.aspx? _act=VitalSearchR ...... <<INFORMATION I NEED>> ...
to view the entire record.</p></div><script xmlns:msxsl="urn:schemas-microsoft-com:xslt">
jQuery(document).ready(function() {
jQuery(".ancestry-information-tooltip").actooltip({
href: "#AncestryInformationTooltip", orientation: "bottomleft"});
});
=========== So my question is ===============
How to get the information generated by JS?
javascript python selenium
I run a query in one web page, then I get result url. If I right click see html source, I can see the html code generated by JS. If I simply use urllib, python cannot get the JS code. So I see some solution using selenium. Here's my code:
from selenium import webdriver
url = 'http://www.archives.com/member/Default.aspx?_act=VitalSearchResult&lastName=Smith&state=UT&country=US&deathYear=2004&deathYearSpan=10&location=UT&activityID=9b79d578-b2a7-4665-9021-b104999cf031&RecordType=2'
driver = webdriver.PhantomJS(executable_path='C:python27scriptsphantomjs.exe')
driver.get(url)
print driver.page_source
>>> <html><head></head><body></body></html> Obviously It's not right!!
Here's the source code I need in right click windows, (I want the INFORMATION part)
</script></div><div class="searchColRight"><div id="topActions" class="clearfix
noPrint"><div id="breadcrumbs" class="left"><a title="Results Summary"
href="Default.aspx? _act=VitalSearchR ...... <<INFORMATION I NEED>> ...
to view the entire record.</p></div><script xmlns:msxsl="urn:schemas-microsoft-com:xslt">
jQuery(document).ready(function() {
jQuery(".ancestry-information-tooltip").actooltip({
href: "#AncestryInformationTooltip", orientation: "bottomleft"});
});
=========== So my question is ===============
How to get the information generated by JS?
javascript python selenium
javascript python selenium
edited Apr 1 '14 at 19:07
MacSanhe
asked Mar 30 '14 at 2:19
MacSanheMacSanhe
74561323
74561323
What does the html code you want look like on the page? You will want to use one of selenium'sget_element_by_*functions, but how exactly depends on the html itself.
– Victory
Mar 30 '14 at 2:25
I mean everything. For example, you enter something in google. In the result webpage, right click, see source. That's the "everything" I want.
– MacSanhe
Mar 30 '14 at 2:29
add a comment |
What does the html code you want look like on the page? You will want to use one of selenium'sget_element_by_*functions, but how exactly depends on the html itself.
– Victory
Mar 30 '14 at 2:25
I mean everything. For example, you enter something in google. In the result webpage, right click, see source. That's the "everything" I want.
– MacSanhe
Mar 30 '14 at 2:29
What does the html code you want look like on the page? You will want to use one of selenium's
get_element_by_* functions, but how exactly depends on the html itself.– Victory
Mar 30 '14 at 2:25
What does the html code you want look like on the page? You will want to use one of selenium's
get_element_by_* functions, but how exactly depends on the html itself.– Victory
Mar 30 '14 at 2:25
I mean everything. For example, you enter something in google. In the result webpage, right click, see source. That's the "everything" I want.
– MacSanhe
Mar 30 '14 at 2:29
I mean everything. For example, you enter something in google. In the result webpage, right click, see source. That's the "everything" I want.
– MacSanhe
Mar 30 '14 at 2:29
add a comment |
6 Answers
6
active
oldest
votes
You will need to get get the document via javascript you can use seleniums execute_script function
from time import sleep # this should go at the top of the file
sleep(5)
html = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")
print html
That will get everything inside of the <html> tag
1
Then I only get: <html><head></head><body></body></html>..... how .... ><
– MacSanhe
Mar 30 '14 at 3:24
It's looks work, but only gives me <html><head></head><body></body></html>, I redefined my question there, could you take a look at the question again please? THank you very much
– MacSanhe
Apr 1 '14 at 19:08
@MacSanhe see my edits, if the page is not fully loaded you will not get all the body content. Also try going to the page and running in your debugger console,document.getElementsByTagName('html')[0].innerHTMLto see how much of the DOM comes through.
– Victory
Apr 1 '14 at 22:32
Does anyone know if theres a way of getting the javascript of a page without using a browser like with Selenium?
– user2540748
Nov 8 '14 at 0:01
1
try to load PhantomJS with this parameters. browser = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true']) It works to me
– Jong Su Park
Jun 26 '15 at 5:01
|
show 4 more comments
It's not necessary to use that workaround, you can use instead:
driver = webdriver.PhantomJS()
driver.get('http://www.google.com/')
html = driver.find_element_by_tag_name('html').get_attribute('innerHTML')
add a comment |
I am thinking that you are getting the source code before the JavaScript has rendered the dynamic HTML.
Initially try putting a few seconds sleep between the navigate and get page source.
If this works, then you can change to a different wait strategy.
add a comment |
You try Dryscrape this browser is fully supported heavy js codes try it i hope it work for you
this is a comment, not an answer
– Jack Flamp
Dec 11 '17 at 20:55
add a comment |
I met the same problem and finally solved by desired_capabilities.
from selenium import webdriver
from selenium.webdriver.common.proxy import Proxy
from selenium.webdriver.common.proxy import ProxyType
proxy = Proxy(
{
'proxyType': ProxyType.MANUAL,
'httpProxy': 'ip_or_host:port'
}
)
desired_capabilities = webdriver.DesiredCapabilities.PHANTOMJS.copy()
proxy.add_to_capabilities(desired_capabilities)
driver = webdriver.PhantomJS(desired_capabilities=desired_capabilities)
driver.get('test_url')
print driver.page_source
it's an old and slightly outdated answer but it gave me an idea on catching javascript with mitmproxy, so +1
– strangeqargo
Dec 3 '18 at 18:16
add a comment |
I have same problem about getting Javascript sourcecode from Internet, and I solved it using above Victory's suggestion.
*First, execute_script
driver=webdriver.Chrome()
driver.get(urls)
innerHTML = driver.execute_script("return document.body.innerHTML")
#print(driver.page_source)
*Second, parse html using beautifulsoup(You can Downloaded beautifulsoup by pip command)
import bs4 #import beautifulsoup
import re
from time import sleep
sleep(1) #wait one second
root=bs4.BeautifulSoup(innerHTML,"lxml") #parse HTML using beautifulsoup
viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'}) #find the value which you need.
*Third, print out the value you need
for span in viewcount:
print(span.string)
*Full code
from selenium import webdriver
import lxml
urls="http://www.archives.com/member/Default.aspx?_act=VitalSearchResult&lastName=Smith&state=UT&country=US&deathYear=2004&deathYearSpan=10&location=UT&activityID=9b79d578-b2a7-4665-9021-b104999cf031&RecordType=2"
driver = webdriver.PhantomJS()
##driver=webdriver.Chrome()
driver.get(urls)
innerHTML = driver.execute_script("return document.body.innerHTML")
##print(driver.page_source)
import bs4
import re
from time import sleep
sleep(1)
root=bs4.BeautifulSoup(innerHTML,"lxml")
viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'})
for span in viewcount:
print(span.string)
driver.quit()
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f22739514%2fhow-to-get-html-with-javascript-rendered-sourcecode-by-using-selenium%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
You will need to get get the document via javascript you can use seleniums execute_script function
from time import sleep # this should go at the top of the file
sleep(5)
html = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")
print html
That will get everything inside of the <html> tag
1
Then I only get: <html><head></head><body></body></html>..... how .... ><
– MacSanhe
Mar 30 '14 at 3:24
It's looks work, but only gives me <html><head></head><body></body></html>, I redefined my question there, could you take a look at the question again please? THank you very much
– MacSanhe
Apr 1 '14 at 19:08
@MacSanhe see my edits, if the page is not fully loaded you will not get all the body content. Also try going to the page and running in your debugger console,document.getElementsByTagName('html')[0].innerHTMLto see how much of the DOM comes through.
– Victory
Apr 1 '14 at 22:32
Does anyone know if theres a way of getting the javascript of a page without using a browser like with Selenium?
– user2540748
Nov 8 '14 at 0:01
1
try to load PhantomJS with this parameters. browser = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true']) It works to me
– Jong Su Park
Jun 26 '15 at 5:01
|
show 4 more comments
You will need to get get the document via javascript you can use seleniums execute_script function
from time import sleep # this should go at the top of the file
sleep(5)
html = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")
print html
That will get everything inside of the <html> tag
1
Then I only get: <html><head></head><body></body></html>..... how .... ><
– MacSanhe
Mar 30 '14 at 3:24
It's looks work, but only gives me <html><head></head><body></body></html>, I redefined my question there, could you take a look at the question again please? THank you very much
– MacSanhe
Apr 1 '14 at 19:08
@MacSanhe see my edits, if the page is not fully loaded you will not get all the body content. Also try going to the page and running in your debugger console,document.getElementsByTagName('html')[0].innerHTMLto see how much of the DOM comes through.
– Victory
Apr 1 '14 at 22:32
Does anyone know if theres a way of getting the javascript of a page without using a browser like with Selenium?
– user2540748
Nov 8 '14 at 0:01
1
try to load PhantomJS with this parameters. browser = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true']) It works to me
– Jong Su Park
Jun 26 '15 at 5:01
|
show 4 more comments
You will need to get get the document via javascript you can use seleniums execute_script function
from time import sleep # this should go at the top of the file
sleep(5)
html = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")
print html
That will get everything inside of the <html> tag
You will need to get get the document via javascript you can use seleniums execute_script function
from time import sleep # this should go at the top of the file
sleep(5)
html = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")
print html
That will get everything inside of the <html> tag
edited Apr 1 '14 at 22:30
answered Mar 30 '14 at 2:35
VictoryVictory
4,5621640
4,5621640
1
Then I only get: <html><head></head><body></body></html>..... how .... ><
– MacSanhe
Mar 30 '14 at 3:24
It's looks work, but only gives me <html><head></head><body></body></html>, I redefined my question there, could you take a look at the question again please? THank you very much
– MacSanhe
Apr 1 '14 at 19:08
@MacSanhe see my edits, if the page is not fully loaded you will not get all the body content. Also try going to the page and running in your debugger console,document.getElementsByTagName('html')[0].innerHTMLto see how much of the DOM comes through.
– Victory
Apr 1 '14 at 22:32
Does anyone know if theres a way of getting the javascript of a page without using a browser like with Selenium?
– user2540748
Nov 8 '14 at 0:01
1
try to load PhantomJS with this parameters. browser = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true']) It works to me
– Jong Su Park
Jun 26 '15 at 5:01
|
show 4 more comments
1
Then I only get: <html><head></head><body></body></html>..... how .... ><
– MacSanhe
Mar 30 '14 at 3:24
It's looks work, but only gives me <html><head></head><body></body></html>, I redefined my question there, could you take a look at the question again please? THank you very much
– MacSanhe
Apr 1 '14 at 19:08
@MacSanhe see my edits, if the page is not fully loaded you will not get all the body content. Also try going to the page and running in your debugger console,document.getElementsByTagName('html')[0].innerHTMLto see how much of the DOM comes through.
– Victory
Apr 1 '14 at 22:32
Does anyone know if theres a way of getting the javascript of a page without using a browser like with Selenium?
– user2540748
Nov 8 '14 at 0:01
1
try to load PhantomJS with this parameters. browser = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true']) It works to me
– Jong Su Park
Jun 26 '15 at 5:01
1
1
Then I only get: <html><head></head><body></body></html>..... how .... ><
– MacSanhe
Mar 30 '14 at 3:24
Then I only get: <html><head></head><body></body></html>..... how .... ><
– MacSanhe
Mar 30 '14 at 3:24
It's looks work, but only gives me <html><head></head><body></body></html>, I redefined my question there, could you take a look at the question again please? THank you very much
– MacSanhe
Apr 1 '14 at 19:08
It's looks work, but only gives me <html><head></head><body></body></html>, I redefined my question there, could you take a look at the question again please? THank you very much
– MacSanhe
Apr 1 '14 at 19:08
@MacSanhe see my edits, if the page is not fully loaded you will not get all the body content. Also try going to the page and running in your debugger console,
document.getElementsByTagName('html')[0].innerHTML to see how much of the DOM comes through.– Victory
Apr 1 '14 at 22:32
@MacSanhe see my edits, if the page is not fully loaded you will not get all the body content. Also try going to the page and running in your debugger console,
document.getElementsByTagName('html')[0].innerHTML to see how much of the DOM comes through.– Victory
Apr 1 '14 at 22:32
Does anyone know if theres a way of getting the javascript of a page without using a browser like with Selenium?
– user2540748
Nov 8 '14 at 0:01
Does anyone know if theres a way of getting the javascript of a page without using a browser like with Selenium?
– user2540748
Nov 8 '14 at 0:01
1
1
try to load PhantomJS with this parameters. browser = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true']) It works to me
– Jong Su Park
Jun 26 '15 at 5:01
try to load PhantomJS with this parameters. browser = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true']) It works to me
– Jong Su Park
Jun 26 '15 at 5:01
|
show 4 more comments
It's not necessary to use that workaround, you can use instead:
driver = webdriver.PhantomJS()
driver.get('http://www.google.com/')
html = driver.find_element_by_tag_name('html').get_attribute('innerHTML')
add a comment |
It's not necessary to use that workaround, you can use instead:
driver = webdriver.PhantomJS()
driver.get('http://www.google.com/')
html = driver.find_element_by_tag_name('html').get_attribute('innerHTML')
add a comment |
It's not necessary to use that workaround, you can use instead:
driver = webdriver.PhantomJS()
driver.get('http://www.google.com/')
html = driver.find_element_by_tag_name('html').get_attribute('innerHTML')
It's not necessary to use that workaround, you can use instead:
driver = webdriver.PhantomJS()
driver.get('http://www.google.com/')
html = driver.find_element_by_tag_name('html').get_attribute('innerHTML')
edited May 2 '17 at 9:55
answered Apr 22 '17 at 22:06
Darius MorawiecDarius Morawiec
3,12411934
3,12411934
add a comment |
add a comment |
I am thinking that you are getting the source code before the JavaScript has rendered the dynamic HTML.
Initially try putting a few seconds sleep between the navigate and get page source.
If this works, then you can change to a different wait strategy.
add a comment |
I am thinking that you are getting the source code before the JavaScript has rendered the dynamic HTML.
Initially try putting a few seconds sleep between the navigate and get page source.
If this works, then you can change to a different wait strategy.
add a comment |
I am thinking that you are getting the source code before the JavaScript has rendered the dynamic HTML.
Initially try putting a few seconds sleep between the navigate and get page source.
If this works, then you can change to a different wait strategy.
I am thinking that you are getting the source code before the JavaScript has rendered the dynamic HTML.
Initially try putting a few seconds sleep between the navigate and get page source.
If this works, then you can change to a different wait strategy.
answered Mar 30 '14 at 14:55
Robbie WarehamRobbie Wareham
3,21211334
3,21211334
add a comment |
add a comment |
You try Dryscrape this browser is fully supported heavy js codes try it i hope it work for you
this is a comment, not an answer
– Jack Flamp
Dec 11 '17 at 20:55
add a comment |
You try Dryscrape this browser is fully supported heavy js codes try it i hope it work for you
this is a comment, not an answer
– Jack Flamp
Dec 11 '17 at 20:55
add a comment |
You try Dryscrape this browser is fully supported heavy js codes try it i hope it work for you
You try Dryscrape this browser is fully supported heavy js codes try it i hope it work for you
answered Dec 11 '17 at 20:36
Harry1992Harry1992
10510
10510
this is a comment, not an answer
– Jack Flamp
Dec 11 '17 at 20:55
add a comment |
this is a comment, not an answer
– Jack Flamp
Dec 11 '17 at 20:55
this is a comment, not an answer
– Jack Flamp
Dec 11 '17 at 20:55
this is a comment, not an answer
– Jack Flamp
Dec 11 '17 at 20:55
add a comment |
I met the same problem and finally solved by desired_capabilities.
from selenium import webdriver
from selenium.webdriver.common.proxy import Proxy
from selenium.webdriver.common.proxy import ProxyType
proxy = Proxy(
{
'proxyType': ProxyType.MANUAL,
'httpProxy': 'ip_or_host:port'
}
)
desired_capabilities = webdriver.DesiredCapabilities.PHANTOMJS.copy()
proxy.add_to_capabilities(desired_capabilities)
driver = webdriver.PhantomJS(desired_capabilities=desired_capabilities)
driver.get('test_url')
print driver.page_source
it's an old and slightly outdated answer but it gave me an idea on catching javascript with mitmproxy, so +1
– strangeqargo
Dec 3 '18 at 18:16
add a comment |
I met the same problem and finally solved by desired_capabilities.
from selenium import webdriver
from selenium.webdriver.common.proxy import Proxy
from selenium.webdriver.common.proxy import ProxyType
proxy = Proxy(
{
'proxyType': ProxyType.MANUAL,
'httpProxy': 'ip_or_host:port'
}
)
desired_capabilities = webdriver.DesiredCapabilities.PHANTOMJS.copy()
proxy.add_to_capabilities(desired_capabilities)
driver = webdriver.PhantomJS(desired_capabilities=desired_capabilities)
driver.get('test_url')
print driver.page_source
it's an old and slightly outdated answer but it gave me an idea on catching javascript with mitmproxy, so +1
– strangeqargo
Dec 3 '18 at 18:16
add a comment |
I met the same problem and finally solved by desired_capabilities.
from selenium import webdriver
from selenium.webdriver.common.proxy import Proxy
from selenium.webdriver.common.proxy import ProxyType
proxy = Proxy(
{
'proxyType': ProxyType.MANUAL,
'httpProxy': 'ip_or_host:port'
}
)
desired_capabilities = webdriver.DesiredCapabilities.PHANTOMJS.copy()
proxy.add_to_capabilities(desired_capabilities)
driver = webdriver.PhantomJS(desired_capabilities=desired_capabilities)
driver.get('test_url')
print driver.page_source
I met the same problem and finally solved by desired_capabilities.
from selenium import webdriver
from selenium.webdriver.common.proxy import Proxy
from selenium.webdriver.common.proxy import ProxyType
proxy = Proxy(
{
'proxyType': ProxyType.MANUAL,
'httpProxy': 'ip_or_host:port'
}
)
desired_capabilities = webdriver.DesiredCapabilities.PHANTOMJS.copy()
proxy.add_to_capabilities(desired_capabilities)
driver = webdriver.PhantomJS(desired_capabilities=desired_capabilities)
driver.get('test_url')
print driver.page_source
edited Dec 3 '18 at 22:14
strangeqargo
1,0511818
1,0511818
answered May 24 '17 at 7:15
VidaVida
9514
9514
it's an old and slightly outdated answer but it gave me an idea on catching javascript with mitmproxy, so +1
– strangeqargo
Dec 3 '18 at 18:16
add a comment |
it's an old and slightly outdated answer but it gave me an idea on catching javascript with mitmproxy, so +1
– strangeqargo
Dec 3 '18 at 18:16
it's an old and slightly outdated answer but it gave me an idea on catching javascript with mitmproxy, so +1
– strangeqargo
Dec 3 '18 at 18:16
it's an old and slightly outdated answer but it gave me an idea on catching javascript with mitmproxy, so +1
– strangeqargo
Dec 3 '18 at 18:16
add a comment |
I have same problem about getting Javascript sourcecode from Internet, and I solved it using above Victory's suggestion.
*First, execute_script
driver=webdriver.Chrome()
driver.get(urls)
innerHTML = driver.execute_script("return document.body.innerHTML")
#print(driver.page_source)
*Second, parse html using beautifulsoup(You can Downloaded beautifulsoup by pip command)
import bs4 #import beautifulsoup
import re
from time import sleep
sleep(1) #wait one second
root=bs4.BeautifulSoup(innerHTML,"lxml") #parse HTML using beautifulsoup
viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'}) #find the value which you need.
*Third, print out the value you need
for span in viewcount:
print(span.string)
*Full code
from selenium import webdriver
import lxml
urls="http://www.archives.com/member/Default.aspx?_act=VitalSearchResult&lastName=Smith&state=UT&country=US&deathYear=2004&deathYearSpan=10&location=UT&activityID=9b79d578-b2a7-4665-9021-b104999cf031&RecordType=2"
driver = webdriver.PhantomJS()
##driver=webdriver.Chrome()
driver.get(urls)
innerHTML = driver.execute_script("return document.body.innerHTML")
##print(driver.page_source)
import bs4
import re
from time import sleep
sleep(1)
root=bs4.BeautifulSoup(innerHTML,"lxml")
viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'})
for span in viewcount:
print(span.string)
driver.quit()
add a comment |
I have same problem about getting Javascript sourcecode from Internet, and I solved it using above Victory's suggestion.
*First, execute_script
driver=webdriver.Chrome()
driver.get(urls)
innerHTML = driver.execute_script("return document.body.innerHTML")
#print(driver.page_source)
*Second, parse html using beautifulsoup(You can Downloaded beautifulsoup by pip command)
import bs4 #import beautifulsoup
import re
from time import sleep
sleep(1) #wait one second
root=bs4.BeautifulSoup(innerHTML,"lxml") #parse HTML using beautifulsoup
viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'}) #find the value which you need.
*Third, print out the value you need
for span in viewcount:
print(span.string)
*Full code
from selenium import webdriver
import lxml
urls="http://www.archives.com/member/Default.aspx?_act=VitalSearchResult&lastName=Smith&state=UT&country=US&deathYear=2004&deathYearSpan=10&location=UT&activityID=9b79d578-b2a7-4665-9021-b104999cf031&RecordType=2"
driver = webdriver.PhantomJS()
##driver=webdriver.Chrome()
driver.get(urls)
innerHTML = driver.execute_script("return document.body.innerHTML")
##print(driver.page_source)
import bs4
import re
from time import sleep
sleep(1)
root=bs4.BeautifulSoup(innerHTML,"lxml")
viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'})
for span in viewcount:
print(span.string)
driver.quit()
add a comment |
I have same problem about getting Javascript sourcecode from Internet, and I solved it using above Victory's suggestion.
*First, execute_script
driver=webdriver.Chrome()
driver.get(urls)
innerHTML = driver.execute_script("return document.body.innerHTML")
#print(driver.page_source)
*Second, parse html using beautifulsoup(You can Downloaded beautifulsoup by pip command)
import bs4 #import beautifulsoup
import re
from time import sleep
sleep(1) #wait one second
root=bs4.BeautifulSoup(innerHTML,"lxml") #parse HTML using beautifulsoup
viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'}) #find the value which you need.
*Third, print out the value you need
for span in viewcount:
print(span.string)
*Full code
from selenium import webdriver
import lxml
urls="http://www.archives.com/member/Default.aspx?_act=VitalSearchResult&lastName=Smith&state=UT&country=US&deathYear=2004&deathYearSpan=10&location=UT&activityID=9b79d578-b2a7-4665-9021-b104999cf031&RecordType=2"
driver = webdriver.PhantomJS()
##driver=webdriver.Chrome()
driver.get(urls)
innerHTML = driver.execute_script("return document.body.innerHTML")
##print(driver.page_source)
import bs4
import re
from time import sleep
sleep(1)
root=bs4.BeautifulSoup(innerHTML,"lxml")
viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'})
for span in viewcount:
print(span.string)
driver.quit()
I have same problem about getting Javascript sourcecode from Internet, and I solved it using above Victory's suggestion.
*First, execute_script
driver=webdriver.Chrome()
driver.get(urls)
innerHTML = driver.execute_script("return document.body.innerHTML")
#print(driver.page_source)
*Second, parse html using beautifulsoup(You can Downloaded beautifulsoup by pip command)
import bs4 #import beautifulsoup
import re
from time import sleep
sleep(1) #wait one second
root=bs4.BeautifulSoup(innerHTML,"lxml") #parse HTML using beautifulsoup
viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'}) #find the value which you need.
*Third, print out the value you need
for span in viewcount:
print(span.string)
*Full code
from selenium import webdriver
import lxml
urls="http://www.archives.com/member/Default.aspx?_act=VitalSearchResult&lastName=Smith&state=UT&country=US&deathYear=2004&deathYearSpan=10&location=UT&activityID=9b79d578-b2a7-4665-9021-b104999cf031&RecordType=2"
driver = webdriver.PhantomJS()
##driver=webdriver.Chrome()
driver.get(urls)
innerHTML = driver.execute_script("return document.body.innerHTML")
##print(driver.page_source)
import bs4
import re
from time import sleep
sleep(1)
root=bs4.BeautifulSoup(innerHTML,"lxml")
viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'})
for span in viewcount:
print(span.string)
driver.quit()
edited Jan 20 at 11:29
answered Jan 20 at 6:53
kuo changkuo chang
313
313
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f22739514%2fhow-to-get-html-with-javascript-rendered-sourcecode-by-using-selenium%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
What does the html code you want look like on the page? You will want to use one of selenium's
get_element_by_*functions, but how exactly depends on the html itself.– Victory
Mar 30 '14 at 2:25
I mean everything. For example, you enter something in google. In the result webpage, right click, see source. That's the "everything" I want.
– MacSanhe
Mar 30 '14 at 2:29