How to get html with javascript rendered sourcecode by using selenium

I run a query in one web page, then I get result url. If I right click see html source, I can see the html code generated by JS. If I simply use urllib, python cannot get the JS code. So I see some solution using selenium. Here's my code:

from selenium import webdriver

url = 'http://www.archives.com/member/Default.aspx?_act=VitalSearchResult&lastName=Smith&state=UT&country=US&deathYear=2004&deathYearSpan=10&location=UT&activityID=9b79d578-b2a7-4665-9021-b104999cf031&RecordType=2'

driver = webdriver.PhantomJS(executable_path='C:python27scriptsphantomjs.exe')

driver.get(url)

print driver.page_source



>>> <html><head></head><body></body></html>         Obviously It's not right!!

Here's the source code I need in right click windows, (I want the INFORMATION part)

</script></div><div class="searchColRight"><div id="topActions" class="clearfix 

noPrint"><div id="breadcrumbs" class="left"><a title="Results Summary"

href="Default.aspx?    _act=VitalSearchR ...... <<INFORMATION I NEED>> ... 

to view the entire record.</p></div><script xmlns:msxsl="urn:schemas-microsoft-com:xslt">



        jQuery(document).ready(function() {

            jQuery(".ancestry-information-tooltip").actooltip({

href: "#AncestryInformationTooltip", orientation: "bottomleft"});

        });

=========== So my question is ===============
How to get the information generated by JS?

edited Apr 1 '14 at 19:07

asked Mar 30 '14 at 2:19

MacSanhe

74561323

What does the html code you want look like on the page? You will want to use one of selenium's get_element_by_* functions, but how exactly depends on the html itself.

– Victory
Mar 30 '14 at 2:25

I mean everything. For example, you enter something in google. In the result webpage, right click, see source. That's the "everything" I want.

– MacSanhe
Mar 30 '14 at 2:29

add a comment |

from selenium import webdriver

url = 'http://www.archives.com/member/Default.aspx?_act=VitalSearchResult&lastName=Smith&state=UT&country=US&deathYear=2004&deathYearSpan=10&location=UT&activityID=9b79d578-b2a7-4665-9021-b104999cf031&RecordType=2'

driver = webdriver.PhantomJS(executable_path='C:python27scriptsphantomjs.exe')

driver.get(url)

print driver.page_source



>>> <html><head></head><body></body></html>         Obviously It's not right!!

Here's the source code I need in right click windows, (I want the INFORMATION part)

</script></div><div class="searchColRight"><div id="topActions" class="clearfix 

noPrint"><div id="breadcrumbs" class="left"><a title="Results Summary"

href="Default.aspx?    _act=VitalSearchR ...... <<INFORMATION I NEED>> ... 

to view the entire record.</p></div><script xmlns:msxsl="urn:schemas-microsoft-com:xslt">



        jQuery(document).ready(function() {

            jQuery(".ancestry-information-tooltip").actooltip({

href: "#AncestryInformationTooltip", orientation: "bottomleft"});

        });

=========== So my question is ===============
How to get the information generated by JS?

edited Apr 1 '14 at 19:07

asked Mar 30 '14 at 2:19

MacSanhe

74561323

What does the html code you want look like on the page? You will want to use one of selenium's get_element_by_* functions, but how exactly depends on the html itself.

– Victory
Mar 30 '14 at 2:25

I mean everything. For example, you enter something in google. In the result webpage, right click, see source. That's the "everything" I want.

– MacSanhe
Mar 30 '14 at 2:29

add a comment |

from selenium import webdriver

url = 'http://www.archives.com/member/Default.aspx?_act=VitalSearchResult&lastName=Smith&state=UT&country=US&deathYear=2004&deathYearSpan=10&location=UT&activityID=9b79d578-b2a7-4665-9021-b104999cf031&RecordType=2'

driver = webdriver.PhantomJS(executable_path='C:python27scriptsphantomjs.exe')

driver.get(url)

print driver.page_source



>>> <html><head></head><body></body></html>         Obviously It's not right!!

Here's the source code I need in right click windows, (I want the INFORMATION part)

</script></div><div class="searchColRight"><div id="topActions" class="clearfix 

noPrint"><div id="breadcrumbs" class="left"><a title="Results Summary"

href="Default.aspx?    _act=VitalSearchR ...... <<INFORMATION I NEED>> ... 

to view the entire record.</p></div><script xmlns:msxsl="urn:schemas-microsoft-com:xslt">



        jQuery(document).ready(function() {

            jQuery(".ancestry-information-tooltip").actooltip({

href: "#AncestryInformationTooltip", orientation: "bottomleft"});

        });

=========== So my question is ===============
How to get the information generated by JS?

edited Apr 1 '14 at 19:07

asked Mar 30 '14 at 2:19

MacSanhe

74561323

from selenium import webdriver

url = 'http://www.archives.com/member/Default.aspx?_act=VitalSearchResult&lastName=Smith&state=UT&country=US&deathYear=2004&deathYearSpan=10&location=UT&activityID=9b79d578-b2a7-4665-9021-b104999cf031&RecordType=2'

driver = webdriver.PhantomJS(executable_path='C:python27scriptsphantomjs.exe')

driver.get(url)

print driver.page_source



>>> <html><head></head><body></body></html>         Obviously It's not right!!

Here's the source code I need in right click windows, (I want the INFORMATION part)

</script></div><div class="searchColRight"><div id="topActions" class="clearfix 

noPrint"><div id="breadcrumbs" class="left"><a title="Results Summary"

href="Default.aspx?    _act=VitalSearchR ...... <<INFORMATION I NEED>> ... 

to view the entire record.</p></div><script xmlns:msxsl="urn:schemas-microsoft-com:xslt">



        jQuery(document).ready(function() {

            jQuery(".ancestry-information-tooltip").actooltip({

href: "#AncestryInformationTooltip", orientation: "bottomleft"});

        });

=========== So my question is ===============
How to get the information generated by JS?

javascript python selenium

edited Apr 1 '14 at 19:07

asked Mar 30 '14 at 2:19

MacSanhe

74561323

edited Apr 1 '14 at 19:07

asked Mar 30 '14 at 2:19

MacSanhe

74561323

edited Apr 1 '14 at 19:07

asked Mar 30 '14 at 2:19

MacSanhe

74561323

asked Mar 30 '14 at 2:19

MacSanhe

74561323

asked Mar 30 '14 at 2:19

MacSanhe

74561323

What does the html code you want look like on the page? You will want to use one of selenium's get_element_by_* functions, but how exactly depends on the html itself.

– Victory
Mar 30 '14 at 2:25

I mean everything. For example, you enter something in google. In the result webpage, right click, see source. That's the "everything" I want.

– MacSanhe
Mar 30 '14 at 2:29

add a comment |

What does the html code you want look like on the page? You will want to use one of selenium's get_element_by_* functions, but how exactly depends on the html itself.

– Victory
Mar 30 '14 at 2:25

I mean everything. For example, you enter something in google. In the result webpage, right click, see source. That's the "everything" I want.

– MacSanhe
Mar 30 '14 at 2:29

What does the html code you want look like on the page? You will want to use one of selenium's get_element_by_* functions, but how exactly depends on the html itself.

– Victory
Mar 30 '14 at 2:25

I mean everything. For example, you enter something in google. In the result webpage, right click, see source. That's the "everything" I want.

– MacSanhe
Mar 30 '14 at 2:29

add a comment |

6 Answers
6

active

oldest

votes

You will need to get get the document via javascript you can use seleniums execute_script function

from time import sleep # this should go at the top of the file



sleep(5)

html = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")

print html

That will get everything inside of the <html> tag

edited Apr 1 '14 at 22:30

answered Mar 30 '14 at 2:35

Victory

4,5621640

1

Then I only get: <html><head></head><body></body></html>..... how .... ><

– MacSanhe
Mar 30 '14 at 3:24

It's looks work, but only gives me <html><head></head><body></body></html>, I redefined my question there, could you take a look at the question again please? THank you very much

– MacSanhe
Apr 1 '14 at 19:08

@MacSanhe see my edits, if the page is not fully loaded you will not get all the body content. Also try going to the page and running in your debugger console, document.getElementsByTagName('html')[0].innerHTML to see how much of the DOM comes through.

– Victory
Apr 1 '14 at 22:32

Does anyone know if theres a way of getting the javascript of a page without using a browser like with Selenium?

– user2540748
Nov 8 '14 at 0:01

1

try to load PhantomJS with this parameters. browser = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true']) It works to me

– Jong Su Park
Jun 26 '15 at 5:01

|
show 4 more comments

It's not necessary to use that workaround, you can use instead:

driver = webdriver.PhantomJS()

driver.get('http://www.google.com/')

html = driver.find_element_by_tag_name('html').get_attribute('innerHTML')

edited May 2 '17 at 9:55

answered Apr 22 '17 at 22:06

Darius Morawiec

3,12411934

add a comment |

I am thinking that you are getting the source code before the JavaScript has rendered the dynamic HTML.

Initially try putting a few seconds sleep between the navigate and get page source.

If this works, then you can change to a different wait strategy.

answered Mar 30 '14 at 14:55

Robbie Wareham

3,21211334

add a comment |

You try Dryscrape this browser is fully supported heavy js codes try it i hope it work for you

answered Dec 11 '17 at 20:36

Harry1992

10510

this is a comment, not an answer

– Jack Flamp
Dec 11 '17 at 20:55

add a comment |

I met the same problem and finally solved by desired_capabilities.

from selenium import webdriver

from selenium.webdriver.common.proxy import Proxy

from selenium.webdriver.common.proxy import ProxyType



proxy = Proxy(

     {

          'proxyType': ProxyType.MANUAL,

          'httpProxy': 'ip_or_host:port'

     }

)

desired_capabilities = webdriver.DesiredCapabilities.PHANTOMJS.copy()

proxy.add_to_capabilities(desired_capabilities)

driver = webdriver.PhantomJS(desired_capabilities=desired_capabilities)

driver.get('test_url')

print driver.page_source

edited Dec 3 '18 at 22:14

strangeqargo

1,0511818

answered May 24 '17 at 7:15

Vida

9514

it's an old and slightly outdated answer but it gave me an idea on catching javascript with mitmproxy, so +1

– strangeqargo
Dec 3 '18 at 18:16

add a comment |

I have same problem about getting Javascript sourcecode from Internet, and I solved it using above Victory's suggestion.

*First, execute_script

driver=webdriver.Chrome()

driver.get(urls)

innerHTML = driver.execute_script("return document.body.innerHTML")

#print(driver.page_source)

*Second, parse html using beautifulsoup(You can Downloaded beautifulsoup by pip command)

 import bs4    #import beautifulsoup

 import re

 from time import sleep



 sleep(1)      #wait one second 

 root=bs4.BeautifulSoup(innerHTML,"lxml") #parse HTML using beautifulsoup

 viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'})   #find the value which you need.

*Third, print out the value you need

 for span in viewcount:

    print(span.string)

*Full code

from selenium import webdriver

import lxml



urls="http://www.archives.com/member/Default.aspx?_act=VitalSearchResult&lastName=Smith&state=UT&country=US&deathYear=2004&deathYearSpan=10&location=UT&activityID=9b79d578-b2a7-4665-9021-b104999cf031&RecordType=2"



driver = webdriver.PhantomJS()





##driver=webdriver.Chrome()

driver.get(urls)

innerHTML = driver.execute_script("return document.body.innerHTML")

##print(driver.page_source)



import bs4

import re

from time import sleep



sleep(1)

root=bs4.BeautifulSoup(innerHTML,"lxml")

viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'})





for span in viewcount:

print(span.string)



driver.quit()

edited Jan 20 at 11:29

answered Jan 20 at 6:53

kuo chang

313

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f22739514%2fhow-to-get-html-with-javascript-rendered-sourcecode-by-using-selenium%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

6 Answers
6

active

oldest

votes

6 Answers
6

active

oldest

votes

You will need to get get the document via javascript you can use seleniums execute_script function

from time import sleep # this should go at the top of the file



sleep(5)

html = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")

print html

That will get everything inside of the <html> tag

edited Apr 1 '14 at 22:30

answered Mar 30 '14 at 2:35

Victory

4,5621640

1

Then I only get: <html><head></head><body></body></html>..... how .... ><

– MacSanhe
Mar 30 '14 at 3:24

It's looks work, but only gives me <html><head></head><body></body></html>, I redefined my question there, could you take a look at the question again please? THank you very much

– MacSanhe
Apr 1 '14 at 19:08

@MacSanhe see my edits, if the page is not fully loaded you will not get all the body content. Also try going to the page and running in your debugger console, document.getElementsByTagName('html')[0].innerHTML to see how much of the DOM comes through.

– Victory
Apr 1 '14 at 22:32

Does anyone know if theres a way of getting the javascript of a page without using a browser like with Selenium?

– user2540748
Nov 8 '14 at 0:01

1

try to load PhantomJS with this parameters. browser = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true']) It works to me

– Jong Su Park
Jun 26 '15 at 5:01

|
show 4 more comments

You will need to get get the document via javascript you can use seleniums execute_script function

from time import sleep # this should go at the top of the file



sleep(5)

html = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")

print html

That will get everything inside of the <html> tag

edited Apr 1 '14 at 22:30

answered Mar 30 '14 at 2:35

Victory

4,5621640

1

Then I only get: <html><head></head><body></body></html>..... how .... ><

– MacSanhe
Mar 30 '14 at 3:24

It's looks work, but only gives me <html><head></head><body></body></html>, I redefined my question there, could you take a look at the question again please? THank you very much

– MacSanhe
Apr 1 '14 at 19:08

@MacSanhe see my edits, if the page is not fully loaded you will not get all the body content. Also try going to the page and running in your debugger console, document.getElementsByTagName('html')[0].innerHTML to see how much of the DOM comes through.

– Victory
Apr 1 '14 at 22:32

Does anyone know if theres a way of getting the javascript of a page without using a browser like with Selenium?

– user2540748
Nov 8 '14 at 0:01

1

try to load PhantomJS with this parameters. browser = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true']) It works to me

– Jong Su Park
Jun 26 '15 at 5:01

|
show 4 more comments

You will need to get get the document via javascript you can use seleniums execute_script function

from time import sleep # this should go at the top of the file



sleep(5)

html = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")

print html

That will get everything inside of the <html> tag

edited Apr 1 '14 at 22:30

answered Mar 30 '14 at 2:35

Victory

4,5621640

You will need to get get the document via javascript you can use seleniums execute_script function

from time import sleep # this should go at the top of the file



sleep(5)

html = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")

print html

That will get everything inside of the <html> tag

edited Apr 1 '14 at 22:30

answered Mar 30 '14 at 2:35

Victory

4,5621640

edited Apr 1 '14 at 22:30

answered Mar 30 '14 at 2:35

Victory

4,5621640

answered Mar 30 '14 at 2:35

Victory

4,5621640

answered Mar 30 '14 at 2:35

Victory

4,5621640

1

Then I only get: <html><head></head><body></body></html>..... how .... ><

– MacSanhe
Mar 30 '14 at 3:24

It's looks work, but only gives me <html><head></head><body></body></html>, I redefined my question there, could you take a look at the question again please? THank you very much

– MacSanhe
Apr 1 '14 at 19:08

@MacSanhe see my edits, if the page is not fully loaded you will not get all the body content. Also try going to the page and running in your debugger console, document.getElementsByTagName('html')[0].innerHTML to see how much of the DOM comes through.

– Victory
Apr 1 '14 at 22:32

Does anyone know if theres a way of getting the javascript of a page without using a browser like with Selenium?

– user2540748
Nov 8 '14 at 0:01

1

try to load PhantomJS with this parameters. browser = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true']) It works to me

– Jong Su Park
Jun 26 '15 at 5:01

|
show 4 more comments

1

Then I only get: <html><head></head><body></body></html>..... how .... ><

– MacSanhe
Mar 30 '14 at 3:24

It's looks work, but only gives me <html><head></head><body></body></html>, I redefined my question there, could you take a look at the question again please? THank you very much

– MacSanhe
Apr 1 '14 at 19:08

@MacSanhe see my edits, if the page is not fully loaded you will not get all the body content. Also try going to the page and running in your debugger console, document.getElementsByTagName('html')[0].innerHTML to see how much of the DOM comes through.

– Victory
Apr 1 '14 at 22:32

Does anyone know if theres a way of getting the javascript of a page without using a browser like with Selenium?

– user2540748
Nov 8 '14 at 0:01

1

try to load PhantomJS with this parameters. browser = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true']) It works to me

– Jong Su Park
Jun 26 '15 at 5:01

Then I only get: <html><head></head><body></body></html>..... how .... ><

– MacSanhe
Mar 30 '14 at 3:24

It's looks work, but only gives me <html><head></head><body></body></html>, I redefined my question there, could you take a look at the question again please? THank you very much

– MacSanhe
Apr 1 '14 at 19:08

@MacSanhe see my edits, if the page is not fully loaded you will not get all the body content. Also try going to the page and running in your debugger console, document.getElementsByTagName('html')[0].innerHTML to see how much of the DOM comes through.

– Victory
Apr 1 '14 at 22:32

Does anyone know if theres a way of getting the javascript of a page without using a browser like with Selenium?

– user2540748
Nov 8 '14 at 0:01

try to load PhantomJS with this parameters. browser = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true']) It works to me

– Jong Su Park
Jun 26 '15 at 5:01

|
show 4 more comments

It's not necessary to use that workaround, you can use instead:

driver = webdriver.PhantomJS()

driver.get('http://www.google.com/')

html = driver.find_element_by_tag_name('html').get_attribute('innerHTML')

edited May 2 '17 at 9:55

answered Apr 22 '17 at 22:06

Darius Morawiec

3,12411934

add a comment |

It's not necessary to use that workaround, you can use instead:

driver = webdriver.PhantomJS()

driver.get('http://www.google.com/')

html = driver.find_element_by_tag_name('html').get_attribute('innerHTML')

edited May 2 '17 at 9:55

answered Apr 22 '17 at 22:06

Darius Morawiec

3,12411934

add a comment |

It's not necessary to use that workaround, you can use instead:

driver = webdriver.PhantomJS()

driver.get('http://www.google.com/')

html = driver.find_element_by_tag_name('html').get_attribute('innerHTML')

edited May 2 '17 at 9:55

answered Apr 22 '17 at 22:06

Darius Morawiec

3,12411934

It's not necessary to use that workaround, you can use instead:

driver = webdriver.PhantomJS()

driver.get('http://www.google.com/')

html = driver.find_element_by_tag_name('html').get_attribute('innerHTML')

edited May 2 '17 at 9:55

answered Apr 22 '17 at 22:06

Darius Morawiec

3,12411934

edited May 2 '17 at 9:55

answered Apr 22 '17 at 22:06

Darius Morawiec

3,12411934

answered Apr 22 '17 at 22:06

Darius Morawiec

3,12411934

answered Apr 22 '17 at 22:06

Darius Morawiec

3,12411934

add a comment |

I am thinking that you are getting the source code before the JavaScript has rendered the dynamic HTML.

Initially try putting a few seconds sleep between the navigate and get page source.

If this works, then you can change to a different wait strategy.

answered Mar 30 '14 at 14:55

Robbie Wareham

3,21211334

add a comment |

I am thinking that you are getting the source code before the JavaScript has rendered the dynamic HTML.

Initially try putting a few seconds sleep between the navigate and get page source.

If this works, then you can change to a different wait strategy.

answered Mar 30 '14 at 14:55

Robbie Wareham

3,21211334

add a comment |

I am thinking that you are getting the source code before the JavaScript has rendered the dynamic HTML.

Initially try putting a few seconds sleep between the navigate and get page source.

If this works, then you can change to a different wait strategy.

answered Mar 30 '14 at 14:55

Robbie Wareham

3,21211334

I am thinking that you are getting the source code before the JavaScript has rendered the dynamic HTML.

Initially try putting a few seconds sleep between the navigate and get page source.

If this works, then you can change to a different wait strategy.

answered Mar 30 '14 at 14:55

Robbie Wareham

3,21211334

answered Mar 30 '14 at 14:55

Robbie Wareham

3,21211334

answered Mar 30 '14 at 14:55

Robbie Wareham

3,21211334

answered Mar 30 '14 at 14:55

Robbie Wareham

3,21211334

add a comment |

You try Dryscrape this browser is fully supported heavy js codes try it i hope it work for you

answered Dec 11 '17 at 20:36

Harry1992

10510

this is a comment, not an answer

– Jack Flamp
Dec 11 '17 at 20:55

add a comment |

You try Dryscrape this browser is fully supported heavy js codes try it i hope it work for you

answered Dec 11 '17 at 20:36

Harry1992

10510

this is a comment, not an answer

– Jack Flamp
Dec 11 '17 at 20:55

add a comment |

You try Dryscrape this browser is fully supported heavy js codes try it i hope it work for you

answered Dec 11 '17 at 20:36

Harry1992

10510

You try Dryscrape this browser is fully supported heavy js codes try it i hope it work for you

answered Dec 11 '17 at 20:36

Harry1992

10510

answered Dec 11 '17 at 20:36

Harry1992

10510

answered Dec 11 '17 at 20:36

Harry1992

10510

answered Dec 11 '17 at 20:36

Harry1992

10510

this is a comment, not an answer

– Jack Flamp
Dec 11 '17 at 20:55

add a comment |

this is a comment, not an answer

– Jack Flamp
Dec 11 '17 at 20:55

this is a comment, not an answer

– Jack Flamp
Dec 11 '17 at 20:55

add a comment |

I met the same problem and finally solved by desired_capabilities.

from selenium import webdriver

from selenium.webdriver.common.proxy import Proxy

from selenium.webdriver.common.proxy import ProxyType



proxy = Proxy(

     {

          'proxyType': ProxyType.MANUAL,

          'httpProxy': 'ip_or_host:port'

     }

)

desired_capabilities = webdriver.DesiredCapabilities.PHANTOMJS.copy()

proxy.add_to_capabilities(desired_capabilities)

driver = webdriver.PhantomJS(desired_capabilities=desired_capabilities)

driver.get('test_url')

print driver.page_source

edited Dec 3 '18 at 22:14

strangeqargo

1,0511818

answered May 24 '17 at 7:15

Vida

9514

it's an old and slightly outdated answer but it gave me an idea on catching javascript with mitmproxy, so +1

– strangeqargo
Dec 3 '18 at 18:16

add a comment |

I met the same problem and finally solved by desired_capabilities.

from selenium import webdriver

from selenium.webdriver.common.proxy import Proxy

from selenium.webdriver.common.proxy import ProxyType



proxy = Proxy(

     {

          'proxyType': ProxyType.MANUAL,

          'httpProxy': 'ip_or_host:port'

     }

)

desired_capabilities = webdriver.DesiredCapabilities.PHANTOMJS.copy()

proxy.add_to_capabilities(desired_capabilities)

driver = webdriver.PhantomJS(desired_capabilities=desired_capabilities)

driver.get('test_url')

print driver.page_source

edited Dec 3 '18 at 22:14

strangeqargo

1,0511818

answered May 24 '17 at 7:15

Vida

9514

it's an old and slightly outdated answer but it gave me an idea on catching javascript with mitmproxy, so +1

– strangeqargo
Dec 3 '18 at 18:16

add a comment |

I met the same problem and finally solved by desired_capabilities.

from selenium import webdriver

from selenium.webdriver.common.proxy import Proxy

from selenium.webdriver.common.proxy import ProxyType



proxy = Proxy(

     {

          'proxyType': ProxyType.MANUAL,

          'httpProxy': 'ip_or_host:port'

     }

)

desired_capabilities = webdriver.DesiredCapabilities.PHANTOMJS.copy()

proxy.add_to_capabilities(desired_capabilities)

driver = webdriver.PhantomJS(desired_capabilities=desired_capabilities)

driver.get('test_url')

print driver.page_source

edited Dec 3 '18 at 22:14

strangeqargo

1,0511818

answered May 24 '17 at 7:15

Vida

9514

I met the same problem and finally solved by desired_capabilities.

from selenium import webdriver

from selenium.webdriver.common.proxy import Proxy

from selenium.webdriver.common.proxy import ProxyType



proxy = Proxy(

     {

          'proxyType': ProxyType.MANUAL,

          'httpProxy': 'ip_or_host:port'

     }

)

desired_capabilities = webdriver.DesiredCapabilities.PHANTOMJS.copy()

proxy.add_to_capabilities(desired_capabilities)

driver = webdriver.PhantomJS(desired_capabilities=desired_capabilities)

driver.get('test_url')

print driver.page_source

edited Dec 3 '18 at 22:14

strangeqargo

1,0511818

answered May 24 '17 at 7:15

Vida

9514

edited Dec 3 '18 at 22:14

strangeqargo

1,0511818

edited Dec 3 '18 at 22:14

strangeqargo

1,0511818

edited Dec 3 '18 at 22:14

strangeqargo

1,0511818

answered May 24 '17 at 7:15

Vida

9514

answered May 24 '17 at 7:15

Vida

9514

answered May 24 '17 at 7:15

Vida

9514

it's an old and slightly outdated answer but it gave me an idea on catching javascript with mitmproxy, so +1

– strangeqargo
Dec 3 '18 at 18:16

add a comment |

it's an old and slightly outdated answer but it gave me an idea on catching javascript with mitmproxy, so +1

– strangeqargo
Dec 3 '18 at 18:16

it's an old and slightly outdated answer but it gave me an idea on catching javascript with mitmproxy, so +1

– strangeqargo
Dec 3 '18 at 18:16

add a comment |

I have same problem about getting Javascript sourcecode from Internet, and I solved it using above Victory's suggestion.

*First, execute_script

driver=webdriver.Chrome()

driver.get(urls)

innerHTML = driver.execute_script("return document.body.innerHTML")

#print(driver.page_source)

*Second, parse html using beautifulsoup(You can Downloaded beautifulsoup by pip command)

 import bs4    #import beautifulsoup

 import re

 from time import sleep



 sleep(1)      #wait one second 

 root=bs4.BeautifulSoup(innerHTML,"lxml") #parse HTML using beautifulsoup

 viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'})   #find the value which you need.

*Third, print out the value you need

 for span in viewcount:

    print(span.string)

*Full code

from selenium import webdriver

import lxml



urls="http://www.archives.com/member/Default.aspx?_act=VitalSearchResult&lastName=Smith&state=UT&country=US&deathYear=2004&deathYearSpan=10&location=UT&activityID=9b79d578-b2a7-4665-9021-b104999cf031&RecordType=2"



driver = webdriver.PhantomJS()





##driver=webdriver.Chrome()

driver.get(urls)

innerHTML = driver.execute_script("return document.body.innerHTML")

##print(driver.page_source)



import bs4

import re

from time import sleep



sleep(1)

root=bs4.BeautifulSoup(innerHTML,"lxml")

viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'})





for span in viewcount:

print(span.string)



driver.quit()

edited Jan 20 at 11:29

answered Jan 20 at 6:53

kuo chang

313

add a comment |

I have same problem about getting Javascript sourcecode from Internet, and I solved it using above Victory's suggestion.

*First, execute_script

driver=webdriver.Chrome()

driver.get(urls)

innerHTML = driver.execute_script("return document.body.innerHTML")

#print(driver.page_source)

*Second, parse html using beautifulsoup(You can Downloaded beautifulsoup by pip command)

 import bs4    #import beautifulsoup

 import re

 from time import sleep



 sleep(1)      #wait one second 

 root=bs4.BeautifulSoup(innerHTML,"lxml") #parse HTML using beautifulsoup

 viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'})   #find the value which you need.

*Third, print out the value you need

 for span in viewcount:

    print(span.string)

*Full code

from selenium import webdriver

import lxml



urls="http://www.archives.com/member/Default.aspx?_act=VitalSearchResult&lastName=Smith&state=UT&country=US&deathYear=2004&deathYearSpan=10&location=UT&activityID=9b79d578-b2a7-4665-9021-b104999cf031&RecordType=2"



driver = webdriver.PhantomJS()





##driver=webdriver.Chrome()

driver.get(urls)

innerHTML = driver.execute_script("return document.body.innerHTML")

##print(driver.page_source)



import bs4

import re

from time import sleep



sleep(1)

root=bs4.BeautifulSoup(innerHTML,"lxml")

viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'})





for span in viewcount:

print(span.string)



driver.quit()

edited Jan 20 at 11:29

answered Jan 20 at 6:53

kuo chang

313

add a comment |

I have same problem about getting Javascript sourcecode from Internet, and I solved it using above Victory's suggestion.

*First, execute_script

driver=webdriver.Chrome()

driver.get(urls)

innerHTML = driver.execute_script("return document.body.innerHTML")

#print(driver.page_source)

*Second, parse html using beautifulsoup(You can Downloaded beautifulsoup by pip command)

 import bs4    #import beautifulsoup

 import re

 from time import sleep



 sleep(1)      #wait one second 

 root=bs4.BeautifulSoup(innerHTML,"lxml") #parse HTML using beautifulsoup

 viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'})   #find the value which you need.

*Third, print out the value you need

 for span in viewcount:

    print(span.string)

*Full code

from selenium import webdriver

import lxml



urls="http://www.archives.com/member/Default.aspx?_act=VitalSearchResult&lastName=Smith&state=UT&country=US&deathYear=2004&deathYearSpan=10&location=UT&activityID=9b79d578-b2a7-4665-9021-b104999cf031&RecordType=2"



driver = webdriver.PhantomJS()





##driver=webdriver.Chrome()

driver.get(urls)

innerHTML = driver.execute_script("return document.body.innerHTML")

##print(driver.page_source)



import bs4

import re

from time import sleep



sleep(1)

root=bs4.BeautifulSoup(innerHTML,"lxml")

viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'})





for span in viewcount:

print(span.string)



driver.quit()

edited Jan 20 at 11:29

answered Jan 20 at 6:53

kuo chang

313

I have same problem about getting Javascript sourcecode from Internet, and I solved it using above Victory's suggestion.

*First, execute_script

driver=webdriver.Chrome()

driver.get(urls)

innerHTML = driver.execute_script("return document.body.innerHTML")

#print(driver.page_source)

*Second, parse html using beautifulsoup(You can Downloaded beautifulsoup by pip command)

 import bs4    #import beautifulsoup

 import re

 from time import sleep



 sleep(1)      #wait one second 

 root=bs4.BeautifulSoup(innerHTML,"lxml") #parse HTML using beautifulsoup

 viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'})   #find the value which you need.

*Third, print out the value you need

 for span in viewcount:

    print(span.string)

*Full code

from selenium import webdriver

import lxml



urls="http://www.archives.com/member/Default.aspx?_act=VitalSearchResult&lastName=Smith&state=UT&country=US&deathYear=2004&deathYearSpan=10&location=UT&activityID=9b79d578-b2a7-4665-9021-b104999cf031&RecordType=2"



driver = webdriver.PhantomJS()





##driver=webdriver.Chrome()

driver.get(urls)

innerHTML = driver.execute_script("return document.body.innerHTML")

##print(driver.page_source)



import bs4

import re

from time import sleep



sleep(1)

root=bs4.BeautifulSoup(innerHTML,"lxml")

viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'})





for span in viewcount:

print(span.string)



driver.quit()

edited Jan 20 at 11:29

answered Jan 20 at 6:53

kuo chang

313

edited Jan 20 at 11:29

answered Jan 20 at 6:53

kuo chang

313

answered Jan 20 at 6:53

kuo chang

313

answered Jan 20 at 6:53

kuo chang

313

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Brtdku