How to get html with javascript rendered sourcecode by using selenium












22















I run a query in one web page, then I get result url. If I right click see html source, I can see the html code generated by JS. If I simply use urllib, python cannot get the JS code. So I see some solution using selenium. Here's my code:



from selenium import webdriver
url = 'http://www.archives.com/member/Default.aspx?_act=VitalSearchResult&lastName=Smith&state=UT&country=US&deathYear=2004&deathYearSpan=10&location=UT&activityID=9b79d578-b2a7-4665-9021-b104999cf031&RecordType=2'
driver = webdriver.PhantomJS(executable_path='C:python27scriptsphantomjs.exe')
driver.get(url)
print driver.page_source

>>> <html><head></head><body></body></html> Obviously It's not right!!


Here's the source code I need in right click windows, (I want the INFORMATION part)



</script></div><div class="searchColRight"><div id="topActions" class="clearfix 
noPrint"><div id="breadcrumbs" class="left"><a title="Results Summary"
href="Default.aspx? _act=VitalSearchR ...... <<INFORMATION I NEED>> ...
to view the entire record.</p></div><script xmlns:msxsl="urn:schemas-microsoft-com:xslt">

jQuery(document).ready(function() {
jQuery(".ancestry-information-tooltip").actooltip({
href: "#AncestryInformationTooltip", orientation: "bottomleft"});
});


=========== So my question is ===============
How to get the information generated by JS?










share|improve this question

























  • What does the html code you want look like on the page? You will want to use one of selenium's get_element_by_* functions, but how exactly depends on the html itself.

    – Victory
    Mar 30 '14 at 2:25











  • I mean everything. For example, you enter something in google. In the result webpage, right click, see source. That's the "everything" I want.

    – MacSanhe
    Mar 30 '14 at 2:29
















22















I run a query in one web page, then I get result url. If I right click see html source, I can see the html code generated by JS. If I simply use urllib, python cannot get the JS code. So I see some solution using selenium. Here's my code:



from selenium import webdriver
url = 'http://www.archives.com/member/Default.aspx?_act=VitalSearchResult&lastName=Smith&state=UT&country=US&deathYear=2004&deathYearSpan=10&location=UT&activityID=9b79d578-b2a7-4665-9021-b104999cf031&RecordType=2'
driver = webdriver.PhantomJS(executable_path='C:python27scriptsphantomjs.exe')
driver.get(url)
print driver.page_source

>>> <html><head></head><body></body></html> Obviously It's not right!!


Here's the source code I need in right click windows, (I want the INFORMATION part)



</script></div><div class="searchColRight"><div id="topActions" class="clearfix 
noPrint"><div id="breadcrumbs" class="left"><a title="Results Summary"
href="Default.aspx? _act=VitalSearchR ...... <<INFORMATION I NEED>> ...
to view the entire record.</p></div><script xmlns:msxsl="urn:schemas-microsoft-com:xslt">

jQuery(document).ready(function() {
jQuery(".ancestry-information-tooltip").actooltip({
href: "#AncestryInformationTooltip", orientation: "bottomleft"});
});


=========== So my question is ===============
How to get the information generated by JS?










share|improve this question

























  • What does the html code you want look like on the page? You will want to use one of selenium's get_element_by_* functions, but how exactly depends on the html itself.

    – Victory
    Mar 30 '14 at 2:25











  • I mean everything. For example, you enter something in google. In the result webpage, right click, see source. That's the "everything" I want.

    – MacSanhe
    Mar 30 '14 at 2:29














22












22








22


6






I run a query in one web page, then I get result url. If I right click see html source, I can see the html code generated by JS. If I simply use urllib, python cannot get the JS code. So I see some solution using selenium. Here's my code:



from selenium import webdriver
url = 'http://www.archives.com/member/Default.aspx?_act=VitalSearchResult&lastName=Smith&state=UT&country=US&deathYear=2004&deathYearSpan=10&location=UT&activityID=9b79d578-b2a7-4665-9021-b104999cf031&RecordType=2'
driver = webdriver.PhantomJS(executable_path='C:python27scriptsphantomjs.exe')
driver.get(url)
print driver.page_source

>>> <html><head></head><body></body></html> Obviously It's not right!!


Here's the source code I need in right click windows, (I want the INFORMATION part)



</script></div><div class="searchColRight"><div id="topActions" class="clearfix 
noPrint"><div id="breadcrumbs" class="left"><a title="Results Summary"
href="Default.aspx? _act=VitalSearchR ...... <<INFORMATION I NEED>> ...
to view the entire record.</p></div><script xmlns:msxsl="urn:schemas-microsoft-com:xslt">

jQuery(document).ready(function() {
jQuery(".ancestry-information-tooltip").actooltip({
href: "#AncestryInformationTooltip", orientation: "bottomleft"});
});


=========== So my question is ===============
How to get the information generated by JS?










share|improve this question
















I run a query in one web page, then I get result url. If I right click see html source, I can see the html code generated by JS. If I simply use urllib, python cannot get the JS code. So I see some solution using selenium. Here's my code:



from selenium import webdriver
url = 'http://www.archives.com/member/Default.aspx?_act=VitalSearchResult&lastName=Smith&state=UT&country=US&deathYear=2004&deathYearSpan=10&location=UT&activityID=9b79d578-b2a7-4665-9021-b104999cf031&RecordType=2'
driver = webdriver.PhantomJS(executable_path='C:python27scriptsphantomjs.exe')
driver.get(url)
print driver.page_source

>>> <html><head></head><body></body></html> Obviously It's not right!!


Here's the source code I need in right click windows, (I want the INFORMATION part)



</script></div><div class="searchColRight"><div id="topActions" class="clearfix 
noPrint"><div id="breadcrumbs" class="left"><a title="Results Summary"
href="Default.aspx? _act=VitalSearchR ...... <<INFORMATION I NEED>> ...
to view the entire record.</p></div><script xmlns:msxsl="urn:schemas-microsoft-com:xslt">

jQuery(document).ready(function() {
jQuery(".ancestry-information-tooltip").actooltip({
href: "#AncestryInformationTooltip", orientation: "bottomleft"});
});


=========== So my question is ===============
How to get the information generated by JS?







javascript python selenium






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Apr 1 '14 at 19:07







MacSanhe

















asked Mar 30 '14 at 2:19









MacSanheMacSanhe

74561323




74561323













  • What does the html code you want look like on the page? You will want to use one of selenium's get_element_by_* functions, but how exactly depends on the html itself.

    – Victory
    Mar 30 '14 at 2:25











  • I mean everything. For example, you enter something in google. In the result webpage, right click, see source. That's the "everything" I want.

    – MacSanhe
    Mar 30 '14 at 2:29



















  • What does the html code you want look like on the page? You will want to use one of selenium's get_element_by_* functions, but how exactly depends on the html itself.

    – Victory
    Mar 30 '14 at 2:25











  • I mean everything. For example, you enter something in google. In the result webpage, right click, see source. That's the "everything" I want.

    – MacSanhe
    Mar 30 '14 at 2:29

















What does the html code you want look like on the page? You will want to use one of selenium's get_element_by_* functions, but how exactly depends on the html itself.

– Victory
Mar 30 '14 at 2:25





What does the html code you want look like on the page? You will want to use one of selenium's get_element_by_* functions, but how exactly depends on the html itself.

– Victory
Mar 30 '14 at 2:25













I mean everything. For example, you enter something in google. In the result webpage, right click, see source. That's the "everything" I want.

– MacSanhe
Mar 30 '14 at 2:29





I mean everything. For example, you enter something in google. In the result webpage, right click, see source. That's the "everything" I want.

– MacSanhe
Mar 30 '14 at 2:29












6 Answers
6






active

oldest

votes


















27














You will need to get get the document via javascript you can use seleniums execute_script function



from time import sleep # this should go at the top of the file

sleep(5)
html = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")
print html


That will get everything inside of the <html> tag






share|improve this answer





















  • 1





    Then I only get: <html><head></head><body></body></html>..... how .... ><

    – MacSanhe
    Mar 30 '14 at 3:24











  • It's looks work, but only gives me <html><head></head><body></body></html>, I redefined my question there, could you take a look at the question again please? THank you very much

    – MacSanhe
    Apr 1 '14 at 19:08











  • @MacSanhe see my edits, if the page is not fully loaded you will not get all the body content. Also try going to the page and running in your debugger console, document.getElementsByTagName('html')[0].innerHTML to see how much of the DOM comes through.

    – Victory
    Apr 1 '14 at 22:32











  • Does anyone know if theres a way of getting the javascript of a page without using a browser like with Selenium?

    – user2540748
    Nov 8 '14 at 0:01






  • 1





    try to load PhantomJS with this parameters. browser = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true']) It works to me

    – Jong Su Park
    Jun 26 '15 at 5:01



















8














It's not necessary to use that workaround, you can use instead:



driver = webdriver.PhantomJS()
driver.get('http://www.google.com/')
html = driver.find_element_by_tag_name('html').get_attribute('innerHTML')





share|improve this answer

































    1














    I am thinking that you are getting the source code before the JavaScript has rendered the dynamic HTML.



    Initially try putting a few seconds sleep between the navigate and get page source.



    If this works, then you can change to a different wait strategy.






    share|improve this answer































      1














      You try Dryscrape this browser is fully supported heavy js codes try it i hope it work for you






      share|improve this answer
























      • this is a comment, not an answer

        – Jack Flamp
        Dec 11 '17 at 20:55



















      0














      I met the same problem and finally solved by desired_capabilities.



      from selenium import webdriver
      from selenium.webdriver.common.proxy import Proxy
      from selenium.webdriver.common.proxy import ProxyType

      proxy = Proxy(
      {
      'proxyType': ProxyType.MANUAL,
      'httpProxy': 'ip_or_host:port'
      }
      )
      desired_capabilities = webdriver.DesiredCapabilities.PHANTOMJS.copy()
      proxy.add_to_capabilities(desired_capabilities)
      driver = webdriver.PhantomJS(desired_capabilities=desired_capabilities)
      driver.get('test_url')
      print driver.page_source





      share|improve this answer


























      • it's an old and slightly outdated answer but it gave me an idea on catching javascript with mitmproxy, so +1

        – strangeqargo
        Dec 3 '18 at 18:16



















      0














      I have same problem about getting Javascript sourcecode from Internet, and I solved it using above Victory's suggestion.



      *First, execute_script



      driver=webdriver.Chrome()
      driver.get(urls)
      innerHTML = driver.execute_script("return document.body.innerHTML")
      #print(driver.page_source)


      *Second, parse html using beautifulsoup(You can Downloaded beautifulsoup by pip command)



       import bs4    #import beautifulsoup
      import re
      from time import sleep

      sleep(1) #wait one second
      root=bs4.BeautifulSoup(innerHTML,"lxml") #parse HTML using beautifulsoup
      viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'}) #find the value which you need.


      *Third, print out the value you need



       for span in viewcount:
      print(span.string)


      *Full code



      from selenium import webdriver
      import lxml

      urls="http://www.archives.com/member/Default.aspx?_act=VitalSearchResult&lastName=Smith&state=UT&country=US&deathYear=2004&deathYearSpan=10&location=UT&activityID=9b79d578-b2a7-4665-9021-b104999cf031&RecordType=2"

      driver = webdriver.PhantomJS()


      ##driver=webdriver.Chrome()
      driver.get(urls)
      innerHTML = driver.execute_script("return document.body.innerHTML")
      ##print(driver.page_source)

      import bs4
      import re
      from time import sleep

      sleep(1)
      root=bs4.BeautifulSoup(innerHTML,"lxml")
      viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'})


      for span in viewcount:
      print(span.string)

      driver.quit()





      share|improve this answer

























        Your Answer






        StackExchange.ifUsing("editor", function () {
        StackExchange.using("externalEditor", function () {
        StackExchange.using("snippets", function () {
        StackExchange.snippets.init();
        });
        });
        }, "code-snippets");

        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "1"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });














        draft saved

        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f22739514%2fhow-to-get-html-with-javascript-rendered-sourcecode-by-using-selenium%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        6 Answers
        6






        active

        oldest

        votes








        6 Answers
        6






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        27














        You will need to get get the document via javascript you can use seleniums execute_script function



        from time import sleep # this should go at the top of the file

        sleep(5)
        html = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")
        print html


        That will get everything inside of the <html> tag






        share|improve this answer





















        • 1





          Then I only get: <html><head></head><body></body></html>..... how .... ><

          – MacSanhe
          Mar 30 '14 at 3:24











        • It's looks work, but only gives me <html><head></head><body></body></html>, I redefined my question there, could you take a look at the question again please? THank you very much

          – MacSanhe
          Apr 1 '14 at 19:08











        • @MacSanhe see my edits, if the page is not fully loaded you will not get all the body content. Also try going to the page and running in your debugger console, document.getElementsByTagName('html')[0].innerHTML to see how much of the DOM comes through.

          – Victory
          Apr 1 '14 at 22:32











        • Does anyone know if theres a way of getting the javascript of a page without using a browser like with Selenium?

          – user2540748
          Nov 8 '14 at 0:01






        • 1





          try to load PhantomJS with this parameters. browser = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true']) It works to me

          – Jong Su Park
          Jun 26 '15 at 5:01
















        27














        You will need to get get the document via javascript you can use seleniums execute_script function



        from time import sleep # this should go at the top of the file

        sleep(5)
        html = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")
        print html


        That will get everything inside of the <html> tag






        share|improve this answer





















        • 1





          Then I only get: <html><head></head><body></body></html>..... how .... ><

          – MacSanhe
          Mar 30 '14 at 3:24











        • It's looks work, but only gives me <html><head></head><body></body></html>, I redefined my question there, could you take a look at the question again please? THank you very much

          – MacSanhe
          Apr 1 '14 at 19:08











        • @MacSanhe see my edits, if the page is not fully loaded you will not get all the body content. Also try going to the page and running in your debugger console, document.getElementsByTagName('html')[0].innerHTML to see how much of the DOM comes through.

          – Victory
          Apr 1 '14 at 22:32











        • Does anyone know if theres a way of getting the javascript of a page without using a browser like with Selenium?

          – user2540748
          Nov 8 '14 at 0:01






        • 1





          try to load PhantomJS with this parameters. browser = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true']) It works to me

          – Jong Su Park
          Jun 26 '15 at 5:01














        27












        27








        27







        You will need to get get the document via javascript you can use seleniums execute_script function



        from time import sleep # this should go at the top of the file

        sleep(5)
        html = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")
        print html


        That will get everything inside of the <html> tag






        share|improve this answer















        You will need to get get the document via javascript you can use seleniums execute_script function



        from time import sleep # this should go at the top of the file

        sleep(5)
        html = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")
        print html


        That will get everything inside of the <html> tag







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Apr 1 '14 at 22:30

























        answered Mar 30 '14 at 2:35









        VictoryVictory

        4,5621640




        4,5621640








        • 1





          Then I only get: <html><head></head><body></body></html>..... how .... ><

          – MacSanhe
          Mar 30 '14 at 3:24











        • It's looks work, but only gives me <html><head></head><body></body></html>, I redefined my question there, could you take a look at the question again please? THank you very much

          – MacSanhe
          Apr 1 '14 at 19:08











        • @MacSanhe see my edits, if the page is not fully loaded you will not get all the body content. Also try going to the page and running in your debugger console, document.getElementsByTagName('html')[0].innerHTML to see how much of the DOM comes through.

          – Victory
          Apr 1 '14 at 22:32











        • Does anyone know if theres a way of getting the javascript of a page without using a browser like with Selenium?

          – user2540748
          Nov 8 '14 at 0:01






        • 1





          try to load PhantomJS with this parameters. browser = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true']) It works to me

          – Jong Su Park
          Jun 26 '15 at 5:01














        • 1





          Then I only get: <html><head></head><body></body></html>..... how .... ><

          – MacSanhe
          Mar 30 '14 at 3:24











        • It's looks work, but only gives me <html><head></head><body></body></html>, I redefined my question there, could you take a look at the question again please? THank you very much

          – MacSanhe
          Apr 1 '14 at 19:08











        • @MacSanhe see my edits, if the page is not fully loaded you will not get all the body content. Also try going to the page and running in your debugger console, document.getElementsByTagName('html')[0].innerHTML to see how much of the DOM comes through.

          – Victory
          Apr 1 '14 at 22:32











        • Does anyone know if theres a way of getting the javascript of a page without using a browser like with Selenium?

          – user2540748
          Nov 8 '14 at 0:01






        • 1





          try to load PhantomJS with this parameters. browser = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true']) It works to me

          – Jong Su Park
          Jun 26 '15 at 5:01








        1




        1





        Then I only get: <html><head></head><body></body></html>..... how .... ><

        – MacSanhe
        Mar 30 '14 at 3:24





        Then I only get: <html><head></head><body></body></html>..... how .... ><

        – MacSanhe
        Mar 30 '14 at 3:24













        It's looks work, but only gives me <html><head></head><body></body></html>, I redefined my question there, could you take a look at the question again please? THank you very much

        – MacSanhe
        Apr 1 '14 at 19:08





        It's looks work, but only gives me <html><head></head><body></body></html>, I redefined my question there, could you take a look at the question again please? THank you very much

        – MacSanhe
        Apr 1 '14 at 19:08













        @MacSanhe see my edits, if the page is not fully loaded you will not get all the body content. Also try going to the page and running in your debugger console, document.getElementsByTagName('html')[0].innerHTML to see how much of the DOM comes through.

        – Victory
        Apr 1 '14 at 22:32





        @MacSanhe see my edits, if the page is not fully loaded you will not get all the body content. Also try going to the page and running in your debugger console, document.getElementsByTagName('html')[0].innerHTML to see how much of the DOM comes through.

        – Victory
        Apr 1 '14 at 22:32













        Does anyone know if theres a way of getting the javascript of a page without using a browser like with Selenium?

        – user2540748
        Nov 8 '14 at 0:01





        Does anyone know if theres a way of getting the javascript of a page without using a browser like with Selenium?

        – user2540748
        Nov 8 '14 at 0:01




        1




        1





        try to load PhantomJS with this parameters. browser = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true']) It works to me

        – Jong Su Park
        Jun 26 '15 at 5:01





        try to load PhantomJS with this parameters. browser = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true']) It works to me

        – Jong Su Park
        Jun 26 '15 at 5:01













        8














        It's not necessary to use that workaround, you can use instead:



        driver = webdriver.PhantomJS()
        driver.get('http://www.google.com/')
        html = driver.find_element_by_tag_name('html').get_attribute('innerHTML')





        share|improve this answer






























          8














          It's not necessary to use that workaround, you can use instead:



          driver = webdriver.PhantomJS()
          driver.get('http://www.google.com/')
          html = driver.find_element_by_tag_name('html').get_attribute('innerHTML')





          share|improve this answer




























            8












            8








            8







            It's not necessary to use that workaround, you can use instead:



            driver = webdriver.PhantomJS()
            driver.get('http://www.google.com/')
            html = driver.find_element_by_tag_name('html').get_attribute('innerHTML')





            share|improve this answer















            It's not necessary to use that workaround, you can use instead:



            driver = webdriver.PhantomJS()
            driver.get('http://www.google.com/')
            html = driver.find_element_by_tag_name('html').get_attribute('innerHTML')






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited May 2 '17 at 9:55

























            answered Apr 22 '17 at 22:06









            Darius MorawiecDarius Morawiec

            3,12411934




            3,12411934























                1














                I am thinking that you are getting the source code before the JavaScript has rendered the dynamic HTML.



                Initially try putting a few seconds sleep between the navigate and get page source.



                If this works, then you can change to a different wait strategy.






                share|improve this answer




























                  1














                  I am thinking that you are getting the source code before the JavaScript has rendered the dynamic HTML.



                  Initially try putting a few seconds sleep between the navigate and get page source.



                  If this works, then you can change to a different wait strategy.






                  share|improve this answer


























                    1












                    1








                    1







                    I am thinking that you are getting the source code before the JavaScript has rendered the dynamic HTML.



                    Initially try putting a few seconds sleep between the navigate and get page source.



                    If this works, then you can change to a different wait strategy.






                    share|improve this answer













                    I am thinking that you are getting the source code before the JavaScript has rendered the dynamic HTML.



                    Initially try putting a few seconds sleep between the navigate and get page source.



                    If this works, then you can change to a different wait strategy.







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Mar 30 '14 at 14:55









                    Robbie WarehamRobbie Wareham

                    3,21211334




                    3,21211334























                        1














                        You try Dryscrape this browser is fully supported heavy js codes try it i hope it work for you






                        share|improve this answer
























                        • this is a comment, not an answer

                          – Jack Flamp
                          Dec 11 '17 at 20:55
















                        1














                        You try Dryscrape this browser is fully supported heavy js codes try it i hope it work for you






                        share|improve this answer
























                        • this is a comment, not an answer

                          – Jack Flamp
                          Dec 11 '17 at 20:55














                        1












                        1








                        1







                        You try Dryscrape this browser is fully supported heavy js codes try it i hope it work for you






                        share|improve this answer













                        You try Dryscrape this browser is fully supported heavy js codes try it i hope it work for you







                        share|improve this answer












                        share|improve this answer



                        share|improve this answer










                        answered Dec 11 '17 at 20:36









                        Harry1992Harry1992

                        10510




                        10510













                        • this is a comment, not an answer

                          – Jack Flamp
                          Dec 11 '17 at 20:55



















                        • this is a comment, not an answer

                          – Jack Flamp
                          Dec 11 '17 at 20:55

















                        this is a comment, not an answer

                        – Jack Flamp
                        Dec 11 '17 at 20:55





                        this is a comment, not an answer

                        – Jack Flamp
                        Dec 11 '17 at 20:55











                        0














                        I met the same problem and finally solved by desired_capabilities.



                        from selenium import webdriver
                        from selenium.webdriver.common.proxy import Proxy
                        from selenium.webdriver.common.proxy import ProxyType

                        proxy = Proxy(
                        {
                        'proxyType': ProxyType.MANUAL,
                        'httpProxy': 'ip_or_host:port'
                        }
                        )
                        desired_capabilities = webdriver.DesiredCapabilities.PHANTOMJS.copy()
                        proxy.add_to_capabilities(desired_capabilities)
                        driver = webdriver.PhantomJS(desired_capabilities=desired_capabilities)
                        driver.get('test_url')
                        print driver.page_source





                        share|improve this answer


























                        • it's an old and slightly outdated answer but it gave me an idea on catching javascript with mitmproxy, so +1

                          – strangeqargo
                          Dec 3 '18 at 18:16
















                        0














                        I met the same problem and finally solved by desired_capabilities.



                        from selenium import webdriver
                        from selenium.webdriver.common.proxy import Proxy
                        from selenium.webdriver.common.proxy import ProxyType

                        proxy = Proxy(
                        {
                        'proxyType': ProxyType.MANUAL,
                        'httpProxy': 'ip_or_host:port'
                        }
                        )
                        desired_capabilities = webdriver.DesiredCapabilities.PHANTOMJS.copy()
                        proxy.add_to_capabilities(desired_capabilities)
                        driver = webdriver.PhantomJS(desired_capabilities=desired_capabilities)
                        driver.get('test_url')
                        print driver.page_source





                        share|improve this answer


























                        • it's an old and slightly outdated answer but it gave me an idea on catching javascript with mitmproxy, so +1

                          – strangeqargo
                          Dec 3 '18 at 18:16














                        0












                        0








                        0







                        I met the same problem and finally solved by desired_capabilities.



                        from selenium import webdriver
                        from selenium.webdriver.common.proxy import Proxy
                        from selenium.webdriver.common.proxy import ProxyType

                        proxy = Proxy(
                        {
                        'proxyType': ProxyType.MANUAL,
                        'httpProxy': 'ip_or_host:port'
                        }
                        )
                        desired_capabilities = webdriver.DesiredCapabilities.PHANTOMJS.copy()
                        proxy.add_to_capabilities(desired_capabilities)
                        driver = webdriver.PhantomJS(desired_capabilities=desired_capabilities)
                        driver.get('test_url')
                        print driver.page_source





                        share|improve this answer















                        I met the same problem and finally solved by desired_capabilities.



                        from selenium import webdriver
                        from selenium.webdriver.common.proxy import Proxy
                        from selenium.webdriver.common.proxy import ProxyType

                        proxy = Proxy(
                        {
                        'proxyType': ProxyType.MANUAL,
                        'httpProxy': 'ip_or_host:port'
                        }
                        )
                        desired_capabilities = webdriver.DesiredCapabilities.PHANTOMJS.copy()
                        proxy.add_to_capabilities(desired_capabilities)
                        driver = webdriver.PhantomJS(desired_capabilities=desired_capabilities)
                        driver.get('test_url')
                        print driver.page_source






                        share|improve this answer














                        share|improve this answer



                        share|improve this answer








                        edited Dec 3 '18 at 22:14









                        strangeqargo

                        1,0511818




                        1,0511818










                        answered May 24 '17 at 7:15









                        VidaVida

                        9514




                        9514













                        • it's an old and slightly outdated answer but it gave me an idea on catching javascript with mitmproxy, so +1

                          – strangeqargo
                          Dec 3 '18 at 18:16



















                        • it's an old and slightly outdated answer but it gave me an idea on catching javascript with mitmproxy, so +1

                          – strangeqargo
                          Dec 3 '18 at 18:16

















                        it's an old and slightly outdated answer but it gave me an idea on catching javascript with mitmproxy, so +1

                        – strangeqargo
                        Dec 3 '18 at 18:16





                        it's an old and slightly outdated answer but it gave me an idea on catching javascript with mitmproxy, so +1

                        – strangeqargo
                        Dec 3 '18 at 18:16











                        0














                        I have same problem about getting Javascript sourcecode from Internet, and I solved it using above Victory's suggestion.



                        *First, execute_script



                        driver=webdriver.Chrome()
                        driver.get(urls)
                        innerHTML = driver.execute_script("return document.body.innerHTML")
                        #print(driver.page_source)


                        *Second, parse html using beautifulsoup(You can Downloaded beautifulsoup by pip command)



                         import bs4    #import beautifulsoup
                        import re
                        from time import sleep

                        sleep(1) #wait one second
                        root=bs4.BeautifulSoup(innerHTML,"lxml") #parse HTML using beautifulsoup
                        viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'}) #find the value which you need.


                        *Third, print out the value you need



                         for span in viewcount:
                        print(span.string)


                        *Full code



                        from selenium import webdriver
                        import lxml

                        urls="http://www.archives.com/member/Default.aspx?_act=VitalSearchResult&lastName=Smith&state=UT&country=US&deathYear=2004&deathYearSpan=10&location=UT&activityID=9b79d578-b2a7-4665-9021-b104999cf031&RecordType=2"

                        driver = webdriver.PhantomJS()


                        ##driver=webdriver.Chrome()
                        driver.get(urls)
                        innerHTML = driver.execute_script("return document.body.innerHTML")
                        ##print(driver.page_source)

                        import bs4
                        import re
                        from time import sleep

                        sleep(1)
                        root=bs4.BeautifulSoup(innerHTML,"lxml")
                        viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'})


                        for span in viewcount:
                        print(span.string)

                        driver.quit()





                        share|improve this answer






























                          0














                          I have same problem about getting Javascript sourcecode from Internet, and I solved it using above Victory's suggestion.



                          *First, execute_script



                          driver=webdriver.Chrome()
                          driver.get(urls)
                          innerHTML = driver.execute_script("return document.body.innerHTML")
                          #print(driver.page_source)


                          *Second, parse html using beautifulsoup(You can Downloaded beautifulsoup by pip command)



                           import bs4    #import beautifulsoup
                          import re
                          from time import sleep

                          sleep(1) #wait one second
                          root=bs4.BeautifulSoup(innerHTML,"lxml") #parse HTML using beautifulsoup
                          viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'}) #find the value which you need.


                          *Third, print out the value you need



                           for span in viewcount:
                          print(span.string)


                          *Full code



                          from selenium import webdriver
                          import lxml

                          urls="http://www.archives.com/member/Default.aspx?_act=VitalSearchResult&lastName=Smith&state=UT&country=US&deathYear=2004&deathYearSpan=10&location=UT&activityID=9b79d578-b2a7-4665-9021-b104999cf031&RecordType=2"

                          driver = webdriver.PhantomJS()


                          ##driver=webdriver.Chrome()
                          driver.get(urls)
                          innerHTML = driver.execute_script("return document.body.innerHTML")
                          ##print(driver.page_source)

                          import bs4
                          import re
                          from time import sleep

                          sleep(1)
                          root=bs4.BeautifulSoup(innerHTML,"lxml")
                          viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'})


                          for span in viewcount:
                          print(span.string)

                          driver.quit()





                          share|improve this answer




























                            0












                            0








                            0







                            I have same problem about getting Javascript sourcecode from Internet, and I solved it using above Victory's suggestion.



                            *First, execute_script



                            driver=webdriver.Chrome()
                            driver.get(urls)
                            innerHTML = driver.execute_script("return document.body.innerHTML")
                            #print(driver.page_source)


                            *Second, parse html using beautifulsoup(You can Downloaded beautifulsoup by pip command)



                             import bs4    #import beautifulsoup
                            import re
                            from time import sleep

                            sleep(1) #wait one second
                            root=bs4.BeautifulSoup(innerHTML,"lxml") #parse HTML using beautifulsoup
                            viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'}) #find the value which you need.


                            *Third, print out the value you need



                             for span in viewcount:
                            print(span.string)


                            *Full code



                            from selenium import webdriver
                            import lxml

                            urls="http://www.archives.com/member/Default.aspx?_act=VitalSearchResult&lastName=Smith&state=UT&country=US&deathYear=2004&deathYearSpan=10&location=UT&activityID=9b79d578-b2a7-4665-9021-b104999cf031&RecordType=2"

                            driver = webdriver.PhantomJS()


                            ##driver=webdriver.Chrome()
                            driver.get(urls)
                            innerHTML = driver.execute_script("return document.body.innerHTML")
                            ##print(driver.page_source)

                            import bs4
                            import re
                            from time import sleep

                            sleep(1)
                            root=bs4.BeautifulSoup(innerHTML,"lxml")
                            viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'})


                            for span in viewcount:
                            print(span.string)

                            driver.quit()





                            share|improve this answer















                            I have same problem about getting Javascript sourcecode from Internet, and I solved it using above Victory's suggestion.



                            *First, execute_script



                            driver=webdriver.Chrome()
                            driver.get(urls)
                            innerHTML = driver.execute_script("return document.body.innerHTML")
                            #print(driver.page_source)


                            *Second, parse html using beautifulsoup(You can Downloaded beautifulsoup by pip command)



                             import bs4    #import beautifulsoup
                            import re
                            from time import sleep

                            sleep(1) #wait one second
                            root=bs4.BeautifulSoup(innerHTML,"lxml") #parse HTML using beautifulsoup
                            viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'}) #find the value which you need.


                            *Third, print out the value you need



                             for span in viewcount:
                            print(span.string)


                            *Full code



                            from selenium import webdriver
                            import lxml

                            urls="http://www.archives.com/member/Default.aspx?_act=VitalSearchResult&lastName=Smith&state=UT&country=US&deathYear=2004&deathYearSpan=10&location=UT&activityID=9b79d578-b2a7-4665-9021-b104999cf031&RecordType=2"

                            driver = webdriver.PhantomJS()


                            ##driver=webdriver.Chrome()
                            driver.get(urls)
                            innerHTML = driver.execute_script("return document.body.innerHTML")
                            ##print(driver.page_source)

                            import bs4
                            import re
                            from time import sleep

                            sleep(1)
                            root=bs4.BeautifulSoup(innerHTML,"lxml")
                            viewcount=root.find_all("span",attrs={'class':'short-view-count style-scope yt-view-count-renderer'})


                            for span in viewcount:
                            print(span.string)

                            driver.quit()






                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Jan 20 at 11:29

























                            answered Jan 20 at 6:53









                            kuo changkuo chang

                            313




                            313






























                                draft saved

                                draft discarded




















































                                Thanks for contributing an answer to Stack Overflow!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f22739514%2fhow-to-get-html-with-javascript-rendered-sourcecode-by-using-selenium%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                Callistus III

                                Ostreoida

                                Plistias Cous