How to get plain text from Wiki API subcategory












1















I am not able to get the plain text from a subcategory via the Wiki API.



I'm using



https://en.wikipedia.org/w/api.php?action=query&titles=Submarine&section=4&prop=extracts&explaintext&exsectionformat=plain&redirects


to fetch the abstract of a site from Wiki. Now I'd love to only get the content of let's say the 4th section. I tried by simply adding:



&section=4


No matter what I try, this just seems to be rejected.
Though I'm able to get the content of a section using:



https://en.wikipedia.org/w/api.php?action=parse&page=Submarine&prop=wikitext&explaintext&exsectionformat=plain&&format=json&origin=*&action=parse&section=4


But then I'm not able to get my text without the Wiki markdown.



Most likely the solution is a combination of those two approaches but I just can't wrap my head around it…



These docs here may help.



Any help is highly appreciated!










share|improve this question























  • There's no such thing. Your best bet is probably to fetch the HTML and turn it into text.

    – Tgr
    Jan 21 at 5:08











  • Hey @tgr, thanks a lot for your reply. Would you mind to explain a bit further how to do that?

    – Moritz
    Jan 21 at 13:20











  • Load it into some DOM library and take the textContent of the top node. This answer has an example for browser-based Javascript.

    – Tgr
    Jan 21 at 22:59











  • Thanks, much appreciated. I‘ll dig into it!

    – Moritz
    Jan 22 at 9:52
















1















I am not able to get the plain text from a subcategory via the Wiki API.



I'm using



https://en.wikipedia.org/w/api.php?action=query&titles=Submarine&section=4&prop=extracts&explaintext&exsectionformat=plain&redirects


to fetch the abstract of a site from Wiki. Now I'd love to only get the content of let's say the 4th section. I tried by simply adding:



&section=4


No matter what I try, this just seems to be rejected.
Though I'm able to get the content of a section using:



https://en.wikipedia.org/w/api.php?action=parse&page=Submarine&prop=wikitext&explaintext&exsectionformat=plain&&format=json&origin=*&action=parse&section=4


But then I'm not able to get my text without the Wiki markdown.



Most likely the solution is a combination of those two approaches but I just can't wrap my head around it…



These docs here may help.



Any help is highly appreciated!










share|improve this question























  • There's no such thing. Your best bet is probably to fetch the HTML and turn it into text.

    – Tgr
    Jan 21 at 5:08











  • Hey @tgr, thanks a lot for your reply. Would you mind to explain a bit further how to do that?

    – Moritz
    Jan 21 at 13:20











  • Load it into some DOM library and take the textContent of the top node. This answer has an example for browser-based Javascript.

    – Tgr
    Jan 21 at 22:59











  • Thanks, much appreciated. I‘ll dig into it!

    – Moritz
    Jan 22 at 9:52














1












1








1


1






I am not able to get the plain text from a subcategory via the Wiki API.



I'm using



https://en.wikipedia.org/w/api.php?action=query&titles=Submarine&section=4&prop=extracts&explaintext&exsectionformat=plain&redirects


to fetch the abstract of a site from Wiki. Now I'd love to only get the content of let's say the 4th section. I tried by simply adding:



&section=4


No matter what I try, this just seems to be rejected.
Though I'm able to get the content of a section using:



https://en.wikipedia.org/w/api.php?action=parse&page=Submarine&prop=wikitext&explaintext&exsectionformat=plain&&format=json&origin=*&action=parse&section=4


But then I'm not able to get my text without the Wiki markdown.



Most likely the solution is a combination of those two approaches but I just can't wrap my head around it…



These docs here may help.



Any help is highly appreciated!










share|improve this question














I am not able to get the plain text from a subcategory via the Wiki API.



I'm using



https://en.wikipedia.org/w/api.php?action=query&titles=Submarine&section=4&prop=extracts&explaintext&exsectionformat=plain&redirects


to fetch the abstract of a site from Wiki. Now I'd love to only get the content of let's say the 4th section. I tried by simply adding:



&section=4


No matter what I try, this just seems to be rejected.
Though I'm able to get the content of a section using:



https://en.wikipedia.org/w/api.php?action=parse&page=Submarine&prop=wikitext&explaintext&exsectionformat=plain&&format=json&origin=*&action=parse&section=4


But then I'm not able to get my text without the Wiki markdown.



Most likely the solution is a combination of those two approaches but I just can't wrap my head around it…



These docs here may help.



Any help is highly appreciated!







javascript json api wikipedia






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jan 19 at 23:35









MoritzMoritz

154




154













  • There's no such thing. Your best bet is probably to fetch the HTML and turn it into text.

    – Tgr
    Jan 21 at 5:08











  • Hey @tgr, thanks a lot for your reply. Would you mind to explain a bit further how to do that?

    – Moritz
    Jan 21 at 13:20











  • Load it into some DOM library and take the textContent of the top node. This answer has an example for browser-based Javascript.

    – Tgr
    Jan 21 at 22:59











  • Thanks, much appreciated. I‘ll dig into it!

    – Moritz
    Jan 22 at 9:52



















  • There's no such thing. Your best bet is probably to fetch the HTML and turn it into text.

    – Tgr
    Jan 21 at 5:08











  • Hey @tgr, thanks a lot for your reply. Would you mind to explain a bit further how to do that?

    – Moritz
    Jan 21 at 13:20











  • Load it into some DOM library and take the textContent of the top node. This answer has an example for browser-based Javascript.

    – Tgr
    Jan 21 at 22:59











  • Thanks, much appreciated. I‘ll dig into it!

    – Moritz
    Jan 22 at 9:52

















There's no such thing. Your best bet is probably to fetch the HTML and turn it into text.

– Tgr
Jan 21 at 5:08





There's no such thing. Your best bet is probably to fetch the HTML and turn it into text.

– Tgr
Jan 21 at 5:08













Hey @tgr, thanks a lot for your reply. Would you mind to explain a bit further how to do that?

– Moritz
Jan 21 at 13:20





Hey @tgr, thanks a lot for your reply. Would you mind to explain a bit further how to do that?

– Moritz
Jan 21 at 13:20













Load it into some DOM library and take the textContent of the top node. This answer has an example for browser-based Javascript.

– Tgr
Jan 21 at 22:59





Load it into some DOM library and take the textContent of the top node. This answer has an example for browser-based Javascript.

– Tgr
Jan 21 at 22:59













Thanks, much appreciated. I‘ll dig into it!

– Moritz
Jan 22 at 9:52





Thanks, much appreciated. I‘ll dig into it!

– Moritz
Jan 22 at 9:52












1 Answer
1






active

oldest

votes


















0














I found a solution!



After struggeling with the answer @Tgr gave me, i stumbled upon a mind-blowing JS lib:
wtf_wikipedia.
I hope Wiki decides to give this man a medal!
Fetching for example the text of the 8th category is as simple as this:



doc.sections(8).text()





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54272287%2fhow-to-get-plain-text-from-wiki-api-subcategory%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    I found a solution!



    After struggeling with the answer @Tgr gave me, i stumbled upon a mind-blowing JS lib:
    wtf_wikipedia.
    I hope Wiki decides to give this man a medal!
    Fetching for example the text of the 8th category is as simple as this:



    doc.sections(8).text()





    share|improve this answer




























      0














      I found a solution!



      After struggeling with the answer @Tgr gave me, i stumbled upon a mind-blowing JS lib:
      wtf_wikipedia.
      I hope Wiki decides to give this man a medal!
      Fetching for example the text of the 8th category is as simple as this:



      doc.sections(8).text()





      share|improve this answer


























        0












        0








        0







        I found a solution!



        After struggeling with the answer @Tgr gave me, i stumbled upon a mind-blowing JS lib:
        wtf_wikipedia.
        I hope Wiki decides to give this man a medal!
        Fetching for example the text of the 8th category is as simple as this:



        doc.sections(8).text()





        share|improve this answer













        I found a solution!



        After struggeling with the answer @Tgr gave me, i stumbled upon a mind-blowing JS lib:
        wtf_wikipedia.
        I hope Wiki decides to give this man a medal!
        Fetching for example the text of the 8th category is as simple as this:



        doc.sections(8).text()






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 25 at 23:21









        MoritzMoritz

        154




        154






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54272287%2fhow-to-get-plain-text-from-wiki-api-subcategory%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Callistus III

            Plistias Cous

            Index Sanctorum