Reading a pandas data frame having unequal columns in observations












1















I am trying to read this small data file,
Link - https://drive.google.com/open?id=1nAS5mpxQLVQn9s_aAKvJt8tWPrP_DUiJ



I am using the code -



df = pd.read_table('/Data/123451_date.csv', sep=';', index_col=0,  engine='python', error_bad_lines=False)


It has ';' as a seprator, and values are missing in the file for some columns values in some observations (or rows).



How can I read it properly. I see the current dataframe, which is not loaded properly.
enter image description here



enter image description here










share|improve this question























  • @jezrael can you please look into it

    – Shivam_hbti
    Jan 19 at 14:30











  • I test it and find problem - first 33 lines have weird values to end each line, no idea what happens

    – jezrael
    Jan 19 at 15:09











  • What should I then, any changes I can make in pandas reading code?

    – Shivam_hbti
    Jan 19 at 15:28
















1















I am trying to read this small data file,
Link - https://drive.google.com/open?id=1nAS5mpxQLVQn9s_aAKvJt8tWPrP_DUiJ



I am using the code -



df = pd.read_table('/Data/123451_date.csv', sep=';', index_col=0,  engine='python', error_bad_lines=False)


It has ';' as a seprator, and values are missing in the file for some columns values in some observations (or rows).



How can I read it properly. I see the current dataframe, which is not loaded properly.
enter image description here



enter image description here










share|improve this question























  • @jezrael can you please look into it

    – Shivam_hbti
    Jan 19 at 14:30











  • I test it and find problem - first 33 lines have weird values to end each line, no idea what happens

    – jezrael
    Jan 19 at 15:09











  • What should I then, any changes I can make in pandas reading code?

    – Shivam_hbti
    Jan 19 at 15:28














1












1








1








I am trying to read this small data file,
Link - https://drive.google.com/open?id=1nAS5mpxQLVQn9s_aAKvJt8tWPrP_DUiJ



I am using the code -



df = pd.read_table('/Data/123451_date.csv', sep=';', index_col=0,  engine='python', error_bad_lines=False)


It has ';' as a seprator, and values are missing in the file for some columns values in some observations (or rows).



How can I read it properly. I see the current dataframe, which is not loaded properly.
enter image description here



enter image description here










share|improve this question














I am trying to read this small data file,
Link - https://drive.google.com/open?id=1nAS5mpxQLVQn9s_aAKvJt8tWPrP_DUiJ



I am using the code -



df = pd.read_table('/Data/123451_date.csv', sep=';', index_col=0,  engine='python', error_bad_lines=False)


It has ';' as a seprator, and values are missing in the file for some columns values in some observations (or rows).



How can I read it properly. I see the current dataframe, which is not loaded properly.
enter image description here



enter image description here







python pandas csv






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jan 19 at 14:08









Shivam_hbtiShivam_hbti

496




496













  • @jezrael can you please look into it

    – Shivam_hbti
    Jan 19 at 14:30











  • I test it and find problem - first 33 lines have weird values to end each line, no idea what happens

    – jezrael
    Jan 19 at 15:09











  • What should I then, any changes I can make in pandas reading code?

    – Shivam_hbti
    Jan 19 at 15:28



















  • @jezrael can you please look into it

    – Shivam_hbti
    Jan 19 at 14:30











  • I test it and find problem - first 33 lines have weird values to end each line, no idea what happens

    – jezrael
    Jan 19 at 15:09











  • What should I then, any changes I can make in pandas reading code?

    – Shivam_hbti
    Jan 19 at 15:28

















@jezrael can you please look into it

– Shivam_hbti
Jan 19 at 14:30





@jezrael can you please look into it

– Shivam_hbti
Jan 19 at 14:30













I test it and find problem - first 33 lines have weird values to end each line, no idea what happens

– jezrael
Jan 19 at 15:09





I test it and find problem - first 33 lines have weird values to end each line, no idea what happens

– jezrael
Jan 19 at 15:09













What should I then, any changes I can make in pandas reading code?

– Shivam_hbti
Jan 19 at 15:28





What should I then, any changes I can make in pandas reading code?

– Shivam_hbti
Jan 19 at 15:28












1 Answer
1






active

oldest

votes


















1














It looks like the data you use has some garbage in it. Precisely, rows 1-33 (inclusive) have additional, unnecessary (non-GPS) information included. You can either fix the database by manually removing the unneeded information from the datasheet, or use following code snippet to skip the rows that include it:



from pandas import read_table

data = read_table('34_2017-02-06.gpx.csv', sep=';', skiprows=list(range(1, 34)).drop("Unnamed: 28", axis=1)


The drop("Unnamed: 28", axis=1) is simply there to remove an additional column that is created probably due to each row in your datasheet ending with a ; (because it reads the empty space at the end of each line as data).



The result of print(data.head()) is then as follows:



   index  cumdist   ele    ...     esttotalpower        lat       lon
0 49 340 -34.8 ... 9 52.077362 5.114530
1 51 350 -34.8 ... 17 52.077468 5.114543
2 52 360 -35.0 ... -54 52.077521 5.114551
3 53 370 -35.0 ... -173 52.077603 5.114505
4 54 380 -34.8 ... 335 52.077677 5.114387

[5 rows x 28 columns]


To explain the role of the drop command even more, here is what would happen without it (notice the last, weird column)



   index  cumdist   ele     ...             lat       lon  Unnamed: 28
0 49 340 -34.8 ... 52.077362 5.114530 NaN
1 51 350 -34.8 ... 52.077468 5.114543 NaN
2 52 360 -35.0 ... 52.077521 5.114551 NaN
3 53 370 -35.0 ... 52.077603 5.114505 NaN
4 54 380 -34.8 ... 52.077677 5.114387 NaN

[5 rows x 29 columns]





share|improve this answer

























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54267930%2freading-a-pandas-data-frame-having-unequal-columns-in-observations%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    It looks like the data you use has some garbage in it. Precisely, rows 1-33 (inclusive) have additional, unnecessary (non-GPS) information included. You can either fix the database by manually removing the unneeded information from the datasheet, or use following code snippet to skip the rows that include it:



    from pandas import read_table

    data = read_table('34_2017-02-06.gpx.csv', sep=';', skiprows=list(range(1, 34)).drop("Unnamed: 28", axis=1)


    The drop("Unnamed: 28", axis=1) is simply there to remove an additional column that is created probably due to each row in your datasheet ending with a ; (because it reads the empty space at the end of each line as data).



    The result of print(data.head()) is then as follows:



       index  cumdist   ele    ...     esttotalpower        lat       lon
    0 49 340 -34.8 ... 9 52.077362 5.114530
    1 51 350 -34.8 ... 17 52.077468 5.114543
    2 52 360 -35.0 ... -54 52.077521 5.114551
    3 53 370 -35.0 ... -173 52.077603 5.114505
    4 54 380 -34.8 ... 335 52.077677 5.114387

    [5 rows x 28 columns]


    To explain the role of the drop command even more, here is what would happen without it (notice the last, weird column)



       index  cumdist   ele     ...             lat       lon  Unnamed: 28
    0 49 340 -34.8 ... 52.077362 5.114530 NaN
    1 51 350 -34.8 ... 52.077468 5.114543 NaN
    2 52 360 -35.0 ... 52.077521 5.114551 NaN
    3 53 370 -35.0 ... 52.077603 5.114505 NaN
    4 54 380 -34.8 ... 52.077677 5.114387 NaN

    [5 rows x 29 columns]





    share|improve this answer






























      1














      It looks like the data you use has some garbage in it. Precisely, rows 1-33 (inclusive) have additional, unnecessary (non-GPS) information included. You can either fix the database by manually removing the unneeded information from the datasheet, or use following code snippet to skip the rows that include it:



      from pandas import read_table

      data = read_table('34_2017-02-06.gpx.csv', sep=';', skiprows=list(range(1, 34)).drop("Unnamed: 28", axis=1)


      The drop("Unnamed: 28", axis=1) is simply there to remove an additional column that is created probably due to each row in your datasheet ending with a ; (because it reads the empty space at the end of each line as data).



      The result of print(data.head()) is then as follows:



         index  cumdist   ele    ...     esttotalpower        lat       lon
      0 49 340 -34.8 ... 9 52.077362 5.114530
      1 51 350 -34.8 ... 17 52.077468 5.114543
      2 52 360 -35.0 ... -54 52.077521 5.114551
      3 53 370 -35.0 ... -173 52.077603 5.114505
      4 54 380 -34.8 ... 335 52.077677 5.114387

      [5 rows x 28 columns]


      To explain the role of the drop command even more, here is what would happen without it (notice the last, weird column)



         index  cumdist   ele     ...             lat       lon  Unnamed: 28
      0 49 340 -34.8 ... 52.077362 5.114530 NaN
      1 51 350 -34.8 ... 52.077468 5.114543 NaN
      2 52 360 -35.0 ... 52.077521 5.114551 NaN
      3 53 370 -35.0 ... 52.077603 5.114505 NaN
      4 54 380 -34.8 ... 52.077677 5.114387 NaN

      [5 rows x 29 columns]





      share|improve this answer




























        1












        1








        1







        It looks like the data you use has some garbage in it. Precisely, rows 1-33 (inclusive) have additional, unnecessary (non-GPS) information included. You can either fix the database by manually removing the unneeded information from the datasheet, or use following code snippet to skip the rows that include it:



        from pandas import read_table

        data = read_table('34_2017-02-06.gpx.csv', sep=';', skiprows=list(range(1, 34)).drop("Unnamed: 28", axis=1)


        The drop("Unnamed: 28", axis=1) is simply there to remove an additional column that is created probably due to each row in your datasheet ending with a ; (because it reads the empty space at the end of each line as data).



        The result of print(data.head()) is then as follows:



           index  cumdist   ele    ...     esttotalpower        lat       lon
        0 49 340 -34.8 ... 9 52.077362 5.114530
        1 51 350 -34.8 ... 17 52.077468 5.114543
        2 52 360 -35.0 ... -54 52.077521 5.114551
        3 53 370 -35.0 ... -173 52.077603 5.114505
        4 54 380 -34.8 ... 335 52.077677 5.114387

        [5 rows x 28 columns]


        To explain the role of the drop command even more, here is what would happen without it (notice the last, weird column)



           index  cumdist   ele     ...             lat       lon  Unnamed: 28
        0 49 340 -34.8 ... 52.077362 5.114530 NaN
        1 51 350 -34.8 ... 52.077468 5.114543 NaN
        2 52 360 -35.0 ... 52.077521 5.114551 NaN
        3 53 370 -35.0 ... 52.077603 5.114505 NaN
        4 54 380 -34.8 ... 52.077677 5.114387 NaN

        [5 rows x 29 columns]





        share|improve this answer















        It looks like the data you use has some garbage in it. Precisely, rows 1-33 (inclusive) have additional, unnecessary (non-GPS) information included. You can either fix the database by manually removing the unneeded information from the datasheet, or use following code snippet to skip the rows that include it:



        from pandas import read_table

        data = read_table('34_2017-02-06.gpx.csv', sep=';', skiprows=list(range(1, 34)).drop("Unnamed: 28", axis=1)


        The drop("Unnamed: 28", axis=1) is simply there to remove an additional column that is created probably due to each row in your datasheet ending with a ; (because it reads the empty space at the end of each line as data).



        The result of print(data.head()) is then as follows:



           index  cumdist   ele    ...     esttotalpower        lat       lon
        0 49 340 -34.8 ... 9 52.077362 5.114530
        1 51 350 -34.8 ... 17 52.077468 5.114543
        2 52 360 -35.0 ... -54 52.077521 5.114551
        3 53 370 -35.0 ... -173 52.077603 5.114505
        4 54 380 -34.8 ... 335 52.077677 5.114387

        [5 rows x 28 columns]


        To explain the role of the drop command even more, here is what would happen without it (notice the last, weird column)



           index  cumdist   ele     ...             lat       lon  Unnamed: 28
        0 49 340 -34.8 ... 52.077362 5.114530 NaN
        1 51 350 -34.8 ... 52.077468 5.114543 NaN
        2 52 360 -35.0 ... 52.077521 5.114551 NaN
        3 53 370 -35.0 ... 52.077603 5.114505 NaN
        4 54 380 -34.8 ... 52.077677 5.114387 NaN

        [5 rows x 29 columns]






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Jan 19 at 15:44

























        answered Jan 19 at 15:35









        Kacper FloriańskiKacper Floriański

        45619




        45619






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54267930%2freading-a-pandas-data-frame-having-unequal-columns-in-observations%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Liquibase includeAll doesn't find base path

            How to use setInterval in EJS file?

            Petrus Granier-Deferre