Reading a pandas data frame having unequal columns in observations

I am trying to read this small data file,
Link - https://drive.google.com/open?id=1nAS5mpxQLVQn9s_aAKvJt8tWPrP_DUiJ

I am using the code -

df = pd.read_table('/Data/123451_date.csv', sep=';', index_col=0,  engine='python', error_bad_lines=False)

It has ';' as a seprator, and values are missing in the file for some columns values in some observations (or rows).

How can I read it properly. I see the current dataframe, which is not loaded properly.
enter image description here

enter image description here

asked Jan 19 at 14:08

Shivam_hbti

496

@jezrael can you please look into it

– Shivam_hbti
Jan 19 at 14:30

I test it and find problem - first 33 lines have weird values to end each line, no idea what happens

– jezrael
Jan 19 at 15:09

What should I then, any changes I can make in pandas reading code?

– Shivam_hbti
Jan 19 at 15:28

add a comment |

I am trying to read this small data file,
Link - https://drive.google.com/open?id=1nAS5mpxQLVQn9s_aAKvJt8tWPrP_DUiJ

I am using the code -

df = pd.read_table('/Data/123451_date.csv', sep=';', index_col=0,  engine='python', error_bad_lines=False)

It has ';' as a seprator, and values are missing in the file for some columns values in some observations (or rows).

How can I read it properly. I see the current dataframe, which is not loaded properly.
enter image description here

enter image description here

asked Jan 19 at 14:08

Shivam_hbti

496

@jezrael can you please look into it

– Shivam_hbti
Jan 19 at 14:30

I test it and find problem - first 33 lines have weird values to end each line, no idea what happens

– jezrael
Jan 19 at 15:09

What should I then, any changes I can make in pandas reading code?

– Shivam_hbti
Jan 19 at 15:28

add a comment |

I am trying to read this small data file,
Link - https://drive.google.com/open?id=1nAS5mpxQLVQn9s_aAKvJt8tWPrP_DUiJ

I am using the code -

df = pd.read_table('/Data/123451_date.csv', sep=';', index_col=0,  engine='python', error_bad_lines=False)

It has ';' as a seprator, and values are missing in the file for some columns values in some observations (or rows).

How can I read it properly. I see the current dataframe, which is not loaded properly.
enter image description here

enter image description here

asked Jan 19 at 14:08

Shivam_hbti

496

I am trying to read this small data file,
Link - https://drive.google.com/open?id=1nAS5mpxQLVQn9s_aAKvJt8tWPrP_DUiJ

I am using the code -

df = pd.read_table('/Data/123451_date.csv', sep=';', index_col=0,  engine='python', error_bad_lines=False)

It has ';' as a seprator, and values are missing in the file for some columns values in some observations (or rows).

How can I read it properly. I see the current dataframe, which is not loaded properly.
enter image description here

enter image description here

python pandas csv

asked Jan 19 at 14:08

Shivam_hbti

496

asked Jan 19 at 14:08

Shivam_hbti

496

asked Jan 19 at 14:08

Shivam_hbti

496

asked Jan 19 at 14:08

Shivam_hbti

496

asked Jan 19 at 14:08

Shivam_hbti

496

@jezrael can you please look into it

– Shivam_hbti
Jan 19 at 14:30

I test it and find problem - first 33 lines have weird values to end each line, no idea what happens

– jezrael
Jan 19 at 15:09

What should I then, any changes I can make in pandas reading code?

– Shivam_hbti
Jan 19 at 15:28

add a comment |

@jezrael can you please look into it

– Shivam_hbti
Jan 19 at 14:30

I test it and find problem - first 33 lines have weird values to end each line, no idea what happens

– jezrael
Jan 19 at 15:09

What should I then, any changes I can make in pandas reading code?

– Shivam_hbti
Jan 19 at 15:28

@jezrael can you please look into it

– Shivam_hbti
Jan 19 at 14:30

I test it and find problem - first 33 lines have weird values to end each line, no idea what happens

– jezrael
Jan 19 at 15:09

What should I then, any changes I can make in pandas reading code?

– Shivam_hbti
Jan 19 at 15:28

add a comment |

1 Answer
1

active

oldest

votes

It looks like the data you use has some garbage in it. Precisely, rows 1-33 (inclusive) have additional, unnecessary (non-GPS) information included. You can either fix the database by manually removing the unneeded information from the datasheet, or use following code snippet to skip the rows that include it:

from pandas import read_table



data = read_table('34_2017-02-06.gpx.csv', sep=';', skiprows=list(range(1, 34)).drop("Unnamed: 28", axis=1)

The drop("Unnamed: 28", axis=1) is simply there to remove an additional column that is created probably due to each row in your datasheet ending with a ; (because it reads the empty space at the end of each line as data).

The result of print(data.head()) is then as follows:

   index  cumdist   ele    ...     esttotalpower        lat       lon

0     49      340 -34.8    ...                 9  52.077362  5.114530

1     51      350 -34.8    ...                17  52.077468  5.114543

2     52      360 -35.0    ...               -54  52.077521  5.114551

3     53      370 -35.0    ...              -173  52.077603  5.114505

4     54      380 -34.8    ...               335  52.077677  5.114387



[5 rows x 28 columns]

To explain the role of the drop command even more, here is what would happen without it (notice the last, weird column)

   index  cumdist   ele     ...             lat       lon  Unnamed: 28

0     49      340 -34.8     ...       52.077362  5.114530          NaN

1     51      350 -34.8     ...       52.077468  5.114543          NaN

2     52      360 -35.0     ...       52.077521  5.114551          NaN

3     53      370 -35.0     ...       52.077603  5.114505          NaN

4     54      380 -34.8     ...       52.077677  5.114387          NaN



[5 rows x 29 columns]

edited Jan 19 at 15:44

answered Jan 19 at 15:35

Kacper Floriański

45619

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54267930%2freading-a-pandas-data-frame-having-unequal-columns-in-observations%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

from pandas import read_table



data = read_table('34_2017-02-06.gpx.csv', sep=';', skiprows=list(range(1, 34)).drop("Unnamed: 28", axis=1)

The result of print(data.head()) is then as follows:

   index  cumdist   ele    ...     esttotalpower        lat       lon

0     49      340 -34.8    ...                 9  52.077362  5.114530

1     51      350 -34.8    ...                17  52.077468  5.114543

2     52      360 -35.0    ...               -54  52.077521  5.114551

3     53      370 -35.0    ...              -173  52.077603  5.114505

4     54      380 -34.8    ...               335  52.077677  5.114387



[5 rows x 28 columns]

To explain the role of the drop command even more, here is what would happen without it (notice the last, weird column)

   index  cumdist   ele     ...             lat       lon  Unnamed: 28

0     49      340 -34.8     ...       52.077362  5.114530          NaN

1     51      350 -34.8     ...       52.077468  5.114543          NaN

2     52      360 -35.0     ...       52.077521  5.114551          NaN

3     53      370 -35.0     ...       52.077603  5.114505          NaN

4     54      380 -34.8     ...       52.077677  5.114387          NaN



[5 rows x 29 columns]

edited Jan 19 at 15:44

answered Jan 19 at 15:35

Kacper Floriański

45619

add a comment |

from pandas import read_table



data = read_table('34_2017-02-06.gpx.csv', sep=';', skiprows=list(range(1, 34)).drop("Unnamed: 28", axis=1)

The result of print(data.head()) is then as follows:

   index  cumdist   ele    ...     esttotalpower        lat       lon

0     49      340 -34.8    ...                 9  52.077362  5.114530

1     51      350 -34.8    ...                17  52.077468  5.114543

2     52      360 -35.0    ...               -54  52.077521  5.114551

3     53      370 -35.0    ...              -173  52.077603  5.114505

4     54      380 -34.8    ...               335  52.077677  5.114387



[5 rows x 28 columns]

To explain the role of the drop command even more, here is what would happen without it (notice the last, weird column)

   index  cumdist   ele     ...             lat       lon  Unnamed: 28

0     49      340 -34.8     ...       52.077362  5.114530          NaN

1     51      350 -34.8     ...       52.077468  5.114543          NaN

2     52      360 -35.0     ...       52.077521  5.114551          NaN

3     53      370 -35.0     ...       52.077603  5.114505          NaN

4     54      380 -34.8     ...       52.077677  5.114387          NaN



[5 rows x 29 columns]

edited Jan 19 at 15:44

answered Jan 19 at 15:35

Kacper Floriański

45619

add a comment |

from pandas import read_table



data = read_table('34_2017-02-06.gpx.csv', sep=';', skiprows=list(range(1, 34)).drop("Unnamed: 28", axis=1)

The result of print(data.head()) is then as follows:

   index  cumdist   ele    ...     esttotalpower        lat       lon

0     49      340 -34.8    ...                 9  52.077362  5.114530

1     51      350 -34.8    ...                17  52.077468  5.114543

2     52      360 -35.0    ...               -54  52.077521  5.114551

3     53      370 -35.0    ...              -173  52.077603  5.114505

4     54      380 -34.8    ...               335  52.077677  5.114387



[5 rows x 28 columns]

To explain the role of the drop command even more, here is what would happen without it (notice the last, weird column)

   index  cumdist   ele     ...             lat       lon  Unnamed: 28

0     49      340 -34.8     ...       52.077362  5.114530          NaN

1     51      350 -34.8     ...       52.077468  5.114543          NaN

2     52      360 -35.0     ...       52.077521  5.114551          NaN

3     53      370 -35.0     ...       52.077603  5.114505          NaN

4     54      380 -34.8     ...       52.077677  5.114387          NaN



[5 rows x 29 columns]

edited Jan 19 at 15:44

answered Jan 19 at 15:35

Kacper Floriański

45619

from pandas import read_table



data = read_table('34_2017-02-06.gpx.csv', sep=';', skiprows=list(range(1, 34)).drop("Unnamed: 28", axis=1)

The result of print(data.head()) is then as follows:

   index  cumdist   ele    ...     esttotalpower        lat       lon

0     49      340 -34.8    ...                 9  52.077362  5.114530

1     51      350 -34.8    ...                17  52.077468  5.114543

2     52      360 -35.0    ...               -54  52.077521  5.114551

3     53      370 -35.0    ...              -173  52.077603  5.114505

4     54      380 -34.8    ...               335  52.077677  5.114387



[5 rows x 28 columns]

To explain the role of the drop command even more, here is what would happen without it (notice the last, weird column)

   index  cumdist   ele     ...             lat       lon  Unnamed: 28

0     49      340 -34.8     ...       52.077362  5.114530          NaN

1     51      350 -34.8     ...       52.077468  5.114543          NaN

2     52      360 -35.0     ...       52.077521  5.114551          NaN

3     53      370 -35.0     ...       52.077603  5.114505          NaN

4     54      380 -34.8     ...       52.077677  5.114387          NaN



[5 rows x 29 columns]

edited Jan 19 at 15:44

answered Jan 19 at 15:35

Kacper Floriański

45619

edited Jan 19 at 15:44

answered Jan 19 at 15:35

Kacper Floriański

45619

answered Jan 19 at 15:35

Kacper Floriański

45619

answered Jan 19 at 15:35

Kacper Floriański

45619

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Brtdku