Need to reshape my dataframe (lots of column names)
I am trying to reshape a dataframe in pandas. I currently have one id variable, and the rest of the variables are in the following format: "variableyear", where year is between 2000 and 2016. I want to to make a new variable year (which extracts the year from my variableyear variable) and creates a column named variable. Here is an example dataset that looks similar to my real dataset (as my data is confidential):
| name | income2015 | income2016 | children2015 | children2016 | education2015 | education2016
---|---------|------------|------------|--------------|--------------|---------------|---------------
0 | John | 1 | 4 | 7 | 10 | 13 | 16
1 | Phillip | 2 | 5 | 8 | 11 | 14 | 17
2 | Carl | 3 | 6 | 9 | 12 | 15 | 18
This is what I want:
| name | year | income | children | education
---|---------|------|--------|----------|-----------
0 | John | 2015 | 1 | 7 | 13
1 | Phillip | 2015 | 2 | 8 | 14
2 | Carl | 2015 | 3 | 9 | 15
3 | John | 2016 | 4 | 10 | 16
4 | Phillip | 2016 | 5 | 11 | 17
5 | Carl | 2016 | 6 | 12 | 18
I have already tried the following:
df2 = pd.melt(df, id_vars=['name'], value_vars=df.columns[1:])
df2['year'] = df2['variable'].map(lambda x: x[-4:])
df2['variable'] = df2['variable'].map(lambda x: x[:-4])
which gives me this:
| | | |
------|----------|-----------|------|------
name | variable | value | year |
0 | John | income | 1 | 2015
1 | Phillip | income | 2 | 2015
2 | Carl | income | 3 | 2015
3 | John | income | 4 | 2016
4 | Phillip | income | 5 | 2016
5 | Carl | income | 6 | 2016
6 | John | children | 7 | 2015
7 | Phillip | children | 8 | 2015
8 | Carl | children | 9 | 2015
9 | John | children | 10 | 2016
10 | Phillip | children | 11 | 2016
11 | Carl | children | 12 | 2016
12 | John | education | 13 | 2015
13 | Phillip | education | 14 | 2015
14 | Carl | education | 15 | 2015
15 | John | education | 16 | 2016
16 | Phillip | education | 17 | 2016
17 | Carl | education | 18 | 2016
But now I have to reshape again... Is there an easier to do this?
Also, here is my df in dictionary format:
{'children2015': {0: 7, 1: 8, 2: 9}, 'children2016': {0: 10, 1: 11, 2: 12}, 'education2015': {0: 13, 1: 14, 2: 15}, 'education2016': {0: 16, 1: 17, 2: 18}, 'income2015': {0: 1, 1: 2, 2: 3}, 'income2016': {0: 4, 1: 5, 2: 6}, 'name': {0: 'John', 1: 'Phillip', 2: 'Carl'}}
python python-3.x pandas dataframe reshape
add a comment |
I am trying to reshape a dataframe in pandas. I currently have one id variable, and the rest of the variables are in the following format: "variableyear", where year is between 2000 and 2016. I want to to make a new variable year (which extracts the year from my variableyear variable) and creates a column named variable. Here is an example dataset that looks similar to my real dataset (as my data is confidential):
| name | income2015 | income2016 | children2015 | children2016 | education2015 | education2016
---|---------|------------|------------|--------------|--------------|---------------|---------------
0 | John | 1 | 4 | 7 | 10 | 13 | 16
1 | Phillip | 2 | 5 | 8 | 11 | 14 | 17
2 | Carl | 3 | 6 | 9 | 12 | 15 | 18
This is what I want:
| name | year | income | children | education
---|---------|------|--------|----------|-----------
0 | John | 2015 | 1 | 7 | 13
1 | Phillip | 2015 | 2 | 8 | 14
2 | Carl | 2015 | 3 | 9 | 15
3 | John | 2016 | 4 | 10 | 16
4 | Phillip | 2016 | 5 | 11 | 17
5 | Carl | 2016 | 6 | 12 | 18
I have already tried the following:
df2 = pd.melt(df, id_vars=['name'], value_vars=df.columns[1:])
df2['year'] = df2['variable'].map(lambda x: x[-4:])
df2['variable'] = df2['variable'].map(lambda x: x[:-4])
which gives me this:
| | | |
------|----------|-----------|------|------
name | variable | value | year |
0 | John | income | 1 | 2015
1 | Phillip | income | 2 | 2015
2 | Carl | income | 3 | 2015
3 | John | income | 4 | 2016
4 | Phillip | income | 5 | 2016
5 | Carl | income | 6 | 2016
6 | John | children | 7 | 2015
7 | Phillip | children | 8 | 2015
8 | Carl | children | 9 | 2015
9 | John | children | 10 | 2016
10 | Phillip | children | 11 | 2016
11 | Carl | children | 12 | 2016
12 | John | education | 13 | 2015
13 | Phillip | education | 14 | 2015
14 | Carl | education | 15 | 2015
15 | John | education | 16 | 2016
16 | Phillip | education | 17 | 2016
17 | Carl | education | 18 | 2016
But now I have to reshape again... Is there an easier to do this?
Also, here is my df in dictionary format:
{'children2015': {0: 7, 1: 8, 2: 9}, 'children2016': {0: 10, 1: 11, 2: 12}, 'education2015': {0: 13, 1: 14, 2: 15}, 'education2016': {0: 16, 1: 17, 2: 18}, 'income2015': {0: 1, 1: 2, 2: 3}, 'income2016': {0: 4, 1: 5, 2: 6}, 'name': {0: 'John', 1: 'Phillip', 2: 'Carl'}}
python python-3.x pandas dataframe reshape
add a comment |
I am trying to reshape a dataframe in pandas. I currently have one id variable, and the rest of the variables are in the following format: "variableyear", where year is between 2000 and 2016. I want to to make a new variable year (which extracts the year from my variableyear variable) and creates a column named variable. Here is an example dataset that looks similar to my real dataset (as my data is confidential):
| name | income2015 | income2016 | children2015 | children2016 | education2015 | education2016
---|---------|------------|------------|--------------|--------------|---------------|---------------
0 | John | 1 | 4 | 7 | 10 | 13 | 16
1 | Phillip | 2 | 5 | 8 | 11 | 14 | 17
2 | Carl | 3 | 6 | 9 | 12 | 15 | 18
This is what I want:
| name | year | income | children | education
---|---------|------|--------|----------|-----------
0 | John | 2015 | 1 | 7 | 13
1 | Phillip | 2015 | 2 | 8 | 14
2 | Carl | 2015 | 3 | 9 | 15
3 | John | 2016 | 4 | 10 | 16
4 | Phillip | 2016 | 5 | 11 | 17
5 | Carl | 2016 | 6 | 12 | 18
I have already tried the following:
df2 = pd.melt(df, id_vars=['name'], value_vars=df.columns[1:])
df2['year'] = df2['variable'].map(lambda x: x[-4:])
df2['variable'] = df2['variable'].map(lambda x: x[:-4])
which gives me this:
| | | |
------|----------|-----------|------|------
name | variable | value | year |
0 | John | income | 1 | 2015
1 | Phillip | income | 2 | 2015
2 | Carl | income | 3 | 2015
3 | John | income | 4 | 2016
4 | Phillip | income | 5 | 2016
5 | Carl | income | 6 | 2016
6 | John | children | 7 | 2015
7 | Phillip | children | 8 | 2015
8 | Carl | children | 9 | 2015
9 | John | children | 10 | 2016
10 | Phillip | children | 11 | 2016
11 | Carl | children | 12 | 2016
12 | John | education | 13 | 2015
13 | Phillip | education | 14 | 2015
14 | Carl | education | 15 | 2015
15 | John | education | 16 | 2016
16 | Phillip | education | 17 | 2016
17 | Carl | education | 18 | 2016
But now I have to reshape again... Is there an easier to do this?
Also, here is my df in dictionary format:
{'children2015': {0: 7, 1: 8, 2: 9}, 'children2016': {0: 10, 1: 11, 2: 12}, 'education2015': {0: 13, 1: 14, 2: 15}, 'education2016': {0: 16, 1: 17, 2: 18}, 'income2015': {0: 1, 1: 2, 2: 3}, 'income2016': {0: 4, 1: 5, 2: 6}, 'name': {0: 'John', 1: 'Phillip', 2: 'Carl'}}
python python-3.x pandas dataframe reshape
I am trying to reshape a dataframe in pandas. I currently have one id variable, and the rest of the variables are in the following format: "variableyear", where year is between 2000 and 2016. I want to to make a new variable year (which extracts the year from my variableyear variable) and creates a column named variable. Here is an example dataset that looks similar to my real dataset (as my data is confidential):
| name | income2015 | income2016 | children2015 | children2016 | education2015 | education2016
---|---------|------------|------------|--------------|--------------|---------------|---------------
0 | John | 1 | 4 | 7 | 10 | 13 | 16
1 | Phillip | 2 | 5 | 8 | 11 | 14 | 17
2 | Carl | 3 | 6 | 9 | 12 | 15 | 18
This is what I want:
| name | year | income | children | education
---|---------|------|--------|----------|-----------
0 | John | 2015 | 1 | 7 | 13
1 | Phillip | 2015 | 2 | 8 | 14
2 | Carl | 2015 | 3 | 9 | 15
3 | John | 2016 | 4 | 10 | 16
4 | Phillip | 2016 | 5 | 11 | 17
5 | Carl | 2016 | 6 | 12 | 18
I have already tried the following:
df2 = pd.melt(df, id_vars=['name'], value_vars=df.columns[1:])
df2['year'] = df2['variable'].map(lambda x: x[-4:])
df2['variable'] = df2['variable'].map(lambda x: x[:-4])
which gives me this:
| | | |
------|----------|-----------|------|------
name | variable | value | year |
0 | John | income | 1 | 2015
1 | Phillip | income | 2 | 2015
2 | Carl | income | 3 | 2015
3 | John | income | 4 | 2016
4 | Phillip | income | 5 | 2016
5 | Carl | income | 6 | 2016
6 | John | children | 7 | 2015
7 | Phillip | children | 8 | 2015
8 | Carl | children | 9 | 2015
9 | John | children | 10 | 2016
10 | Phillip | children | 11 | 2016
11 | Carl | children | 12 | 2016
12 | John | education | 13 | 2015
13 | Phillip | education | 14 | 2015
14 | Carl | education | 15 | 2015
15 | John | education | 16 | 2016
16 | Phillip | education | 17 | 2016
17 | Carl | education | 18 | 2016
But now I have to reshape again... Is there an easier to do this?
Also, here is my df in dictionary format:
{'children2015': {0: 7, 1: 8, 2: 9}, 'children2016': {0: 10, 1: 11, 2: 12}, 'education2015': {0: 13, 1: 14, 2: 15}, 'education2016': {0: 16, 1: 17, 2: 18}, 'income2015': {0: 1, 1: 2, 2: 3}, 'income2016': {0: 4, 1: 5, 2: 6}, 'name': {0: 'John', 1: 'Phillip', 2: 'Carl'}}
python python-3.x pandas dataframe reshape
python python-3.x pandas dataframe reshape
asked Jan 19 at 20:49
JimboJimbo
32
32
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
You can actually use pd.wide_to_long
for just this. In the stubnames arg you could use a set of variable names (that excludes name and drop the last 4 characters) in your df using this code: set([x[:-4] for x in df.columns[1:]])
.
pd.wide_to_long(df,stubnames=set([x[:-4] for x in df.columns[1:]]),i=['name'],j='year').reset_index()
Output:
name year education income children
0 John 2015 13 1 7
1 Phillip 2015 14 2 8
2 Carl 2015 15 3 9
3 John 2016 16 4 10
4 Phillip 2016 17 5 11
5 Carl 2016 18 6 12
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54271274%2fneed-to-reshape-my-dataframe-lots-of-column-names%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can actually use pd.wide_to_long
for just this. In the stubnames arg you could use a set of variable names (that excludes name and drop the last 4 characters) in your df using this code: set([x[:-4] for x in df.columns[1:]])
.
pd.wide_to_long(df,stubnames=set([x[:-4] for x in df.columns[1:]]),i=['name'],j='year').reset_index()
Output:
name year education income children
0 John 2015 13 1 7
1 Phillip 2015 14 2 8
2 Carl 2015 15 3 9
3 John 2016 16 4 10
4 Phillip 2016 17 5 11
5 Carl 2016 18 6 12
add a comment |
You can actually use pd.wide_to_long
for just this. In the stubnames arg you could use a set of variable names (that excludes name and drop the last 4 characters) in your df using this code: set([x[:-4] for x in df.columns[1:]])
.
pd.wide_to_long(df,stubnames=set([x[:-4] for x in df.columns[1:]]),i=['name'],j='year').reset_index()
Output:
name year education income children
0 John 2015 13 1 7
1 Phillip 2015 14 2 8
2 Carl 2015 15 3 9
3 John 2016 16 4 10
4 Phillip 2016 17 5 11
5 Carl 2016 18 6 12
add a comment |
You can actually use pd.wide_to_long
for just this. In the stubnames arg you could use a set of variable names (that excludes name and drop the last 4 characters) in your df using this code: set([x[:-4] for x in df.columns[1:]])
.
pd.wide_to_long(df,stubnames=set([x[:-4] for x in df.columns[1:]]),i=['name'],j='year').reset_index()
Output:
name year education income children
0 John 2015 13 1 7
1 Phillip 2015 14 2 8
2 Carl 2015 15 3 9
3 John 2016 16 4 10
4 Phillip 2016 17 5 11
5 Carl 2016 18 6 12
You can actually use pd.wide_to_long
for just this. In the stubnames arg you could use a set of variable names (that excludes name and drop the last 4 characters) in your df using this code: set([x[:-4] for x in df.columns[1:]])
.
pd.wide_to_long(df,stubnames=set([x[:-4] for x in df.columns[1:]]),i=['name'],j='year').reset_index()
Output:
name year education income children
0 John 2015 13 1 7
1 Phillip 2015 14 2 8
2 Carl 2015 15 3 9
3 John 2016 16 4 10
4 Phillip 2016 17 5 11
5 Carl 2016 18 6 12
answered Jan 19 at 20:52
Joe PattenJoe Patten
1,3801414
1,3801414
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54271274%2fneed-to-reshape-my-dataframe-lots-of-column-names%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown