Need to reshape my dataframe (lots of column names)












0















I am trying to reshape a dataframe in pandas. I currently have one id variable, and the rest of the variables are in the following format: "variableyear", where year is between 2000 and 2016. I want to to make a new variable year (which extracts the year from my variableyear variable) and creates a column named variable. Here is an example dataset that looks similar to my real dataset (as my data is confidential):




| name | income2015 | income2016 | children2015 | children2016 | education2015 | education2016
---|---------|------------|------------|--------------|--------------|---------------|---------------
0 | John | 1 | 4 | 7 | 10 | 13 | 16
1 | Phillip | 2 | 5 | 8 | 11 | 14 | 17
2 | Carl | 3 | 6 | 9 | 12 | 15 | 18


This is what I want:



    |  name   | year | income | children | education 
---|---------|------|--------|----------|-----------
0 | John | 2015 | 1 | 7 | 13
1 | Phillip | 2015 | 2 | 8 | 14
2 | Carl | 2015 | 3 | 9 | 15
3 | John | 2016 | 4 | 10 | 16
4 | Phillip | 2016 | 5 | 11 | 17
5 | Carl | 2016 | 6 | 12 | 18


I have already tried the following:



df2 = pd.melt(df, id_vars=['name'], value_vars=df.columns[1:])
df2['year'] = df2['variable'].map(lambda x: x[-4:])
df2['variable'] = df2['variable'].map(lambda x: x[:-4])


which gives me this:



       |          |           |      |      
------|----------|-----------|------|------
name | variable | value | year |
0 | John | income | 1 | 2015
1 | Phillip | income | 2 | 2015
2 | Carl | income | 3 | 2015
3 | John | income | 4 | 2016
4 | Phillip | income | 5 | 2016
5 | Carl | income | 6 | 2016
6 | John | children | 7 | 2015
7 | Phillip | children | 8 | 2015
8 | Carl | children | 9 | 2015
9 | John | children | 10 | 2016
10 | Phillip | children | 11 | 2016
11 | Carl | children | 12 | 2016
12 | John | education | 13 | 2015
13 | Phillip | education | 14 | 2015
14 | Carl | education | 15 | 2015
15 | John | education | 16 | 2016
16 | Phillip | education | 17 | 2016
17 | Carl | education | 18 | 2016


But now I have to reshape again... Is there an easier to do this?



Also, here is my df in dictionary format:



{'children2015': {0: 7, 1: 8, 2: 9}, 'children2016': {0: 10, 1: 11, 2: 12}, 'education2015': {0: 13, 1: 14, 2: 15}, 'education2016': {0: 16, 1: 17, 2: 18}, 'income2015': {0: 1, 1: 2, 2: 3}, 'income2016': {0: 4, 1: 5, 2: 6}, 'name': {0: 'John', 1: 'Phillip', 2: 'Carl'}}









share|improve this question



























    0















    I am trying to reshape a dataframe in pandas. I currently have one id variable, and the rest of the variables are in the following format: "variableyear", where year is between 2000 and 2016. I want to to make a new variable year (which extracts the year from my variableyear variable) and creates a column named variable. Here is an example dataset that looks similar to my real dataset (as my data is confidential):




    | name | income2015 | income2016 | children2015 | children2016 | education2015 | education2016
    ---|---------|------------|------------|--------------|--------------|---------------|---------------
    0 | John | 1 | 4 | 7 | 10 | 13 | 16
    1 | Phillip | 2 | 5 | 8 | 11 | 14 | 17
    2 | Carl | 3 | 6 | 9 | 12 | 15 | 18


    This is what I want:



        |  name   | year | income | children | education 
    ---|---------|------|--------|----------|-----------
    0 | John | 2015 | 1 | 7 | 13
    1 | Phillip | 2015 | 2 | 8 | 14
    2 | Carl | 2015 | 3 | 9 | 15
    3 | John | 2016 | 4 | 10 | 16
    4 | Phillip | 2016 | 5 | 11 | 17
    5 | Carl | 2016 | 6 | 12 | 18


    I have already tried the following:



    df2 = pd.melt(df, id_vars=['name'], value_vars=df.columns[1:])
    df2['year'] = df2['variable'].map(lambda x: x[-4:])
    df2['variable'] = df2['variable'].map(lambda x: x[:-4])


    which gives me this:



           |          |           |      |      
    ------|----------|-----------|------|------
    name | variable | value | year |
    0 | John | income | 1 | 2015
    1 | Phillip | income | 2 | 2015
    2 | Carl | income | 3 | 2015
    3 | John | income | 4 | 2016
    4 | Phillip | income | 5 | 2016
    5 | Carl | income | 6 | 2016
    6 | John | children | 7 | 2015
    7 | Phillip | children | 8 | 2015
    8 | Carl | children | 9 | 2015
    9 | John | children | 10 | 2016
    10 | Phillip | children | 11 | 2016
    11 | Carl | children | 12 | 2016
    12 | John | education | 13 | 2015
    13 | Phillip | education | 14 | 2015
    14 | Carl | education | 15 | 2015
    15 | John | education | 16 | 2016
    16 | Phillip | education | 17 | 2016
    17 | Carl | education | 18 | 2016


    But now I have to reshape again... Is there an easier to do this?



    Also, here is my df in dictionary format:



    {'children2015': {0: 7, 1: 8, 2: 9}, 'children2016': {0: 10, 1: 11, 2: 12}, 'education2015': {0: 13, 1: 14, 2: 15}, 'education2016': {0: 16, 1: 17, 2: 18}, 'income2015': {0: 1, 1: 2, 2: 3}, 'income2016': {0: 4, 1: 5, 2: 6}, 'name': {0: 'John', 1: 'Phillip', 2: 'Carl'}}









    share|improve this question

























      0












      0








      0








      I am trying to reshape a dataframe in pandas. I currently have one id variable, and the rest of the variables are in the following format: "variableyear", where year is between 2000 and 2016. I want to to make a new variable year (which extracts the year from my variableyear variable) and creates a column named variable. Here is an example dataset that looks similar to my real dataset (as my data is confidential):




      | name | income2015 | income2016 | children2015 | children2016 | education2015 | education2016
      ---|---------|------------|------------|--------------|--------------|---------------|---------------
      0 | John | 1 | 4 | 7 | 10 | 13 | 16
      1 | Phillip | 2 | 5 | 8 | 11 | 14 | 17
      2 | Carl | 3 | 6 | 9 | 12 | 15 | 18


      This is what I want:



          |  name   | year | income | children | education 
      ---|---------|------|--------|----------|-----------
      0 | John | 2015 | 1 | 7 | 13
      1 | Phillip | 2015 | 2 | 8 | 14
      2 | Carl | 2015 | 3 | 9 | 15
      3 | John | 2016 | 4 | 10 | 16
      4 | Phillip | 2016 | 5 | 11 | 17
      5 | Carl | 2016 | 6 | 12 | 18


      I have already tried the following:



      df2 = pd.melt(df, id_vars=['name'], value_vars=df.columns[1:])
      df2['year'] = df2['variable'].map(lambda x: x[-4:])
      df2['variable'] = df2['variable'].map(lambda x: x[:-4])


      which gives me this:



             |          |           |      |      
      ------|----------|-----------|------|------
      name | variable | value | year |
      0 | John | income | 1 | 2015
      1 | Phillip | income | 2 | 2015
      2 | Carl | income | 3 | 2015
      3 | John | income | 4 | 2016
      4 | Phillip | income | 5 | 2016
      5 | Carl | income | 6 | 2016
      6 | John | children | 7 | 2015
      7 | Phillip | children | 8 | 2015
      8 | Carl | children | 9 | 2015
      9 | John | children | 10 | 2016
      10 | Phillip | children | 11 | 2016
      11 | Carl | children | 12 | 2016
      12 | John | education | 13 | 2015
      13 | Phillip | education | 14 | 2015
      14 | Carl | education | 15 | 2015
      15 | John | education | 16 | 2016
      16 | Phillip | education | 17 | 2016
      17 | Carl | education | 18 | 2016


      But now I have to reshape again... Is there an easier to do this?



      Also, here is my df in dictionary format:



      {'children2015': {0: 7, 1: 8, 2: 9}, 'children2016': {0: 10, 1: 11, 2: 12}, 'education2015': {0: 13, 1: 14, 2: 15}, 'education2016': {0: 16, 1: 17, 2: 18}, 'income2015': {0: 1, 1: 2, 2: 3}, 'income2016': {0: 4, 1: 5, 2: 6}, 'name': {0: 'John', 1: 'Phillip', 2: 'Carl'}}









      share|improve this question














      I am trying to reshape a dataframe in pandas. I currently have one id variable, and the rest of the variables are in the following format: "variableyear", where year is between 2000 and 2016. I want to to make a new variable year (which extracts the year from my variableyear variable) and creates a column named variable. Here is an example dataset that looks similar to my real dataset (as my data is confidential):




      | name | income2015 | income2016 | children2015 | children2016 | education2015 | education2016
      ---|---------|------------|------------|--------------|--------------|---------------|---------------
      0 | John | 1 | 4 | 7 | 10 | 13 | 16
      1 | Phillip | 2 | 5 | 8 | 11 | 14 | 17
      2 | Carl | 3 | 6 | 9 | 12 | 15 | 18


      This is what I want:



          |  name   | year | income | children | education 
      ---|---------|------|--------|----------|-----------
      0 | John | 2015 | 1 | 7 | 13
      1 | Phillip | 2015 | 2 | 8 | 14
      2 | Carl | 2015 | 3 | 9 | 15
      3 | John | 2016 | 4 | 10 | 16
      4 | Phillip | 2016 | 5 | 11 | 17
      5 | Carl | 2016 | 6 | 12 | 18


      I have already tried the following:



      df2 = pd.melt(df, id_vars=['name'], value_vars=df.columns[1:])
      df2['year'] = df2['variable'].map(lambda x: x[-4:])
      df2['variable'] = df2['variable'].map(lambda x: x[:-4])


      which gives me this:



             |          |           |      |      
      ------|----------|-----------|------|------
      name | variable | value | year |
      0 | John | income | 1 | 2015
      1 | Phillip | income | 2 | 2015
      2 | Carl | income | 3 | 2015
      3 | John | income | 4 | 2016
      4 | Phillip | income | 5 | 2016
      5 | Carl | income | 6 | 2016
      6 | John | children | 7 | 2015
      7 | Phillip | children | 8 | 2015
      8 | Carl | children | 9 | 2015
      9 | John | children | 10 | 2016
      10 | Phillip | children | 11 | 2016
      11 | Carl | children | 12 | 2016
      12 | John | education | 13 | 2015
      13 | Phillip | education | 14 | 2015
      14 | Carl | education | 15 | 2015
      15 | John | education | 16 | 2016
      16 | Phillip | education | 17 | 2016
      17 | Carl | education | 18 | 2016


      But now I have to reshape again... Is there an easier to do this?



      Also, here is my df in dictionary format:



      {'children2015': {0: 7, 1: 8, 2: 9}, 'children2016': {0: 10, 1: 11, 2: 12}, 'education2015': {0: 13, 1: 14, 2: 15}, 'education2016': {0: 16, 1: 17, 2: 18}, 'income2015': {0: 1, 1: 2, 2: 3}, 'income2016': {0: 4, 1: 5, 2: 6}, 'name': {0: 'John', 1: 'Phillip', 2: 'Carl'}}






      python python-3.x pandas dataframe reshape






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Jan 19 at 20:49









      JimboJimbo

      32




      32
























          1 Answer
          1






          active

          oldest

          votes


















          0














          You can actually use pd.wide_to_long for just this. In the stubnames arg you could use a set of variable names (that excludes name and drop the last 4 characters) in your df using this code: set([x[:-4] for x in df.columns[1:]]).



          pd.wide_to_long(df,stubnames=set([x[:-4] for x in df.columns[1:]]),i=['name'],j='year').reset_index()


          Output:



              name    year    education   income  children
          0 John 2015 13 1 7
          1 Phillip 2015 14 2 8
          2 Carl 2015 15 3 9
          3 John 2016 16 4 10
          4 Phillip 2016 17 5 11
          5 Carl 2016 18 6 12





          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54271274%2fneed-to-reshape-my-dataframe-lots-of-column-names%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            You can actually use pd.wide_to_long for just this. In the stubnames arg you could use a set of variable names (that excludes name and drop the last 4 characters) in your df using this code: set([x[:-4] for x in df.columns[1:]]).



            pd.wide_to_long(df,stubnames=set([x[:-4] for x in df.columns[1:]]),i=['name'],j='year').reset_index()


            Output:



                name    year    education   income  children
            0 John 2015 13 1 7
            1 Phillip 2015 14 2 8
            2 Carl 2015 15 3 9
            3 John 2016 16 4 10
            4 Phillip 2016 17 5 11
            5 Carl 2016 18 6 12





            share|improve this answer




























              0














              You can actually use pd.wide_to_long for just this. In the stubnames arg you could use a set of variable names (that excludes name and drop the last 4 characters) in your df using this code: set([x[:-4] for x in df.columns[1:]]).



              pd.wide_to_long(df,stubnames=set([x[:-4] for x in df.columns[1:]]),i=['name'],j='year').reset_index()


              Output:



                  name    year    education   income  children
              0 John 2015 13 1 7
              1 Phillip 2015 14 2 8
              2 Carl 2015 15 3 9
              3 John 2016 16 4 10
              4 Phillip 2016 17 5 11
              5 Carl 2016 18 6 12





              share|improve this answer


























                0












                0








                0







                You can actually use pd.wide_to_long for just this. In the stubnames arg you could use a set of variable names (that excludes name and drop the last 4 characters) in your df using this code: set([x[:-4] for x in df.columns[1:]]).



                pd.wide_to_long(df,stubnames=set([x[:-4] for x in df.columns[1:]]),i=['name'],j='year').reset_index()


                Output:



                    name    year    education   income  children
                0 John 2015 13 1 7
                1 Phillip 2015 14 2 8
                2 Carl 2015 15 3 9
                3 John 2016 16 4 10
                4 Phillip 2016 17 5 11
                5 Carl 2016 18 6 12





                share|improve this answer













                You can actually use pd.wide_to_long for just this. In the stubnames arg you could use a set of variable names (that excludes name and drop the last 4 characters) in your df using this code: set([x[:-4] for x in df.columns[1:]]).



                pd.wide_to_long(df,stubnames=set([x[:-4] for x in df.columns[1:]]),i=['name'],j='year').reset_index()


                Output:



                    name    year    education   income  children
                0 John 2015 13 1 7
                1 Phillip 2015 14 2 8
                2 Carl 2015 15 3 9
                3 John 2016 16 4 10
                4 Phillip 2016 17 5 11
                5 Carl 2016 18 6 12






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Jan 19 at 20:52









                Joe PattenJoe Patten

                1,3801414




                1,3801414






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54271274%2fneed-to-reshape-my-dataframe-lots-of-column-names%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Liquibase includeAll doesn't find base path

                    How to use setInterval in EJS file?

                    Petrus Granier-Deferre