Joining Time Series Dataframes where duplicate columns contain the same values












1















I'm trying to combine multiple dataframes that contain time series data. These dataframes can have up to 100 columns and roughly 5000 rows. Two sample dataframes are



df1 = pd.DataFrame({'SubjectID': ['A', 'A', 'B', 'C'], 'Date': ['2010-05-08', '2010-05-10', '2010-05-08', '2010-05-08'], 'Test1':[1, 2, 3, 4], 'Gender': ['M', 'M', 'M', 'F'], 'StudyID': [1, 1, 1, 1]})

df2 = pd.DataFrame({'SubjectID': ['A', 'A', 'A', 'B', 'C'], 'Date': ['2010-05-08', '2010-05-09', '2010-05-10', '2010-05-08', '2010-05-09'], 'Test2': [1, 2, 3, 4, 5], 'Gender': ['M', 'M', 'M', 'M', 'F'], 'StudyID': [1, 1, 1, 1, 1]})

df1
SubjectID Date Test1 Gender StudyID
0 A 2010-05-08 1 M 1
1 A 2010-05-10 2 M 1
2 B 2010-05-08 3 M 1
3 C 2010-05-08 4 F 1

df2
SubjectID Date Test2 Gender StudyID
0 A 2010-05-08 1 M 1
1 A 2010-05-09 2 M 1
2 A 2010-05-10 3 M 1
3 B 2010-05-08 4 M 1
4 C 2010-05-09 5 F 1


My expected output is



SubjectID   Date    Test1   Gender  StudyID     Test2   
0 A 2010-05-08 1.0 M 1.0 1.0
1 A 2010-05-09 NaN M 1.0 2.0
2 A 2010-05-10 2.0 M 1.0 3.0
3 B 2010-05-08 3.0 M 1.0 4.0
4 C 2010-05-08 4.0 F 1.0 NaN
5 C 2010-05-09 NaN F 1.0 5.0


I'm joining the dataframes by



merged_df = df1.set_index(['SubjectID', 'Date']).join(df2.set_index(['SubjectID', 'Date']), how = 'outer', lsuffix = '_l', rsuffix = '_r').reset_index()


but my output is



  SubjectID     Date    Test1   Gender_l    StudyID_l   Test2   Gender_r    StudyID_r
0 A 2010-05-08 1.0 M 1.0 1.0 M 1.0
1 A 2010-05-09 NaN NaN NaN 2.0 M 1.0
2 A 2010-05-10 2.0 M 1.0 3.0 M 1.0
3 B 2010-05-08 3.0 M 1.0 4.0 M 1.0
4 C 2010-05-08 4.0 F 1.0 NaN NaN NaN
5 C 2010-05-09 NaN NaN NaN 5.0 F 1.0


Is there a way to combine columns while joining the dataframes if all the values in both dataframes are equal? I can do it after the merge, but that will get tedious for my large datasets.










share|improve this question



























    1















    I'm trying to combine multiple dataframes that contain time series data. These dataframes can have up to 100 columns and roughly 5000 rows. Two sample dataframes are



    df1 = pd.DataFrame({'SubjectID': ['A', 'A', 'B', 'C'], 'Date': ['2010-05-08', '2010-05-10', '2010-05-08', '2010-05-08'], 'Test1':[1, 2, 3, 4], 'Gender': ['M', 'M', 'M', 'F'], 'StudyID': [1, 1, 1, 1]})

    df2 = pd.DataFrame({'SubjectID': ['A', 'A', 'A', 'B', 'C'], 'Date': ['2010-05-08', '2010-05-09', '2010-05-10', '2010-05-08', '2010-05-09'], 'Test2': [1, 2, 3, 4, 5], 'Gender': ['M', 'M', 'M', 'M', 'F'], 'StudyID': [1, 1, 1, 1, 1]})

    df1
    SubjectID Date Test1 Gender StudyID
    0 A 2010-05-08 1 M 1
    1 A 2010-05-10 2 M 1
    2 B 2010-05-08 3 M 1
    3 C 2010-05-08 4 F 1

    df2
    SubjectID Date Test2 Gender StudyID
    0 A 2010-05-08 1 M 1
    1 A 2010-05-09 2 M 1
    2 A 2010-05-10 3 M 1
    3 B 2010-05-08 4 M 1
    4 C 2010-05-09 5 F 1


    My expected output is



    SubjectID   Date    Test1   Gender  StudyID     Test2   
    0 A 2010-05-08 1.0 M 1.0 1.0
    1 A 2010-05-09 NaN M 1.0 2.0
    2 A 2010-05-10 2.0 M 1.0 3.0
    3 B 2010-05-08 3.0 M 1.0 4.0
    4 C 2010-05-08 4.0 F 1.0 NaN
    5 C 2010-05-09 NaN F 1.0 5.0


    I'm joining the dataframes by



    merged_df = df1.set_index(['SubjectID', 'Date']).join(df2.set_index(['SubjectID', 'Date']), how = 'outer', lsuffix = '_l', rsuffix = '_r').reset_index()


    but my output is



      SubjectID     Date    Test1   Gender_l    StudyID_l   Test2   Gender_r    StudyID_r
    0 A 2010-05-08 1.0 M 1.0 1.0 M 1.0
    1 A 2010-05-09 NaN NaN NaN 2.0 M 1.0
    2 A 2010-05-10 2.0 M 1.0 3.0 M 1.0
    3 B 2010-05-08 3.0 M 1.0 4.0 M 1.0
    4 C 2010-05-08 4.0 F 1.0 NaN NaN NaN
    5 C 2010-05-09 NaN NaN NaN 5.0 F 1.0


    Is there a way to combine columns while joining the dataframes if all the values in both dataframes are equal? I can do it after the merge, but that will get tedious for my large datasets.










    share|improve this question

























      1












      1








      1








      I'm trying to combine multiple dataframes that contain time series data. These dataframes can have up to 100 columns and roughly 5000 rows. Two sample dataframes are



      df1 = pd.DataFrame({'SubjectID': ['A', 'A', 'B', 'C'], 'Date': ['2010-05-08', '2010-05-10', '2010-05-08', '2010-05-08'], 'Test1':[1, 2, 3, 4], 'Gender': ['M', 'M', 'M', 'F'], 'StudyID': [1, 1, 1, 1]})

      df2 = pd.DataFrame({'SubjectID': ['A', 'A', 'A', 'B', 'C'], 'Date': ['2010-05-08', '2010-05-09', '2010-05-10', '2010-05-08', '2010-05-09'], 'Test2': [1, 2, 3, 4, 5], 'Gender': ['M', 'M', 'M', 'M', 'F'], 'StudyID': [1, 1, 1, 1, 1]})

      df1
      SubjectID Date Test1 Gender StudyID
      0 A 2010-05-08 1 M 1
      1 A 2010-05-10 2 M 1
      2 B 2010-05-08 3 M 1
      3 C 2010-05-08 4 F 1

      df2
      SubjectID Date Test2 Gender StudyID
      0 A 2010-05-08 1 M 1
      1 A 2010-05-09 2 M 1
      2 A 2010-05-10 3 M 1
      3 B 2010-05-08 4 M 1
      4 C 2010-05-09 5 F 1


      My expected output is



      SubjectID   Date    Test1   Gender  StudyID     Test2   
      0 A 2010-05-08 1.0 M 1.0 1.0
      1 A 2010-05-09 NaN M 1.0 2.0
      2 A 2010-05-10 2.0 M 1.0 3.0
      3 B 2010-05-08 3.0 M 1.0 4.0
      4 C 2010-05-08 4.0 F 1.0 NaN
      5 C 2010-05-09 NaN F 1.0 5.0


      I'm joining the dataframes by



      merged_df = df1.set_index(['SubjectID', 'Date']).join(df2.set_index(['SubjectID', 'Date']), how = 'outer', lsuffix = '_l', rsuffix = '_r').reset_index()


      but my output is



        SubjectID     Date    Test1   Gender_l    StudyID_l   Test2   Gender_r    StudyID_r
      0 A 2010-05-08 1.0 M 1.0 1.0 M 1.0
      1 A 2010-05-09 NaN NaN NaN 2.0 M 1.0
      2 A 2010-05-10 2.0 M 1.0 3.0 M 1.0
      3 B 2010-05-08 3.0 M 1.0 4.0 M 1.0
      4 C 2010-05-08 4.0 F 1.0 NaN NaN NaN
      5 C 2010-05-09 NaN NaN NaN 5.0 F 1.0


      Is there a way to combine columns while joining the dataframes if all the values in both dataframes are equal? I can do it after the merge, but that will get tedious for my large datasets.










      share|improve this question














      I'm trying to combine multiple dataframes that contain time series data. These dataframes can have up to 100 columns and roughly 5000 rows. Two sample dataframes are



      df1 = pd.DataFrame({'SubjectID': ['A', 'A', 'B', 'C'], 'Date': ['2010-05-08', '2010-05-10', '2010-05-08', '2010-05-08'], 'Test1':[1, 2, 3, 4], 'Gender': ['M', 'M', 'M', 'F'], 'StudyID': [1, 1, 1, 1]})

      df2 = pd.DataFrame({'SubjectID': ['A', 'A', 'A', 'B', 'C'], 'Date': ['2010-05-08', '2010-05-09', '2010-05-10', '2010-05-08', '2010-05-09'], 'Test2': [1, 2, 3, 4, 5], 'Gender': ['M', 'M', 'M', 'M', 'F'], 'StudyID': [1, 1, 1, 1, 1]})

      df1
      SubjectID Date Test1 Gender StudyID
      0 A 2010-05-08 1 M 1
      1 A 2010-05-10 2 M 1
      2 B 2010-05-08 3 M 1
      3 C 2010-05-08 4 F 1

      df2
      SubjectID Date Test2 Gender StudyID
      0 A 2010-05-08 1 M 1
      1 A 2010-05-09 2 M 1
      2 A 2010-05-10 3 M 1
      3 B 2010-05-08 4 M 1
      4 C 2010-05-09 5 F 1


      My expected output is



      SubjectID   Date    Test1   Gender  StudyID     Test2   
      0 A 2010-05-08 1.0 M 1.0 1.0
      1 A 2010-05-09 NaN M 1.0 2.0
      2 A 2010-05-10 2.0 M 1.0 3.0
      3 B 2010-05-08 3.0 M 1.0 4.0
      4 C 2010-05-08 4.0 F 1.0 NaN
      5 C 2010-05-09 NaN F 1.0 5.0


      I'm joining the dataframes by



      merged_df = df1.set_index(['SubjectID', 'Date']).join(df2.set_index(['SubjectID', 'Date']), how = 'outer', lsuffix = '_l', rsuffix = '_r').reset_index()


      but my output is



        SubjectID     Date    Test1   Gender_l    StudyID_l   Test2   Gender_r    StudyID_r
      0 A 2010-05-08 1.0 M 1.0 1.0 M 1.0
      1 A 2010-05-09 NaN NaN NaN 2.0 M 1.0
      2 A 2010-05-10 2.0 M 1.0 3.0 M 1.0
      3 B 2010-05-08 3.0 M 1.0 4.0 M 1.0
      4 C 2010-05-08 4.0 F 1.0 NaN NaN NaN
      5 C 2010-05-09 NaN NaN NaN 5.0 F 1.0


      Is there a way to combine columns while joining the dataframes if all the values in both dataframes are equal? I can do it after the merge, but that will get tedious for my large datasets.







      python-3.x pandas join time-series






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Jan 18 at 16:14









      m42op64m42op64

      207




      207
























          1 Answer
          1






          active

          oldest

          votes


















          0














          It depends how you want to implement the logic of resolving information that may not exactly match. Had you merged several frames, I think taking the modal value is appropriate. Taking your merged_df we can resolve it as:



          merged_df = merged_df.groupby([x.split('_')[0] for x in merged_df.columns], 1).apply(lambda x: x.mode(1)[0])

          Date Gender StudyID SubjectID Test1 Test2
          0 2010-05-08 M 1.0 A 1.0 1.0
          1 2010-05-09 M 1.0 A NaN 2.0
          2 2010-05-10 M 1.0 A 2.0 3.0
          3 2010-05-08 M 1.0 B 3.0 4.0
          4 2010-05-08 F 1.0 C 4.0 NaN
          5 2010-05-09 F 1.0 C NaN 5.0




          Or perhaps, you want to give priority to the non-null value value in the first frame, then this is .combine_first.



          df1.set_index(['SubjectID', 'Date']).combine_first(df2.set_index(['SubjectID', 'Date']))

          Gender StudyID Test1 Test2
          SubjectID Date
          A 2010-05-08 M 1.0 1.0 1.0
          2010-05-09 M 1.0 NaN 2.0
          2010-05-10 M 1.0 2.0 3.0
          B 2010-05-08 M 1.0 3.0 4.0
          C 2010-05-08 F 1.0 4.0 NaN
          2010-05-09 F 1.0 NaN 5.0




          If you have to merge many DataFrames it may be best to use reduce from functools.



          from functools import reduce

          merged_df = reduce(lambda l,r: l.merge(r, on=['SubjectID', 'Date'], how='outer', suffixes=['_l', '_r']),
          [df1, df2 ,df1, df2, df2])


          You'll have lots of overlapping columns, but still can resolve them:



          merged_df.groupby([x.split('_')[0] for x in merged_df.columns], 1).apply(lambda x: x.mode(1)[0])

          Date Gender StudyID SubjectID Test1 Test2
          0 2010-05-08 M 1.0 A 1.0 1.0
          1 2010-05-10 M 1.0 A 2.0 3.0
          2 2010-05-08 M 1.0 B 3.0 4.0
          3 2010-05-08 F 1.0 C 4.0 NaN
          4 2010-05-09 M 1.0 A NaN 2.0
          5 2010-05-09 F 1.0 C NaN 5.0





          share|improve this answer


























          • Thanks, both of these are good solutions. However, when I try to merge an odd number of dataframes, the modal solution returns an empty DataFrame, but produces the expected result when I merge an even number of dataframes. Any idea why this is the case?

            – m42op64
            Jan 18 at 17:24











          • @m42op64 see my update. I'm not entirely sure what is going wrong, perhaps the names are being messed up with multiple merges. Does my solution with reduce work for you? In my example, I specified 5 DataFrames to merge, and the resulting mode still works.

            – ALollz
            Jan 18 at 18:31











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54257694%2fjoining-time-series-dataframes-where-duplicate-columns-contain-the-same-values%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0














          It depends how you want to implement the logic of resolving information that may not exactly match. Had you merged several frames, I think taking the modal value is appropriate. Taking your merged_df we can resolve it as:



          merged_df = merged_df.groupby([x.split('_')[0] for x in merged_df.columns], 1).apply(lambda x: x.mode(1)[0])

          Date Gender StudyID SubjectID Test1 Test2
          0 2010-05-08 M 1.0 A 1.0 1.0
          1 2010-05-09 M 1.0 A NaN 2.0
          2 2010-05-10 M 1.0 A 2.0 3.0
          3 2010-05-08 M 1.0 B 3.0 4.0
          4 2010-05-08 F 1.0 C 4.0 NaN
          5 2010-05-09 F 1.0 C NaN 5.0




          Or perhaps, you want to give priority to the non-null value value in the first frame, then this is .combine_first.



          df1.set_index(['SubjectID', 'Date']).combine_first(df2.set_index(['SubjectID', 'Date']))

          Gender StudyID Test1 Test2
          SubjectID Date
          A 2010-05-08 M 1.0 1.0 1.0
          2010-05-09 M 1.0 NaN 2.0
          2010-05-10 M 1.0 2.0 3.0
          B 2010-05-08 M 1.0 3.0 4.0
          C 2010-05-08 F 1.0 4.0 NaN
          2010-05-09 F 1.0 NaN 5.0




          If you have to merge many DataFrames it may be best to use reduce from functools.



          from functools import reduce

          merged_df = reduce(lambda l,r: l.merge(r, on=['SubjectID', 'Date'], how='outer', suffixes=['_l', '_r']),
          [df1, df2 ,df1, df2, df2])


          You'll have lots of overlapping columns, but still can resolve them:



          merged_df.groupby([x.split('_')[0] for x in merged_df.columns], 1).apply(lambda x: x.mode(1)[0])

          Date Gender StudyID SubjectID Test1 Test2
          0 2010-05-08 M 1.0 A 1.0 1.0
          1 2010-05-10 M 1.0 A 2.0 3.0
          2 2010-05-08 M 1.0 B 3.0 4.0
          3 2010-05-08 F 1.0 C 4.0 NaN
          4 2010-05-09 M 1.0 A NaN 2.0
          5 2010-05-09 F 1.0 C NaN 5.0





          share|improve this answer


























          • Thanks, both of these are good solutions. However, when I try to merge an odd number of dataframes, the modal solution returns an empty DataFrame, but produces the expected result when I merge an even number of dataframes. Any idea why this is the case?

            – m42op64
            Jan 18 at 17:24











          • @m42op64 see my update. I'm not entirely sure what is going wrong, perhaps the names are being messed up with multiple merges. Does my solution with reduce work for you? In my example, I specified 5 DataFrames to merge, and the resulting mode still works.

            – ALollz
            Jan 18 at 18:31
















          0














          It depends how you want to implement the logic of resolving information that may not exactly match. Had you merged several frames, I think taking the modal value is appropriate. Taking your merged_df we can resolve it as:



          merged_df = merged_df.groupby([x.split('_')[0] for x in merged_df.columns], 1).apply(lambda x: x.mode(1)[0])

          Date Gender StudyID SubjectID Test1 Test2
          0 2010-05-08 M 1.0 A 1.0 1.0
          1 2010-05-09 M 1.0 A NaN 2.0
          2 2010-05-10 M 1.0 A 2.0 3.0
          3 2010-05-08 M 1.0 B 3.0 4.0
          4 2010-05-08 F 1.0 C 4.0 NaN
          5 2010-05-09 F 1.0 C NaN 5.0




          Or perhaps, you want to give priority to the non-null value value in the first frame, then this is .combine_first.



          df1.set_index(['SubjectID', 'Date']).combine_first(df2.set_index(['SubjectID', 'Date']))

          Gender StudyID Test1 Test2
          SubjectID Date
          A 2010-05-08 M 1.0 1.0 1.0
          2010-05-09 M 1.0 NaN 2.0
          2010-05-10 M 1.0 2.0 3.0
          B 2010-05-08 M 1.0 3.0 4.0
          C 2010-05-08 F 1.0 4.0 NaN
          2010-05-09 F 1.0 NaN 5.0




          If you have to merge many DataFrames it may be best to use reduce from functools.



          from functools import reduce

          merged_df = reduce(lambda l,r: l.merge(r, on=['SubjectID', 'Date'], how='outer', suffixes=['_l', '_r']),
          [df1, df2 ,df1, df2, df2])


          You'll have lots of overlapping columns, but still can resolve them:



          merged_df.groupby([x.split('_')[0] for x in merged_df.columns], 1).apply(lambda x: x.mode(1)[0])

          Date Gender StudyID SubjectID Test1 Test2
          0 2010-05-08 M 1.0 A 1.0 1.0
          1 2010-05-10 M 1.0 A 2.0 3.0
          2 2010-05-08 M 1.0 B 3.0 4.0
          3 2010-05-08 F 1.0 C 4.0 NaN
          4 2010-05-09 M 1.0 A NaN 2.0
          5 2010-05-09 F 1.0 C NaN 5.0





          share|improve this answer


























          • Thanks, both of these are good solutions. However, when I try to merge an odd number of dataframes, the modal solution returns an empty DataFrame, but produces the expected result when I merge an even number of dataframes. Any idea why this is the case?

            – m42op64
            Jan 18 at 17:24











          • @m42op64 see my update. I'm not entirely sure what is going wrong, perhaps the names are being messed up with multiple merges. Does my solution with reduce work for you? In my example, I specified 5 DataFrames to merge, and the resulting mode still works.

            – ALollz
            Jan 18 at 18:31














          0












          0








          0







          It depends how you want to implement the logic of resolving information that may not exactly match. Had you merged several frames, I think taking the modal value is appropriate. Taking your merged_df we can resolve it as:



          merged_df = merged_df.groupby([x.split('_')[0] for x in merged_df.columns], 1).apply(lambda x: x.mode(1)[0])

          Date Gender StudyID SubjectID Test1 Test2
          0 2010-05-08 M 1.0 A 1.0 1.0
          1 2010-05-09 M 1.0 A NaN 2.0
          2 2010-05-10 M 1.0 A 2.0 3.0
          3 2010-05-08 M 1.0 B 3.0 4.0
          4 2010-05-08 F 1.0 C 4.0 NaN
          5 2010-05-09 F 1.0 C NaN 5.0




          Or perhaps, you want to give priority to the non-null value value in the first frame, then this is .combine_first.



          df1.set_index(['SubjectID', 'Date']).combine_first(df2.set_index(['SubjectID', 'Date']))

          Gender StudyID Test1 Test2
          SubjectID Date
          A 2010-05-08 M 1.0 1.0 1.0
          2010-05-09 M 1.0 NaN 2.0
          2010-05-10 M 1.0 2.0 3.0
          B 2010-05-08 M 1.0 3.0 4.0
          C 2010-05-08 F 1.0 4.0 NaN
          2010-05-09 F 1.0 NaN 5.0




          If you have to merge many DataFrames it may be best to use reduce from functools.



          from functools import reduce

          merged_df = reduce(lambda l,r: l.merge(r, on=['SubjectID', 'Date'], how='outer', suffixes=['_l', '_r']),
          [df1, df2 ,df1, df2, df2])


          You'll have lots of overlapping columns, but still can resolve them:



          merged_df.groupby([x.split('_')[0] for x in merged_df.columns], 1).apply(lambda x: x.mode(1)[0])

          Date Gender StudyID SubjectID Test1 Test2
          0 2010-05-08 M 1.0 A 1.0 1.0
          1 2010-05-10 M 1.0 A 2.0 3.0
          2 2010-05-08 M 1.0 B 3.0 4.0
          3 2010-05-08 F 1.0 C 4.0 NaN
          4 2010-05-09 M 1.0 A NaN 2.0
          5 2010-05-09 F 1.0 C NaN 5.0





          share|improve this answer















          It depends how you want to implement the logic of resolving information that may not exactly match. Had you merged several frames, I think taking the modal value is appropriate. Taking your merged_df we can resolve it as:



          merged_df = merged_df.groupby([x.split('_')[0] for x in merged_df.columns], 1).apply(lambda x: x.mode(1)[0])

          Date Gender StudyID SubjectID Test1 Test2
          0 2010-05-08 M 1.0 A 1.0 1.0
          1 2010-05-09 M 1.0 A NaN 2.0
          2 2010-05-10 M 1.0 A 2.0 3.0
          3 2010-05-08 M 1.0 B 3.0 4.0
          4 2010-05-08 F 1.0 C 4.0 NaN
          5 2010-05-09 F 1.0 C NaN 5.0




          Or perhaps, you want to give priority to the non-null value value in the first frame, then this is .combine_first.



          df1.set_index(['SubjectID', 'Date']).combine_first(df2.set_index(['SubjectID', 'Date']))

          Gender StudyID Test1 Test2
          SubjectID Date
          A 2010-05-08 M 1.0 1.0 1.0
          2010-05-09 M 1.0 NaN 2.0
          2010-05-10 M 1.0 2.0 3.0
          B 2010-05-08 M 1.0 3.0 4.0
          C 2010-05-08 F 1.0 4.0 NaN
          2010-05-09 F 1.0 NaN 5.0




          If you have to merge many DataFrames it may be best to use reduce from functools.



          from functools import reduce

          merged_df = reduce(lambda l,r: l.merge(r, on=['SubjectID', 'Date'], how='outer', suffixes=['_l', '_r']),
          [df1, df2 ,df1, df2, df2])


          You'll have lots of overlapping columns, but still can resolve them:



          merged_df.groupby([x.split('_')[0] for x in merged_df.columns], 1).apply(lambda x: x.mode(1)[0])

          Date Gender StudyID SubjectID Test1 Test2
          0 2010-05-08 M 1.0 A 1.0 1.0
          1 2010-05-10 M 1.0 A 2.0 3.0
          2 2010-05-08 M 1.0 B 3.0 4.0
          3 2010-05-08 F 1.0 C 4.0 NaN
          4 2010-05-09 M 1.0 A NaN 2.0
          5 2010-05-09 F 1.0 C NaN 5.0






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Jan 18 at 18:30

























          answered Jan 18 at 16:35









          ALollzALollz

          11.9k31536




          11.9k31536













          • Thanks, both of these are good solutions. However, when I try to merge an odd number of dataframes, the modal solution returns an empty DataFrame, but produces the expected result when I merge an even number of dataframes. Any idea why this is the case?

            – m42op64
            Jan 18 at 17:24











          • @m42op64 see my update. I'm not entirely sure what is going wrong, perhaps the names are being messed up with multiple merges. Does my solution with reduce work for you? In my example, I specified 5 DataFrames to merge, and the resulting mode still works.

            – ALollz
            Jan 18 at 18:31



















          • Thanks, both of these are good solutions. However, when I try to merge an odd number of dataframes, the modal solution returns an empty DataFrame, but produces the expected result when I merge an even number of dataframes. Any idea why this is the case?

            – m42op64
            Jan 18 at 17:24











          • @m42op64 see my update. I'm not entirely sure what is going wrong, perhaps the names are being messed up with multiple merges. Does my solution with reduce work for you? In my example, I specified 5 DataFrames to merge, and the resulting mode still works.

            – ALollz
            Jan 18 at 18:31

















          Thanks, both of these are good solutions. However, when I try to merge an odd number of dataframes, the modal solution returns an empty DataFrame, but produces the expected result when I merge an even number of dataframes. Any idea why this is the case?

          – m42op64
          Jan 18 at 17:24





          Thanks, both of these are good solutions. However, when I try to merge an odd number of dataframes, the modal solution returns an empty DataFrame, but produces the expected result when I merge an even number of dataframes. Any idea why this is the case?

          – m42op64
          Jan 18 at 17:24













          @m42op64 see my update. I'm not entirely sure what is going wrong, perhaps the names are being messed up with multiple merges. Does my solution with reduce work for you? In my example, I specified 5 DataFrames to merge, and the resulting mode still works.

          – ALollz
          Jan 18 at 18:31





          @m42op64 see my update. I'm not entirely sure what is going wrong, perhaps the names are being messed up with multiple merges. Does my solution with reduce work for you? In my example, I specified 5 DataFrames to merge, and the resulting mode still works.

          – ALollz
          Jan 18 at 18:31


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54257694%2fjoining-time-series-dataframes-where-duplicate-columns-contain-the-same-values%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Liquibase includeAll doesn't find base path

          How to use setInterval in EJS file?

          Petrus Granier-Deferre