Joining Time Series Dataframes where duplicate columns contain the same values

I'm trying to combine multiple dataframes that contain time series data. These dataframes can have up to 100 columns and roughly 5000 rows. Two sample dataframes are

df1 = pd.DataFrame({'SubjectID': ['A', 'A', 'B', 'C'], 'Date': ['2010-05-08', '2010-05-10', '2010-05-08', '2010-05-08'], 'Test1':[1, 2, 3, 4], 'Gender': ['M', 'M', 'M', 'F'], 'StudyID': [1, 1, 1, 1]})



df2 = pd.DataFrame({'SubjectID': ['A', 'A', 'A', 'B', 'C'], 'Date': ['2010-05-08', '2010-05-09', '2010-05-10', '2010-05-08', '2010-05-09'], 'Test2': [1, 2, 3, 4, 5], 'Gender': ['M', 'M', 'M', 'M', 'F'], 'StudyID': [1, 1, 1, 1, 1]})



df1

    SubjectID   Date    Test1   Gender  StudyID

0         A     2010-05-08  1   M   1

1         A     2010-05-10  2   M   1

2         B     2010-05-08  3   M   1

3         C     2010-05-08  4   F   1



df2

    SubjectID   Date    Test2   Gender  StudyID

0   A   2010-05-08  1   M   1

1   A   2010-05-09  2   M   1

2   A   2010-05-10  3   M   1

3   B   2010-05-08  4   M   1

4   C   2010-05-09  5   F   1

My expected output is

SubjectID   Date    Test1   Gender  StudyID     Test2   

0   A   2010-05-08  1.0     M   1.0     1.0     

1   A   2010-05-09  NaN     M   1.0     2.0     

2   A   2010-05-10  2.0     M   1.0     3.0     

3   B   2010-05-08  3.0     M   1.0     4.0     

4   C   2010-05-08  4.0     F   1.0     NaN     

5   C   2010-05-09  NaN     F   1.0     5.0

I'm joining the dataframes by

merged_df = df1.set_index(['SubjectID', 'Date']).join(df2.set_index(['SubjectID', 'Date']), how = 'outer', lsuffix = '_l', rsuffix = '_r').reset_index()

but my output is

  SubjectID     Date    Test1   Gender_l    StudyID_l   Test2   Gender_r    StudyID_r

0         A     2010-05-08  1.0     M   1.0     1.0     M   1.0

1         A     2010-05-09  NaN    NaN  NaN     2.0     M   1.0

2         A     2010-05-10  2.0     M   1.0     3.0     M   1.0

3         B     2010-05-08  3.0     M   1.0     4.0     M   1.0

4         C     2010-05-08  4.0     F   1.0     NaN     NaN NaN

5         C     2010-05-09  NaN    NaN  NaN     5.0     F   1.0

Is there a way to combine columns while joining the dataframes if all the values in both dataframes are equal? I can do it after the merge, but that will get tedious for my large datasets.

asked Jan 18 at 16:14

m42op64

207

add a comment |

I'm trying to combine multiple dataframes that contain time series data. These dataframes can have up to 100 columns and roughly 5000 rows. Two sample dataframes are

df1 = pd.DataFrame({'SubjectID': ['A', 'A', 'B', 'C'], 'Date': ['2010-05-08', '2010-05-10', '2010-05-08', '2010-05-08'], 'Test1':[1, 2, 3, 4], 'Gender': ['M', 'M', 'M', 'F'], 'StudyID': [1, 1, 1, 1]})



df2 = pd.DataFrame({'SubjectID': ['A', 'A', 'A', 'B', 'C'], 'Date': ['2010-05-08', '2010-05-09', '2010-05-10', '2010-05-08', '2010-05-09'], 'Test2': [1, 2, 3, 4, 5], 'Gender': ['M', 'M', 'M', 'M', 'F'], 'StudyID': [1, 1, 1, 1, 1]})



df1

    SubjectID   Date    Test1   Gender  StudyID

0         A     2010-05-08  1   M   1

1         A     2010-05-10  2   M   1

2         B     2010-05-08  3   M   1

3         C     2010-05-08  4   F   1



df2

    SubjectID   Date    Test2   Gender  StudyID

0   A   2010-05-08  1   M   1

1   A   2010-05-09  2   M   1

2   A   2010-05-10  3   M   1

3   B   2010-05-08  4   M   1

4   C   2010-05-09  5   F   1

My expected output is

SubjectID   Date    Test1   Gender  StudyID     Test2   

0   A   2010-05-08  1.0     M   1.0     1.0     

1   A   2010-05-09  NaN     M   1.0     2.0     

2   A   2010-05-10  2.0     M   1.0     3.0     

3   B   2010-05-08  3.0     M   1.0     4.0     

4   C   2010-05-08  4.0     F   1.0     NaN     

5   C   2010-05-09  NaN     F   1.0     5.0

I'm joining the dataframes by

merged_df = df1.set_index(['SubjectID', 'Date']).join(df2.set_index(['SubjectID', 'Date']), how = 'outer', lsuffix = '_l', rsuffix = '_r').reset_index()

but my output is

  SubjectID     Date    Test1   Gender_l    StudyID_l   Test2   Gender_r    StudyID_r

0         A     2010-05-08  1.0     M   1.0     1.0     M   1.0

1         A     2010-05-09  NaN    NaN  NaN     2.0     M   1.0

2         A     2010-05-10  2.0     M   1.0     3.0     M   1.0

3         B     2010-05-08  3.0     M   1.0     4.0     M   1.0

4         C     2010-05-08  4.0     F   1.0     NaN     NaN NaN

5         C     2010-05-09  NaN    NaN  NaN     5.0     F   1.0

Is there a way to combine columns while joining the dataframes if all the values in both dataframes are equal? I can do it after the merge, but that will get tedious for my large datasets.

asked Jan 18 at 16:14

m42op64

207

add a comment |

I'm trying to combine multiple dataframes that contain time series data. These dataframes can have up to 100 columns and roughly 5000 rows. Two sample dataframes are

df1 = pd.DataFrame({'SubjectID': ['A', 'A', 'B', 'C'], 'Date': ['2010-05-08', '2010-05-10', '2010-05-08', '2010-05-08'], 'Test1':[1, 2, 3, 4], 'Gender': ['M', 'M', 'M', 'F'], 'StudyID': [1, 1, 1, 1]})



df2 = pd.DataFrame({'SubjectID': ['A', 'A', 'A', 'B', 'C'], 'Date': ['2010-05-08', '2010-05-09', '2010-05-10', '2010-05-08', '2010-05-09'], 'Test2': [1, 2, 3, 4, 5], 'Gender': ['M', 'M', 'M', 'M', 'F'], 'StudyID': [1, 1, 1, 1, 1]})



df1

    SubjectID   Date    Test1   Gender  StudyID

0         A     2010-05-08  1   M   1

1         A     2010-05-10  2   M   1

2         B     2010-05-08  3   M   1

3         C     2010-05-08  4   F   1



df2

    SubjectID   Date    Test2   Gender  StudyID

0   A   2010-05-08  1   M   1

1   A   2010-05-09  2   M   1

2   A   2010-05-10  3   M   1

3   B   2010-05-08  4   M   1

4   C   2010-05-09  5   F   1

My expected output is

SubjectID   Date    Test1   Gender  StudyID     Test2   

0   A   2010-05-08  1.0     M   1.0     1.0     

1   A   2010-05-09  NaN     M   1.0     2.0     

2   A   2010-05-10  2.0     M   1.0     3.0     

3   B   2010-05-08  3.0     M   1.0     4.0     

4   C   2010-05-08  4.0     F   1.0     NaN     

5   C   2010-05-09  NaN     F   1.0     5.0

I'm joining the dataframes by

merged_df = df1.set_index(['SubjectID', 'Date']).join(df2.set_index(['SubjectID', 'Date']), how = 'outer', lsuffix = '_l', rsuffix = '_r').reset_index()

but my output is

  SubjectID     Date    Test1   Gender_l    StudyID_l   Test2   Gender_r    StudyID_r

0         A     2010-05-08  1.0     M   1.0     1.0     M   1.0

1         A     2010-05-09  NaN    NaN  NaN     2.0     M   1.0

2         A     2010-05-10  2.0     M   1.0     3.0     M   1.0

3         B     2010-05-08  3.0     M   1.0     4.0     M   1.0

4         C     2010-05-08  4.0     F   1.0     NaN     NaN NaN

5         C     2010-05-09  NaN    NaN  NaN     5.0     F   1.0

Is there a way to combine columns while joining the dataframes if all the values in both dataframes are equal? I can do it after the merge, but that will get tedious for my large datasets.

asked Jan 18 at 16:14

m42op64

207

I'm trying to combine multiple dataframes that contain time series data. These dataframes can have up to 100 columns and roughly 5000 rows. Two sample dataframes are

df1 = pd.DataFrame({'SubjectID': ['A', 'A', 'B', 'C'], 'Date': ['2010-05-08', '2010-05-10', '2010-05-08', '2010-05-08'], 'Test1':[1, 2, 3, 4], 'Gender': ['M', 'M', 'M', 'F'], 'StudyID': [1, 1, 1, 1]})



df2 = pd.DataFrame({'SubjectID': ['A', 'A', 'A', 'B', 'C'], 'Date': ['2010-05-08', '2010-05-09', '2010-05-10', '2010-05-08', '2010-05-09'], 'Test2': [1, 2, 3, 4, 5], 'Gender': ['M', 'M', 'M', 'M', 'F'], 'StudyID': [1, 1, 1, 1, 1]})



df1

    SubjectID   Date    Test1   Gender  StudyID

0         A     2010-05-08  1   M   1

1         A     2010-05-10  2   M   1

2         B     2010-05-08  3   M   1

3         C     2010-05-08  4   F   1



df2

    SubjectID   Date    Test2   Gender  StudyID

0   A   2010-05-08  1   M   1

1   A   2010-05-09  2   M   1

2   A   2010-05-10  3   M   1

3   B   2010-05-08  4   M   1

4   C   2010-05-09  5   F   1

My expected output is

SubjectID   Date    Test1   Gender  StudyID     Test2   

0   A   2010-05-08  1.0     M   1.0     1.0     

1   A   2010-05-09  NaN     M   1.0     2.0     

2   A   2010-05-10  2.0     M   1.0     3.0     

3   B   2010-05-08  3.0     M   1.0     4.0     

4   C   2010-05-08  4.0     F   1.0     NaN     

5   C   2010-05-09  NaN     F   1.0     5.0

I'm joining the dataframes by

merged_df = df1.set_index(['SubjectID', 'Date']).join(df2.set_index(['SubjectID', 'Date']), how = 'outer', lsuffix = '_l', rsuffix = '_r').reset_index()

but my output is

  SubjectID     Date    Test1   Gender_l    StudyID_l   Test2   Gender_r    StudyID_r

0         A     2010-05-08  1.0     M   1.0     1.0     M   1.0

1         A     2010-05-09  NaN    NaN  NaN     2.0     M   1.0

2         A     2010-05-10  2.0     M   1.0     3.0     M   1.0

3         B     2010-05-08  3.0     M   1.0     4.0     M   1.0

4         C     2010-05-08  4.0     F   1.0     NaN     NaN NaN

5         C     2010-05-09  NaN    NaN  NaN     5.0     F   1.0

Is there a way to combine columns while joining the dataframes if all the values in both dataframes are equal? I can do it after the merge, but that will get tedious for my large datasets.

python-3.x pandas join time-series

asked Jan 18 at 16:14

m42op64

207

asked Jan 18 at 16:14

m42op64

207

asked Jan 18 at 16:14

m42op64

207

asked Jan 18 at 16:14

m42op64

207

asked Jan 18 at 16:14

m42op64

207

add a comment |

1 Answer
1

active

oldest

votes

It depends how you want to implement the logic of resolving information that may not exactly match. Had you merged several frames, I think taking the modal value is appropriate. Taking your merged_df we can resolve it as:

merged_df = merged_df.groupby([x.split('_')[0] for x in merged_df.columns], 1).apply(lambda x: x.mode(1)[0])



         Date Gender  StudyID SubjectID  Test1  Test2

0  2010-05-08      M      1.0         A    1.0    1.0

1  2010-05-09      M      1.0         A    NaN    2.0

2  2010-05-10      M      1.0         A    2.0    3.0

3  2010-05-08      M      1.0         B    3.0    4.0

4  2010-05-08      F      1.0         C    4.0    NaN

5  2010-05-09      F      1.0         C    NaN    5.0

Or perhaps, you want to give priority to the non-null value value in the first frame, then this is .combine_first.

df1.set_index(['SubjectID', 'Date']).combine_first(df2.set_index(['SubjectID', 'Date']))



                     Gender  StudyID  Test1  Test2

SubjectID Date                                    

A         2010-05-08      M      1.0    1.0    1.0

          2010-05-09      M      1.0    NaN    2.0

          2010-05-10      M      1.0    2.0    3.0

B         2010-05-08      M      1.0    3.0    4.0

C         2010-05-08      F      1.0    4.0    NaN

          2010-05-09      F      1.0    NaN    5.0

If you have to merge many DataFrames it may be best to use reduce from functools.

from functools import reduce



merged_df = reduce(lambda l,r: l.merge(r, on=['SubjectID', 'Date'], how='outer', suffixes=['_l', '_r']), 

                   [df1, df2 ,df1, df2, df2])

You'll have lots of overlapping columns, but still can resolve them:

merged_df.groupby([x.split('_')[0] for x in merged_df.columns], 1).apply(lambda x: x.mode(1)[0])



         Date Gender  StudyID SubjectID  Test1  Test2

0  2010-05-08      M      1.0         A    1.0    1.0

1  2010-05-10      M      1.0         A    2.0    3.0

2  2010-05-08      M      1.0         B    3.0    4.0

3  2010-05-08      F      1.0         C    4.0    NaN

4  2010-05-09      M      1.0         A    NaN    2.0

5  2010-05-09      F      1.0         C    NaN    5.0

edited Jan 18 at 18:30

answered Jan 18 at 16:35

ALollz

11.9k31536

Thanks, both of these are good solutions. However, when I try to merge an odd number of dataframes, the modal solution returns an empty DataFrame, but produces the expected result when I merge an even number of dataframes. Any idea why this is the case?

– m42op64
Jan 18 at 17:24

@m42op64 see my update. I'm not entirely sure what is going wrong, perhaps the names are being messed up with multiple merges. Does my solution with reduce work for you? In my example, I specified 5 DataFrames to merge, and the resulting mode still works.

– ALollz
Jan 18 at 18:31

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54257694%2fjoining-time-series-dataframes-where-duplicate-columns-contain-the-same-values%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

merged_df = merged_df.groupby([x.split('_')[0] for x in merged_df.columns], 1).apply(lambda x: x.mode(1)[0])



         Date Gender  StudyID SubjectID  Test1  Test2

0  2010-05-08      M      1.0         A    1.0    1.0

1  2010-05-09      M      1.0         A    NaN    2.0

2  2010-05-10      M      1.0         A    2.0    3.0

3  2010-05-08      M      1.0         B    3.0    4.0

4  2010-05-08      F      1.0         C    4.0    NaN

5  2010-05-09      F      1.0         C    NaN    5.0

Or perhaps, you want to give priority to the non-null value value in the first frame, then this is .combine_first.

df1.set_index(['SubjectID', 'Date']).combine_first(df2.set_index(['SubjectID', 'Date']))



                     Gender  StudyID  Test1  Test2

SubjectID Date                                    

A         2010-05-08      M      1.0    1.0    1.0

          2010-05-09      M      1.0    NaN    2.0

          2010-05-10      M      1.0    2.0    3.0

B         2010-05-08      M      1.0    3.0    4.0

C         2010-05-08      F      1.0    4.0    NaN

          2010-05-09      F      1.0    NaN    5.0

If you have to merge many DataFrames it may be best to use reduce from functools.

from functools import reduce



merged_df = reduce(lambda l,r: l.merge(r, on=['SubjectID', 'Date'], how='outer', suffixes=['_l', '_r']), 

                   [df1, df2 ,df1, df2, df2])

You'll have lots of overlapping columns, but still can resolve them:

merged_df.groupby([x.split('_')[0] for x in merged_df.columns], 1).apply(lambda x: x.mode(1)[0])



         Date Gender  StudyID SubjectID  Test1  Test2

0  2010-05-08      M      1.0         A    1.0    1.0

1  2010-05-10      M      1.0         A    2.0    3.0

2  2010-05-08      M      1.0         B    3.0    4.0

3  2010-05-08      F      1.0         C    4.0    NaN

4  2010-05-09      M      1.0         A    NaN    2.0

5  2010-05-09      F      1.0         C    NaN    5.0

edited Jan 18 at 18:30

answered Jan 18 at 16:35

ALollz

11.9k31536

Thanks, both of these are good solutions. However, when I try to merge an odd number of dataframes, the modal solution returns an empty DataFrame, but produces the expected result when I merge an even number of dataframes. Any idea why this is the case?

– m42op64
Jan 18 at 17:24

@m42op64 see my update. I'm not entirely sure what is going wrong, perhaps the names are being messed up with multiple merges. Does my solution with reduce work for you? In my example, I specified 5 DataFrames to merge, and the resulting mode still works.

– ALollz
Jan 18 at 18:31

add a comment |

merged_df = merged_df.groupby([x.split('_')[0] for x in merged_df.columns], 1).apply(lambda x: x.mode(1)[0])



         Date Gender  StudyID SubjectID  Test1  Test2

0  2010-05-08      M      1.0         A    1.0    1.0

1  2010-05-09      M      1.0         A    NaN    2.0

2  2010-05-10      M      1.0         A    2.0    3.0

3  2010-05-08      M      1.0         B    3.0    4.0

4  2010-05-08      F      1.0         C    4.0    NaN

5  2010-05-09      F      1.0         C    NaN    5.0

Or perhaps, you want to give priority to the non-null value value in the first frame, then this is .combine_first.

df1.set_index(['SubjectID', 'Date']).combine_first(df2.set_index(['SubjectID', 'Date']))



                     Gender  StudyID  Test1  Test2

SubjectID Date                                    

A         2010-05-08      M      1.0    1.0    1.0

          2010-05-09      M      1.0    NaN    2.0

          2010-05-10      M      1.0    2.0    3.0

B         2010-05-08      M      1.0    3.0    4.0

C         2010-05-08      F      1.0    4.0    NaN

          2010-05-09      F      1.0    NaN    5.0

If you have to merge many DataFrames it may be best to use reduce from functools.

from functools import reduce



merged_df = reduce(lambda l,r: l.merge(r, on=['SubjectID', 'Date'], how='outer', suffixes=['_l', '_r']), 

                   [df1, df2 ,df1, df2, df2])

You'll have lots of overlapping columns, but still can resolve them:

merged_df.groupby([x.split('_')[0] for x in merged_df.columns], 1).apply(lambda x: x.mode(1)[0])



         Date Gender  StudyID SubjectID  Test1  Test2

0  2010-05-08      M      1.0         A    1.0    1.0

1  2010-05-10      M      1.0         A    2.0    3.0

2  2010-05-08      M      1.0         B    3.0    4.0

3  2010-05-08      F      1.0         C    4.0    NaN

4  2010-05-09      M      1.0         A    NaN    2.0

5  2010-05-09      F      1.0         C    NaN    5.0

edited Jan 18 at 18:30

answered Jan 18 at 16:35

ALollz

11.9k31536

Thanks, both of these are good solutions. However, when I try to merge an odd number of dataframes, the modal solution returns an empty DataFrame, but produces the expected result when I merge an even number of dataframes. Any idea why this is the case?

– m42op64
Jan 18 at 17:24

@m42op64 see my update. I'm not entirely sure what is going wrong, perhaps the names are being messed up with multiple merges. Does my solution with reduce work for you? In my example, I specified 5 DataFrames to merge, and the resulting mode still works.

– ALollz
Jan 18 at 18:31

add a comment |

merged_df = merged_df.groupby([x.split('_')[0] for x in merged_df.columns], 1).apply(lambda x: x.mode(1)[0])



         Date Gender  StudyID SubjectID  Test1  Test2

0  2010-05-08      M      1.0         A    1.0    1.0

1  2010-05-09      M      1.0         A    NaN    2.0

2  2010-05-10      M      1.0         A    2.0    3.0

3  2010-05-08      M      1.0         B    3.0    4.0

4  2010-05-08      F      1.0         C    4.0    NaN

5  2010-05-09      F      1.0         C    NaN    5.0

Or perhaps, you want to give priority to the non-null value value in the first frame, then this is .combine_first.

df1.set_index(['SubjectID', 'Date']).combine_first(df2.set_index(['SubjectID', 'Date']))



                     Gender  StudyID  Test1  Test2

SubjectID Date                                    

A         2010-05-08      M      1.0    1.0    1.0

          2010-05-09      M      1.0    NaN    2.0

          2010-05-10      M      1.0    2.0    3.0

B         2010-05-08      M      1.0    3.0    4.0

C         2010-05-08      F      1.0    4.0    NaN

          2010-05-09      F      1.0    NaN    5.0

If you have to merge many DataFrames it may be best to use reduce from functools.

from functools import reduce



merged_df = reduce(lambda l,r: l.merge(r, on=['SubjectID', 'Date'], how='outer', suffixes=['_l', '_r']), 

                   [df1, df2 ,df1, df2, df2])

You'll have lots of overlapping columns, but still can resolve them:

merged_df.groupby([x.split('_')[0] for x in merged_df.columns], 1).apply(lambda x: x.mode(1)[0])



         Date Gender  StudyID SubjectID  Test1  Test2

0  2010-05-08      M      1.0         A    1.0    1.0

1  2010-05-10      M      1.0         A    2.0    3.0

2  2010-05-08      M      1.0         B    3.0    4.0

3  2010-05-08      F      1.0         C    4.0    NaN

4  2010-05-09      M      1.0         A    NaN    2.0

5  2010-05-09      F      1.0         C    NaN    5.0

edited Jan 18 at 18:30

answered Jan 18 at 16:35

ALollz

11.9k31536

merged_df = merged_df.groupby([x.split('_')[0] for x in merged_df.columns], 1).apply(lambda x: x.mode(1)[0])



         Date Gender  StudyID SubjectID  Test1  Test2

0  2010-05-08      M      1.0         A    1.0    1.0

1  2010-05-09      M      1.0         A    NaN    2.0

2  2010-05-10      M      1.0         A    2.0    3.0

3  2010-05-08      M      1.0         B    3.0    4.0

4  2010-05-08      F      1.0         C    4.0    NaN

5  2010-05-09      F      1.0         C    NaN    5.0

Or perhaps, you want to give priority to the non-null value value in the first frame, then this is .combine_first.

df1.set_index(['SubjectID', 'Date']).combine_first(df2.set_index(['SubjectID', 'Date']))



                     Gender  StudyID  Test1  Test2

SubjectID Date                                    

A         2010-05-08      M      1.0    1.0    1.0

          2010-05-09      M      1.0    NaN    2.0

          2010-05-10      M      1.0    2.0    3.0

B         2010-05-08      M      1.0    3.0    4.0

C         2010-05-08      F      1.0    4.0    NaN

          2010-05-09      F      1.0    NaN    5.0

If you have to merge many DataFrames it may be best to use reduce from functools.

from functools import reduce



merged_df = reduce(lambda l,r: l.merge(r, on=['SubjectID', 'Date'], how='outer', suffixes=['_l', '_r']), 

                   [df1, df2 ,df1, df2, df2])

You'll have lots of overlapping columns, but still can resolve them:

merged_df.groupby([x.split('_')[0] for x in merged_df.columns], 1).apply(lambda x: x.mode(1)[0])



         Date Gender  StudyID SubjectID  Test1  Test2

0  2010-05-08      M      1.0         A    1.0    1.0

1  2010-05-10      M      1.0         A    2.0    3.0

2  2010-05-08      M      1.0         B    3.0    4.0

3  2010-05-08      F      1.0         C    4.0    NaN

4  2010-05-09      M      1.0         A    NaN    2.0

5  2010-05-09      F      1.0         C    NaN    5.0

edited Jan 18 at 18:30

answered Jan 18 at 16:35

ALollz

11.9k31536

edited Jan 18 at 18:30

answered Jan 18 at 16:35

ALollz

11.9k31536

answered Jan 18 at 16:35

ALollz

11.9k31536

answered Jan 18 at 16:35

ALollz

11.9k31536

Thanks, both of these are good solutions. However, when I try to merge an odd number of dataframes, the modal solution returns an empty DataFrame, but produces the expected result when I merge an even number of dataframes. Any idea why this is the case?

– m42op64
Jan 18 at 17:24

@m42op64 see my update. I'm not entirely sure what is going wrong, perhaps the names are being messed up with multiple merges. Does my solution with reduce work for you? In my example, I specified 5 DataFrames to merge, and the resulting mode still works.

– ALollz
Jan 18 at 18:31

add a comment |

Thanks, both of these are good solutions. However, when I try to merge an odd number of dataframes, the modal solution returns an empty DataFrame, but produces the expected result when I merge an even number of dataframes. Any idea why this is the case?

– m42op64
Jan 18 at 17:24

@m42op64 see my update. I'm not entirely sure what is going wrong, perhaps the names are being messed up with multiple merges. Does my solution with reduce work for you? In my example, I specified 5 DataFrames to merge, and the resulting mode still works.

– ALollz
Jan 18 at 18:31

Thanks, both of these are good solutions. However, when I try to merge an odd number of dataframes, the modal solution returns an empty DataFrame, but produces the expected result when I merge an even number of dataframes. Any idea why this is the case?

– m42op64
Jan 18 at 17:24

@m42op64 see my update. I'm not entirely sure what is going wrong, perhaps the names are being messed up with multiple merges. Does my solution with reduce work for you? In my example, I specified 5 DataFrames to merge, and the resulting mode still works.

– ALollz
Jan 18 at 18:31

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Brtdku