Joining Time Series Dataframes where duplicate columns contain the same values
I'm trying to combine multiple dataframes that contain time series data. These dataframes can have up to 100 columns and roughly 5000 rows. Two sample dataframes are
df1 = pd.DataFrame({'SubjectID': ['A', 'A', 'B', 'C'], 'Date': ['2010-05-08', '2010-05-10', '2010-05-08', '2010-05-08'], 'Test1':[1, 2, 3, 4], 'Gender': ['M', 'M', 'M', 'F'], 'StudyID': [1, 1, 1, 1]})
df2 = pd.DataFrame({'SubjectID': ['A', 'A', 'A', 'B', 'C'], 'Date': ['2010-05-08', '2010-05-09', '2010-05-10', '2010-05-08', '2010-05-09'], 'Test2': [1, 2, 3, 4, 5], 'Gender': ['M', 'M', 'M', 'M', 'F'], 'StudyID': [1, 1, 1, 1, 1]})
df1
SubjectID Date Test1 Gender StudyID
0 A 2010-05-08 1 M 1
1 A 2010-05-10 2 M 1
2 B 2010-05-08 3 M 1
3 C 2010-05-08 4 F 1
df2
SubjectID Date Test2 Gender StudyID
0 A 2010-05-08 1 M 1
1 A 2010-05-09 2 M 1
2 A 2010-05-10 3 M 1
3 B 2010-05-08 4 M 1
4 C 2010-05-09 5 F 1
My expected output is
SubjectID Date Test1 Gender StudyID Test2
0 A 2010-05-08 1.0 M 1.0 1.0
1 A 2010-05-09 NaN M 1.0 2.0
2 A 2010-05-10 2.0 M 1.0 3.0
3 B 2010-05-08 3.0 M 1.0 4.0
4 C 2010-05-08 4.0 F 1.0 NaN
5 C 2010-05-09 NaN F 1.0 5.0
I'm joining the dataframes by
merged_df = df1.set_index(['SubjectID', 'Date']).join(df2.set_index(['SubjectID', 'Date']), how = 'outer', lsuffix = '_l', rsuffix = '_r').reset_index()
but my output is
SubjectID Date Test1 Gender_l StudyID_l Test2 Gender_r StudyID_r
0 A 2010-05-08 1.0 M 1.0 1.0 M 1.0
1 A 2010-05-09 NaN NaN NaN 2.0 M 1.0
2 A 2010-05-10 2.0 M 1.0 3.0 M 1.0
3 B 2010-05-08 3.0 M 1.0 4.0 M 1.0
4 C 2010-05-08 4.0 F 1.0 NaN NaN NaN
5 C 2010-05-09 NaN NaN NaN 5.0 F 1.0
Is there a way to combine columns while joining the dataframes if all the values in both dataframes are equal? I can do it after the merge, but that will get tedious for my large datasets.
python-3.x pandas join time-series
add a comment |
I'm trying to combine multiple dataframes that contain time series data. These dataframes can have up to 100 columns and roughly 5000 rows. Two sample dataframes are
df1 = pd.DataFrame({'SubjectID': ['A', 'A', 'B', 'C'], 'Date': ['2010-05-08', '2010-05-10', '2010-05-08', '2010-05-08'], 'Test1':[1, 2, 3, 4], 'Gender': ['M', 'M', 'M', 'F'], 'StudyID': [1, 1, 1, 1]})
df2 = pd.DataFrame({'SubjectID': ['A', 'A', 'A', 'B', 'C'], 'Date': ['2010-05-08', '2010-05-09', '2010-05-10', '2010-05-08', '2010-05-09'], 'Test2': [1, 2, 3, 4, 5], 'Gender': ['M', 'M', 'M', 'M', 'F'], 'StudyID': [1, 1, 1, 1, 1]})
df1
SubjectID Date Test1 Gender StudyID
0 A 2010-05-08 1 M 1
1 A 2010-05-10 2 M 1
2 B 2010-05-08 3 M 1
3 C 2010-05-08 4 F 1
df2
SubjectID Date Test2 Gender StudyID
0 A 2010-05-08 1 M 1
1 A 2010-05-09 2 M 1
2 A 2010-05-10 3 M 1
3 B 2010-05-08 4 M 1
4 C 2010-05-09 5 F 1
My expected output is
SubjectID Date Test1 Gender StudyID Test2
0 A 2010-05-08 1.0 M 1.0 1.0
1 A 2010-05-09 NaN M 1.0 2.0
2 A 2010-05-10 2.0 M 1.0 3.0
3 B 2010-05-08 3.0 M 1.0 4.0
4 C 2010-05-08 4.0 F 1.0 NaN
5 C 2010-05-09 NaN F 1.0 5.0
I'm joining the dataframes by
merged_df = df1.set_index(['SubjectID', 'Date']).join(df2.set_index(['SubjectID', 'Date']), how = 'outer', lsuffix = '_l', rsuffix = '_r').reset_index()
but my output is
SubjectID Date Test1 Gender_l StudyID_l Test2 Gender_r StudyID_r
0 A 2010-05-08 1.0 M 1.0 1.0 M 1.0
1 A 2010-05-09 NaN NaN NaN 2.0 M 1.0
2 A 2010-05-10 2.0 M 1.0 3.0 M 1.0
3 B 2010-05-08 3.0 M 1.0 4.0 M 1.0
4 C 2010-05-08 4.0 F 1.0 NaN NaN NaN
5 C 2010-05-09 NaN NaN NaN 5.0 F 1.0
Is there a way to combine columns while joining the dataframes if all the values in both dataframes are equal? I can do it after the merge, but that will get tedious for my large datasets.
python-3.x pandas join time-series
add a comment |
I'm trying to combine multiple dataframes that contain time series data. These dataframes can have up to 100 columns and roughly 5000 rows. Two sample dataframes are
df1 = pd.DataFrame({'SubjectID': ['A', 'A', 'B', 'C'], 'Date': ['2010-05-08', '2010-05-10', '2010-05-08', '2010-05-08'], 'Test1':[1, 2, 3, 4], 'Gender': ['M', 'M', 'M', 'F'], 'StudyID': [1, 1, 1, 1]})
df2 = pd.DataFrame({'SubjectID': ['A', 'A', 'A', 'B', 'C'], 'Date': ['2010-05-08', '2010-05-09', '2010-05-10', '2010-05-08', '2010-05-09'], 'Test2': [1, 2, 3, 4, 5], 'Gender': ['M', 'M', 'M', 'M', 'F'], 'StudyID': [1, 1, 1, 1, 1]})
df1
SubjectID Date Test1 Gender StudyID
0 A 2010-05-08 1 M 1
1 A 2010-05-10 2 M 1
2 B 2010-05-08 3 M 1
3 C 2010-05-08 4 F 1
df2
SubjectID Date Test2 Gender StudyID
0 A 2010-05-08 1 M 1
1 A 2010-05-09 2 M 1
2 A 2010-05-10 3 M 1
3 B 2010-05-08 4 M 1
4 C 2010-05-09 5 F 1
My expected output is
SubjectID Date Test1 Gender StudyID Test2
0 A 2010-05-08 1.0 M 1.0 1.0
1 A 2010-05-09 NaN M 1.0 2.0
2 A 2010-05-10 2.0 M 1.0 3.0
3 B 2010-05-08 3.0 M 1.0 4.0
4 C 2010-05-08 4.0 F 1.0 NaN
5 C 2010-05-09 NaN F 1.0 5.0
I'm joining the dataframes by
merged_df = df1.set_index(['SubjectID', 'Date']).join(df2.set_index(['SubjectID', 'Date']), how = 'outer', lsuffix = '_l', rsuffix = '_r').reset_index()
but my output is
SubjectID Date Test1 Gender_l StudyID_l Test2 Gender_r StudyID_r
0 A 2010-05-08 1.0 M 1.0 1.0 M 1.0
1 A 2010-05-09 NaN NaN NaN 2.0 M 1.0
2 A 2010-05-10 2.0 M 1.0 3.0 M 1.0
3 B 2010-05-08 3.0 M 1.0 4.0 M 1.0
4 C 2010-05-08 4.0 F 1.0 NaN NaN NaN
5 C 2010-05-09 NaN NaN NaN 5.0 F 1.0
Is there a way to combine columns while joining the dataframes if all the values in both dataframes are equal? I can do it after the merge, but that will get tedious for my large datasets.
python-3.x pandas join time-series
I'm trying to combine multiple dataframes that contain time series data. These dataframes can have up to 100 columns and roughly 5000 rows. Two sample dataframes are
df1 = pd.DataFrame({'SubjectID': ['A', 'A', 'B', 'C'], 'Date': ['2010-05-08', '2010-05-10', '2010-05-08', '2010-05-08'], 'Test1':[1, 2, 3, 4], 'Gender': ['M', 'M', 'M', 'F'], 'StudyID': [1, 1, 1, 1]})
df2 = pd.DataFrame({'SubjectID': ['A', 'A', 'A', 'B', 'C'], 'Date': ['2010-05-08', '2010-05-09', '2010-05-10', '2010-05-08', '2010-05-09'], 'Test2': [1, 2, 3, 4, 5], 'Gender': ['M', 'M', 'M', 'M', 'F'], 'StudyID': [1, 1, 1, 1, 1]})
df1
SubjectID Date Test1 Gender StudyID
0 A 2010-05-08 1 M 1
1 A 2010-05-10 2 M 1
2 B 2010-05-08 3 M 1
3 C 2010-05-08 4 F 1
df2
SubjectID Date Test2 Gender StudyID
0 A 2010-05-08 1 M 1
1 A 2010-05-09 2 M 1
2 A 2010-05-10 3 M 1
3 B 2010-05-08 4 M 1
4 C 2010-05-09 5 F 1
My expected output is
SubjectID Date Test1 Gender StudyID Test2
0 A 2010-05-08 1.0 M 1.0 1.0
1 A 2010-05-09 NaN M 1.0 2.0
2 A 2010-05-10 2.0 M 1.0 3.0
3 B 2010-05-08 3.0 M 1.0 4.0
4 C 2010-05-08 4.0 F 1.0 NaN
5 C 2010-05-09 NaN F 1.0 5.0
I'm joining the dataframes by
merged_df = df1.set_index(['SubjectID', 'Date']).join(df2.set_index(['SubjectID', 'Date']), how = 'outer', lsuffix = '_l', rsuffix = '_r').reset_index()
but my output is
SubjectID Date Test1 Gender_l StudyID_l Test2 Gender_r StudyID_r
0 A 2010-05-08 1.0 M 1.0 1.0 M 1.0
1 A 2010-05-09 NaN NaN NaN 2.0 M 1.0
2 A 2010-05-10 2.0 M 1.0 3.0 M 1.0
3 B 2010-05-08 3.0 M 1.0 4.0 M 1.0
4 C 2010-05-08 4.0 F 1.0 NaN NaN NaN
5 C 2010-05-09 NaN NaN NaN 5.0 F 1.0
Is there a way to combine columns while joining the dataframes if all the values in both dataframes are equal? I can do it after the merge, but that will get tedious for my large datasets.
python-3.x pandas join time-series
python-3.x pandas join time-series
asked Jan 18 at 16:14
m42op64m42op64
207
207
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
It depends how you want to implement the logic of resolving information that may not exactly match. Had you merged several frames, I think taking the modal
value is appropriate. Taking your merged_df
we can resolve it as:
merged_df = merged_df.groupby([x.split('_')[0] for x in merged_df.columns], 1).apply(lambda x: x.mode(1)[0])
Date Gender StudyID SubjectID Test1 Test2
0 2010-05-08 M 1.0 A 1.0 1.0
1 2010-05-09 M 1.0 A NaN 2.0
2 2010-05-10 M 1.0 A 2.0 3.0
3 2010-05-08 M 1.0 B 3.0 4.0
4 2010-05-08 F 1.0 C 4.0 NaN
5 2010-05-09 F 1.0 C NaN 5.0
Or perhaps, you want to give priority to the non-null value value in the first frame, then this is .combine_first
.
df1.set_index(['SubjectID', 'Date']).combine_first(df2.set_index(['SubjectID', 'Date']))
Gender StudyID Test1 Test2
SubjectID Date
A 2010-05-08 M 1.0 1.0 1.0
2010-05-09 M 1.0 NaN 2.0
2010-05-10 M 1.0 2.0 3.0
B 2010-05-08 M 1.0 3.0 4.0
C 2010-05-08 F 1.0 4.0 NaN
2010-05-09 F 1.0 NaN 5.0
If you have to merge many DataFrames
it may be best to use reduce
from functools.
from functools import reduce
merged_df = reduce(lambda l,r: l.merge(r, on=['SubjectID', 'Date'], how='outer', suffixes=['_l', '_r']),
[df1, df2 ,df1, df2, df2])
You'll have lots of overlapping columns, but still can resolve them:
merged_df.groupby([x.split('_')[0] for x in merged_df.columns], 1).apply(lambda x: x.mode(1)[0])
Date Gender StudyID SubjectID Test1 Test2
0 2010-05-08 M 1.0 A 1.0 1.0
1 2010-05-10 M 1.0 A 2.0 3.0
2 2010-05-08 M 1.0 B 3.0 4.0
3 2010-05-08 F 1.0 C 4.0 NaN
4 2010-05-09 M 1.0 A NaN 2.0
5 2010-05-09 F 1.0 C NaN 5.0
Thanks, both of these are good solutions. However, when I try to merge an odd number of dataframes, themodal
solution returns an empty DataFrame, but produces the expected result when I merge an even number of dataframes. Any idea why this is the case?
– m42op64
Jan 18 at 17:24
@m42op64 see my update. I'm not entirely sure what is going wrong, perhaps the names are being messed up with multiple merges. Does my solution withreduce
work for you? In my example, I specified 5DataFrames
to merge, and the resulting mode still works.
– ALollz
Jan 18 at 18:31
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54257694%2fjoining-time-series-dataframes-where-duplicate-columns-contain-the-same-values%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
It depends how you want to implement the logic of resolving information that may not exactly match. Had you merged several frames, I think taking the modal
value is appropriate. Taking your merged_df
we can resolve it as:
merged_df = merged_df.groupby([x.split('_')[0] for x in merged_df.columns], 1).apply(lambda x: x.mode(1)[0])
Date Gender StudyID SubjectID Test1 Test2
0 2010-05-08 M 1.0 A 1.0 1.0
1 2010-05-09 M 1.0 A NaN 2.0
2 2010-05-10 M 1.0 A 2.0 3.0
3 2010-05-08 M 1.0 B 3.0 4.0
4 2010-05-08 F 1.0 C 4.0 NaN
5 2010-05-09 F 1.0 C NaN 5.0
Or perhaps, you want to give priority to the non-null value value in the first frame, then this is .combine_first
.
df1.set_index(['SubjectID', 'Date']).combine_first(df2.set_index(['SubjectID', 'Date']))
Gender StudyID Test1 Test2
SubjectID Date
A 2010-05-08 M 1.0 1.0 1.0
2010-05-09 M 1.0 NaN 2.0
2010-05-10 M 1.0 2.0 3.0
B 2010-05-08 M 1.0 3.0 4.0
C 2010-05-08 F 1.0 4.0 NaN
2010-05-09 F 1.0 NaN 5.0
If you have to merge many DataFrames
it may be best to use reduce
from functools.
from functools import reduce
merged_df = reduce(lambda l,r: l.merge(r, on=['SubjectID', 'Date'], how='outer', suffixes=['_l', '_r']),
[df1, df2 ,df1, df2, df2])
You'll have lots of overlapping columns, but still can resolve them:
merged_df.groupby([x.split('_')[0] for x in merged_df.columns], 1).apply(lambda x: x.mode(1)[0])
Date Gender StudyID SubjectID Test1 Test2
0 2010-05-08 M 1.0 A 1.0 1.0
1 2010-05-10 M 1.0 A 2.0 3.0
2 2010-05-08 M 1.0 B 3.0 4.0
3 2010-05-08 F 1.0 C 4.0 NaN
4 2010-05-09 M 1.0 A NaN 2.0
5 2010-05-09 F 1.0 C NaN 5.0
Thanks, both of these are good solutions. However, when I try to merge an odd number of dataframes, themodal
solution returns an empty DataFrame, but produces the expected result when I merge an even number of dataframes. Any idea why this is the case?
– m42op64
Jan 18 at 17:24
@m42op64 see my update. I'm not entirely sure what is going wrong, perhaps the names are being messed up with multiple merges. Does my solution withreduce
work for you? In my example, I specified 5DataFrames
to merge, and the resulting mode still works.
– ALollz
Jan 18 at 18:31
add a comment |
It depends how you want to implement the logic of resolving information that may not exactly match. Had you merged several frames, I think taking the modal
value is appropriate. Taking your merged_df
we can resolve it as:
merged_df = merged_df.groupby([x.split('_')[0] for x in merged_df.columns], 1).apply(lambda x: x.mode(1)[0])
Date Gender StudyID SubjectID Test1 Test2
0 2010-05-08 M 1.0 A 1.0 1.0
1 2010-05-09 M 1.0 A NaN 2.0
2 2010-05-10 M 1.0 A 2.0 3.0
3 2010-05-08 M 1.0 B 3.0 4.0
4 2010-05-08 F 1.0 C 4.0 NaN
5 2010-05-09 F 1.0 C NaN 5.0
Or perhaps, you want to give priority to the non-null value value in the first frame, then this is .combine_first
.
df1.set_index(['SubjectID', 'Date']).combine_first(df2.set_index(['SubjectID', 'Date']))
Gender StudyID Test1 Test2
SubjectID Date
A 2010-05-08 M 1.0 1.0 1.0
2010-05-09 M 1.0 NaN 2.0
2010-05-10 M 1.0 2.0 3.0
B 2010-05-08 M 1.0 3.0 4.0
C 2010-05-08 F 1.0 4.0 NaN
2010-05-09 F 1.0 NaN 5.0
If you have to merge many DataFrames
it may be best to use reduce
from functools.
from functools import reduce
merged_df = reduce(lambda l,r: l.merge(r, on=['SubjectID', 'Date'], how='outer', suffixes=['_l', '_r']),
[df1, df2 ,df1, df2, df2])
You'll have lots of overlapping columns, but still can resolve them:
merged_df.groupby([x.split('_')[0] for x in merged_df.columns], 1).apply(lambda x: x.mode(1)[0])
Date Gender StudyID SubjectID Test1 Test2
0 2010-05-08 M 1.0 A 1.0 1.0
1 2010-05-10 M 1.0 A 2.0 3.0
2 2010-05-08 M 1.0 B 3.0 4.0
3 2010-05-08 F 1.0 C 4.0 NaN
4 2010-05-09 M 1.0 A NaN 2.0
5 2010-05-09 F 1.0 C NaN 5.0
Thanks, both of these are good solutions. However, when I try to merge an odd number of dataframes, themodal
solution returns an empty DataFrame, but produces the expected result when I merge an even number of dataframes. Any idea why this is the case?
– m42op64
Jan 18 at 17:24
@m42op64 see my update. I'm not entirely sure what is going wrong, perhaps the names are being messed up with multiple merges. Does my solution withreduce
work for you? In my example, I specified 5DataFrames
to merge, and the resulting mode still works.
– ALollz
Jan 18 at 18:31
add a comment |
It depends how you want to implement the logic of resolving information that may not exactly match. Had you merged several frames, I think taking the modal
value is appropriate. Taking your merged_df
we can resolve it as:
merged_df = merged_df.groupby([x.split('_')[0] for x in merged_df.columns], 1).apply(lambda x: x.mode(1)[0])
Date Gender StudyID SubjectID Test1 Test2
0 2010-05-08 M 1.0 A 1.0 1.0
1 2010-05-09 M 1.0 A NaN 2.0
2 2010-05-10 M 1.0 A 2.0 3.0
3 2010-05-08 M 1.0 B 3.0 4.0
4 2010-05-08 F 1.0 C 4.0 NaN
5 2010-05-09 F 1.0 C NaN 5.0
Or perhaps, you want to give priority to the non-null value value in the first frame, then this is .combine_first
.
df1.set_index(['SubjectID', 'Date']).combine_first(df2.set_index(['SubjectID', 'Date']))
Gender StudyID Test1 Test2
SubjectID Date
A 2010-05-08 M 1.0 1.0 1.0
2010-05-09 M 1.0 NaN 2.0
2010-05-10 M 1.0 2.0 3.0
B 2010-05-08 M 1.0 3.0 4.0
C 2010-05-08 F 1.0 4.0 NaN
2010-05-09 F 1.0 NaN 5.0
If you have to merge many DataFrames
it may be best to use reduce
from functools.
from functools import reduce
merged_df = reduce(lambda l,r: l.merge(r, on=['SubjectID', 'Date'], how='outer', suffixes=['_l', '_r']),
[df1, df2 ,df1, df2, df2])
You'll have lots of overlapping columns, but still can resolve them:
merged_df.groupby([x.split('_')[0] for x in merged_df.columns], 1).apply(lambda x: x.mode(1)[0])
Date Gender StudyID SubjectID Test1 Test2
0 2010-05-08 M 1.0 A 1.0 1.0
1 2010-05-10 M 1.0 A 2.0 3.0
2 2010-05-08 M 1.0 B 3.0 4.0
3 2010-05-08 F 1.0 C 4.0 NaN
4 2010-05-09 M 1.0 A NaN 2.0
5 2010-05-09 F 1.0 C NaN 5.0
It depends how you want to implement the logic of resolving information that may not exactly match. Had you merged several frames, I think taking the modal
value is appropriate. Taking your merged_df
we can resolve it as:
merged_df = merged_df.groupby([x.split('_')[0] for x in merged_df.columns], 1).apply(lambda x: x.mode(1)[0])
Date Gender StudyID SubjectID Test1 Test2
0 2010-05-08 M 1.0 A 1.0 1.0
1 2010-05-09 M 1.0 A NaN 2.0
2 2010-05-10 M 1.0 A 2.0 3.0
3 2010-05-08 M 1.0 B 3.0 4.0
4 2010-05-08 F 1.0 C 4.0 NaN
5 2010-05-09 F 1.0 C NaN 5.0
Or perhaps, you want to give priority to the non-null value value in the first frame, then this is .combine_first
.
df1.set_index(['SubjectID', 'Date']).combine_first(df2.set_index(['SubjectID', 'Date']))
Gender StudyID Test1 Test2
SubjectID Date
A 2010-05-08 M 1.0 1.0 1.0
2010-05-09 M 1.0 NaN 2.0
2010-05-10 M 1.0 2.0 3.0
B 2010-05-08 M 1.0 3.0 4.0
C 2010-05-08 F 1.0 4.0 NaN
2010-05-09 F 1.0 NaN 5.0
If you have to merge many DataFrames
it may be best to use reduce
from functools.
from functools import reduce
merged_df = reduce(lambda l,r: l.merge(r, on=['SubjectID', 'Date'], how='outer', suffixes=['_l', '_r']),
[df1, df2 ,df1, df2, df2])
You'll have lots of overlapping columns, but still can resolve them:
merged_df.groupby([x.split('_')[0] for x in merged_df.columns], 1).apply(lambda x: x.mode(1)[0])
Date Gender StudyID SubjectID Test1 Test2
0 2010-05-08 M 1.0 A 1.0 1.0
1 2010-05-10 M 1.0 A 2.0 3.0
2 2010-05-08 M 1.0 B 3.0 4.0
3 2010-05-08 F 1.0 C 4.0 NaN
4 2010-05-09 M 1.0 A NaN 2.0
5 2010-05-09 F 1.0 C NaN 5.0
edited Jan 18 at 18:30
answered Jan 18 at 16:35
ALollzALollz
11.9k31536
11.9k31536
Thanks, both of these are good solutions. However, when I try to merge an odd number of dataframes, themodal
solution returns an empty DataFrame, but produces the expected result when I merge an even number of dataframes. Any idea why this is the case?
– m42op64
Jan 18 at 17:24
@m42op64 see my update. I'm not entirely sure what is going wrong, perhaps the names are being messed up with multiple merges. Does my solution withreduce
work for you? In my example, I specified 5DataFrames
to merge, and the resulting mode still works.
– ALollz
Jan 18 at 18:31
add a comment |
Thanks, both of these are good solutions. However, when I try to merge an odd number of dataframes, themodal
solution returns an empty DataFrame, but produces the expected result when I merge an even number of dataframes. Any idea why this is the case?
– m42op64
Jan 18 at 17:24
@m42op64 see my update. I'm not entirely sure what is going wrong, perhaps the names are being messed up with multiple merges. Does my solution withreduce
work for you? In my example, I specified 5DataFrames
to merge, and the resulting mode still works.
– ALollz
Jan 18 at 18:31
Thanks, both of these are good solutions. However, when I try to merge an odd number of dataframes, the
modal
solution returns an empty DataFrame, but produces the expected result when I merge an even number of dataframes. Any idea why this is the case?– m42op64
Jan 18 at 17:24
Thanks, both of these are good solutions. However, when I try to merge an odd number of dataframes, the
modal
solution returns an empty DataFrame, but produces the expected result when I merge an even number of dataframes. Any idea why this is the case?– m42op64
Jan 18 at 17:24
@m42op64 see my update. I'm not entirely sure what is going wrong, perhaps the names are being messed up with multiple merges. Does my solution with
reduce
work for you? In my example, I specified 5 DataFrames
to merge, and the resulting mode still works.– ALollz
Jan 18 at 18:31
@m42op64 see my update. I'm not entirely sure what is going wrong, perhaps the names are being messed up with multiple merges. Does my solution with
reduce
work for you? In my example, I specified 5 DataFrames
to merge, and the resulting mode still works.– ALollz
Jan 18 at 18:31
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54257694%2fjoining-time-series-dataframes-where-duplicate-columns-contain-the-same-values%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown