Combine/merge DataFrames with different indexes and different column names
I have problems to merge two dataframes in the desired way. I unsuccessfully tried out a lot with merge
and join
methods but I did not achieve the desired result.
import pandas as pd
d = {'A': [1, 1, 0, 1, 0, 1, 0],
'B': [0, 0, 0, 0, 0, 1, 1]
}
df = pd.DataFrame(data=d, index=["A", "B", "C", "D", "E", "F", "G"])
print(df)
d = {'A2': ["D", "A", "A", "B", "C", "C", "E", "X", "F", "G"],
'B2': ["DD", "AA", "AA", "BB", "CC", "CC", "EE", "XX", "FF", "GG"],
'C3': [1, 1, 11, 35, 53, 2, 76, 45, 5, 34]}
df2 = pd.DataFrame(data=d)
print(df2)
Console output:
A B
A 1 0
B 1 0
C 0 0
D 1 0
E 0 0
F 1 1
G 0 1
A2 B2 C3
0 A AA 1
1 A AA 11
2 B BB 35
3 C CC 53
4 C CC 2
5 E EE 76
6 X XX 45
7 F FF 5
8 G GG 34
I'm looking for a way to compute the following: Via the index of df
I can look up in column A2
of df2
the value of B2
which should be added to df
.
Desired result:
A B B2
A 1 0 AA
B 1 0 BB
C 0 0 CC
D 1 0 DD
E 0 0 EE
F 1 1 FF
G 0 1 GG
(This is only dummy data, just duplicating the index and write it in column B2
of df
is not sufficient)
pandas dataframe
add a comment |
I have problems to merge two dataframes in the desired way. I unsuccessfully tried out a lot with merge
and join
methods but I did not achieve the desired result.
import pandas as pd
d = {'A': [1, 1, 0, 1, 0, 1, 0],
'B': [0, 0, 0, 0, 0, 1, 1]
}
df = pd.DataFrame(data=d, index=["A", "B", "C", "D", "E", "F", "G"])
print(df)
d = {'A2': ["D", "A", "A", "B", "C", "C", "E", "X", "F", "G"],
'B2': ["DD", "AA", "AA", "BB", "CC", "CC", "EE", "XX", "FF", "GG"],
'C3': [1, 1, 11, 35, 53, 2, 76, 45, 5, 34]}
df2 = pd.DataFrame(data=d)
print(df2)
Console output:
A B
A 1 0
B 1 0
C 0 0
D 1 0
E 0 0
F 1 1
G 0 1
A2 B2 C3
0 A AA 1
1 A AA 11
2 B BB 35
3 C CC 53
4 C CC 2
5 E EE 76
6 X XX 45
7 F FF 5
8 G GG 34
I'm looking for a way to compute the following: Via the index of df
I can look up in column A2
of df2
the value of B2
which should be added to df
.
Desired result:
A B B2
A 1 0 AA
B 1 0 BB
C 0 0 CC
D 1 0 DD
E 0 0 EE
F 1 1 FF
G 0 1 GG
(This is only dummy data, just duplicating the index and write it in column B2
of df
is not sufficient)
pandas dataframe
1
where isDD
come from
– W-B
Jan 18 at 20:28
df2 is duplicated with respect to'A2'
so it's not guaranteed to be a unique mapping. What do we do in this case (i.e. is justdrop_duplicates
good or can A ever be mapped to multiple values like AA and AB?)
– ALollz
Jan 18 at 20:29
@W-B i fixed the issue in the MWE
– d4rty
Jan 18 at 20:30
1
@ALollz Indf2
if two rows have the sameA2
value than it follows that they also have the sameB2
value. This is guaranteed by the data
– d4rty
Jan 18 at 20:32
add a comment |
I have problems to merge two dataframes in the desired way. I unsuccessfully tried out a lot with merge
and join
methods but I did not achieve the desired result.
import pandas as pd
d = {'A': [1, 1, 0, 1, 0, 1, 0],
'B': [0, 0, 0, 0, 0, 1, 1]
}
df = pd.DataFrame(data=d, index=["A", "B", "C", "D", "E", "F", "G"])
print(df)
d = {'A2': ["D", "A", "A", "B", "C", "C", "E", "X", "F", "G"],
'B2': ["DD", "AA", "AA", "BB", "CC", "CC", "EE", "XX", "FF", "GG"],
'C3': [1, 1, 11, 35, 53, 2, 76, 45, 5, 34]}
df2 = pd.DataFrame(data=d)
print(df2)
Console output:
A B
A 1 0
B 1 0
C 0 0
D 1 0
E 0 0
F 1 1
G 0 1
A2 B2 C3
0 A AA 1
1 A AA 11
2 B BB 35
3 C CC 53
4 C CC 2
5 E EE 76
6 X XX 45
7 F FF 5
8 G GG 34
I'm looking for a way to compute the following: Via the index of df
I can look up in column A2
of df2
the value of B2
which should be added to df
.
Desired result:
A B B2
A 1 0 AA
B 1 0 BB
C 0 0 CC
D 1 0 DD
E 0 0 EE
F 1 1 FF
G 0 1 GG
(This is only dummy data, just duplicating the index and write it in column B2
of df
is not sufficient)
pandas dataframe
I have problems to merge two dataframes in the desired way. I unsuccessfully tried out a lot with merge
and join
methods but I did not achieve the desired result.
import pandas as pd
d = {'A': [1, 1, 0, 1, 0, 1, 0],
'B': [0, 0, 0, 0, 0, 1, 1]
}
df = pd.DataFrame(data=d, index=["A", "B", "C", "D", "E", "F", "G"])
print(df)
d = {'A2': ["D", "A", "A", "B", "C", "C", "E", "X", "F", "G"],
'B2': ["DD", "AA", "AA", "BB", "CC", "CC", "EE", "XX", "FF", "GG"],
'C3': [1, 1, 11, 35, 53, 2, 76, 45, 5, 34]}
df2 = pd.DataFrame(data=d)
print(df2)
Console output:
A B
A 1 0
B 1 0
C 0 0
D 1 0
E 0 0
F 1 1
G 0 1
A2 B2 C3
0 A AA 1
1 A AA 11
2 B BB 35
3 C CC 53
4 C CC 2
5 E EE 76
6 X XX 45
7 F FF 5
8 G GG 34
I'm looking for a way to compute the following: Via the index of df
I can look up in column A2
of df2
the value of B2
which should be added to df
.
Desired result:
A B B2
A 1 0 AA
B 1 0 BB
C 0 0 CC
D 1 0 DD
E 0 0 EE
F 1 1 FF
G 0 1 GG
(This is only dummy data, just duplicating the index and write it in column B2
of df
is not sufficient)
pandas dataframe
pandas dataframe
edited Jan 18 at 20:30
d4rty
asked Jan 18 at 20:22
d4rtyd4rty
1,32821232
1,32821232
1
where isDD
come from
– W-B
Jan 18 at 20:28
df2 is duplicated with respect to'A2'
so it's not guaranteed to be a unique mapping. What do we do in this case (i.e. is justdrop_duplicates
good or can A ever be mapped to multiple values like AA and AB?)
– ALollz
Jan 18 at 20:29
@W-B i fixed the issue in the MWE
– d4rty
Jan 18 at 20:30
1
@ALollz Indf2
if two rows have the sameA2
value than it follows that they also have the sameB2
value. This is guaranteed by the data
– d4rty
Jan 18 at 20:32
add a comment |
1
where isDD
come from
– W-B
Jan 18 at 20:28
df2 is duplicated with respect to'A2'
so it's not guaranteed to be a unique mapping. What do we do in this case (i.e. is justdrop_duplicates
good or can A ever be mapped to multiple values like AA and AB?)
– ALollz
Jan 18 at 20:29
@W-B i fixed the issue in the MWE
– d4rty
Jan 18 at 20:30
1
@ALollz Indf2
if two rows have the sameA2
value than it follows that they also have the sameB2
value. This is guaranteed by the data
– d4rty
Jan 18 at 20:32
1
1
where is
DD
come from– W-B
Jan 18 at 20:28
where is
DD
come from– W-B
Jan 18 at 20:28
df2 is duplicated with respect to
'A2'
so it's not guaranteed to be a unique mapping. What do we do in this case (i.e. is just drop_duplicates
good or can A ever be mapped to multiple values like AA and AB?)– ALollz
Jan 18 at 20:29
df2 is duplicated with respect to
'A2'
so it's not guaranteed to be a unique mapping. What do we do in this case (i.e. is just drop_duplicates
good or can A ever be mapped to multiple values like AA and AB?)– ALollz
Jan 18 at 20:29
@W-B i fixed the issue in the MWE
– d4rty
Jan 18 at 20:30
@W-B i fixed the issue in the MWE
– d4rty
Jan 18 at 20:30
1
1
@ALollz In
df2
if two rows have the same A2
value than it follows that they also have the same B2
value. This is guaranteed by the data– d4rty
Jan 18 at 20:32
@ALollz In
df2
if two rows have the same A2
value than it follows that they also have the same B2
value. This is guaranteed by the data– d4rty
Jan 18 at 20:32
add a comment |
2 Answers
2
active
oldest
votes
set_index
and assign it
df['B2']=df2.drop_duplicates('A2').set_index('A2')['B2']
df
Out[728]:
A B B2
A 1 0 AA
B 1 0 BB
C 0 0 CC
D 1 0 DD
E 0 0 EE
F 1 1 FF
G 0 1 GG
add a comment |
I know this has been already answered by W-B in a very elegant way.
However, since I have spent the time to solve this in a less professional way, let me post also my solution.
From:
I'm looking for a way to compute the following: Via the index of df I
can look up in column A2 of df2 the value of B2 which should be added
to df.
I understood I should do:
- get index list form df. So A, B, C...
- look values in df2['B2'] in the same index than df2['A2'] for each element of df index
- create a new column ['B2'] in df, where we copy these values from df2['B2'] matching the index from df to the elements on df2['A2']
This is my code:
import pandas as pd
d = {'A': [1, 1, 0, 1, 0, 1, 0],
'B': [0, 0, 0, 0, 0, 1, 1]
}
df = pd.DataFrame(data=d, index=["A", "B", "C", "D", "E", "F", "G"])
print(df)
d = {'A2': ["D", "A", "A", "B", "C", "C", "E", "X", "F", "G"],
'B2': ["DD", "AA", "AA", "BB", "CC", "CC", "EE", "XX", "FF", "GG"],
'C3': [1, 1, 11, 35, 53, 2, 76, 45, 5, 34]}
df2 = pd.DataFrame(data=d)
print(df2)
llista=
for i in df.index:
m=df2['A2'].loc[df2['A2']==i].index
if m[0]:
print m[0],i
llista.append(df2['B2'].iloc[m[0]])
else:
llista.append()
df['B2'] = llista
Output is:
A B B2
A 1 0 AA
B 1 0 BB
C 0 0 CC
D 1 0
E 0 0 EE
F 1 1 FF
G 0 1 GG
As you can see is different than the accepted post. This is because there is no 'D' index in df2['A2']
Both answers are right and they deserve a vote up. However, I have to say that I found this one to be the right one.
– stram hp
Jan 19 at 14:23
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54260920%2fcombine-merge-dataframes-with-different-indexes-and-different-column-names%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
set_index
and assign it
df['B2']=df2.drop_duplicates('A2').set_index('A2')['B2']
df
Out[728]:
A B B2
A 1 0 AA
B 1 0 BB
C 0 0 CC
D 1 0 DD
E 0 0 EE
F 1 1 FF
G 0 1 GG
add a comment |
set_index
and assign it
df['B2']=df2.drop_duplicates('A2').set_index('A2')['B2']
df
Out[728]:
A B B2
A 1 0 AA
B 1 0 BB
C 0 0 CC
D 1 0 DD
E 0 0 EE
F 1 1 FF
G 0 1 GG
add a comment |
set_index
and assign it
df['B2']=df2.drop_duplicates('A2').set_index('A2')['B2']
df
Out[728]:
A B B2
A 1 0 AA
B 1 0 BB
C 0 0 CC
D 1 0 DD
E 0 0 EE
F 1 1 FF
G 0 1 GG
set_index
and assign it
df['B2']=df2.drop_duplicates('A2').set_index('A2')['B2']
df
Out[728]:
A B B2
A 1 0 AA
B 1 0 BB
C 0 0 CC
D 1 0 DD
E 0 0 EE
F 1 1 FF
G 0 1 GG
answered Jan 18 at 20:32
W-BW-B
107k83165
107k83165
add a comment |
add a comment |
I know this has been already answered by W-B in a very elegant way.
However, since I have spent the time to solve this in a less professional way, let me post also my solution.
From:
I'm looking for a way to compute the following: Via the index of df I
can look up in column A2 of df2 the value of B2 which should be added
to df.
I understood I should do:
- get index list form df. So A, B, C...
- look values in df2['B2'] in the same index than df2['A2'] for each element of df index
- create a new column ['B2'] in df, where we copy these values from df2['B2'] matching the index from df to the elements on df2['A2']
This is my code:
import pandas as pd
d = {'A': [1, 1, 0, 1, 0, 1, 0],
'B': [0, 0, 0, 0, 0, 1, 1]
}
df = pd.DataFrame(data=d, index=["A", "B", "C", "D", "E", "F", "G"])
print(df)
d = {'A2': ["D", "A", "A", "B", "C", "C", "E", "X", "F", "G"],
'B2': ["DD", "AA", "AA", "BB", "CC", "CC", "EE", "XX", "FF", "GG"],
'C3': [1, 1, 11, 35, 53, 2, 76, 45, 5, 34]}
df2 = pd.DataFrame(data=d)
print(df2)
llista=
for i in df.index:
m=df2['A2'].loc[df2['A2']==i].index
if m[0]:
print m[0],i
llista.append(df2['B2'].iloc[m[0]])
else:
llista.append()
df['B2'] = llista
Output is:
A B B2
A 1 0 AA
B 1 0 BB
C 0 0 CC
D 1 0
E 0 0 EE
F 1 1 FF
G 0 1 GG
As you can see is different than the accepted post. This is because there is no 'D' index in df2['A2']
Both answers are right and they deserve a vote up. However, I have to say that I found this one to be the right one.
– stram hp
Jan 19 at 14:23
add a comment |
I know this has been already answered by W-B in a very elegant way.
However, since I have spent the time to solve this in a less professional way, let me post also my solution.
From:
I'm looking for a way to compute the following: Via the index of df I
can look up in column A2 of df2 the value of B2 which should be added
to df.
I understood I should do:
- get index list form df. So A, B, C...
- look values in df2['B2'] in the same index than df2['A2'] for each element of df index
- create a new column ['B2'] in df, where we copy these values from df2['B2'] matching the index from df to the elements on df2['A2']
This is my code:
import pandas as pd
d = {'A': [1, 1, 0, 1, 0, 1, 0],
'B': [0, 0, 0, 0, 0, 1, 1]
}
df = pd.DataFrame(data=d, index=["A", "B", "C", "D", "E", "F", "G"])
print(df)
d = {'A2': ["D", "A", "A", "B", "C", "C", "E", "X", "F", "G"],
'B2': ["DD", "AA", "AA", "BB", "CC", "CC", "EE", "XX", "FF", "GG"],
'C3': [1, 1, 11, 35, 53, 2, 76, 45, 5, 34]}
df2 = pd.DataFrame(data=d)
print(df2)
llista=
for i in df.index:
m=df2['A2'].loc[df2['A2']==i].index
if m[0]:
print m[0],i
llista.append(df2['B2'].iloc[m[0]])
else:
llista.append()
df['B2'] = llista
Output is:
A B B2
A 1 0 AA
B 1 0 BB
C 0 0 CC
D 1 0
E 0 0 EE
F 1 1 FF
G 0 1 GG
As you can see is different than the accepted post. This is because there is no 'D' index in df2['A2']
Both answers are right and they deserve a vote up. However, I have to say that I found this one to be the right one.
– stram hp
Jan 19 at 14:23
add a comment |
I know this has been already answered by W-B in a very elegant way.
However, since I have spent the time to solve this in a less professional way, let me post also my solution.
From:
I'm looking for a way to compute the following: Via the index of df I
can look up in column A2 of df2 the value of B2 which should be added
to df.
I understood I should do:
- get index list form df. So A, B, C...
- look values in df2['B2'] in the same index than df2['A2'] for each element of df index
- create a new column ['B2'] in df, where we copy these values from df2['B2'] matching the index from df to the elements on df2['A2']
This is my code:
import pandas as pd
d = {'A': [1, 1, 0, 1, 0, 1, 0],
'B': [0, 0, 0, 0, 0, 1, 1]
}
df = pd.DataFrame(data=d, index=["A", "B", "C", "D", "E", "F", "G"])
print(df)
d = {'A2': ["D", "A", "A", "B", "C", "C", "E", "X", "F", "G"],
'B2': ["DD", "AA", "AA", "BB", "CC", "CC", "EE", "XX", "FF", "GG"],
'C3': [1, 1, 11, 35, 53, 2, 76, 45, 5, 34]}
df2 = pd.DataFrame(data=d)
print(df2)
llista=
for i in df.index:
m=df2['A2'].loc[df2['A2']==i].index
if m[0]:
print m[0],i
llista.append(df2['B2'].iloc[m[0]])
else:
llista.append()
df['B2'] = llista
Output is:
A B B2
A 1 0 AA
B 1 0 BB
C 0 0 CC
D 1 0
E 0 0 EE
F 1 1 FF
G 0 1 GG
As you can see is different than the accepted post. This is because there is no 'D' index in df2['A2']
I know this has been already answered by W-B in a very elegant way.
However, since I have spent the time to solve this in a less professional way, let me post also my solution.
From:
I'm looking for a way to compute the following: Via the index of df I
can look up in column A2 of df2 the value of B2 which should be added
to df.
I understood I should do:
- get index list form df. So A, B, C...
- look values in df2['B2'] in the same index than df2['A2'] for each element of df index
- create a new column ['B2'] in df, where we copy these values from df2['B2'] matching the index from df to the elements on df2['A2']
This is my code:
import pandas as pd
d = {'A': [1, 1, 0, 1, 0, 1, 0],
'B': [0, 0, 0, 0, 0, 1, 1]
}
df = pd.DataFrame(data=d, index=["A", "B", "C", "D", "E", "F", "G"])
print(df)
d = {'A2': ["D", "A", "A", "B", "C", "C", "E", "X", "F", "G"],
'B2': ["DD", "AA", "AA", "BB", "CC", "CC", "EE", "XX", "FF", "GG"],
'C3': [1, 1, 11, 35, 53, 2, 76, 45, 5, 34]}
df2 = pd.DataFrame(data=d)
print(df2)
llista=
for i in df.index:
m=df2['A2'].loc[df2['A2']==i].index
if m[0]:
print m[0],i
llista.append(df2['B2'].iloc[m[0]])
else:
llista.append()
df['B2'] = llista
Output is:
A B B2
A 1 0 AA
B 1 0 BB
C 0 0 CC
D 1 0
E 0 0 EE
F 1 1 FF
G 0 1 GG
As you can see is different than the accepted post. This is because there is no 'D' index in df2['A2']
answered Jan 18 at 21:20
danieldaniel
4471717
4471717
Both answers are right and they deserve a vote up. However, I have to say that I found this one to be the right one.
– stram hp
Jan 19 at 14:23
add a comment |
Both answers are right and they deserve a vote up. However, I have to say that I found this one to be the right one.
– stram hp
Jan 19 at 14:23
Both answers are right and they deserve a vote up. However, I have to say that I found this one to be the right one.
– stram hp
Jan 19 at 14:23
Both answers are right and they deserve a vote up. However, I have to say that I found this one to be the right one.
– stram hp
Jan 19 at 14:23
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54260920%2fcombine-merge-dataframes-with-different-indexes-and-different-column-names%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
where is
DD
come from– W-B
Jan 18 at 20:28
df2 is duplicated with respect to
'A2'
so it's not guaranteed to be a unique mapping. What do we do in this case (i.e. is justdrop_duplicates
good or can A ever be mapped to multiple values like AA and AB?)– ALollz
Jan 18 at 20:29
@W-B i fixed the issue in the MWE
– d4rty
Jan 18 at 20:30
1
@ALollz In
df2
if two rows have the sameA2
value than it follows that they also have the sameB2
value. This is guaranteed by the data– d4rty
Jan 18 at 20:32