Combine/merge DataFrames with different indexes and different column names












4















I have problems to merge two dataframes in the desired way. I unsuccessfully tried out a lot with merge and join methods but I did not achieve the desired result.



import pandas as pd

d = {'A': [1, 1, 0, 1, 0, 1, 0],
'B': [0, 0, 0, 0, 0, 1, 1]
}
df = pd.DataFrame(data=d, index=["A", "B", "C", "D", "E", "F", "G"])
print(df)

d = {'A2': ["D", "A", "A", "B", "C", "C", "E", "X", "F", "G"],
'B2': ["DD", "AA", "AA", "BB", "CC", "CC", "EE", "XX", "FF", "GG"],
'C3': [1, 1, 11, 35, 53, 2, 76, 45, 5, 34]}


df2 = pd.DataFrame(data=d)
print(df2)


Console output:



   A  B
A 1 0
B 1 0
C 0 0
D 1 0
E 0 0
F 1 1
G 0 1

A2 B2 C3
0 A AA 1
1 A AA 11
2 B BB 35
3 C CC 53
4 C CC 2
5 E EE 76
6 X XX 45
7 F FF 5
8 G GG 34


I'm looking for a way to compute the following: Via the index of df I can look up in column A2 of df2 the value of B2 which should be added to df.



Desired result:



   A  B  B2
A 1 0 AA
B 1 0 BB
C 0 0 CC
D 1 0 DD
E 0 0 EE
F 1 1 FF
G 0 1 GG


(This is only dummy data, just duplicating the index and write it in column B2 of df is not sufficient)










share|improve this question




















  • 1





    where is DD come from

    – W-B
    Jan 18 at 20:28











  • df2 is duplicated with respect to 'A2' so it's not guaranteed to be a unique mapping. What do we do in this case (i.e. is just drop_duplicates good or can A ever be mapped to multiple values like AA and AB?)

    – ALollz
    Jan 18 at 20:29













  • @W-B i fixed the issue in the MWE

    – d4rty
    Jan 18 at 20:30






  • 1





    @ALollz In df2 if two rows have the same A2 value than it follows that they also have the same B2 value. This is guaranteed by the data

    – d4rty
    Jan 18 at 20:32
















4















I have problems to merge two dataframes in the desired way. I unsuccessfully tried out a lot with merge and join methods but I did not achieve the desired result.



import pandas as pd

d = {'A': [1, 1, 0, 1, 0, 1, 0],
'B': [0, 0, 0, 0, 0, 1, 1]
}
df = pd.DataFrame(data=d, index=["A", "B", "C", "D", "E", "F", "G"])
print(df)

d = {'A2': ["D", "A", "A", "B", "C", "C", "E", "X", "F", "G"],
'B2': ["DD", "AA", "AA", "BB", "CC", "CC", "EE", "XX", "FF", "GG"],
'C3': [1, 1, 11, 35, 53, 2, 76, 45, 5, 34]}


df2 = pd.DataFrame(data=d)
print(df2)


Console output:



   A  B
A 1 0
B 1 0
C 0 0
D 1 0
E 0 0
F 1 1
G 0 1

A2 B2 C3
0 A AA 1
1 A AA 11
2 B BB 35
3 C CC 53
4 C CC 2
5 E EE 76
6 X XX 45
7 F FF 5
8 G GG 34


I'm looking for a way to compute the following: Via the index of df I can look up in column A2 of df2 the value of B2 which should be added to df.



Desired result:



   A  B  B2
A 1 0 AA
B 1 0 BB
C 0 0 CC
D 1 0 DD
E 0 0 EE
F 1 1 FF
G 0 1 GG


(This is only dummy data, just duplicating the index and write it in column B2 of df is not sufficient)










share|improve this question




















  • 1





    where is DD come from

    – W-B
    Jan 18 at 20:28











  • df2 is duplicated with respect to 'A2' so it's not guaranteed to be a unique mapping. What do we do in this case (i.e. is just drop_duplicates good or can A ever be mapped to multiple values like AA and AB?)

    – ALollz
    Jan 18 at 20:29













  • @W-B i fixed the issue in the MWE

    – d4rty
    Jan 18 at 20:30






  • 1





    @ALollz In df2 if two rows have the same A2 value than it follows that they also have the same B2 value. This is guaranteed by the data

    – d4rty
    Jan 18 at 20:32














4












4








4








I have problems to merge two dataframes in the desired way. I unsuccessfully tried out a lot with merge and join methods but I did not achieve the desired result.



import pandas as pd

d = {'A': [1, 1, 0, 1, 0, 1, 0],
'B': [0, 0, 0, 0, 0, 1, 1]
}
df = pd.DataFrame(data=d, index=["A", "B", "C", "D", "E", "F", "G"])
print(df)

d = {'A2': ["D", "A", "A", "B", "C", "C", "E", "X", "F", "G"],
'B2': ["DD", "AA", "AA", "BB", "CC", "CC", "EE", "XX", "FF", "GG"],
'C3': [1, 1, 11, 35, 53, 2, 76, 45, 5, 34]}


df2 = pd.DataFrame(data=d)
print(df2)


Console output:



   A  B
A 1 0
B 1 0
C 0 0
D 1 0
E 0 0
F 1 1
G 0 1

A2 B2 C3
0 A AA 1
1 A AA 11
2 B BB 35
3 C CC 53
4 C CC 2
5 E EE 76
6 X XX 45
7 F FF 5
8 G GG 34


I'm looking for a way to compute the following: Via the index of df I can look up in column A2 of df2 the value of B2 which should be added to df.



Desired result:



   A  B  B2
A 1 0 AA
B 1 0 BB
C 0 0 CC
D 1 0 DD
E 0 0 EE
F 1 1 FF
G 0 1 GG


(This is only dummy data, just duplicating the index and write it in column B2 of df is not sufficient)










share|improve this question
















I have problems to merge two dataframes in the desired way. I unsuccessfully tried out a lot with merge and join methods but I did not achieve the desired result.



import pandas as pd

d = {'A': [1, 1, 0, 1, 0, 1, 0],
'B': [0, 0, 0, 0, 0, 1, 1]
}
df = pd.DataFrame(data=d, index=["A", "B", "C", "D", "E", "F", "G"])
print(df)

d = {'A2': ["D", "A", "A", "B", "C", "C", "E", "X", "F", "G"],
'B2': ["DD", "AA", "AA", "BB", "CC", "CC", "EE", "XX", "FF", "GG"],
'C3': [1, 1, 11, 35, 53, 2, 76, 45, 5, 34]}


df2 = pd.DataFrame(data=d)
print(df2)


Console output:



   A  B
A 1 0
B 1 0
C 0 0
D 1 0
E 0 0
F 1 1
G 0 1

A2 B2 C3
0 A AA 1
1 A AA 11
2 B BB 35
3 C CC 53
4 C CC 2
5 E EE 76
6 X XX 45
7 F FF 5
8 G GG 34


I'm looking for a way to compute the following: Via the index of df I can look up in column A2 of df2 the value of B2 which should be added to df.



Desired result:



   A  B  B2
A 1 0 AA
B 1 0 BB
C 0 0 CC
D 1 0 DD
E 0 0 EE
F 1 1 FF
G 0 1 GG


(This is only dummy data, just duplicating the index and write it in column B2 of df is not sufficient)







pandas dataframe






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 18 at 20:30







d4rty

















asked Jan 18 at 20:22









d4rtyd4rty

1,32821232




1,32821232








  • 1





    where is DD come from

    – W-B
    Jan 18 at 20:28











  • df2 is duplicated with respect to 'A2' so it's not guaranteed to be a unique mapping. What do we do in this case (i.e. is just drop_duplicates good or can A ever be mapped to multiple values like AA and AB?)

    – ALollz
    Jan 18 at 20:29













  • @W-B i fixed the issue in the MWE

    – d4rty
    Jan 18 at 20:30






  • 1





    @ALollz In df2 if two rows have the same A2 value than it follows that they also have the same B2 value. This is guaranteed by the data

    – d4rty
    Jan 18 at 20:32














  • 1





    where is DD come from

    – W-B
    Jan 18 at 20:28











  • df2 is duplicated with respect to 'A2' so it's not guaranteed to be a unique mapping. What do we do in this case (i.e. is just drop_duplicates good or can A ever be mapped to multiple values like AA and AB?)

    – ALollz
    Jan 18 at 20:29













  • @W-B i fixed the issue in the MWE

    – d4rty
    Jan 18 at 20:30






  • 1





    @ALollz In df2 if two rows have the same A2 value than it follows that they also have the same B2 value. This is guaranteed by the data

    – d4rty
    Jan 18 at 20:32








1




1





where is DD come from

– W-B
Jan 18 at 20:28





where is DD come from

– W-B
Jan 18 at 20:28













df2 is duplicated with respect to 'A2' so it's not guaranteed to be a unique mapping. What do we do in this case (i.e. is just drop_duplicates good or can A ever be mapped to multiple values like AA and AB?)

– ALollz
Jan 18 at 20:29







df2 is duplicated with respect to 'A2' so it's not guaranteed to be a unique mapping. What do we do in this case (i.e. is just drop_duplicates good or can A ever be mapped to multiple values like AA and AB?)

– ALollz
Jan 18 at 20:29















@W-B i fixed the issue in the MWE

– d4rty
Jan 18 at 20:30





@W-B i fixed the issue in the MWE

– d4rty
Jan 18 at 20:30




1




1





@ALollz In df2 if two rows have the same A2 value than it follows that they also have the same B2 value. This is guaranteed by the data

– d4rty
Jan 18 at 20:32





@ALollz In df2 if two rows have the same A2 value than it follows that they also have the same B2 value. This is guaranteed by the data

– d4rty
Jan 18 at 20:32












2 Answers
2






active

oldest

votes


















3














set_index and assign it



df['B2']=df2.drop_duplicates('A2').set_index('A2')['B2']
df
Out[728]:
A B B2
A 1 0 AA
B 1 0 BB
C 0 0 CC
D 1 0 DD
E 0 0 EE
F 1 1 FF
G 0 1 GG





share|improve this answer































    1














    I know this has been already answered by W-B in a very elegant way.



    However, since I have spent the time to solve this in a less professional way, let me post also my solution.



    From:




    I'm looking for a way to compute the following: Via the index of df I
    can look up in column A2 of df2 the value of B2 which should be added
    to df.




    I understood I should do:




    1. get index list form df. So A, B, C...

    2. look values in df2['B2'] in the same index than df2['A2'] for each element of df index

    3. create a new column ['B2'] in df, where we copy these values from df2['B2'] matching the index from df to the elements on df2['A2']


    This is my code:



    import pandas as pd

    d = {'A': [1, 1, 0, 1, 0, 1, 0],
    'B': [0, 0, 0, 0, 0, 1, 1]
    }
    df = pd.DataFrame(data=d, index=["A", "B", "C", "D", "E", "F", "G"])
    print(df)

    d = {'A2': ["D", "A", "A", "B", "C", "C", "E", "X", "F", "G"],
    'B2': ["DD", "AA", "AA", "BB", "CC", "CC", "EE", "XX", "FF", "GG"],
    'C3': [1, 1, 11, 35, 53, 2, 76, 45, 5, 34]}


    df2 = pd.DataFrame(data=d)
    print(df2)

    llista=
    for i in df.index:
    m=df2['A2'].loc[df2['A2']==i].index
    if m[0]:
    print m[0],i
    llista.append(df2['B2'].iloc[m[0]])
    else:
    llista.append()
    df['B2'] = llista


    Output is:



       A  B  B2
    A 1 0 AA
    B 1 0 BB
    C 0 0 CC
    D 1 0
    E 0 0 EE
    F 1 1 FF
    G 0 1 GG


    As you can see is different than the accepted post. This is because there is no 'D' index in df2['A2']






    share|improve this answer
























    • Both answers are right and they deserve a vote up. However, I have to say that I found this one to be the right one.

      – stram hp
      Jan 19 at 14:23











    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54260920%2fcombine-merge-dataframes-with-different-indexes-and-different-column-names%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    3














    set_index and assign it



    df['B2']=df2.drop_duplicates('A2').set_index('A2')['B2']
    df
    Out[728]:
    A B B2
    A 1 0 AA
    B 1 0 BB
    C 0 0 CC
    D 1 0 DD
    E 0 0 EE
    F 1 1 FF
    G 0 1 GG





    share|improve this answer




























      3














      set_index and assign it



      df['B2']=df2.drop_duplicates('A2').set_index('A2')['B2']
      df
      Out[728]:
      A B B2
      A 1 0 AA
      B 1 0 BB
      C 0 0 CC
      D 1 0 DD
      E 0 0 EE
      F 1 1 FF
      G 0 1 GG





      share|improve this answer


























        3












        3








        3







        set_index and assign it



        df['B2']=df2.drop_duplicates('A2').set_index('A2')['B2']
        df
        Out[728]:
        A B B2
        A 1 0 AA
        B 1 0 BB
        C 0 0 CC
        D 1 0 DD
        E 0 0 EE
        F 1 1 FF
        G 0 1 GG





        share|improve this answer













        set_index and assign it



        df['B2']=df2.drop_duplicates('A2').set_index('A2')['B2']
        df
        Out[728]:
        A B B2
        A 1 0 AA
        B 1 0 BB
        C 0 0 CC
        D 1 0 DD
        E 0 0 EE
        F 1 1 FF
        G 0 1 GG






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 18 at 20:32









        W-BW-B

        107k83165




        107k83165

























            1














            I know this has been already answered by W-B in a very elegant way.



            However, since I have spent the time to solve this in a less professional way, let me post also my solution.



            From:




            I'm looking for a way to compute the following: Via the index of df I
            can look up in column A2 of df2 the value of B2 which should be added
            to df.




            I understood I should do:




            1. get index list form df. So A, B, C...

            2. look values in df2['B2'] in the same index than df2['A2'] for each element of df index

            3. create a new column ['B2'] in df, where we copy these values from df2['B2'] matching the index from df to the elements on df2['A2']


            This is my code:



            import pandas as pd

            d = {'A': [1, 1, 0, 1, 0, 1, 0],
            'B': [0, 0, 0, 0, 0, 1, 1]
            }
            df = pd.DataFrame(data=d, index=["A", "B", "C", "D", "E", "F", "G"])
            print(df)

            d = {'A2': ["D", "A", "A", "B", "C", "C", "E", "X", "F", "G"],
            'B2': ["DD", "AA", "AA", "BB", "CC", "CC", "EE", "XX", "FF", "GG"],
            'C3': [1, 1, 11, 35, 53, 2, 76, 45, 5, 34]}


            df2 = pd.DataFrame(data=d)
            print(df2)

            llista=
            for i in df.index:
            m=df2['A2'].loc[df2['A2']==i].index
            if m[0]:
            print m[0],i
            llista.append(df2['B2'].iloc[m[0]])
            else:
            llista.append()
            df['B2'] = llista


            Output is:



               A  B  B2
            A 1 0 AA
            B 1 0 BB
            C 0 0 CC
            D 1 0
            E 0 0 EE
            F 1 1 FF
            G 0 1 GG


            As you can see is different than the accepted post. This is because there is no 'D' index in df2['A2']






            share|improve this answer
























            • Both answers are right and they deserve a vote up. However, I have to say that I found this one to be the right one.

              – stram hp
              Jan 19 at 14:23
















            1














            I know this has been already answered by W-B in a very elegant way.



            However, since I have spent the time to solve this in a less professional way, let me post also my solution.



            From:




            I'm looking for a way to compute the following: Via the index of df I
            can look up in column A2 of df2 the value of B2 which should be added
            to df.




            I understood I should do:




            1. get index list form df. So A, B, C...

            2. look values in df2['B2'] in the same index than df2['A2'] for each element of df index

            3. create a new column ['B2'] in df, where we copy these values from df2['B2'] matching the index from df to the elements on df2['A2']


            This is my code:



            import pandas as pd

            d = {'A': [1, 1, 0, 1, 0, 1, 0],
            'B': [0, 0, 0, 0, 0, 1, 1]
            }
            df = pd.DataFrame(data=d, index=["A", "B", "C", "D", "E", "F", "G"])
            print(df)

            d = {'A2': ["D", "A", "A", "B", "C", "C", "E", "X", "F", "G"],
            'B2': ["DD", "AA", "AA", "BB", "CC", "CC", "EE", "XX", "FF", "GG"],
            'C3': [1, 1, 11, 35, 53, 2, 76, 45, 5, 34]}


            df2 = pd.DataFrame(data=d)
            print(df2)

            llista=
            for i in df.index:
            m=df2['A2'].loc[df2['A2']==i].index
            if m[0]:
            print m[0],i
            llista.append(df2['B2'].iloc[m[0]])
            else:
            llista.append()
            df['B2'] = llista


            Output is:



               A  B  B2
            A 1 0 AA
            B 1 0 BB
            C 0 0 CC
            D 1 0
            E 0 0 EE
            F 1 1 FF
            G 0 1 GG


            As you can see is different than the accepted post. This is because there is no 'D' index in df2['A2']






            share|improve this answer
























            • Both answers are right and they deserve a vote up. However, I have to say that I found this one to be the right one.

              – stram hp
              Jan 19 at 14:23














            1












            1








            1







            I know this has been already answered by W-B in a very elegant way.



            However, since I have spent the time to solve this in a less professional way, let me post also my solution.



            From:




            I'm looking for a way to compute the following: Via the index of df I
            can look up in column A2 of df2 the value of B2 which should be added
            to df.




            I understood I should do:




            1. get index list form df. So A, B, C...

            2. look values in df2['B2'] in the same index than df2['A2'] for each element of df index

            3. create a new column ['B2'] in df, where we copy these values from df2['B2'] matching the index from df to the elements on df2['A2']


            This is my code:



            import pandas as pd

            d = {'A': [1, 1, 0, 1, 0, 1, 0],
            'B': [0, 0, 0, 0, 0, 1, 1]
            }
            df = pd.DataFrame(data=d, index=["A", "B", "C", "D", "E", "F", "G"])
            print(df)

            d = {'A2': ["D", "A", "A", "B", "C", "C", "E", "X", "F", "G"],
            'B2': ["DD", "AA", "AA", "BB", "CC", "CC", "EE", "XX", "FF", "GG"],
            'C3': [1, 1, 11, 35, 53, 2, 76, 45, 5, 34]}


            df2 = pd.DataFrame(data=d)
            print(df2)

            llista=
            for i in df.index:
            m=df2['A2'].loc[df2['A2']==i].index
            if m[0]:
            print m[0],i
            llista.append(df2['B2'].iloc[m[0]])
            else:
            llista.append()
            df['B2'] = llista


            Output is:



               A  B  B2
            A 1 0 AA
            B 1 0 BB
            C 0 0 CC
            D 1 0
            E 0 0 EE
            F 1 1 FF
            G 0 1 GG


            As you can see is different than the accepted post. This is because there is no 'D' index in df2['A2']






            share|improve this answer













            I know this has been already answered by W-B in a very elegant way.



            However, since I have spent the time to solve this in a less professional way, let me post also my solution.



            From:




            I'm looking for a way to compute the following: Via the index of df I
            can look up in column A2 of df2 the value of B2 which should be added
            to df.




            I understood I should do:




            1. get index list form df. So A, B, C...

            2. look values in df2['B2'] in the same index than df2['A2'] for each element of df index

            3. create a new column ['B2'] in df, where we copy these values from df2['B2'] matching the index from df to the elements on df2['A2']


            This is my code:



            import pandas as pd

            d = {'A': [1, 1, 0, 1, 0, 1, 0],
            'B': [0, 0, 0, 0, 0, 1, 1]
            }
            df = pd.DataFrame(data=d, index=["A", "B", "C", "D", "E", "F", "G"])
            print(df)

            d = {'A2': ["D", "A", "A", "B", "C", "C", "E", "X", "F", "G"],
            'B2': ["DD", "AA", "AA", "BB", "CC", "CC", "EE", "XX", "FF", "GG"],
            'C3': [1, 1, 11, 35, 53, 2, 76, 45, 5, 34]}


            df2 = pd.DataFrame(data=d)
            print(df2)

            llista=
            for i in df.index:
            m=df2['A2'].loc[df2['A2']==i].index
            if m[0]:
            print m[0],i
            llista.append(df2['B2'].iloc[m[0]])
            else:
            llista.append()
            df['B2'] = llista


            Output is:



               A  B  B2
            A 1 0 AA
            B 1 0 BB
            C 0 0 CC
            D 1 0
            E 0 0 EE
            F 1 1 FF
            G 0 1 GG


            As you can see is different than the accepted post. This is because there is no 'D' index in df2['A2']







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Jan 18 at 21:20









            danieldaniel

            4471717




            4471717













            • Both answers are right and they deserve a vote up. However, I have to say that I found this one to be the right one.

              – stram hp
              Jan 19 at 14:23



















            • Both answers are right and they deserve a vote up. However, I have to say that I found this one to be the right one.

              – stram hp
              Jan 19 at 14:23

















            Both answers are right and they deserve a vote up. However, I have to say that I found this one to be the right one.

            – stram hp
            Jan 19 at 14:23





            Both answers are right and they deserve a vote up. However, I have to say that I found this one to be the right one.

            – stram hp
            Jan 19 at 14:23


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54260920%2fcombine-merge-dataframes-with-different-indexes-and-different-column-names%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Liquibase includeAll doesn't find base path

            How to use setInterval in EJS file?

            Petrus Granier-Deferre