Fastest way to sort each row in a pandas dataframe












9















I need to find the quickest way to sort each row in a dataframe with millions of rows and around a hundred columns.



So something like this:



A   B   C   D
3 4 8 1
9 2 7 2


Needs to become:



A   B   C   D
8 4 3 1
9 7 2 2


Right now I'm applying sort to each row and building up a new dataframe row by row. I'm also doing a couple of extra, less important things to each row (hence why I'm using pandas and not numpy). Could it be quicker to instead create a list of lists and then build the new dataframe at once? Or do I need to go cython?










share|improve this question























  • Transpose it, sort it, transpose it back?

    – Jon Clements
    Sep 12 '14 at 22:51











  • How would transposing it make the sorting quicker?

    – Luke
    Sep 12 '14 at 22:53











  • You just change the "view" of the mapping... so you still need to do the sort, but you turn a 1mx100 into 100x1m in the same space, sort that, then reversing it, you just have the different view on the data back

    – Jon Clements
    Sep 12 '14 at 22:56













  • I'm still confused. I would just have to sort a million columns instead of a million rows.

    – Luke
    Sep 12 '14 at 23:05
















9















I need to find the quickest way to sort each row in a dataframe with millions of rows and around a hundred columns.



So something like this:



A   B   C   D
3 4 8 1
9 2 7 2


Needs to become:



A   B   C   D
8 4 3 1
9 7 2 2


Right now I'm applying sort to each row and building up a new dataframe row by row. I'm also doing a couple of extra, less important things to each row (hence why I'm using pandas and not numpy). Could it be quicker to instead create a list of lists and then build the new dataframe at once? Or do I need to go cython?










share|improve this question























  • Transpose it, sort it, transpose it back?

    – Jon Clements
    Sep 12 '14 at 22:51











  • How would transposing it make the sorting quicker?

    – Luke
    Sep 12 '14 at 22:53











  • You just change the "view" of the mapping... so you still need to do the sort, but you turn a 1mx100 into 100x1m in the same space, sort that, then reversing it, you just have the different view on the data back

    – Jon Clements
    Sep 12 '14 at 22:56













  • I'm still confused. I would just have to sort a million columns instead of a million rows.

    – Luke
    Sep 12 '14 at 23:05














9












9








9


1






I need to find the quickest way to sort each row in a dataframe with millions of rows and around a hundred columns.



So something like this:



A   B   C   D
3 4 8 1
9 2 7 2


Needs to become:



A   B   C   D
8 4 3 1
9 7 2 2


Right now I'm applying sort to each row and building up a new dataframe row by row. I'm also doing a couple of extra, less important things to each row (hence why I'm using pandas and not numpy). Could it be quicker to instead create a list of lists and then build the new dataframe at once? Or do I need to go cython?










share|improve this question














I need to find the quickest way to sort each row in a dataframe with millions of rows and around a hundred columns.



So something like this:



A   B   C   D
3 4 8 1
9 2 7 2


Needs to become:



A   B   C   D
8 4 3 1
9 7 2 2


Right now I'm applying sort to each row and building up a new dataframe row by row. I'm also doing a couple of extra, less important things to each row (hence why I'm using pandas and not numpy). Could it be quicker to instead create a list of lists and then build the new dataframe at once? Or do I need to go cython?







python performance pandas






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Sep 12 '14 at 22:45









LukeLuke

1,60042346




1,60042346













  • Transpose it, sort it, transpose it back?

    – Jon Clements
    Sep 12 '14 at 22:51











  • How would transposing it make the sorting quicker?

    – Luke
    Sep 12 '14 at 22:53











  • You just change the "view" of the mapping... so you still need to do the sort, but you turn a 1mx100 into 100x1m in the same space, sort that, then reversing it, you just have the different view on the data back

    – Jon Clements
    Sep 12 '14 at 22:56













  • I'm still confused. I would just have to sort a million columns instead of a million rows.

    – Luke
    Sep 12 '14 at 23:05



















  • Transpose it, sort it, transpose it back?

    – Jon Clements
    Sep 12 '14 at 22:51











  • How would transposing it make the sorting quicker?

    – Luke
    Sep 12 '14 at 22:53











  • You just change the "view" of the mapping... so you still need to do the sort, but you turn a 1mx100 into 100x1m in the same space, sort that, then reversing it, you just have the different view on the data back

    – Jon Clements
    Sep 12 '14 at 22:56













  • I'm still confused. I would just have to sort a million columns instead of a million rows.

    – Luke
    Sep 12 '14 at 23:05

















Transpose it, sort it, transpose it back?

– Jon Clements
Sep 12 '14 at 22:51





Transpose it, sort it, transpose it back?

– Jon Clements
Sep 12 '14 at 22:51













How would transposing it make the sorting quicker?

– Luke
Sep 12 '14 at 22:53





How would transposing it make the sorting quicker?

– Luke
Sep 12 '14 at 22:53













You just change the "view" of the mapping... so you still need to do the sort, but you turn a 1mx100 into 100x1m in the same space, sort that, then reversing it, you just have the different view on the data back

– Jon Clements
Sep 12 '14 at 22:56







You just change the "view" of the mapping... so you still need to do the sort, but you turn a 1mx100 into 100x1m in the same space, sort that, then reversing it, you just have the different view on the data back

– Jon Clements
Sep 12 '14 at 22:56















I'm still confused. I would just have to sort a million columns instead of a million rows.

– Luke
Sep 12 '14 at 23:05





I'm still confused. I would just have to sort a million columns instead of a million rows.

– Luke
Sep 12 '14 at 23:05












3 Answers
3






active

oldest

votes


















12














I think I would do this in numpy:



In [11]: a = df.values

In [12]: a.sort(axis=1) # no ascending argument

In [13]: a = a[:, ::-1] # so reverse

In [14]: a
Out[14]:
array([[8, 4, 3, 1],
[9, 7, 2, 2]])

In [15]: pd.DataFrame(a, df.index, df.columns)
Out[15]:
A B C D
0 8 4 3 1
1 9 7 2 2




I had thought this might work, but it sorts the columns:



In [21]: df.sort(axis=1, ascending=False)
Out[21]:
D C B A
0 1 8 4 3
1 2 7 2 9


Ah, pandas raises:



In [22]: df.sort(df.columns, axis=1, ascending=False)



ValueError: When sorting by column, axis must be 0 (rows)







share|improve this answer

































    4














    To Add to the answer given by @Andy-Hayden, to do this inplace to the whole frame... not really sure why this works, but it does. There seems to be no control on the order.



        In [97]: A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five'])

    In [98]: A
    Out[98]:
    one two three four five
    0 22 63 72 46 49
    1 43 30 69 33 25
    2 93 24 21 56 39
    3 3 57 52 11 74

    In [99]: A.values.sort
    Out[99]: <function ndarray.sort>

    In [100]: A
    Out[100]:
    one two three four five
    0 22 63 72 46 49
    1 43 30 69 33 25
    2 93 24 21 56 39
    3 3 57 52 11 74

    In [101]: A.values.sort()

    In [102]: A
    Out[102]:
    one two three four five
    0 22 46 49 63 72
    1 25 30 33 43 69
    2 21 24 39 56 93
    3 3 11 52 57 74
    In [103]: A = A.iloc[:,::-1]

    In [104]: A
    Out[104]:
    five four three two one
    0 72 63 49 46 22
    1 69 43 33 30 25
    2 93 56 39 24 21
    3 74 57 52 11 3


    I hope someone can explain the why of this, just happy that it works 8)






    share|improve this answer
























    • A.values returns the numpy representation of A, so this sort is just a numpy sort, done in place.

      – ptrj
      May 6 '16 at 17:30



















    1














    You could use pd.apply.



    Eg:

    A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five'])
    print (A)

    one two three four five
    0 2 75 44 53 46
    1 18 51 73 80 66
    2 35 91 86 44 25
    3 60 97 57 33 79

    A = A.apply(np.sort, axis = 1)
    print(A)

    one two three four five
    0 2 44 46 53 75
    1 18 51 66 73 80
    2 25 35 44 86 91
    3 33 57 60 79 97


    Since you want it in descending order, you can simply multiply the dataframe with -1 and sort it.



    A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five'])
    A = A * -1
    A = A.apply(np.sort, axis = 1)
    A = A * -1





    share|improve this answer























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f25817930%2ffastest-way-to-sort-each-row-in-a-pandas-dataframe%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      12














      I think I would do this in numpy:



      In [11]: a = df.values

      In [12]: a.sort(axis=1) # no ascending argument

      In [13]: a = a[:, ::-1] # so reverse

      In [14]: a
      Out[14]:
      array([[8, 4, 3, 1],
      [9, 7, 2, 2]])

      In [15]: pd.DataFrame(a, df.index, df.columns)
      Out[15]:
      A B C D
      0 8 4 3 1
      1 9 7 2 2




      I had thought this might work, but it sorts the columns:



      In [21]: df.sort(axis=1, ascending=False)
      Out[21]:
      D C B A
      0 1 8 4 3
      1 2 7 2 9


      Ah, pandas raises:



      In [22]: df.sort(df.columns, axis=1, ascending=False)



      ValueError: When sorting by column, axis must be 0 (rows)







      share|improve this answer






























        12














        I think I would do this in numpy:



        In [11]: a = df.values

        In [12]: a.sort(axis=1) # no ascending argument

        In [13]: a = a[:, ::-1] # so reverse

        In [14]: a
        Out[14]:
        array([[8, 4, 3, 1],
        [9, 7, 2, 2]])

        In [15]: pd.DataFrame(a, df.index, df.columns)
        Out[15]:
        A B C D
        0 8 4 3 1
        1 9 7 2 2




        I had thought this might work, but it sorts the columns:



        In [21]: df.sort(axis=1, ascending=False)
        Out[21]:
        D C B A
        0 1 8 4 3
        1 2 7 2 9


        Ah, pandas raises:



        In [22]: df.sort(df.columns, axis=1, ascending=False)



        ValueError: When sorting by column, axis must be 0 (rows)







        share|improve this answer




























          12












          12








          12







          I think I would do this in numpy:



          In [11]: a = df.values

          In [12]: a.sort(axis=1) # no ascending argument

          In [13]: a = a[:, ::-1] # so reverse

          In [14]: a
          Out[14]:
          array([[8, 4, 3, 1],
          [9, 7, 2, 2]])

          In [15]: pd.DataFrame(a, df.index, df.columns)
          Out[15]:
          A B C D
          0 8 4 3 1
          1 9 7 2 2




          I had thought this might work, but it sorts the columns:



          In [21]: df.sort(axis=1, ascending=False)
          Out[21]:
          D C B A
          0 1 8 4 3
          1 2 7 2 9


          Ah, pandas raises:



          In [22]: df.sort(df.columns, axis=1, ascending=False)



          ValueError: When sorting by column, axis must be 0 (rows)







          share|improve this answer















          I think I would do this in numpy:



          In [11]: a = df.values

          In [12]: a.sort(axis=1) # no ascending argument

          In [13]: a = a[:, ::-1] # so reverse

          In [14]: a
          Out[14]:
          array([[8, 4, 3, 1],
          [9, 7, 2, 2]])

          In [15]: pd.DataFrame(a, df.index, df.columns)
          Out[15]:
          A B C D
          0 8 4 3 1
          1 9 7 2 2




          I had thought this might work, but it sorts the columns:



          In [21]: df.sort(axis=1, ascending=False)
          Out[21]:
          D C B A
          0 1 8 4 3
          1 2 7 2 9


          Ah, pandas raises:



          In [22]: df.sort(df.columns, axis=1, ascending=False)



          ValueError: When sorting by column, axis must be 0 (rows)








          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 2 days ago









          nick

          528414




          528414










          answered Sep 12 '14 at 23:06









          Andy HaydenAndy Hayden

          179k51422411




          179k51422411

























              4














              To Add to the answer given by @Andy-Hayden, to do this inplace to the whole frame... not really sure why this works, but it does. There seems to be no control on the order.



                  In [97]: A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five'])

              In [98]: A
              Out[98]:
              one two three four five
              0 22 63 72 46 49
              1 43 30 69 33 25
              2 93 24 21 56 39
              3 3 57 52 11 74

              In [99]: A.values.sort
              Out[99]: <function ndarray.sort>

              In [100]: A
              Out[100]:
              one two three four five
              0 22 63 72 46 49
              1 43 30 69 33 25
              2 93 24 21 56 39
              3 3 57 52 11 74

              In [101]: A.values.sort()

              In [102]: A
              Out[102]:
              one two three four five
              0 22 46 49 63 72
              1 25 30 33 43 69
              2 21 24 39 56 93
              3 3 11 52 57 74
              In [103]: A = A.iloc[:,::-1]

              In [104]: A
              Out[104]:
              five four three two one
              0 72 63 49 46 22
              1 69 43 33 30 25
              2 93 56 39 24 21
              3 74 57 52 11 3


              I hope someone can explain the why of this, just happy that it works 8)






              share|improve this answer
























              • A.values returns the numpy representation of A, so this sort is just a numpy sort, done in place.

                – ptrj
                May 6 '16 at 17:30
















              4














              To Add to the answer given by @Andy-Hayden, to do this inplace to the whole frame... not really sure why this works, but it does. There seems to be no control on the order.



                  In [97]: A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five'])

              In [98]: A
              Out[98]:
              one two three four five
              0 22 63 72 46 49
              1 43 30 69 33 25
              2 93 24 21 56 39
              3 3 57 52 11 74

              In [99]: A.values.sort
              Out[99]: <function ndarray.sort>

              In [100]: A
              Out[100]:
              one two three four five
              0 22 63 72 46 49
              1 43 30 69 33 25
              2 93 24 21 56 39
              3 3 57 52 11 74

              In [101]: A.values.sort()

              In [102]: A
              Out[102]:
              one two three four five
              0 22 46 49 63 72
              1 25 30 33 43 69
              2 21 24 39 56 93
              3 3 11 52 57 74
              In [103]: A = A.iloc[:,::-1]

              In [104]: A
              Out[104]:
              five four three two one
              0 72 63 49 46 22
              1 69 43 33 30 25
              2 93 56 39 24 21
              3 74 57 52 11 3


              I hope someone can explain the why of this, just happy that it works 8)






              share|improve this answer
























              • A.values returns the numpy representation of A, so this sort is just a numpy sort, done in place.

                – ptrj
                May 6 '16 at 17:30














              4












              4








              4







              To Add to the answer given by @Andy-Hayden, to do this inplace to the whole frame... not really sure why this works, but it does. There seems to be no control on the order.



                  In [97]: A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five'])

              In [98]: A
              Out[98]:
              one two three four five
              0 22 63 72 46 49
              1 43 30 69 33 25
              2 93 24 21 56 39
              3 3 57 52 11 74

              In [99]: A.values.sort
              Out[99]: <function ndarray.sort>

              In [100]: A
              Out[100]:
              one two three four five
              0 22 63 72 46 49
              1 43 30 69 33 25
              2 93 24 21 56 39
              3 3 57 52 11 74

              In [101]: A.values.sort()

              In [102]: A
              Out[102]:
              one two three four five
              0 22 46 49 63 72
              1 25 30 33 43 69
              2 21 24 39 56 93
              3 3 11 52 57 74
              In [103]: A = A.iloc[:,::-1]

              In [104]: A
              Out[104]:
              five four three two one
              0 72 63 49 46 22
              1 69 43 33 30 25
              2 93 56 39 24 21
              3 74 57 52 11 3


              I hope someone can explain the why of this, just happy that it works 8)






              share|improve this answer













              To Add to the answer given by @Andy-Hayden, to do this inplace to the whole frame... not really sure why this works, but it does. There seems to be no control on the order.



                  In [97]: A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five'])

              In [98]: A
              Out[98]:
              one two three four five
              0 22 63 72 46 49
              1 43 30 69 33 25
              2 93 24 21 56 39
              3 3 57 52 11 74

              In [99]: A.values.sort
              Out[99]: <function ndarray.sort>

              In [100]: A
              Out[100]:
              one two three four five
              0 22 63 72 46 49
              1 43 30 69 33 25
              2 93 24 21 56 39
              3 3 57 52 11 74

              In [101]: A.values.sort()

              In [102]: A
              Out[102]:
              one two three four five
              0 22 46 49 63 72
              1 25 30 33 43 69
              2 21 24 39 56 93
              3 3 11 52 57 74
              In [103]: A = A.iloc[:,::-1]

              In [104]: A
              Out[104]:
              five four three two one
              0 72 63 49 46 22
              1 69 43 33 30 25
              2 93 56 39 24 21
              3 74 57 52 11 3


              I hope someone can explain the why of this, just happy that it works 8)







              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered Apr 24 '15 at 7:56









              SpmPSpmP

              322211




              322211













              • A.values returns the numpy representation of A, so this sort is just a numpy sort, done in place.

                – ptrj
                May 6 '16 at 17:30



















              • A.values returns the numpy representation of A, so this sort is just a numpy sort, done in place.

                – ptrj
                May 6 '16 at 17:30

















              A.values returns the numpy representation of A, so this sort is just a numpy sort, done in place.

              – ptrj
              May 6 '16 at 17:30





              A.values returns the numpy representation of A, so this sort is just a numpy sort, done in place.

              – ptrj
              May 6 '16 at 17:30











              1














              You could use pd.apply.



              Eg:

              A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five'])
              print (A)

              one two three four five
              0 2 75 44 53 46
              1 18 51 73 80 66
              2 35 91 86 44 25
              3 60 97 57 33 79

              A = A.apply(np.sort, axis = 1)
              print(A)

              one two three four five
              0 2 44 46 53 75
              1 18 51 66 73 80
              2 25 35 44 86 91
              3 33 57 60 79 97


              Since you want it in descending order, you can simply multiply the dataframe with -1 and sort it.



              A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five'])
              A = A * -1
              A = A.apply(np.sort, axis = 1)
              A = A * -1





              share|improve this answer




























                1














                You could use pd.apply.



                Eg:

                A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five'])
                print (A)

                one two three four five
                0 2 75 44 53 46
                1 18 51 73 80 66
                2 35 91 86 44 25
                3 60 97 57 33 79

                A = A.apply(np.sort, axis = 1)
                print(A)

                one two three four five
                0 2 44 46 53 75
                1 18 51 66 73 80
                2 25 35 44 86 91
                3 33 57 60 79 97


                Since you want it in descending order, you can simply multiply the dataframe with -1 and sort it.



                A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five'])
                A = A * -1
                A = A.apply(np.sort, axis = 1)
                A = A * -1





                share|improve this answer


























                  1












                  1








                  1







                  You could use pd.apply.



                  Eg:

                  A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five'])
                  print (A)

                  one two three four five
                  0 2 75 44 53 46
                  1 18 51 73 80 66
                  2 35 91 86 44 25
                  3 60 97 57 33 79

                  A = A.apply(np.sort, axis = 1)
                  print(A)

                  one two three four five
                  0 2 44 46 53 75
                  1 18 51 66 73 80
                  2 25 35 44 86 91
                  3 33 57 60 79 97


                  Since you want it in descending order, you can simply multiply the dataframe with -1 and sort it.



                  A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five'])
                  A = A * -1
                  A = A.apply(np.sort, axis = 1)
                  A = A * -1





                  share|improve this answer













                  You could use pd.apply.



                  Eg:

                  A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five'])
                  print (A)

                  one two three four five
                  0 2 75 44 53 46
                  1 18 51 73 80 66
                  2 35 91 86 44 25
                  3 60 97 57 33 79

                  A = A.apply(np.sort, axis = 1)
                  print(A)

                  one two three four five
                  0 2 44 46 53 75
                  1 18 51 66 73 80
                  2 25 35 44 86 91
                  3 33 57 60 79 97


                  Since you want it in descending order, you can simply multiply the dataframe with -1 and sort it.



                  A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five'])
                  A = A * -1
                  A = A.apply(np.sort, axis = 1)
                  A = A * -1






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Mar 1 '16 at 19:21









                  Pradeep VairamaniPradeep Vairamani

                  1,97422242




                  1,97422242






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f25817930%2ffastest-way-to-sort-each-row-in-a-pandas-dataframe%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Liquibase includeAll doesn't find base path

                      How to use setInterval in EJS file?

                      Petrus Granier-Deferre