In this article, we will see how to rank rows in a pandas dataframe based on multiple columns.

By using the rank() method with the method parameter set to 'min' and the ascending parameter set to False.

We can also pass the columns to use for ranking by providing the list of column names using the by parameter.

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 2, 3, 3, 3], 'B': [4, 5, 5, 6, 6, 6], 'C': [7, 8, 9, 10, 11, 12]})
ranked_df = df.assign(rank=df.groupby(['A', 'B'])['C'].rank(method='min', ascending=False)).sort_values(['A', 'B'])
print(ranked_df)

In the above example, we created the pandas dataframe df with three columns A, B, and C. We used the groupby() method to group the dataframe by columns A and B and use the rank() method to assign a rank to each row based on the column C.

Using the assign() method we add the rank column to the dataframe and use the sort_values() method to sort the dataframe by columns A and B.

Following is dataframe printed on the console with the rows ranked based on columns A, B, C .

   A  B   C  rank
0  1  4   7   1.0
1  2  5   8   1.0
2  2  5   9   2.0
3  3  6  10   1.0
4  3  6  11   2.0
5  3  6  12   3.0

Categorized in:

Tagged in: