In this article, we will see how to rank rows in a pandas dataframe based on multiple columns.
By using the rank()
method with the method parameter set to 'min'
and the ascending parameter set to False
.
We can also pass the columns to use for ranking by providing the list of column names using the by
parameter.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 2, 3, 3, 3], 'B': [4, 5, 5, 6, 6, 6], 'C': [7, 8, 9, 10, 11, 12]})
ranked_df = df.assign(rank=df.groupby(['A', 'B'])['C'].rank(method='min', ascending=False)).sort_values(['A', 'B'])
print(ranked_df)
In the above example, we created the pandas dataframe df
with three columns A, B, and C. We used the groupby() method to group the dataframe by columns A and B and use the rank() method to assign a rank to each row based on the column C
.
Using the assign()
method we add the rank column to the dataframe and use the sort_values()
method to sort the dataframe by columns A
and B
.
Following is dataframe printed on the console with the rows ranked based on columns A
, B
, C
.
A B C rank
0 1 4 7 1.0
1 2 5 8 1.0
2 2 5 9 2.0
3 3 6 10 1.0
4 3 6 11 2.0
5 3 6 12 3.0