Python’s Pandas library is a well-liked tool for handling and analyzing data. It offers several ways to handle and work with different types of data, including numerical data. It can be helpful to identify the numerical columns in a dataframe when working with a large dataset.
In this article, we will see how to find numerical columns in pandas.
First, let’s create a sample dataframe:
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, 2, 3, 4, 5],
'B': [6, 7, 8, 9, 10],
'C': ['a', 'b', 'c', 'd', 'e'],
'D': [0.1, 0.2, 0.3, 0.4, 0.5]})
The sample dataframe df
will look like this:
A B C D
0 1 6 a 0.1
1 2 7 b 0.2
2 3 8 c 0.3
3 4 9 d 0.4
4 5 10 e 0.5
To find the numeric columns, we can use the select_dtypes
method and pass in the argument include='number'
:
numeric_cols = df.select_dtypes(include='number').columns
In this example, the method select_dtypes
is used to select the columns that have numerical data types (i.e., int
and float
). The resulting numeric_cols
will be a list of the numeric column names:
Index(['A', 'B', 'D'], dtype='object')
You can also use the.apply
method and the .isnumeric
method to find numeric columns.
numeric_cols = df.columns[df.apply(lambda x: x.str.isnumeric().all())]
In this example, the .apply
method is used to apply the .isnumeric
method to each column in the dataframe. The .isnumeric
method returns a Boolean value indicating whether or not all elements in the column are numeric. The .columns
property is used to get the column names.
The resulting numeric_cols
will be a list of the numeric column names:
Index(['A', 'B'], dtype='object')