In this article, we will discuss how to check for NaN values in pandas data frames.
In pandas, it is essential to identify and handle NaN values in data frames to ensure accurate and reliable analysis.
Checking for NaN values
We can use the isna() method to check for NaN values in pandas dataframe. This method returns a Boolean mask that indicates where the NaN values are located. It returns the True
for NaN values and False
for non-NaN values.
See the below example:
import pandas as pd
# create a pandas dataframe
df = pd.DataFrame({'col1': [1, 2, 3, 4, 5], 'col2': [6, 7, None, 9, 10]})
# check for NaN values
print(df.isna())
In the above example, we created a pandas dataframe with two columns. We then used the isna()
method to check for NaN values in the dataframe.
Output
col1 col2
0 False False
1 False False
2 False True
3 False False
4 False False
The above output shows a Boolean mask indicating that the third row of the col2
column contains a NaN value.
As an alternative way, we can also use the isnull()
method, which is an alias for the isna()
method.
import pandas as pd
# create a pandas dataframe
df = pd.DataFrame({'col1': [1, 2, 3, 4, 5], 'col2': [6, 7, None, 9, 10]})
# check for NaN values
print(df.isnull())
Output:
col1 col2
0 False False
1 False False
2 False True
3 False False
4 False False
Handling NaN values
Once we have identified the NaN values in a pandas dataframe, we can handle them in different ways. One common approach is to fill the NaN values with a specific value, such as a mean, median, or mode value.
For example, you can use the fillna()
method to fill the NaN values in a column with the mean value of the column.
import pandas as pd
# create a pandas dataframe
df = pd.DataFrame({'col1': [1, 2, 3, 4, 5], 'col2': [6, 7, None, 9, 10]})
# fill NaN values with the mean of the column
df['col2'].fillna(df['col2'].mean(), inplace=True)
# print the dataframe
print(df)
Output:
col1 col2
0 1 6.0
1 2 7.0
2 3 8.0
3 4 9.0
4 5 10.0
In the above example, we used the fillna()
method to fill the NaN values in the col2
column with the mean value of the column.
The original dataframe get modified by using the parameter inplace=True
.