Handling missing data in Python Pandas
Handling missing data is an important part of data analysis, as datasets are rarely complete and can contain null or missing values. Pandas offers several functions and methods for working with missing data.
Creating a Sample DataFrame with Missing Values
import pandas as pd
import numpy as np
data = {'Name': ['John', 'Jane', 'Bob', 'Alice'],
'Age': [30, np.nan, 35, 28],
'City': ['New York', np.nan, 'Chicago', 'Los Angeles']}
df = pd.DataFrame(data)
print(df)
Output:
Name Age City
0 John 30.0 New York
1 Jane NaN NaN
2 Bob 35.0 Chicago
3 Alice 28.0 Los Angeles
isnull()
and notnull()
: these functions are used to check whether there are any missing values in the dataset.
print(df.isnull())
print(df.notnull())
Output:
Name Age City
0 False False False
1 False True True
2 False False False
3 False False False
Name Age City
0 True True True
1 True False False
2 True True True
3 True True True
dropna()
: this method is used to drop rows or columns that contain missing values.
# drop rows with missing values
df_dropped = df.dropna()
print(df_dropped)
# drop columns with missing values
df_dropped_cols = df.dropna(axis=1)
print(df_dropped_cols)
Output:
Name Age City
0 John 30.0 New York
2 Bob 35.0 Chicago
3 Alice 28.0 Los Angeles
Name
0 John
1 Jane
2 Bob
3 Alice
fillna()
: this method is used to fill missing values with a specific value or a statistical value such as mean or median.
# fill missing values with a specific value
df_filled = df.fillna('Unknown')
print(df_filled)
# fill missing values with mean
df_filled_mean = df.fillna(df.mean())
print(df_filled_mean)
Output:
Name Age City
0 John 30 New York
1 Jane Unknown Unknown
2 Bob 35 Chicago
3 Alice 28 Los Angeles
Name Age City
0 John 30.0 New York
1 Jane 31.0 NaN
2 Bob 35.0 Chicago
3 Alice 28.0 Los Angeles
In summary, Pandas offers several functions and methods for working with missing data in a DataFrame. These functions can be used to check for missing values, drop rows or columns containing missing values, and fill missing values with specific values or statistical values.
Happy Learning!! Happy Coding!!
Comments