Handling missing data in Python Pandas

- Friday, May 05, 2023

Handling missing data in Python Pandas

Handling missing data is an important part of data analysis, as datasets are rarely complete and can contain null or missing values. Pandas offers several functions and methods for working with missing data.

Creating a Sample DataFrame with Missing Values

import pandas as pd import numpy as np data = {'Name': ['John', 'Jane', 'Bob', 'Alice'], 'Age': [30, np.nan, 35, 28], 'City': ['New York', np.nan, 'Chicago', 'Los Angeles']} df = pd.DataFrame(data) print(df)

Output:

Name Age City 0 John 30.0 New York 1 Jane NaN NaN 2 Bob 35.0 Chicago 3 Alice 28.0 Los Angeles

isnull() and notnull(): these functions are used to check whether there are any missing values in the dataset.

print(df.isnull())

print(df.notnull())

Output:

Name Age City 0 False False False 1 False True True 2 False False False 3 False False False Name Age City 0 True True True 1 True False False 2 True True True 3 True True True

dropna(): this method is used to drop rows or columns that contain missing values.

# drop rows with missing values df_dropped = df.dropna() print(df_dropped) # drop columns with missing values df_dropped_cols = df.dropna(axis=1) print(df_dropped_cols)

Output:

Name Age City 0 John 30.0 New York 2 Bob 35.0 Chicago 3 Alice 28.0 Los Angeles Name 0 John 1 Jane 2 Bob 3 Alice

fillna(): this method is used to fill missing values with a specific value or a statistical value such as mean or median.

# fill missing values with a specific value df_filled = df.fillna('Unknown') print(df_filled) # fill missing values with mean df_filled_mean = df.fillna(df.mean()) print(df_filled_mean)

Output:

Name Age City 0 John 30 New York 1 Jane Unknown Unknown 2 Bob 35 Chicago 3 Alice 28 Los Angeles Name Age City 0 John 30.0 New York 1 Jane 31.0 NaN 2 Bob 35.0 Chicago 3 Alice 28.0 Los Angeles

In summary, Pandas offers several functions and methods for working with missing data in a DataFrame. These functions can be used to check for missing values, drop rows or columns containing missing values, and fill missing values with specific values or statistical values.

Happy Learning!! Happy Coding!!