Dropping Features
Dropping feature is a common task in cleaning data. I use Pandas to drop the majority of features and observations in my workflow. Below is an example of a column of drop statements in Python. You can find the full documentation on dropping null value on Pandas' documents (link below).
Import Preliminaries¶
# Import modules
import pandas as pd
Import Data¶
# Import data
df = pd.read_csv('Data/Pokemon.csv')
# View the head of the dataframe
df.head()
Dropping Feature¶
# Drop the "name" feature and view the head of the dataframe
df.drop('Name', axis=1).head()
Dropping Observations¶
# Drop the third observations from the DataFrame
df.drop(3, axis=0).head()
Filtering Observations¶
While you can drop the observations using the drop function in Pandas, but its just faster to just use loc and list filtering available in the package.
# View the all the observations where the Type is not 'Grass' via filtering
df[df['Type 1'] != 'Grass'].head()
Loc Statements¶
# Select only a subset of the datasets using the loc statement
df.loc[:2, 'Name':'HP']
Dropping Mulptile Features¶
# Drop the "Name" feature and view the head of the dataframe
df.drop(['#','Name', 'Generation','Legendary'] ,axis=1).head()
Dropping Columns Inplace¶
By default the inplace parameter for all the drop functions is set to False, but your can pass in opposite Bollean value very easily.
# Copy the DataFrame
df_copy = df.copy()
# Drop the Name column
df_copy.drop('Name', axis=1, inplace=True)
# View the head of the dataframe
df_copy.head()
Dropping Index¶
If you want to reset your index the best way to do this would be to just use the reset index function. But if you have dataframe with the multindex we can also us the drop datapoints related to the index value
# Filtered dataframe for Grass Pokemon
df[(df['Type 1'] == 'Grass') | (df['Type 2'] == 'Grass')].head()
# Filtered dataframe for Grass Pokemon + reset index
df[(df['Type 1'] == 'Grass') | (df['Type 2'] == 'Grass')].reset_index().head()
# Creating a Multindex DataFrame via the Group by Function
sum_stats = df.groupby(['Type 1','Type 2'])['Total', 'HP', 'Attack', 'Defense',
'Sp. Atk', 'Sp. Def', 'Speed'].mean()
sum_stats.head(19)
# Drop "bug" type from first level of index
sum_stats.drop('Bug' , axis=0, level=0).head(19)
# Drop "fire" type from second level of index
sum_stats.drop('Fire' , axis=0, level=1).head(19)
Author: Kavi Sekhon