01 Aug

Standardization

Import Modules

In [11]:
%matplotlib inline

# Import modules
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import warnings
from sklearn.preprocessing import StandardScaler

# Set warning option
warnings.filterwarnings("ignore") # Remove Package Warning

Create DataFrame

In [12]:
# Create example dataframe
df = pd.DataFrame(data = {'Feature 1': np.random.randint(50,100, size=10),
                 'Feature 2': np.random.randint(0,100, size= 10),
                 'Feature 3': np.random.randint(0,100, size=10),
                 'Feature 4': np.random.randint(0,100, size=10)})

# View the dataframe
df
Out[12]:
Feature 1 Feature 2 Feature 3 Feature 4
0 83 68 44 25
1 79 36 79 44
2 55 74 28 14
3 59 49 18 88
4 98 41 9 20
5 74 58 27 90
6 60 25 74 88
7 92 20 20 39
8 95 39 22 15
9 77 3 87 75

Standardize Feature

In [13]:
# Standardize features
scaler = StandardScaler()
df_standardized = pd.DataFrame(data=scaler.fit_transform(df), columns=df.columns)

# View standardized DataFrame
df_standardized
Out[13]:
Feature 1 Feature 2 Feature 3 Feature 4
0 0.396888 1.281626 0.117813 -0.812543
1 0.123172 -0.254405 1.406390 -0.190030
2 -1.519123 1.569632 -0.471251 -1.172945
3 -1.245407 0.369608 -0.839416 1.251578
4 1.423323 -0.014400 -1.170765 -0.976362
5 -0.218973 0.801616 -0.508068 1.317106
6 -1.176978 -0.782416 1.222308 1.251578
7 1.012749 -1.022421 -0.765783 -0.353849
8 1.218036 -0.110402 -0.692150 -1.140181
9 -0.013686 -1.838438 1.700922 0.825648

Visualize Orginal Feature

In [14]:
# Plot Dist Plot
sns.distplot(df['Feature 1']);
plt.title('Distribution of Feature 1')
plt.ylabel('Density')
Out[14]:
Text(0,0.5,'Density')

Visualize Standardized Feature

In [16]:
# Look at the X-axis, we in a new landscape now
sns.distplot(df_standardized['Feature 1']);
plt.title('Distribution of Feature 1')
plt.ylabel('Density')
Out[16]:
Text(0,0.5,'Density')

Notice that our y ticks have changed. This is great!

Author: Kavi Sekhon