16 Sep

Sparse Matrix

A sparse matrix is a different approach to hold dataframes that are contain many zeros within their value counts for the datasets feature. A sparse matrix can hold these data in a smaller format then a dataframe by not storing all the "0" values but instead storing their location in the dataframe, which is done in a manner that save us memory, so we can process large dataframe with the memory savings

Import Preliminaries

In [1]:
%matplotlib inline
%config InlineBackend.figure_format='retina'


# Import modules
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import matplotlib as mpl
import numpy as np
import pandas as pd 
import sklearn
import seaborn as sns
import warnings

from scipy.sparse import csr_matrix

# Import Model Selection 
from sklearn.model_selection import train_test_split, cross_val_score

# Set pandas options
pd.set_option('max_columns',1000)
pd.set_option('max_rows',30)
pd.set_option('display.float_format', lambda x: '%.3f' % x)

# Set plotting options
mpl.rcParams['figure.figsize'] = (8.0, 7.0)

Create a DataFrame

In [2]:
df = pd.DataFrame(
    data ={'name': ['Group A','Group B']*5000,
           'number': np.random.choice([0,1], size=10000), 
           'value': np.random.choice([0,1], size=10000),
          })

df.head(10)
Out[2]:
name number value
0 Group A 1 0
1 Group B 1 0
2 Group A 1 1
3 Group B 1 1
4 Group A 1 1
5 Group B 1 0
6 Group A 0 0
7 Group B 0 1
8 Group A 0 1
9 Group B 1 0

Encode Data

In [3]:
df = pd.get_dummies(df)
df.head()
Out[3]:
number value name_Group A name_Group B
0 1 0 1 0
1 1 0 0 1
2 1 1 1 0
3 1 1 0 1
4 1 1 1 0

DataFrame to Spare Matrix

In [4]:
sparse_matrix = csr_matrix(df.values)
feature_names = df.columns
sparse_matrix
Out[4]:
<10000x4 sparse matrix of type '<class 'numpy.int64'>'
	with 19980 stored elements in Compressed Sparse Row format>

Spare Matrix to DataFrame

In [5]:
df = pd.DataFrame(sparse_matrix.todense(), columns=feature_names)
df.head(10)
Out[5]:
number value name_Group A name_Group B
0 1 0 1 0
1 1 0 0 1
2 1 1 1 0
3 1 1 0 1
4 1 1 1 0
5 1 0 0 1
6 0 0 1 0
7 0 1 0 1
8 0 1 1 0
9 1 0 0 1

Author: Kavi Sekhon