Collaborative Based Recommender

Collaborative Recommendations Systems work under the assumptions that user tastes are similar to each other. If that is the case then users reco..{

Time Series Forecasting

This notebook demonstrates my solution to a time series problem posted on Kaggle. In the dataset, we have near 1 million training records from 1..{

Hierachical Clustering

Hierarchical Clustering methods are a group of clustering algorithms that clustering data points into larger clusters based on a similarity metr..{

Grid Search

Beware full grid search can be more computation intensive depending on your parameters. We use a K-Nearest Neighbours model for its example. Aft..{

XG Boost Classifier

XG Boost is a pretty famous machine learning algorithm introduced in 2012. XG Boost is a gradient boosting decision tree algorithm. Instead of t..{

Connecting to Databases

This notebook goes over how to connects to a local database and runs an SQL query to wrangle the data into our notebook. We want to connect to o..{

Hatch, Linestyle, Marker

Matplotlib has many hatches for its bar plots, and many lifestyles and markers for its line plots Sometimes is hard to visualize each \"hatch\" ..{

Dummy Classifier

IIt is useful when developing machine learning models, to be modeling against a benchmark. If your new model cannot beat your benchmark model, t..{

Rewieghting Classes

If you have a class present in your dataset, you should reweight them to account for the lack in the data. Sklearns implementation of this allow..{

Sparse Matrix

A sparse matrix is a different approach to hold dataframes that are contain many zeros within their value counts for the datasets feature. A ..{

Grandient Boosting Classifier

Gradient Boosting Trees follow the same type of logic as Random Forest Classifiers, instead of using one tree lets build a ensemble. The key dif..{

Random Forest Classifier

The Random Forest Classifications algorithim use a ensemble of Decision Trees whent training its model. Each decision tree in the ensemble is tr..{

Time Zone Conversion

Data will not always be in the right format. Most time data it is stored in a database under the UTC timezone. Therefore after exporting the inf..{

Linear Regression

Below is an example of a regression problem using linear regression to predict house prices from a dataset provided on Kaggle. The dataset can b..{

Dropping Features

Dropping feature is a common task in cleaning data. I use Pandas to drop the majority of features and observations in my workflow. Below is an e..{

Page 1 / 3 »