Collaborative Based Recommender
Collaborative Recommendations Systems work under the assumptions that user tastes are similar to each other. If that is the case then users reco..{
Time Series Forecasting
This notebook demonstrates my solution to a time series problem posted on Kaggle. In the dataset, we have near 1 million training records from 1..{
Hierachical Clustering
Hierarchical Clustering methods are a group of clustering algorithms that clustering data points into larger clusters based on a similarity metr..{
Grid Search
Beware full grid search can be more computation intensive depending on your parameters. We use a K-Nearest Neighbours model for its example. Aft..{
XG Boost Classifier
XG Boost is a pretty famous machine learning algorithm introduced in 2012. XG Boost is a gradient boosting decision tree algorithm. Instead of t..{
Connecting to Databases
This notebook goes over how to connects to a local database and runs an SQL query to wrangle the data into our notebook. We want to connect to o..{
Hatch, Linestyle, Marker
Matplotlib has many hatches for its bar plots, and many lifestyles and markers for its line plots Sometimes is hard to visualize each \"hatch\" ..{
Dummy Classifier
IIt is useful when developing machine learning models, to be modeling against a benchmark. If your new model cannot beat your benchmark model, t..{
Rewieghting Classes
If you have a class present in your dataset, you should reweight them to account for the lack in the data. Sklearns implementation of this allow..{
Sparse Matrix
A sparse matrix is a different approach to hold dataframes that are contain many zeros within their value counts for the datasets feature. A ..{
Grandient Boosting Classifier
Gradient Boosting Trees follow the same type of logic as Random Forest Classifiers, instead of using one tree lets build a ensemble. The key dif..{
Random Forest Classifier
The Random Forest Classifications algorithim use a ensemble of Decision Trees whent training its model. Each decision tree in the ensemble is tr..{
Time Zone Conversion
Data will not always be in the right format. Most time data it is stored in a database under the UTC timezone. Therefore after exporting the inf..{
Linear Regression
Below is an example of a regression problem using linear regression to predict house prices from a dataset provided on Kaggle. The dataset can b..{
Dropping Features
Dropping feature is a common task in cleaning data. I use Pandas to drop the majority of features and observations in my workflow. Below is an e..{
Page 1 / 3 »