Tools of a Data Scientist

This is a comprehensive list of the main tools I use in my workflow. Plus an addition section on client related tools I have been exposed to. No..{

Train-Test Split

Spliting a dataset into a train and test set is the requirement for evaluating any machine learning model. Sklearn's train-test-split can be use..{


The fizzbuzz question is an basic interview question for coders. Basically write some code that says 'Fizzbuzz' when a value is divisible by 15,..{

Regex in Python

As you deal with more and messier string data, regex starts becoming a very valuable skill that can save you a lot of time. It also has a second..{

Binning Feature

The binning of data into a few categorical groups can help us see the summarize the sparse continuous values to a few data points. This can be u..{

Convering Notebook to Slides

You can convert any notebook to slides using the following commands. Note that scrolling for fragments is now disabled by default. You can enabl..{

Label Encoding

Encoding a categorical feature into numeric values before processing the data through your machine learning model is now easier than ever, give..{

Resampling Datetime

When plotting times series data it becomes advantageous to reframe and aggregate the data by a period of time. Data Scientists that deal with ti..{

Styling a DataFrame

Styling a data frame is pleasant a surprising amount of times. You can define a coloring function for your data frame then apply the styling whe..{

Using Select Dtypes

Once you get used to using pandas, filtering dataframe for content quickly and efficiently can be a huge asset. Pandas introduced a new feature ..{

Using the OS Module

This notebook is a combination of little snippets of Python code from Python's OS module that can I found useful for a variety of tasks. These t..{

Create Dummy Variables

Pandas to the rescue again. The pandas \"pd.to_dummies\" function can be operated on dataframe to take one categorical feature and create a one-..{

Removing Outliers

How to handle outliers in a dataset requires a bit of intuition and domain expertise to get right for descriptive and predictive analytics. Yet ..{

« Page 3 / 3