21 Sep

Connecting to Databases

This notebook goes over how to connects to a local database and runs an SQL query to wrangle the data into our notebook. We want to connect to our database using the notebook to make reproducibility easier by removing any manual data wrangling process. The parameters for this notebook are easy to adjust so that you can change the values and connect to your production or slave database. In second part of the notebook, we will look at connecting via a SSH connection.

Import Preliminaries

In [3]:
# Import modules
import pandas as pd
from sqlalchemy import create_engine

Connect to Database

In [5]:
# Set your local database parameters 
db_username = 'root'
db_password = 'mypassword'
host = '127.0.0.1'
port = '3306'
db_name = 'world'

# Create a MySQLEngine
engine = create_engine('mysql+mysqldb://'+db_username+':'+db_password+'@\
'+host+':'+port+'/'+db_name)

# Connect to database
engine.connect();

Query Database

In [3]:
# Run a query on your database
df = pd.read_sql(""" SELECT * FROM city""", engine)

# View dataframe 
df.head()
Out[3]:
ID Name CountryCode District Population
0 1 Kabul AFG Kabol 1780000
1 2 Qandahar AFG Qandahar 237500
2 3 Herat AFG Herat 186800
3 4 Mazar-e-Sharif AFG Balkh 127800
4 5 Amsterdam NLD Noord-Holland 731200

Connecting to a Database with SHH

Now lets assume your have a remote database that uses an SSH tunnel. The parameters for this notebook are easy to adjust so that you can change the values and connect to a production or slave database via SSH. Note that this SSH tunnel connection uses a ssh_key, so you'll need to know the location of this key on your local machine.

Import Preliminaries

In [4]:
# Import modules
import pandas as pd
from sqlalchemy import create_engine
from sshtunnel import SSHTunnelForwarder

Create SSH Tunnel

In [ ]:
# Enter the your ssh and database credentials
ssh_host = 'ssh_tunnel.devops.world.com'
ssh_port = 22
sss_username = "root"
ssh_key_path = "/Users/root/.ssh/ssh_key"
database_address = '127.0.0.1'
database_port = 8889

# Create a SSH Tunnel
server =  SSHTunnelForwarder(
     (ssh_host,ssh_port),
     ssh_username=sss_username,
     ssh_pkey=ssh_key_path,
     remote_bind_address=(database_address,database_port))

# Start a SSH Tunnel
server.start()

Connect to Database

In [1]:
# Set your database parameters 
db_username = 'root'
db_password = 'mypassword'
host = '127.0.0.1'
port = '3306'
db_name = 'world'

# Create a MySQLEngine (this will depend on your database)
engine = create_engine('mysql+mysqldb://'+db_username+':'+db_password+'@\
'+host+':%s/'+db_name %server.local_bind_port))

# Connect to database
engine.connect()
Out[1]:
<sqlalchemy.engine.base.Connection at 0x10efebb70>

Query Database

In [5]:
# Run a query on the database (this is on a local MAMP database)
df = pd.read_sql(""" SELECT * FROM city""", engine)

# View dataframe 
df.head()
Out[5]:
ID Name CountryCode District Population
0 1 Kabul AFG Kabol 1780000
1 2 Qandahar AFG Qandahar 237500
2 3 Herat AFG Herat 186800
3 4 Mazar-e-Sharif AFG Balkh 127800
4 5 Amsterdam NLD Noord-Holland 731200

Stop SSH Connection

In [ ]:
# Stop ssh connection
server.stop()

Author: Kavi Sekhon