SQL Alchemy with python

2019-02-08
Python SQL SQL Alchemy



sql_alchemy_logo

SQL is one of the most relevant language for databases. So at one point you will need to interact to get, store or modify some data. Fortunatelly you can do it in python using SQL Alchemy.

In this post I will show to do it while giving some advices based on problems I had.

Conect to a SQL database

In order to connect to a SQL database you will need to create an engine. The best way is to use the functions create_engine and engine.url.URL. For example:

import pandas as pd # Used later to read/insert data
import sqlalchemy as sa

engine = sa.create_engine(
    sa.engine.url.URL(
        drivername=drivername,
        username="root",
        password="1234",
        host="localhost",
        port=port,
        database="db",
    ),
)

You obviously will need to use your credentials for username, password, host and database.

For the drivername and port below you have and example for some popular SQL databases:

postgres:
  drivername: "postgresql"
  port: 5439

redshift:
  drivername: "redshift+psycopg2"
  port: 5432

my_sql:
  drivername: "mysql+pymysql"
  port: 3306

More info about SQL engines at SQL Alchemy documentation.

Read data from SQL

To retrive a pandas dataframe it is easy. You can directly use pandas read_sql_query function:

# MySQL
with engine.connect() as connection:
    df = pd.read_sql_query("SELECT * FROM my_table", connection)

Run a SQL sentence

You can also run an UPDATE, DELETE or other SQL sentences by creating a connection with the engine as follows:

with engine.connect() as connection:
    connection.execute("DELETE FROM my_table WHERE id > 10")

Insert data to SQL

Ideally to insert data to SQL is as easy as calling the datafram function to_sql. For example:

df.to_sql(name="my_table", con=engine, if_exists="append", index=False)

Be careful with using if_exists="replace" because it will delete the SQL table an create it another time. This caused me some problems when I was inserting japanese text in a table because the encoding was lost in the process.

I usually use index=False because the index will be a dummy one. If it is not the case first call df.reset_index().

Inserting data to PostgreSQL

The function df.to_sql works well with MySQL but I had serious performance problems when working with PostgreSQL. Luckily there is a workaround:

from io import StringIO

output = StringIO()
df.to_csv(output, sep=';', header=False, index=False, columns=df.columns)
output.getvalue()
# jump to start of stream
output.seek(0)

# Insert df into postgre
connection = engine.raw_connection()
with connection.cursor() as cursor:
    cursor.copy_from(output, "my_table", sep=';', null="NULL", columns=(df.columns))
    connection.commit()

It is not as simple as before but I got a 10x improvement.

Be careful to use the same separator in df.to_csv and cursor.copy_from