SQL Alchemy with python
0. Why SQL with python?
SQL is one of the most relevant language for databases. So at one point you will need to interact to get, store or modify some data. Fortunatelly you can do it in python using SQL Alchemy.
In this post I will show to do it while giving some advices based on problems I had.
1. Connect to a SQL database
In order to connect to a SQL database you will need to create an engine. The best way is to use the functions create_engine
and engine.url.URL
. For example:
import pandas as pd # Used later to read/insert data
import sqlalchemy as sa
engine = sa.create_engine(
sa.engine.url.URL(
drivername=drivername,
username="root",
password="1234",
host="localhost",
port=port,
database="db",
),
)
You obviously will need to use your credentials for username
, password
, host
and database
.
For the drivername
and port
below you have and example for some popular SQL databases:
postgres:
drivername: "postgresql"
port: 5439
redshift:
drivername: "redshift+psycopg2"
port: 5432
my_sql:
drivername: "mysql+pymysql"
port: 3306
More info about SQL engines at SQL Alchemy documentation.
2. Read data from SQL
To retrive a pandas dataframe it is easy. You can directly use pandas read_sql_query
function:
with engine.connect() as connection:
df = pd.read_sql_query("SELECT * FROM my_table", connection)
3. Run a SQL sentence
You can also run an UPDATE
, DELETE
or other SQL sentences by creating a connection with the engine as follows:
with engine.connect() as connection:
connection.execute("DELETE FROM my_table WHERE id > 10")
4. Run stored procedures
It is also posible to run stored procedures and to pass parameters to it. To do so you need to access the raw connection of the engine.
with engine.raw_connection().cursor() as cursor:
cursor.execute("my_stored_procedure ?, ?", ['param_1', 'param_2'])
cursor.commit()
If you don’t have input parameters you can run cursor.execute("my_stored_procedure")
.
5. Insert data to SQL
Ideally to insert data to SQL is as easy as calling the datafram function to_sql
. For example:
df.to_sql(name="my_table", con=engine, if_exists="append", index=False)
Be careful with using if_exists="replace"
because it will delete the SQL table an create it another time. This caused me some problems when I was inserting japanese text in a table because the encoding was lost in the process.
I usually use index=False
because the index will be a dummy one. If it is not the case first call df.reset_index()
.
5.1. Inserting data to PostgreSQL
The function df.to_sql
works well with MySQL
but I had serious performance problems when working with PostgreSQL
. Luckily there is a workaround:
from io import StringIO
output = StringIO()
df.to_csv(output, sep=';', header=False, index=False, columns=df.columns)
output.getvalue()
# jump to start of stream
output.seek(0)
# Insert df into postgre
connection = engine.raw_connection()
with connection.cursor() as cursor:
cursor.copy_from(output, "my_table", sep=';', null="NULL", columns=(df.columns))
connection.commit()
It is not as simple as before but I got a 10x improvement.
Be careful to use the same separator in df.to_csv
and cursor.copy_from