SQL is one of the most relevant language for databases. So at one point you will need to interact to get, store or modify some data. Fortunatelly you can do it in python using SQL Alchemy.
In this post I will show to do it while giving some advices based on problems I had.
In order to connect to a SQL database you will need to create an engine. The best way is to use the functions
engine.url.URL. For example:
import pandas as pd # Used later to read/insert data import sqlalchemy as sa engine = sa.create_engine( sa.engine.url.URL( drivername=drivername, username="root", password="1234", host="localhost", port=port, database="db", ), )
You obviously will need to use your credentials for
port below you have and example for some popular SQL databases:
postgres: drivername: "postgresql" port: 5439 redshift: drivername: "redshift+psycopg2" port: 5432 my_sql: drivername: "mysql+pymysql" port: 3306
More info about SQL engines at SQL Alchemy documentation.
To retrive a pandas dataframe it is easy. You can directly use pandas
with engine.connect() as connection: df = pd.read_sql_query("SELECT * FROM my_table", connection)
You can also run an
DELETE or other SQL sentences by creating a connection with the engine as follows:
with engine.connect() as connection: connection.execute("DELETE FROM my_table WHERE id > 10")
It is also posible to run stored procedures and to pass parameters to it. To do so you need to access the raw connection of the engine.
with engine.raw_connection().cursor() as cursor: cursor.execute("my_stored_procedure ?, ?", ['param_1', 'param_2']) cursor.commit()
If you don't have input parameters you can run
Ideally to insert data to SQL is as easy as calling the datafram function
to_sql. For example:
df.to_sql(name="my_table", con=engine, if_exists="append", index=False)
Be careful with using
if_exists="replace"because it will delete the SQL table an create it another time. This caused me some problems when I was inserting japanese text in a table because the encoding was lost in the process.
I usually use
index=Falsebecause the index will be a dummy one. If it is not the case first call
df.to_sql works well with
MySQL but I had serious performance problems when working with
PostgreSQL. Luckily there is a workaround:
from io import StringIO output = StringIO() df.to_csv(output, sep=';', header=False, index=False, columns=df.columns) output.getvalue() # jump to start of stream output.seek(0) # Insert df into postgre connection = engine.raw_connection() with connection.cursor() as cursor: cursor.copy_from(output, "my_table", sep=';', null="NULL", columns=(df.columns)) connection.commit()
It is not as simple as before but I got a 10x improvement.
Be careful to use the same separator in