SQLAlchemy Tutorial

22 Jan, 2016

This is a quick tutorial for getting started with SQLAlchemy Core API. In this quick start guide, we'll assume that you're using Windows and PostgreSQL is already installed in some location (let's say, %POSTGRESQL_DIR%), and %POSTGRESQL_DIR%/bin/ directory is in our PATH and the below command successfully works:

c:\>psql -U postgres
psql (9.4.4)
Type "help" for help.

postgres=#

By default, PostgreSQL runs at port 5432. Let's work with that. We also assume that python is installed and in PATH, so the below command should work:

c:\>python
Python 2.7.10 |Anaconda 2.3.0 (32-bit)| (default, Dec  1 2015, 11:53:35) [MSC v.1500 32 bit (Intel)] on win32

Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>>

I'm using the Anaconda distribution of python. It's a handy version of python that comes off the shelf with many features. We need to install the sqlalchemy library along with the database adapter for PostgreSQL, which is psycopg2.

pip install psycopg2 sqlalchemy

If things get notorious while installing dependencies, you can manually download a .whl file for psycopg2 from ~Gohlke and then install it by running pip install /path/to/<downloaded file>.whl. If all goes well, we should be able to import sqlalchemy and psycopg2 in a python REPL.

>>> import sqlalchemy
>>> sqlalchemy.__version__ 
1.1.0

That completes our setup.

For the sake of this tutorial, let's create a database called tennis and a user called federer with password grandestslam. Here's how we do it:

postgres=# CREATE DATABASE tennis;
CREATE DATABASE
postgres=# CREATE USER federer WITH PASSWORD 'grandestslam';
CREATE ROLE
postgres=# GRANT ALL PRIVILEGES ON DATABASE tennis TO federer;
GRANT

That should do. Now let's connect to this database and create some tables. Our first step will be to connect with the PostgreSQL server. Let's write a function that connects to the database and returns two things: a connection object and a metadata object. This is important, as we'll use these two objects to interact with the database later.

import sqlalchemy

def connect(user, password, db, host='localhost', port=5432):
    '''Returns a connection and a metadata object'''
    # We connect with the help of the PostgreSQL URL
    # postgresql://federer:grandestslam@localhost:5432/tennis
    url = 'postgresql://{}:{}@{}:{}/{}'
    url = url.format(user, password, host, port, db)

    # The return value of create_engine() is our connection object
    con = sqlalchemy.create_engine(url, client_encoding='utf8')

    # We then bind the connection to MetaData()
    meta = sqlalchemy.MetaData(bind=con, reflect=True)

    return con, meta

Let's now call connect to get con and meta:

>>> con, meta = connect('federer', 'grandestslam', 'tennis')
>>> con
Engine(postgresql://federer:***@localhost:5432/tennis)
>>> meta
MetaData(bind=Engine(postgresql://federer:***@localhost:5432/tennis))

We're now connected. Let's create two tables, slams and results that represents Roger Federer's results in the major slams in his long and illustrous career.

This is the rough vision for our two tables:

# slams
---------------------------------
| name         | country        |
---------------------------------
| Wimbledon    | United Kingdom |
| French Open  | France         |


# results
---------------------------------
| slam        | year  | result  |
---------------------------------
| Wimbledon   | 2003  | W       |
| French Open | 2004  | 3R      |

The slam column of results table is naturally referenced from the name column of slams table.

from sqlalchemy import Table, Column, Integer, String, ForeignKey

slams = Table('slams', meta,
    Column('name', String, primary_key=True),
    Column('country', String)
)

results = Table('results', meta,
    Column('slam', String, ForeignKey('slams.name')),
    Column('year', Integer),
    Column('result', String)
)

# Create the above tables
meta.create_all(con)

That is it. We can now confirm if the above code worked by checking if the tables have been created from the PostgreSQL prompt.

c:\>psql -U federer tennis
Password for user federer:
psql (9.4.4)
Type "help" for help.

tennis=> \d+
                       List of relations
 Schema |  Name   | Type  |  Owner  |    Size    | Description
--------+---------+-------+---------+------------+-------------
 public | results | table | federer | 8192 bytes |
 public | slams   | table | federer | 8192 bytes |
(2 rows)

Before we move on to inserting some values in these tables, let's take a look at meta.tables first. To see all the tables that have been created, we just iterate over meta.tables.

>>> for table in meta.tables:
...     print table
...
slams
results

We can further get the sqlalchemy.schema.Table object on which we can perform inserts by table = meta.tables['slams'] or more generally for some table_name, table = meta.tables[table_name].

As you might have guessed, meta.tables is an immutabledict that has mapping of table names to corresponding Table object. Once our tables are ready, the next step is to actually add records to these tables. For this, we usually create an object of Insert relative to a target table, for example: slams.insert() or results.insert().

>>> clause = slams.insert().values(name='Wimbledon', country='United Kingdom')

>>> con.execute(clause)
<sqlalchemy.engine.result.ResultProxy at 0x4508af0>

The execute returns a ResultProxy object. After an insert, we can use this object to get the successfully inserted Primary key.

>>> clause = slams.insert().values(name='Roland Garros', country='France')

>>> result = con.execute(clause)

>>> result.inserted_primary_key
['Roland Garros']

As a practice, when we have lots of records to insert, we instead pass a list of values directly to the execute function.

victories = [
    {'slam': 'Wimbledon', 'year': 2003, 'result': 'W'},
    {'slam': 'Wimbledon', 'year': 2004, 'result': 'W'},
    {'slam': 'Wimbledon', 'year': 2005, 'result': 'W'}
]

>>> con.execute(meta.tables['results'].insert(), victories)
<sqlalchemy.engine.result.ResultProxy at 0x45983d0>

Next, we'll take a look at the very special table_name.c object. To access all the columns of a table, say table_name, we iterate through the table_name.c object. It's an object of ImmutableColumnCollection from sqlalchemy.sql.base.

>>> results = meta.tables['result']

>>> results.c
<sqlalchemy.sql.base.ImmutableColumnCollection at 0x44c8b10>

>>> for col in results.c:
...     print col
...
results.slam
results.year
results.result

We'll now use our knowledge of table_name.c to select some specific things from the results table. The simplest way of selecting from a table would be to create an object of Select relative to our target table, and running it with execute().

>>> for row in con.execute(results.select()):
...     print row
...
(u'Wimbledon', 2003, u'W')
(u'Wimbledon', 2004, u'W')
(u'Wimbledon', 2005, u'W')

But what if we needed only results from 2005? For that we use Select.where().

>>> clause = results.select().where(results.c.year == 2005)

>>> for row in con.execute(clause):
...     print row
...
(u'Wimbledon', 2005, u'W')

If you happen to use Pandas for data manipulations, reading data is as simple as df = pd.read_sql(clause, con). Congratulations on getting started with SQLAlchemy Core and PostgreSQL. A more comprehensive documentation and tutorial is available here.

#programming