. 5 min read
This is a quick tutorial for getting started with SQLAlchemy Core API. In this quick start guide, we’ll assume that you’re using Windows and PostgreSQL is already installed in some location (let’s say, %POSTGRESQL_DIR%
), and %POSTGRESQL_DIR%/bin/
directory is in our PATH
and the below command successfully works:
c:\>psql -U postgres
psql (9.4.4)
Type "help" for help.
postgres=#
By default, PostgreSQL runs at port 5432. Let’s work with that. We also assume that python
is installed and in PATH
, so the below command should work:
c:\>python
Python 2.7.10 |Anaconda 2.3.0 (32-bit)| (default, Dec 1 2015, 11:53:35) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>>
I’m using the Anaconda distribution of python
. It’s a handy version of python
that comes off the shelf with many features. We need to install the sqlalchemy
library along with the database adapter for PostgreSQL, which is psycopg2
.
pip install psycopg2 sqlalchemy
If things get notorious while installing dependencies, you can manually download a .whl
file for psycopg2
from ~Gohlke and then install it by running pip install /path/to/<downloaded file>.whl
. If all goes well, we should be able to import sqlalchemy
and psycopg2
in a python
REPL.
>>> import sqlalchemy
>>> sqlalchemy.__version__
1.1.0
That completes our setup.
For the sake of this tutorial, let’s create a database called tennis
and a user called federer
with password grandestslam
. Here’s how we do it:
postgres=# CREATE DATABASE tennis;
CREATE DATABASE
postgres=# CREATE USER federer WITH PASSWORD 'grandestslam';
CREATE ROLE
postgres=# GRANT ALL PRIVILEGES ON DATABASE tennis TO federer;
GRANT
That should do. Now let’s connect to this database and create some tables. Our first step will be to connect with the PostgreSQL server. Let’s write a function that connects to the database and returns two things: a connection object and a metadata object. This is important, as we’ll use these two objects to interact with the database later.
import sqlalchemy
def connect(user, password, db, host='localhost', port=5432):
'''Returns a connection and a metadata object'''
# We connect with the help of the PostgreSQL URL
# postgresql://federer:grandestslam@localhost:5432/tennis
url = 'postgresql://{}:{}@{}:{}/{}'
url = url.format(user, password, host, port, db)
# The return value of create_engine() is our connection object
con = sqlalchemy.create_engine(url, client_encoding='utf8')
# We then bind the connection to MetaData()
meta = sqlalchemy.MetaData(bind=con, reflect=True)
return con, meta
Let’s now call connect
to get con
and meta
:
>>> con, meta = connect('federer', 'grandestslam', 'tennis')
>>> con
Engine(postgresql://federer:***@localhost:5432/tennis)
>>> meta
MetaData(bind=Engine(postgresql://federer:***@localhost:5432/tennis))
We’re now connected. Let’s create two tables, slams
and results
that represents Roger Federer’s results in the major slams in his long and illustrous career.
This is the rough vision for our two tables:
# slams
---------------------------------
| name | country |
---------------------------------
| Wimbledon | United Kingdom |
| French Open | France |
# results
---------------------------------
| slam | year | result |
---------------------------------
| Wimbledon | 2003 | W |
| French Open | 2004 | 3R |
The slam
column of results
table is naturally referenced from the name
column of slams
table.
from sqlalchemy import Table, Column, Integer, String, ForeignKey
slams = Table('slams', meta,
Column('name', String, primary_key=True),
Column('country', String)
)
results = Table('results', meta,
Column('slam', String, ForeignKey('slams.name')),
Column('year', Integer),
Column('result', String)
)
# Create the above tables
meta.create_all(con)
That is it. We can now confirm if the above code worked by checking if the tables have been created from the PostgreSQL prompt.
c:\>psql -U federer tennis
Password for user federer:
psql (9.4.4)
Type "help" for help.
tennis=> \d+
List of relations
Schema | Name | Type | Owner | Size | Description
--------+---------+-------+---------+------------+-------------
public | results | table | federer | 8192 bytes |
public | slams | table | federer | 8192 bytes |
(2 rows)
Before we move on to inserting some values in these tables, let’s take a look at meta.tables
first. To see all the tables that have been created, we just iterate over meta.tables
.
>>> for table in meta.tables:
... print table
...
slams
results
We can further get the sqlalchemy.schema.Table object on which we can perform inserts by table = meta.tables['slams']
or more generally for some table_name, table = meta.tables[table_name]
.
As you might have guessed, meta.tables
is an immutabledict
that has mapping of table names to corresponding Table
object. Once our tables are ready, the next step is to actually add records to these tables. For this, we usually create an object of Insert
relative to a target table, for example: slams.insert()
or results.insert()
.
>>> clause = slams.insert().values(name='Wimbledon', country='United Kingdom')
>>> con.execute(clause)
<sqlalchemy.engine.result.ResultProxy at 0x4508af0>
The execute
returns a ResultProxy
object. After an insert, we can use this object to get the successfully inserted Primary key.
>>> clause = slams.insert().values(name='Roland Garros', country='France')
>>> result = con.execute(clause)
>>> result.inserted_primary_key
['Roland Garros']
As a practice, when we have lots of records to insert, we instead pass a list of values directly to the execute
function.
victories = [
{'slam': 'Wimbledon', 'year': 2003, 'result': 'W'},
{'slam': 'Wimbledon', 'year': 2004, 'result': 'W'},
{'slam': 'Wimbledon', 'year': 2005, 'result': 'W'}
]
>>> con.execute(meta.tables['results'].insert(), victories)
<sqlalchemy.engine.result.ResultProxy at 0x45983d0>
Next, we’ll take a look at the very special table_name.c
object. To access all the columns of a table, say table_name, we iterate through the table_name.c
object. It’s an object of ImmutableColumnCollection
from sqlalchemy.sql.base
.
>>> results = meta.tables['result']
>>> results.c
<sqlalchemy.sql.base.ImmutableColumnCollection at 0x44c8b10>
>>> for col in results.c:
... print col
...
results.slam
results.year
results.result
We’ll now use our knowledge of table_name.c
to select some specific things from the results table. The simplest way of selecting from a table would be to create an object of Select
relative to our target table, and running it with execute()
.
>>> for row in con.execute(results.select()):
... print row
...
(u'Wimbledon', 2003, u'W')
(u'Wimbledon', 2004, u'W')
(u'Wimbledon', 2005, u'W')
But what if we needed only results from 2005? For that we use Select.where()
.
>>> clause = results.select().where(results.c.year == 2005)
>>> for row in con.execute(clause):
... print row
...
(u'Wimbledon', 2005, u'W')
If you happen to use Pandas for data manipulations, reading data is as simple as df = pd.read_sql(clause, con)
. Congratulations on getting started with SQLAlchemy Core and PostgreSQL. A more comprehensive documentation and tutorial is available here.