Recursively querying hierarchical data in Postgres

. 3 min read

Suhas

I want to learn how to use the “WITH RECURSIVE” clause in postgresql. Can you, only using SQL statements and examples, give me an excellent pedagogical example for this? (Incl. create table and insert statements so I can test it in a playground). Ensure data is fun and memorable.

ChatGPT

Sure! Let’s use a fun example involving a family tree. We’ll create a table to store family members and their parent-child relationships, and then use the WITH RECURSIVE directive to query the data.

Step 1: Create the family Table

CREATE TABLE family (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    parent_id INT
);

Step 2: Insert Sample Data

We’ll insert some family members into the table. Imagine this family tree:

    Grandparent
       |
    Parent
    /    \
Child1  Child2
              \
             Grandchild

Here’s the SQL to insert this data:

INSERT INTO family (name, parent_id) VALUES
('Grandparent', NULL), -- id 1
('Parent', 1),         -- id 2
('Child1', 2),         -- id 3
('Child2', 2),         -- id 4
('Grandchild', 4);     -- id 5

Step 3: Use WITH RECURSIVE to Query the Family Tree Now we’ll use the WITH RECURSIVE directive to traverse the family tree. For example, let’s find all descendants of the ‘Grandparent’.

WITH RECURSIVE family_tree AS (
    SELECT id, name, parent_id, name AS path
    FROM family
    WHERE name = 'Grandparent'
    UNION ALL
    SELECT f.id, f.name, f.parent_id, ft.path || ' -> ' || f.name
    FROM family f
    INNER JOIN family_tree ft ON f.parent_id = ft.id
)
SELECT id, name, path FROM family_tree;

Explanation

  1. Base Case: The initial member (in this case, ‘Grandparent’) is selected.

  2. Recursive Case: The recursive part joins the family table with the family_tree CTE, adding each child to the tree and constructing the path from the ancestor to the current member.

  3. Final Select: This retrieves the final results, showing each member and the path from the ancestor.

Example Output

Running the above query should give you something like this:

 id |     name     |                path
----+--------------+-------------------------------------
  1 | Grandparent  | Grandparent
  2 | Parent       | Grandparent -> Parent
  3 | Child1       | Grandparent -> Parent -> Child1
  4 | Child2       | Grandparent -> Parent -> Child2
  5 | Grandchild   | Grandparent -> Parent -> Child2 -> Grandchild

This output shows the entire family tree starting from the ‘Grandparent’, with the path to each descendant clearly displayed.

You can experiment with changing the base case to another family member to see the subtree of descendants for that member. For instance, if you change WHERE name = ‘Grandparent’ to WHERE name = ‘Parent’, you’ll see the descendants of ‘Parent’ only.