Advent of Code 2018

A Sunday morning with the Advent of Code


I happened to stumble upon the very interesting site: https://adventofcode.com; where there are some challenges posted every day until Christmas. It felt more adventurous than CodeForces, SPOJ or CodeChef – so I took to it.

Day 01

(UPDATE: After authoring this post, I also came across https://www.reddit.com/r/adventofcode/comments/a20646/2018_day_1_solutions/eaukxu5/, which offers a new and interesting perspective. It is worth a read.)

https://adventofcode.com/2018/day/1. There are two parts to day 01, and solving each will give you exactly 1 gold star. Obtaining the first gold star is perhaps too trivial to write about. The second part is a classic data structures 101 problem, which I enjoyed. The problem is well described in the site. In summary, you have a stream of numbers, which you have to continuously accumulate until you find a duplicated state.

My first attempt was to brute force it, something like this:

def find_duplicate():
    register, counter, memory = 0, 0, []
    while 1:
        print("epoch: %s" % counter)
        counter = counter + 1
        for freq in freqs:
            register = register + freq
            if register in memory:
                return register
            memory.append(register)

but that took a significant amount of time… more than 2 minutes, on my MacBook Air 2015.

python 01.py  133.25s user 0.51s system 99% cpu 2:14.39 total

Running a profiler confirms that the time hog is the step where we check for item existence in list.

Total time: 16.9317 s
File: 01.py
Function: find_duplicate at line 3

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     3                                           @profile
     4                                           def find_duplicate():
     5         1        262.0    262.0      0.0      register, counter, memory = 0, 0, []
     6         1          2.0      2.0      0.0      while 1:
     7        51       2223.0     43.6      0.0          print("epoch: %s" % counter)
     8        51         51.0      1.0      0.0          counter = counter + 1
     9     52111      26479.0      0.5      0.2          for freq in freqs:
    10     52061      29430.0      0.6      0.2              register = register + freq
    11     52061   16822038.0    323.1     99.4              if register in memory:
    12                                                           return register
    13     52060      51251.0      1.0      0.3              memory.append(register)

From the CPython implementation of in (https://github.com/python/cpython/blob/master/Objects/listobject.c#L456), we know that it is an O(n) operation. Below is the list_contains function:

static int
list_contains(PyListObject *a, PyObject *el)
{
    Py_ssize_t i;
    int cmp;

    for (i = 0, cmp = 0 ; cmp == 0 && i < Py_SIZE(a); ++i)
        cmp = PyObject_RichCompareBool(el, PyList_GET_ITEM(a, i),
                                           Py_EQ);
    return cmp;
}

Bloom Filters

We can do much better… with probabilistic data structures. We could try this problem with a Bloom Filter. For the sake of it, I also switched the language to Golang.

I’m using an already written implementation of Bloom Filters in Golang that I Googled and found. It is here: https://github.com/AndreasBriese/bbloom. Below is the core of the implementation.

func duplicatedFrequency(frequencies []string) string {
    var register int
    var registerStr string
    bf := bbloom.New(float64(1<<18), float64(0.001))

    for {
        for _, freq := range frequencies {
            val, _ := strconv.Atoi(freq)
            register = register + val
            registerStr = strconv.Itoa(register)

            if bf.Has([]byte(registerStr)) {
                return registerStr
            }

            bf.Add([]byte(registerStr))
        }
    }
}

This runs almost 400x faster than the brute force method:

go run 01.go  0.32s user 0.16s system 94% cpu 0.511 total

The full gist can be found here… and now off to a nice Sunday breakfast.


Share article

Also recommended

Styling And Building Custom Maps
Notes on building a data-styled map with TileMill

First, you’ll need to install TileMill (the now deprecated software in favour of Mapbox Studio). Even if it has …

Read now ▶
Parsing Python Abstract Syntax Trees
Featuring my talk at Pycon India 2016

Update: This blog post inspired my talk at Pycon India 2016, Hacking the Python AST. Watch it on YouTube: …

Read now ▶