Only the first day of Advent of Code 2018

2018-12-09

I happened to stumble upon the very interesting site: https://adventofcode.com; where there are some challenges posted every day until Christmas. It felt more adventurous than LeetCode, CodeForces, SPOJ or CodeChef -- so I took to it. After authoring this post, I also came across a reddit post, which offers a new and interesting perspective. It is worth a read.

In this post, I just speak about day 01. There are two parts to day 01, and solving each will give you exactly 1 gold star. Obtaining the first gold star is perhaps too trivial to write about. The second part is a classic data structures 101 problem, which I enjoyed. The problem is well described in the site. In summary, you have a stream of numbers, which you have to continuously accumulate until you find a duplicated state.

My first attempt was to brute force it, something like this:

def find_duplicate():
    register, counter, memory = 0, 0, []
    while 1:
        print("epoch: %s" % counter)
        counter = counter + 1
        for freq in freqs:
            register = register + freq
            if register in memory:
                return register
            memory.append(register)

but that took a significant amount of time... more than 2 minutes, on my MacBook Air 2015.

python 01.py  133.25s user 0.51s system 99% cpu 2:14.39 total

Running a profiler confirms that the time hog is the step where we check for item existence in list.

Total time: 16.9317 s
File: 01.py
Function: find_duplicate at line 3

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     3                                           @profile
     4                                           def find_duplicate():
     5         1        262.0    262.0      0.0      register, counter, memory = 0, 0, []
     6         1          2.0      2.0      0.0      while 1:
     7        51       2223.0     43.6      0.0          print("epoch: %s" % counter)
     8        51         51.0      1.0      0.0          counter = counter + 1
     9     52111      26479.0      0.5      0.2          for freq in freqs:
    10     52061      29430.0      0.6      0.2              register = register + freq
    11     52061   16822038.0    323.1     99.4              if register in memory:
    12                                                           return register
    13     52060      51251.0      1.0      0.3              memory.append(register)

From the CPython implementation of in (https://github.com/python/cpython/blob/master/Objects/listobject.c#L456), we know that it is an O(n) operation. Below is the list_contains function:

static int
list_contains(PyListObject *a, PyObject *el)
{
    Py_ssize_t i;
    int cmp;

    for (i = 0, cmp = 0 ; cmp == 0 && i < Py_SIZE(a); ++i)
        cmp = PyObject_RichCompareBool(el, PyList_GET_ITEM(a, i),
                                           Py_EQ);
    return cmp;
}

Bloom Filters

We can do better with probabilistic data structures. We could try this problem with a Bloom Filter. For the sake of it, I also switched the language to Golang. I'm using an already written implementation of Bloom Filters in Golang that I Googled and found. It is here: https://github.com/AndreasBriese/bbloom. Below is the core of the implementation.

func duplicatedFrequency(frequencies []string) string {
    var register int
    var registerStr string
    bf := bbloom.New(float64(1<<18), float64(0.001))

    for {
        for _, freq := range frequencies {
            val, _ := strconv.Atoi(freq)
            register = register + val
            registerStr = strconv.Itoa(register)

            if bf.Has([]byte(registerStr)) {
                return registerStr
            }

            bf.Add([]byte(registerStr))
        }
    }
}

This runs almost 400x faster than the brute force method:

go run 01.go  0.32s user 0.16s system 94% cpu 0.511 total

The full gist can be found here... and now off to a nice Sunday breakfast.