Advent of code

Advent of Code

The contest

Advent of Code is an advent calendar of programming puzzles made by Eric Wastl. It’s been going since 2015 and really gained popularity in 2020. I’d heard about AOC peripherally (is… peripheral hearing a thing?) for the past couple of years but 2022 was the first time I tried it myself. Taking part was a huge amount of fun and I learned a lot.

A new problem is released each day at midnight Eastern Time for the first 25 days of December. The problems are really nicely designed and there’s a thread of storyline/festive schtich connecting them. Each puzzle is split into two parts with part 2 building on part 1. The second part only becomes visible once you’ve solved part one, so there’s an incentive to try to make your initial solution somewhat generic as that will usually mean you can solve the rest more quickly/easily. Every competitor gets their own personalized input data[1] and the problems are designed to have an answer which is a simple number or string you paste into the AOC webpage when you think you have the right answer. You get a gold star for completing each part correctly. There’s nothing much more satisfying than finally seeing this little message:

Both parts of this puzzle are complete! They provide two gold stars: **

The puzzles start out quite easy but by day 15 or so they are reasonably challenging. Every one is plausible to complete before the next day’s puzzle is released, but it requires real commitment past day 15 or so. The shape of the problem set was pleasingly diverse - some required graph algorithms, others involved simulating a cellular automata, playing a kind of infinite game of Tetris, or implementing a recursive depth first search. Many of the later problems require thought about the complexity of your solution to write code that completes quickly enough but “[you don’t] need a fancy computer; every problem has a solution that completes in at most 15 seconds on ten-year-old hardware”.

There is an extremely competitive global leaderboard of folks trying to solve each problem as fast as possible. The top leaderboard competitor solved the easiest puzzle this year in just 53 seconds – here’s a crazy video of them doing just that. They managed to get both stars in less time than it took me to read the prompt! The fastest solver of the hardest (or at least most time consuming to code up) problem this year did it in a shade under 26 mins. For a mere mortal like me getting a place on the global leaderboard (it is capped at 100 people per day) was not realistic – my goal was simply to keep up and try to complete all the challenges by Christmas day. AOC also has a private leaderboard feature so you can compare your progress with a set of friends or colleagues. We had a private leaderboard at Stripe and a really friendly slack channel where veteran contestants provided tips and tricks for newcomers. There’s also a genuinely welcoming and supportive AOC community on Reddit, often centered around each day’s “Solution mega-thread” where folks share their solutions once the global leaderboard has filled out. In fact, it’s a really consistent theme across the contest that the experts engage super positively to help everyone learn. Many people share their solutions on GitHub and several of the folks at the top of the leaderboard in 2022 have YouTube channels where they carefully explain how they solved the day’s problem and share general tips and tricks - my favorite of those was hyper-neutrino’s channel.

I’m quite proud that I managed to keep up and got all 50 stars by Christmas morning. I had so much fun doing this that I also completed the 2021 puzzles over the remainder of the holiday break. It’s definitely more fun solving the problems live as they are released and while the rest of the community is thinking about them too, but still super rewarding going back over previous years’ puzzles. If you’re reading this and it sounds interesting you can get started right away. My solutions are in this GitHub repo - there are better ones out there for sure, but these ones are mine and are all reasonably fast and maybe intelligible?

Stuff I learned

I decided to write my solutions in python, which is a language I thought knew pretty well. I now feel a lot more adept at using many of the language features and the standard library thanks to AOC. My general approach to every problem was to write the solution entirely on my own[2], without reading other people’s solutions, until I had acquired both stars. After that, I’d engage with as many discussion threads and alternative solutions as I could to learn about stuff I had missed. When there was a much better approach possible I would go back and update my solution to take advantage of what I had learned. Reading lots of other peoples solutions to the same problem helped me improve my own code substantially.

Becoming more pythonic

While most python code I used to read and write was very imperative, python has excellent primitives to express stuff in a more functional way. I learned to prefer those to for loops and if statements in most places. For instance many of my solutions made liberal use of sum(), min(), max() across list comprehension statements and generators, often with filters applied inline. For example, this line in my solution to day 16 is doing a lot of work:

max(v1 + v2 for bitmask1, v1 in B.items() for bitmask2, v2 in B.items() if not bitmask1 & bitmask2)

The list/generator comprehension syntax is expressive and allows you to write your intent more clearly. A one-liner like this is less verbose but crucially means you write much less code than the imperative alternative. Write less code is a good general lesson learned. The best/fastest competitors all have short solutions - less code equals fewer bugs.

Building on this, I made frequent use of the functools and itertools packages which I’d hardly touched in the past. Here are a few highlights:

functools.reduce

The general form of sum(), etc above - applies a function of two args cumulatively on a list, replacing a more verbose for loop. This was useful on day 18 when I had a custom addition function over a weird kind of numeral and wanted to sum a list of “numbers” in that format:

from itertools import reduce
...
magnitude(reduce(add, [parse_sn(x) for x in input]))

functools.cache

Several of the later problems required the use of a depth first search [DFS]. Writing a recursive function is usually the easiest way to implement such a solution. Often an exhaustive depth first search will not complete fast enough but the problem is set up so that dynamic programming / memoization of the value of repeated branches in the search tree will make DFS tractable. I was vaguely familiar with this approach from my decades-old CS knowledge, but ended up coding up the memoization by hand the first time I needed it using a python dict – something like:

DP = {}
def expensive_function(state):
  if state in DP:
    return DP[state]
  result = actual_expensive_work()
  DP[state] = result
  return result

This is a bit tedious to code up, and also requires that you remember to write the return value to the DP dict on every return path. This is much easier with functools.cache which allows you to simply decorate your function with @cache to automatically benefit from memoization. The above code becomes:

@cache
def expensive_function(state):
  return actual_expensive_work()

@cache works only with parameter values which are hashable - in practice that means you need to use immutable value types for parameters of functions you want to memoize. (The same is true of the dict approach as dictionary keys also ultimately need to be hashed too.) The first time I used @cache I wanted to pass a set in as part of the state. I ended up doing something very hacky to work around this but later discovered frozenset from other people’s solutions. frozenset is an immutable set.

>>> DP = {}
>>> DP[set('demo')] = 42
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'set'
>>> DP[frozenset({"demo"})] = 42

Finally, @cache also has a useful cache_info() method which helps understand how valuable memoization is for the function in question. Here’s the cache info for the memoized call to dfs in my day 16 solution for instance:

    print(dfs.cache_info()) ->

CacheInfo(hits=23211, misses=2329760, maxsize=None, currsize=2329760)

itertools

There are a bunch of useful functional tools for creating and using iterators in this package. I found myself reaching confidently for itertools.product and itertools.combinations to generate states to work through (replacing nested for loops).

For instance on day 19 I needed to process each pair of scanners in the input which was easily generated with:

for c in itertools.combinations(scanner, 2):

itertools.zip_longest is also pretty handy in cases where you want to zip two lists of different lengths and do something with the elements than only appear in the longer one:

>>> list(zip([0,1,2],[0,1,2,3,4]))
[(0, 0), (1, 1), (2, 2)]
>>> # but I needed 3 and 4!
>>> list(itertools.zip_longest([0,1,2],[0,1,2,3,4]))
[(0, 0), (1, 1), (2, 2), (None, 3), (None, 4)]

Sets and complex numbers for 2D planes

Many Advent of Code problems require working with points on a 2D plane. One common way to represent a 2D plane is with a two-dimensional array, using tuples of (x, y) or (row, column) to reference the array. I wrote a custom Grid class to read in and store 2D arrays given how frequently they come up in the problems. Quite frequently though the plane can be infinite or is very large. A useful technique for those problems is to represent the plane in a set holding all (x, y) coordinates that are occupied in the problem world.

This set approach was particularly valuable for day 17’s challenge, which was one of my favorites. Here we needed to simulate a very long sequence of falling rocks moving sideways and then settling into the “terrain” (a bit like Tetris). It turned out to be super convenient to use set intersection (to test if a rock had hit the terrain) and union functions (to merge stopped rocks into the world).

# Very pleasing impedance match with sets :-)
rock = shift(world, rock, move)
if stop_rock(world, rock):
    world = world.union(rock)

Using tuples to represent points works fine but is a bit error prone and verbose once you start writing expressions that work on pairs of points etc…

A useful trick is to use python’s built-in complex number type to represent points in the plane. A complex number is of the form a + b j, where a is the real component and b is the imaginary component. (Python uses j instead of i for √-1.) For our trick the real component corresponds to the x-coordinate, and the imaginary component corresponds to the y-coordinate. Now a single value represents each point and many of the operations we might need to do on them reduce to a single mathematical operation. This produces shorter and much less fiddly (hence error prone) code.

world = set()
...
point = (x,y)
DIR = [(1,0), (-1,0), (0, 1), (0, -1)]
# Got to get all the x and y cases right
neighbors = [(point[0]+delta[0], point[1]+delta[1]) for delta in DIR]

# With complex numbers
point = x+y*1j
DIR = [1,-1,1j,-1j]
neighbors = [point+delta for delta in DIR]

collections.defaultdict and collections.Counter

Another useful couple of tools when working with Advent of Code problems are the defaultdict and Counter classes from the Python standard library’s collections module. defaultdict allows you to set a default value for a dictionary, so that if a key is not found, it will automatically be added with the default value. This can be especially useful when working with counting or tallying items. collections.Counter is a specialized dictionary that is specifically designed for counting occurrences of items in a collection. It provides a convenient way to count the items in a list, for example, and can be useful when working with problems that require counting specific elements. Both defaultdict and collections.Counter help a lot with write less code.

>>> from collections import Counter
>>> Counter("abracadabra")
Counter({'a': 5, 'b': 2, 'r': 2, 'c': 1, 'd': 1})

My utils

A lot of concepts come up again and again between problems, so it’s useful to keep a collection of previous utility functions handy. Here’s some advice from mcpower who was one of the top leaderboard finishers in 2021:

Write your own utility library. You could use someone else’s, but if you write it yourself you know exactly how things work, and exactly when things can go wrong. This should include very common things like “getting all integers from a line of input”, “distance between two points”, etc. You should definitely use other people’s code as inspiration, but I personally recommend writing it yourself for the aforementioned reasons.

Writing utilities that I really understood well and expected to be able to reuse was very rewarding. Mine include collection of common algorithms for graph traversal, such as the Floyd-Warshall algorithm for finding the shortest path between all nodes in a graph, Dijkstra’s algorithm and A* algorithm for finding the shortest path between two nodes, as well as helper functions for working with grids and lists. (See here for the actual code.)

I learned most by optimizing runtime

Getting stars each day was fun, but I think I learned most by taking some of my functional solutions to some of the harder problems and optimizing them to run as fast as possible. I set myself a goal of getting every solution to run in under 5 seconds. Here was my progress tuning day 16’s solution for instance:

  • Initial DFS implementation: 1m 36s
  • With frozenset: 56.7s
  • With pypy: 42.7s
  • Remove non-flow valves (BFS dist table) 39.5s
  • int bitmask instead of frozenset 35.2s
  • floyd-warshall instead of BFS 31.7s
  • opened valves statemap 1.624s

The most fun and interesting puzzles

I’m not going to say too much about the problems themselves because… you know – spoilers! I hope you’ll try them for yourself, but here are a few I particularly enjoyed or learned from.

Day 15: Beacon Exclusion Zone - this was the first one in 2022 where computational complexity of the solution really came into play. Day 16: Proboscidea Volcanium - from my perspective, this was the most challenging problem of them all. But I learned a ton which I was able to apply on Day 19: Not Enough Minerals (also a tough one).

Day 17: Pyroclastic Flow (the Tetris one) - this was my favorite puzzle. It has several conceptual cruxes but ultimately an elegant solution.

Finally, one puzzle I really didn’t enjoy, though I learned a valuable lesson. Day 22: Monkey Map. This one was conceptually uncomplicated but required some very intricate coding of 14 different wrapping conditions. I did it in the passenger seat on a long car journey in the UK which was not super conducive to the kind of concentration required, so I was really pleased when I finished the final case, ran the code on the sample input and got the right answer. That code promptly failed on the (much larger) real input which had a completely different shape to the example. I nearly threw my laptop out the car window when I realized that would require re-writing all fourteen of those intricate cases! The lesson here is to read the full problem input alongside the problem text before investing a lot of time in coding.

Reflections

Participating in Advent of Code was a fun and stimulating experience that allowed me to improve my python skills. One of the biggest benefits I found from the contest was being able to learn from others who were solving the same challenges. I engaged in discussion threads, read alternative solutions and compared my progress with friends and colleagues on the leaderboard. This was not only an opportunity to improve my own code, but also to learn new and more efficient ways of solving problems. I found that the community was extremely welcoming and supportive, and I enjoyed the opportunity to share my own solutions and learn from other participants on Reddit and GitHub. Overall, Advent of Code was a great opportunity to improve my skills and learn from others in a fun and festive environment.

[1] In practice there are a finite number of input data files for each day so some competitors get the same data but the point is you can’t just copy someone else’s answer and expect it to work for you, which is nice. [2] I managed to keep this rule every day except day 16 where I wrote a pretty inefficient solution to part 2 and opened up the solutions megathread while it was running and learned a trick to speed it up that I don’t think I would have figured out by myself.