Practical Use of the Monad Abstraction

Insights from Advent of Code 2020 pt. 1

Posted on February 2, 2021

This year I participated in the Advent of Code challenge, where once a day between the 1st and 25th of December a pair of new code challenge is released. Usually the second problem requires that the solution to the first problem be refactored or optimized in order to solve the second. I solved most of the problems this year in Haskell, one of my favorite programming languages. While I probably won’t be publishing all of my solutions 1, I wanted to write up and share some of the more interesting pieces of my solutions.

The infamous Monad is well known for it’s confusing name and countless useless tutorials. I won’t attempt to explain what it is here, but to explain why it is useful to have it available as an abstraction. One common objection to Monads is that, while they do seem to capture a shared interface and abstraction, it’s hard to see what benefit is derived from using Monads. In Haskell, it’s easy enough to say that the do-notation syntactic sugar requires the Monad interface and no more, but why are some people so intent on getting Monads working in Rust, OCaml, and even Java and Python?

To see why, let’s look at one of the refactors in Advent of Code’s day 14. Part 1 of this challenge introduces the operation of “masking” an integer. It gives us, as an example,

value:  000000000000000000000000000000001011  (decimal 11)
mask:   XXXXXXXXXXXXXXXXXXXXXXXXXXXXX1XXXX0X
result: 000000000000000000000000000001001001  (decimal 73)

We can see that an X indicates that the bit should remain unchanged, a 1 indicates that the bit at that position should be overwritten with a 1, and a 0 indicating that the bit at that position should be overwritten with a 0. Using the Data.Bits module, this is fairly simple to implement in Haskell.

import Data.Bits
import Data.Foldable

maskBit :: Bits a => a -> (Int, Char) -> a
maskBit x (_, 'X') = x
maskBit x (i, '0') = clearBit x i
maskBit x (i, '1') = setBit x i

mask :: Bits a => String -> a -> a
mask maskSpec original = foldl maskBit original
  (zip [0..] (reverse maskSpec))

in the above code, we have a function which sets, clears, or ignores a specified bit in some value x. Then, in the second function, we use a left fold. The three arguments to the mask function are the maskBit function, the original value, and a list of the characters in the maskSpec string paired with their indexes. (But reversed, so that "1X0X" becomes [(0,'X'), (1,'0'), (2, 'X'), (3, '1')].) The left fold will call maskBit passing in the original value and the first element in the list, and then pass the result of that back into maskBit with the second element in the list, and so on until the list is exhausted.

Take a moment to look at the code and really understand it. Note a few things:

  1. This code is generic over what type we are setting the bits of. In my full solution for day 14 and in the example, it’s always an integer, but it could be anything with a specified bit structure. Hopefully I’ll get around to writing up why I wrote it this way in a future post.
  2. The direction of the fold doesn’t really matter here. I could have used a right fold if I rewrote maskBit to have it’s arguments swapped. However, in my actual code, I used foldl', which for operations like this is much faster performance-wise than foldl or foldr.

Now let’s see how this operation changes in part 2 of day 14. The meaning of our mask has indeed changed:

If the bitmask bit is 0, the corresponding bit is unchanged.

If the bitmask bit is 1, the corresponding bit is overwritten with 1.

If the bitmask bit is X, the corresponding bit is floating.

A floating bit is not connected to anything and instead fluctuates unpredictably. In practice, this means the floating bits will take on all possible values.

In the example, we see that this means that we no longer apply the mask and get a single value out, but instead we get many values out:

value:   000000000000000000000000000000101010  (decimal 42)
mask:    000000000000000000000000000000X1001X
result:  000000000000000000000000000000X1101X

Or, written another way:

results:
        000000000000000000000000000000011010  (decimal 26)
        000000000000000000000000000000011011  (decimal 27)
        000000000000000000000000000000111010  (decimal 58)
        000000000000000000000000000000111011  (decimal 59)

Now, how do we need to refactor our code from part 1 to do part 2? Refactoring maskBit is fairly easy. Instead of returning one result, we return a list of the possible results.

import Data.Bits
import Data.Foldable

maskBit2 :: Bits a => a -> (Int, Char) -> [a]
maskBit2 x (i, 'X') = [setBit x i, clearBit x i]
maskBit2 x (_, '0') = [x]
maskBit2 x (i, '1') = [setBit x i]

Refactoring mask is slightly more of a pain. We have to stop using foldl, because our maskBit2 function no longer takes the same argument as it returns. Using recursion, we can do the following:

mask2 :: Bits a => String -> a -> [a]
mask2 mask original = go (length mask) original
  where
    reversed = reverse mask

    go (-1) value = value
    go i value = let
        values = maskBit2 value (i, reversed !! i)
        in join (map (go (i-1)) values)

This works just fine, especially since this is Haskell and recursion is sort of expected. Some notes about this implementation:

  1. go is a Haskell convention for helper functions that are declared within the scope of a top-level function. They are often recursive, but need different arguments from the top-level function.
  2. join is equivalent to flatten in many other languages. While it has a highly abstract type, here it’s equivalent to join :: [[a]] -> [a].

However, there’s actually a second definition, which both saves lines of code and time when you’re competing in a coding competition.

mask2 :: Bits a => String -> a -> [a]
mask2 maskSpec original = foldlM maskBit2 original
  (zip [0..] (reverse maskSpec))

This is identical to mask except for the type, and with foldl swapped out for foldlM. What is foldlM and what does it do?

Well, just like foldl, it folds over a list and accumulates a single value. However, unlike foldl, its first parameter is a function of type Monad m => a -> b -> m a instead of a -> b -> a, and returns m a instead of a. If you look at Monads from an “effectful computation” point of view, it’s a fold that can perform effects while it folds. For example, you could use this to concatenate the contents of a list of files.

concatenated :: [String] -> IO String
concatenated filenames = foldlM (\filename string -> do
    s <- readFile filename
    return (string ++ s)
  ) "" filenames

However, the function we’re passing into foldlM is of type Bits a => String -> a -> [a], which means we’re using the list Monad here. For effectful computations, like IO or State, it makes sense to fold over a list while performing some effects. What does it mean to fold over a list while performing the effects of a list?

Here we have to turn to the fact that lists do represent an effectful computation. Specifically, they represent a branching or forking computation. To demonstrate this, let’s first talk about Maybe’s Monad effects.

Maybe (and Either) represent fallible computations. In the sense of a data structure, they contain a single value or they contain nothing (or an error message). In the computational sense, they are fallible, and using that sense, we can infer what foldlM does when used with them. It’s a fold that can at some time fail. So for instance, if we wrote:

divideBy :: Int -> [Int] -> Maybe Int
divideBy original divisors = foldlM divOpt original divisors
  where divOpt x d = if x `mod` d == 0
    then Just (x `div` d)
    else Nothing

We can see that this divides the original value by each divisor, but only if it divides the value cleanly. Otherwise, it fails, returning Nothing.

From this perspective, we can draw parallels to lists. A computation that returns the empty list is just like a computation that returns Nothing. No value of type a is available, even though the empty list is of type [a] and Nothing is of type Maybe a. In the case of the value [x] or Just x, we have a single value of type a available. So lists can still model fallible computations. But lists can also have multiple values. What does a computation that yields [x, y] mean? It’s a computation that, when completed, has two possible results.

One way of looking at this is the traditional notion of a non-deterministic Turing machine. Unlike a standard Turing machine, a non-deterministic Turing machine can choose to take more than one possible action at a time, branching into many copies of itself and finding all of the results from each possible action. This is actually where the complexity class NP comes from. NP stands for Non-deterministically polynomial, the class of problems that can be solved in polynomial time by a branching, self duplicating machine. This perspective gives us another viewpoint, if we only want one answer (or only care if the computation failed or succeeded). We can view a non-deterministic Turing machine as a Turing machine which chooses randomly, but infinitely lucky; it always chooses the right direction to go when given a choice.

Obviously, Haskell programs are usually run on deterministic (aka normal) Turing machines, so we don’t get the ability to solve NP complete problems in polynomial time with the list Monad. The program still has to explore every choice possible all the way to the end on a single non-duplicating computer, so we end up taking exponential time to solve NP problems.

But for problems where we have a set of choices, and we want to explore the solutions we get for all possible combinations of choices, the list monad works very well.

When we use do-notation in Haskell, this is what we’re telling the program:

sums = do
    x <- [1, 2, 3, 4]
    y <- [-1, 1, 0]
    return (x + y)

We’re saying “choose all possible values for x, and all possible values for y, and then give me back all possible values for x + y.” This can also be interpreted as a pseudo-for loop. For each value x in a list, and for each value y in another list, compute x + y. The only difference between an imperative for loop and this for loop is that this for loop has the idea of returning a value, instead of using side effects to construct another data structure. We can even use the Monadic combinator guard :: Alternative f -> Bool -> f (), which is a function that takes a Boolean value and either fails or returns nothing, as a break equivalent. 2 You might even notice that in the type for guard, there’s a typeclass Alternative, which is a typeclass for Monads (actually, any Applicative) that can represent fallible nondeterminism. 3

inBothLists xs ys = do  -- def inBothLists(xs, ys):
                        --     both = []
  x <- xs               --     for x in xs:
  guard (elem x ys)     --         if x not in ys:
                        --             break
  return x              --         both.append(x)
                        --     return both

This behavior is mechanically derivable from the definition of >>= for lists, but it’s unenlightening to stare at the proof. Instead, this behavior can be best understood as a fusion of the standard imperative for loop and the forking computation method described above.

List is also not the only nondeterministic Monad. Any data structure that contains multiple values is somehow nondeterministic. For example, the non-empty list (data NonEmpty a = Cons a (NonEmpty a) | End a) represents a non-fallible nondeterministic monad. The infinite stream of values (data Stream a = Cons a (Stream a)) represents a nondeterministic monad where each choice has to have infinite possibilities. The many-branched tree (data Tree a = Leaf a | Branch [Tree a]) is a non-fallible nondeterministic monad that keeps track of how it got to each solution. And so on. It would probably be possible to write a Monad which creates a new thread on each choice, giving parallelism to these computations.

But the very fact that we understand lists as a Monad gives us so much power. We can use lists as nondeterminism during folds, maps, while looping and recursing, and even manually with do notation.

Compare this with other languages. Sure, Rust has Iterators and Java has Streams and Python has generators. But in those languages it’s very rare to see a foldlM, probably because it would be used so rarely it’s better to just write a custom recursive function if you need it. But if you can take the Async and Maybe and List and Exception and State and IO (and …) Monads and abstract them away to one single function, then it might be worth writing.

And in some contexts, for some specific Monads, it still appears in some codebases. There’s a library of promise-related functions in the codebase where I work, and they have reimplemented foldM in Java, specialized to Iterables (instead of any Foldable) and Promises (instead of any Monad), but with the exact functionality.


  1. There are plenty already published on the Advent of Code subreddit↩︎

  2. () is the zero-element tuple, also known as Unit, which has exactly one value, and therefore is the least interesting type. Compare to the imperative void, used for functions which don’t return anything.↩︎

  3. The function we care about that allows us to write guard from Alternative is empty :: Alternative f => f (). For lists, empty = [] and for Maybe a, empty = Nothing.↩︎