I made a spreadsheet

Visicalc, the world’s first spreadsheet, was also the original killer app. Steve Jobs said that Visicalc “propelled the Apple ][ to the success it achieved”. As one of the first mainstream digital tools, I’m fairly sure the first spreadsheet inspired Jobs’ later quote that “…the computer is the most remarkable tool that we’ve ever come up with. It’s the equivalent of a bicycle for our minds.” – and that quote often inspires me.

Having completed a proof of concept compiler, I was reminded of this list of challenging projects every programmer should try. I’ve built a text editor before, and now, a compiler, but a spreadsheet - my very own “magic sheet of paper that can perform calculations and recalculations” seemed like a next fun Saturday challenge.

Here’s the app I made - it’s very simple but demonstrates the core concepts of all spreadsheets. You can try it yourself right here, or full screen at dps-spreadsheet.netlify.app.

You can read the code on GitHub. The UI is implemented in React with Material UI and you’ll find most of the implementation in SpreadsheetItems.js and grammar.jison.

What even is a spreadsheet?

You’ve probably used a spreadsheet before, but it’s worth taking a step back to think about what a spreadsheet really is.

The essence of any spreadsheet is a 2D array of cells in rows (numbered 1...N) and columns (labeled A...Z). Into each cell you can enter either a value e.g. Item, 3.14159 or formulae which usually start with an equals sign e.g. =A1+1. The spreadsheet constantly updates to run all the calculations provided in its formulae. What’s really clever is that the formula for each cell can refer to values in other cells in the spreadsheet, including those that contain formulae. So you can construct a model where changing the value in a single cell (e.g. a tax rate, or currency exchange rate) magically updates every other cell that needs to change in the sheet.

Recalculating on change

Since each spreadsheet cell can depend on arbitrarily many values of other cells across the whole sheet, some thought is required to figure out how to end up with the correct values everywhere.

Visicalc achieved this by recalculating all the formulae left-to-right and top-to-bottom repeatedly until none of them changed. That works, but there’s a better and more efficient way. Lotus 1-2-3 was the first spreadsheet to introduce “natural-order recalculation”, which I’ve implemented here. (Note there are even more efficient schemes, which you can read about in this great blog post).

Natural-order recalculation involves figuring out the dependency order of the formulae in the sheet and, as values change, recalculating each cell only once and crucially before any other cells that depend on its value.

For instance, say we have the following cells and formulae:

A1: 5
A2: =A1*2
A3: =A2+A4
A4: =A2

The value of A2 depends on A1 since it references it in the formula for that cell. Similarly, A4 depends on A2 and A3 depends on both A4 and A2.

A3 --> A4 --> A2 --> A1
 |___________⤴

This dependency ordering forms a directed acyclic graph [DAG]. By forming the graph and then determining its ordering with respect to those dependencies we can come up with an sequence through the cells where, if we compute cells values in that sequence no repeated re-calculation is required. We’ll get the right answer first time. In the example above the order A1, A2, A4, A3 would achieve the desired result. A popular algorithm for performing this topological sort is Kahn’s algorithm. There also happens to be an npm module which implements this called toposort.

Here’s my entire function that builds the DAG from the formulae in the sheet (or stuffs values in the final values structure) and does the toposort:

  const toposortCells = () => {
    var graph = [];

    for (var item in cells) {
        const formula = String(cells[item]);
        if (!formula.startsWith("=")) {
            window.values[item] = formula;
        } else {
            var edgesAdded = 0;

            const matches = formula.match(/[A-Z][0-9]+/g);
            for (var match in matches) {
                edgesAdded++;
                graph.push([matches[match], item]);
            }

            if (edgesAdded === 0) {
                // This cell has no dependencies, but it is a formula so
                // compute its value right now. (e.g. pure math)
                window.values[item] = parse(formula);
            }
        }

      }
      const topo = toposort(graph);
      parseCells(topo); // simply calls parse on each cell in the order in topo.
  }

I think that’s rather neat - the heart of a spreadsheet in ~12 lines of code! This design keeps a separate cells map (which contains the values or formulae input by the user) and values map where computed values are stored and then displayed in the UI. cells and values are sparse data structures - i.e. we don’t predefine a value for all of the A...Z x 1...N cells but simply store data with the cell name as the key in a map when we first see it.

In the little code snippet above, we iterate through every cell which has a defined formula. If it doesn’t start with an = sign it’s a value and we put it straight into the values map with the same cell name as the key. Otherwise, we find all the cell names that appear in the formula using the regular expression /[A-Z][0-9]+/g and add an edge to the DAG between each cell appearing in the formula and the cell currently being inspected. The DAG (graph) is modeled as an array of two-element arrays, on which we can call toposort to produce the natural ordering.

cells = {           graph = [['A1', 'A2'], ['A2', 'A3'], ['A4', 'A3'], ['A2', 'A4']]
  'A1': '5',
  'A2': '=A1*2',    toposort(graph) -> ["A1", "A2", "A4", "A3"]
  'A3': '=A2+A4',
  'A4': '=A2',
}

Now we have the correct ordering, we can iterate over each cell in that sequence, parsing the formula and computing the current value as we go.

Parsing formulae

Parsing the formulae is a very similar task to the lexing and parsing we saw for the compiler. Again, there are lots of good off-the-shelf parser generators available. Since I was working with Javascript I chose the yacc-like jison. It was very familiar from last week’s goyacc adventure.

There’s an example jison grammar for parsing and interpreting mathematical expressions to which I made minimal additions to parse cell names in the lexer, and grab values for those cells in the parser.

\s+                   /* skip whitespace */
"="                   /* skip = */
[0-9]+("."[0-9]+)?\b  return 'NUMBER'
[A-Z][0-9]+           return 'CELL'    // <------ match cell names like "A1"
"*"                   return '*'
...
expressions
    : e EOF
        { return $1; }
    ;

e
    : e '+' e
        {$$ = $1+$3;}
...
    | CELL
      {$$ = Number(window.values[yytext]);}  // <------ When interpreting a cell, grab its value

Unlike the compiler example, we’re not building a parse tree as we visit each node. Instead we’re simply interpreting values, math operations and cell values on the spot to compute the ultimate value of the formula. jison generates a (complicated) javascript file which defines the method parse which we can call for each formula, in the order we determined above, to compute all the values.

  const parseCells = (topo) => {
      for (var i in topo) {
          const cell = topo[i];
          const formula = String(cells[cell]);

          if (formula.startsWith("=")) {
              try {
                const val = parse(cells[cell]);
                window.values[cell] = val;
                if (isNaN(val)) {
                    window.values[cell] = "NAN";
                }
                delete errorCellText[cell];
              } catch (err) {
                errorCellText[cell] = err.message;
                window.values[cell] = "ERR";
              }
          }
      }
  }

That’s it - the whole spreadsheet engine.

User interface

The rest of the project involved building a UI in React to present all this in a familiar package. I spent longer on the UI implementation than the core of the engine, but it’s conceptually quite simple.

React provided a neat impedance match for the task of building this UI. While this was my first time working with Material UI, I found it easy to pick up too. The definition of the core spreadsheet view is also delightfully compact:

<TableBody>
    {[...Array(rowCount).keys()].map((r, row) => (
    <TableRow key={"row-" + row}>
        <TableCell component="th" scope="row" key={"rowl-" + row}>
        {row + 1}
        </TableCell>
        {[...Array(colCount).keys()].map((i) => {
            const cellName = String.fromCharCode('A'.charCodeAt(0) + i) + (row + 1);
            return <TableCell 
                className={cellName === editCell ? classes.outlined : inSelection(cellName) ? classes.selectedCell : classes.tableCell}
                key={"cell-" + cellName + "-" + (cells[cellName] || '')}
                onClick={() => {setEditCell(cellName);}}
                align="right">
                    {window.values[cellName] || cells[cellName] || ''}
                </TableCell>;
        })}
    </TableRow>
    ))}
</TableBody>

Selection and fill

The final thing that, in my view, is part of the essence of a spreadsheet is the ability to fill formulae across a whole selection. If you’ve used a spreadsheet, you have probably done this many times. Filling formulae means the user doesn’t have to type out repetitive incantations in every cell across a whole table of data. It’s the ability to type =A1+1 in cell A2 and then fill the whole A column, with the A1 becoming A2, A3, A4 etc as the fill moves progressively down the sheet.

I added a UI for managing and showing the range of selected cells (using shift + arrow keys) and the fillDown function triggered by ctrl+d.

  const fillDown = () => {
      const startCell = selection.split(":")[0];
      const endCell = selection.split(":")[1];
      const startParts = cellParts(startCell);
      const endParts = cellParts(endCell);

      var templateFormula = cells[startCell];

      for (var i = startParts.row + 1; i <= endParts.row; i++) {
        const regex = /([A-Z])([0-9]+)/g;
        var computedFormula = "";
        var matches;
        var pos = 0;
        while ((matches = regex.exec(templateFormula))) {
            // Add the next bit that isn't a cell name
            computedFormula += templateFormula.substr(pos, matches.index - pos);

            // Then grab the cell name and increment it by one row (filling DOWN).
            computedFormula += matches[1] + (Number(matches[2]) + 1);
            pos = matches.index + matches[0].length;
        }
        // copy over anything left at the end of the template after all matches.
        computedFormula += templateFormula.substring(pos);

        cells[startParts.col + i] = computedFormula;
        templateFormula = computedFormula;
      }
      toposortCells();
  }

This was probably the trickiest part of the whole project to get right - the API for repeatedly matching a regex in the same string is a bit subtle!

It’s a wrap

Cells, formulae and filling - now we have a spreadsheet. That’s all I could do in an afternoon, but there are lots of fun extensions. For instance, the next order of business would be to add SUM and AVERAGE as spreadsheet functions operating on ranges of cells. VLOOKUP would be useful too.

If you’d like to play with the natural order computation and parsing functions, I’ve created a these RunKit playgrounds:

And here’s a video of the final product:

Thanks to Luke Miles for reviewing a draft of this post (and telling me to put the spreadsheet inline!)

Contents