LAMBDA: The Ultimate Excel Spreadsheet Feature

0

Since its release in the 1980s, Microsoft Excel has changed the way people organize, analyze and visualize their data, providing a foundation for decision-making for the millions of people who use it every day. It is also the most used in the world programming language. Excel formulas are written by an order of magnitude more users than all the C, C++, C#, Java, and Python programmers in the world combined. Despite its success, considered a programming language Excel has fundamental weaknesses. Over the years, two particular shortcomings have stood out: (1) the Excel formula language only supported scalar values ​​(numbers, strings, and Booleans) and (2) it did not allow users to define new functions.

Until now.

Microsoft Research Cambridge’s Calc Intelligence project has a long-standing partnership with the Excel team to turn spreadsheet formulas into a full-fledged programming language. The fruits of this partnership are beginning to show in the product itself. At the ACM SIGPLAN 2019 Symposium on Principles of Programming Languages ​​(POPL 2019), we announced two important developments: Data Types take Excel beyond text and numbers and allow cells to contain first-class records, including entities linked to external data, and dynamic tables allow ordinary formulas to calculate entire arrays that spill into adjacent cells. These changes are a substantial start to our first challenge: rich, first-class structured data in Excel.

In December 2020, we announced LAMBDA, which allows users to define new functions written in Excel’s own formula language, directly addressing our second challenge. These newly defined functions can call other functions defined by LAMBDA, at arbitrary depth, even recursively. With LAMBDA, Excel has become Turing-complete. You can now, in principle, write any calculation in Excel formula language. LAMBDA is available to members of the Insiders: Beta program. The initial release has some implementation restrictions that we plan to lift in the future. We discussed LAMBDA and some of our spreadsheet research in a sponsored video featured at POPL 2021.

The power of LAMBDA

Scholars have known since the 1960s that Church’s lambda notation is a basis for a wide range of programming languages ​​and is therefore a highly expressive programming construct in its own right. Its incorporation into Excel represents a qualitative change, not just an incremental change.

To illustrate the power of LAMBDA, here is a function written using notation to calculate the length of the hypotenuse of a right triangle:

=LAMBDA( X, Y, SQRT( X*X+Y*Y ) )

LAMBDA complements the March 2020 version of LET, which allows us to structure the same example like this:

=LAMBDA( X, Y, LET( XS, X*X, YS, Y*Y, SQRT( XS+YS ) ) )

The function takes two named arguments X and Yes, binds the value of X*X the name XS, binds the value of A*A at YS, and returns SQUARE( XS+YS) as a result.

The existing name manager in Excel allows you to give a name to any formula. If we name our function PYTHAGORAS, then a formula such as PYTHAGORAS(3,4) evaluates to 5. Once named, you call the function by name, eliminating the need to repeat entire formulas when you want to use them.

Moreover, LAMBDA is the real lambda we know and love: a lambda can be an argument for another lambda or its result; you can set Church figures; lambdas can return lambdas, so you can make curry; you can define a fixed-point combinator using LAMBDA and therefore write recursive functions; etc (Also, since lambdas can be named, they can call each other directly recursively, which is much more convenient than using a fixed-point combinator.)

Examples: Recursive Inversion of a String and a Fixed-Point Combinator

Reversing a string is beyond Excel’s built-in functions and previously could only be written outside of the formula language, using Visual Basic or JavaScript. Here is a definition of REVERSE as a recursive LAMBDA, which uses two helper functions – HEAD and TAIL – to calculate the first character and everything but the first character respectively.

Even without relying on a recursively defined name, the formula language is Turing-complete because we can encode recursive function definitions using the classic call-by-value fixed-point combinator. In fact, running this combinator was one of LAMBDA’s first stress tests in the Excel code base. Here it is, applied to define a factorial function.

(On the other hand, Felienne Hermans nice blog post on writing a Turing machine in Excel does not, strictly speaking, establish Turing completeness because it uses successive rows for successive states, so the number of steps is limited by the number of rows.)

And after?

There’s a lot more to come. In the near term, we expect to see fully nested arrays and efficient implementations of array processing combinators, such as MAP and REDUCE, that take lambda functions as arguments. Beyond that, we hope to define functions not just by a single formula but by an entire worksheet, called sheet-defined functions, or even elastic functions defined by the sheet. In essence, sheet-defined functions will “follow the flow” of a typical spreadsheet design by allowing users to define a larger function through multiple formulas spread across multiple cells.

Community response

A programming language is only successful if its user base can easily and effectively use its power. It is debatable whether LAMBDA is good for programming language enthusiasts, but too difficult for end users to understand. By launching LAMBDA, Microsoft effectively launched a world-wide experience on end-user programming with higher-order functions.

The first feedback is encouraging. Within 24 hours of LAMBDA’s release in December, there were several videos, including one on splitting data across multiple columns and one on the use LAMBDA and LET to create single cell reports, and blog posts, like this one on calculation of axis scales, from the Excel community describing applications of LAMBDA that we had never thought of. A trade item described transition from Excel to a Turing-complete programming language.

Moreover, even if it takes more skills and knowledge to author a lambda, no additional skills are required to to call this. LAMBDA allows skilled authors to extend Excel with application domain-specific functions that appear transparently as part of Excel to their colleagues, who simply call them.

It will be interesting to see how users continue to experiment and apply not only LAMBDA, but also data types and dynamic arrays. We believe these new functional programming features will transform the way people make decisions with Excel.

From research to product and back to research

Our partnership with the product team illustrates a symbiosis between research and practice. For example, like many long-lived programming systems, the only truly accurate documentation of Excel’s semantics is its source code. So, to understand what Excel really does, we developed written semantics for it and a reference implementation of formula evaluation in TypeScript, Calc.ts.

We initially thought the Excel formula language was quite simple – that’s part of what makes it so appealing to end users – but we’ve found that it incorporates a variety of interesting and little-known features. For example, in Excel, a interval—a rectangular area of ​​the grid—is a first-class value. The ROW( function interval ) returns the row number of the first row in the range, so ROW( A7:A99 ) returns 7. But since ranges are first class, they can be returned by functions: INDEX( A7:A99, 3 ), for example, returns a reference to the third cell of range A7:A99—that is, range A9:A9—not the value of cell A9. So ROW( INDEX( A7:A99, 3 ) ) returns 9. Excel has several functions that take or return ranges, including union and intersection operators. It even has an INDIRECT function, which takes a string and interprets it as a range: INDIRECT(“A9” & “9”) returns the range A99:A99 (here & is the concatenation of strings).

We also discovered that Excel has a mechanism for “self-lifting” functions on arrays. For example, SUM( A2:A100 + 1 ) takes the range A2:A100, dereferences a vector of 99 values ​​(since (+) doesn’t accept references), then adds one to each element (lifting (+ ) on the table), and summarizes the result. It took us a while to figure out the precise rules that govern this behavior, but they turned out to be simple and systematic.

Writing semantics independent of the Excel formula language led to extended dialogues with the Excel team, numerous experimental validations, and occasional conversations with Excel engineers, who verified the code source.

Our independent reference implementation not only fleshed out and brought the semantics to life, but also proved extremely useful for the web version of Excel, which needs to evaluate formulas in the browser. Read the story of Calc.ts, which powers client-side calculations for the web version of Excel, on the Microsoft Garage Wall of Fame. Beyond its product impact, Calc.ts is also a fantastic research asset, as it allows prototype evaluation variants of formulas for rapid experimentation, for internal or university projects, for example. Let us know if you would like access to Calc.ts for research purposes.

More broadly, we have relied on our partnership to develop a research program around the theme of end-user programming, specifically involving a research crossover between programming languages, human-computer interaction and machine learning, such as This selection of articles bears witness to this:

Join our team!

Calc Intelligence is recruiting. We aim to be a diverse and inclusive team representing a range of disciplines, including human-computer interaction, machine learning and programming languages. Your work on our team could have a real impact on the most popular programming language in the world! Find out more about the 2021 internship opportunities in human machine interaction and in Programming languages and on the Calc Intelligence project.

Share.

About Author

Comments are closed.