Overview: Working with Immutable Data
Applies to: Functional Programming
Published: January 2010
Summary: This overview looks at one of the important functional concepts—immutability. It explains how immutable programs are written and introduces F# concepts such as functions and recursion.
This topic contains the following sections.
- Using Immutability Everywhere
- Programming with Immutable Values
- Programming with Immutable Data Structures
- Using Recursion to Change Program State
- Reusing Patterns by Using Functions
- Additional Resources
- See Also
This article is an excerpt from Real World Functional Programming: With Examples in F# and C# by Tomas Petricek with Jon Skeet from Manning Publications (ISBN 9781933988924, copyright Manning Publications 2009, all rights reserved). No part of these chapters may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, electrostatic, mechanical, photocopying, recording, or otherwise—without the prior written permission of the publisher, except in the case of brief quotations embodied in critical articles or reviews.
Immutability is one of the key concepts in functional programming. It largely changes how programs are written. This article looks at basic functional constructs in the F# language that make it possible to elegantly write programs that use immutable values. It introduces value and function declarations and talks about recursion, which is a technique for writing code that works with real-world functional data structures.
In functional languages, immutability is present in two ways:
Variables in functional languages are replaced by symbols or values that cannot be changed. After assigning a value such as number or a string to a symbol (for example, num), the value referred to by the symbol cannot be changed. This is reflected in the F# syntax: num = 10 no longer modifies a variable. It is a logical expression testing whether the value assigned to num is 10.
Data structures that are used in functional programs are also immutable. Once an object is created (e.g., to represent a character or item in a game) it cannot be changed. To update the game state, the program needs to create a new object representing the changed state.
This article focuses on what immutability looks like in F# and doesn't discuss why it is a good thing. More information about how immutability makes code more readable and maintainable can be found in:
Overview: Understanding What a Program Does demonstrates how immutability makes it easier to reason about programs and also how it simplifies testing. The article uses mostly C#.
Overview: Parallel and Asynchronous Programming explains how immutability makes it easier to parallelize programs.
The first aspect of immutability in functional programming languages is that "variables" cannot be changed.
If a variable cannot change, then it isn't really a variable. For this reason, functional languages usually prefer a different term. F# uses the term value or symbol instead of variable. The F# language also supports traditional mutable variables. This is useful mainly for performance optimizations or when working with a .NET library that uses mutation.
The following example demonstrates the difference. A version that uses mutable variables in F# looks like this:
// Declare mutable variable and initialize it let mutable res = GetInitialValue() // Mutate the value by adding number two times res <- res + ReadInt32() res <- res * ReadInt32() WriteInt32(res);
A mutable variable is declared using the let mutable syntax. This is essentially the same thing as var in C#. Just like in C#, the F# compiler automatically infers the type of variable. The next two lines modify the value stored in a variable. This is done using the <- operator, which is used to assign a new value to mutable variables.
The following snippet implements the same functionality without modifying the value of any variables:
let res = GetInitialValue() let res = res + ReadInt32() let res = res * ReadInt32() WriteInt32(res)
When declaring a value using the standard let syntax, it cannot be modified using the <- operator. Instead of modifying an existing value, the snippet declares a new value using the let syntax again. The example is slightly tricky, because it reuses the same name for a newly declared value. The code would mean the same thing if the symbols were named res0, res1, and res2.
The repeated use of the same name demonstrates the shadowing of bindings. When declaring a new value with the same name, it hides the previous value of the same name. When the code refers to res on the third line, it gets the value of the symbol declared on the second line and not the one from the first line. Technically, shadowing doesn't just overwrite the value in the memory. If the previous value is captured by some construct, it will not be modified. Readers familiar with C# may know that capturing mutable variables in lambda functions causes a lot of confusion in C#. More information on this topic can be found in the articles referenced the end of this document.
Using immutable values instead of variables requires expressing algorithms in a different way. The overview gets back to this topic but, first, you'll see another use of immutability in functional languages.
The approach to using immutable values isn’t limited just to variables but extends to declarations of all functional data types. A declaration of a functional data type (or a class in object-oriented terms) may store, for example, an integer and a string. These data items need to be declared as members of a data type. Functional languages prefer immutable values over variables, and the same principle is used when declaring data type members. The members of the type (in this example, an integer and a string) will also be immutable. This means that the type cannot modify its fields after it is created.
Working with Immutable Strings
Though immutable types may seem a bit strange at first, they aren't actually that uncommon, even in .NET or Java world. A typical example of such immutable data type, known to all .NET programmers, is the type string. When working with strings, every operation that acts on the string (e.g., Substring) doesn’t alter the original string but, instead, returns a new string value:
let input = "hElLo wOrLD!" let first = input.Substring(0, 1).ToUpper() let rest = input.Substring(1).ToLower() let result = first + rest
This example takes an input string and turns it into a version that starts with an uppercase letter followed by the rest of the string in lowercase. This demonstrates how to work with data types that are immutable in general. The solution is quite simple—the operations that work on these types (like ToUpper or Substring) return a new string value as a result.
Working with Immutable Lists
This approach can be used for any immutable types. For example, what would a functional list of integers look like? The list is created using two operations. One creates an empty list and the other “adds” a number to a list. Since the list is immutable, adding an element has to return a new list containing the items from the original list and the newly added element.
The following example uses a fictional type FunctionalList that has the two operations as methods (in a similar way as string):
let list = FunctionalList.Empty let list = list.Add(1) let list = list.Add(3) let list = list.Add(5) let list = list.Add(7)
This can, of course, be written more elegantly, without declaring all of the unnecessary temporary variables. The Add operation can be called straight away on the returned temporary list:
Using purely immutable types changes the way code is written. This example shows that immutable data types aren’t by any means less powerful then their mutable alternatives. They just have to be used in a different way and, indeed, they also have to be designed and implemented in a different way!
Immutable types have many benefits. Many of them are explained in Overview: Understanding What a Program Does. Another benefit that is less important, but is nicely shown in the previous example is that working with immutable types tends to be very succinct. The above snippet chains operations (such as Empty and Add) in a single expression instead of adding elements in a separate statements.
More information about immutable data types can be found in Tutorial: Functional Programming in C# and F#. The tutorial also shows how to implement an immutable list such as FunctionalList from this example in C# and how to use a similar type in F#.
An earlier section demonstrated how to rewrite simple code that changes a variable to use immutable values instead. The upcoming section returns to this topic and discusses a more interesting case.
The F# library provides a large collection of reusable functions that are often sufficient for solving many real-world problems. This section shows how these functions are implemented (and what to do when a library doesn't provide a required function).
The next example defines a function that sums numbers in a specified range. The sum could be calculated directly but it serves as an example of calculation that uses a loop. The function can be written in F# using a mutable variable:
The snippet shows an F# function declaration. Just as when creating values, functions are written using the let keyword. The difference is that the function name (sumNumbers) is followed by parameters. The function in the example has two parameters nfrom and nto. The types of parameters are not written explicitly because F# infers the type from the way parameters are used. The function body declares a mutable variable res using the let mutable binding as in the earlier example. Then, it uses a for loop and, inside the body, changes the value of the variable using the <- operator.
In this case, it is not possible to replace directly the variable with value bindings because a new value is created during every evaluation of the XE "loops:using recursion" loop. The program has to keep a certain state, and that state changes on each iteration of the loop. There is no way to declare a new value for every change of the state as in the earlier example. The code needs a more fundamental change, which is to use recursion XE "recursion:encoding loops" :
let rec sumNumbers nfrom nto = if nfrom > nto then 0 else let sumRest = sumNumbers (nfrom + 1) nto nfrom + sumRest
A recursive function in F# needs to be explicitly marked as recursive using let rec. Then it is allowed to make recursive calls in the body of the function. The sumNumbers function first tests if the range to be processed is empty. If yes, it immediately returns zero. If there are some numbers to be added, it first recursively processes a range starting from the lower bound incremented by one. Then, it adds the lower bound (nfrom) to the recursively added rest (sumRest).
The body contains only a single value binding to declare the value sumRest, so it’s purely functional. The state of the computation, which was originally stored in a mutable variable, is now expressed using recursion. The earlier statement that there is no way to declare a new variable for every change of the state was in some sense incorrect. That’s what the new recursive implementation does. Every time the function recursively calls itself, it skips the first number and calculates the sum of the remaining numbers. This result is bound to a value sumRest, which is declared in a new binding during every execution of the recursive function.
Recursion has a reputation for being difficult. After gaining some experience with functional programming, most of the people realize that this is not the case. Nevertheless, it would be unfortunate to repeat the same recursive pattern over and over again, so functional languages provide a way for "hiding" the recursive part and expressing most of the problems without using recursion explicitly...
The question that motivates this section is, “How to separate the functionality that will vary with every use from the recursive nature of the code that always stays the same?” The answer is simple: write the recursive part as a function with parameters. These parameters specify the unique operation that the function should perform.
The sumNumbers function from the previous section defines one such recursive pattern that stays the same for many different. It starts with some initial value and calculates a new state of the calculation for each iteration (in a specified range) using some operation. The previous section used zero as the initial value and addition as the operation. The resulting computation for a range from 5 to 10 looked like this:
5 + (6 + (7 + (8 + (9 + (10 + 0)))))
Is there some way to make the function more general to allow performing computations using different operations? For example, it could multiply all the numbers in the range together, generating the following computation:
5 * (6 * (7 * (8 * (9 * (10 * 1)))))
There are only two changes: the initial value is 1 instead of 0, and the operator used during the computation becomes * instead of +. A function that can perform both calculations looks as follows:
let rec aggregateNumbers init op nfrom nto = if nfrom > nto then init else let sumRest = aggregateNumbers init op (nfrom + 1) nto op nfrom sumRest
The function has two additional parameters—the initial value init and the operation op that specifies how to transform the intermediate result and a number from the range into the next result. Again, the F# compiler infers the types of parameters automatically. For example, the last line contains op followed by two other identifiers. This is the F# syntax for calling functions with multiple arguments; therefore, F# infers that op is actually a function. Before explaining several concepts related to functions, the following three snippets show how to use aggregateNumbers from F# Interactive:
> aggregateNumbers 0 (+) 1 5;; val it : int = 15 > aggregateNumbers 1 (*) 1 5;; val it : int = 120 > aggregateNumbers "" (sprintf "%d%s") 1 5;; val it : string = "12345"
The first two snippets demonstrate the two uses discussed earlier. Note that F# allows using operator as a function by enclosing it in parentheses. It is not necessary to write a separate function explicitly (for example, add or multiply) just so that it can be used as an argument.
The last example is more interesting. It uses an empty string as the initial value. The function that is passed as an argument takes a number (the current value from the range) and a string (the result accumulated so far) and creates a new string containing the number prepended to the original string. The references at the end of this article contain more information on what exactly is going on here, but here is a brief summary of the two interesting aspects:
The function aggregateNumbers is automatically generalized. This means that the result doesn't have to be just an integer. F# automatically recognizes that the function is generic and it compiles it as such. The only requirement is that the initial value init as well as the second parameter and the result of the op function have the same type. In the first two examples, this type was int, but, in the last one, we used string.
The last line uses partial function application to construct a function that is used as the op parameter. The sprintf function takes a format string that specifies what arguments it expects. In the example above, the format string creates a function that expects an integer and a string, but it is not provided with the values of these arguments. The incomplete call can be wrapped in parentheses and used as an argument of aggregateNumbers. This is a very easy way of creating functions—the missing arguments will be provided when the op function is called inside aggregateNumbers.
In functional languages, a function can take other functions as parameters. This is exactly what the current example does. It is also possible to write functions that return a function as the result or even create a list of functions, and so on. Functions can simply appear in any place where you can use values of other types such as int. This makes the language very flexible and allows us to express complex ideas with a very small amount of code.
This overview explored one of the key concepts in functional programming, which is immutability. Immutability affects both variables and data structures. When creating a variable in F# using let, it cannot be changed later. (Therefore, it doesn't make sense to call it a "variable," and F# uses the term "value" instead.) Similarly, when declaring a field in a class, it also becomes immutable.
These two aspects change the way programs are written. When working with values, a newly calculated state is assigned to new values. When working with data structures, all operations return new states as the result. In this scenario, many need to be written using recursion. However, recursion (just like other programming patterns) can be elegantly hidden inside generally useful functions.
Immutability is one of several key functional concepts, but it is not the only one. The following articles discuss other concepts that form the functional programming style:
This overview focused on explaining what immutability is and how it manifests itself in functional programs. It didn't explain the benefits of using immutability and other functional concepts. This topic is covered in the following articles:
To download the code snippets shown in this article, go to http://code.msdn.microsoft.com/Chapter-1-Introducing-F-c967460d
This article is based on Real World Functional Programming: With Examples in F# and C#. Book chapters related to the content of this article are:
Book Chapter 2: “Core concepts in functional programming” explains how immutability is used in functional programming and how this changes the way we write code.
Book Chapter 3: “Meet tuples, lists and functions in F# and C#” gives examples of two fundamental immutable types used in functional programming—a tuple and a list.
Book Chapter 11: “Refactoring and testing functional programs” gives numerous examples of how functional programming simplifies refactoring and testing and enables more powerful refactorings.
The following MSDN documents are related to the topic of this article:
string (C# Reference) demonstrates how to use the C# string type, which is the most widely used immutable type in .NET.
Lists (F#) discusses immutable functional lists that are used in F#.
let Bindings (F#) explains how to use the F# let keyword.
Functions as First-Class Values (F#) contains more information about writing reusable code by using functions as arguments of other functions.
Automatic Generalization (F#) explains how F# automatically infers the most general generic type of functions.
Closing over the loop variable considered harmful (Eric Lippert's blog) explains the problem that can occur when capturing mutable variable in C#.