In this week in Exercises in Programming Style’s post, we will explore one of the foundation of Functional Programming, I/O and how to cope with it.
This is the 13th post in the Exercises in Programming Style focus series.Other posts include:
- Introducing Exercises in Programming Style
- Exercises in Programming Style, stacking things up
- Exercises in Programming Style, Kwisatz Haderach-style
- Exercises in Programming Style, recursion
- Exercises in Programming Style with higher-order functions
- Composing Exercises in Programming Style
- Exercises in Programming Style, back to Object-Oriented Programming
- Exercises in Programming Style: maps are objects too
- Exercises in Programming Style: Event-Driven Programming
- Exercises in Programming Style and the Event Bus
- Reflecting over Exercises in Programming Style
- Exercises in Aspect-Oriented Programming Style
- Exercises in Programming Style: FP & I/O (this post)
- Exercises in Relational Database Style
- Exercises in Programming Style: spreadsheets
- Exercises in Concurrent Programming Style
- Exercises in Programming Style: sharing data among threads
- Exercises in Programming Style with Hazelcast
- Exercises in MapReduce Style
- Conclusion of Exercises in Programming Style
Purity, side-effects, and I/O
In FP, functions should be pure: a pure function’s return value must only depend on its input parameters. Once a function is pure, it also has referential transparency:
Referential transparency means that an expression can be replaced with its value. This requires that the expression is pure, that is to say the expression must be deterministic and side-effect free.
This works well in theory. As per the definition above, side-effects make functions "impure". Example of side effects include:
- Reading the standard in
- Reading from a file
- Writing to the standard out
- Writing to a file
- etc.
When functions are pure, they can be composed. For example, here’s a sample of functions composed together:
Because of referential transparency, the previous design can be changed to the following design:
Let’s repeat it again. For that to work, every function involved needs to be pure: it should always return the same value when a specific parameter is passed, and be free of side-effects.
However, the real-world code is riddled with side-effects. For example, programs read parameters e.g. read from the command-line, do some transformation, and output the results e.g. write to the database. To cope with that, FP applied in day-to-day programming pushes I/O at the boundaries of the functions' "chain".
Applying the theory
The original Python code reads from the standard input, and prints the results on the standard output. Compared to this, in all previous exercises, we already updated the design for automated testing purposes.
Function | Description | Pure |
---|---|---|
|
Read the words from a file |
|
|
Read the stop words from a file and filter them out from the words list |
|
|
Transform the words list into a frequencies map |
|
|
Transform the frequencies map into a list and sort the list by frequency |
|
|
Keep the top 25 words and transform the frequencies back into a list |
|
One can improve the design by pushing the I/O further:
the removeStopWords()
function is impure because it reads stop words from a file, a strong side-effect.
It can be broken into an impure extracting part, and a pure removing part.
Higher-Order functions: purity and laziness
Yet, functions that are at the boundaries are still impure. One straightforward way to achieve purity across the whole chain is to wrap every function… in a function. That’s what Higher-Order functions are for. When a function always returns the same function, it’s pure by definition.
Moreover, using higher-order functions allows for laziness: it’s very cheap to pass computations around. Executing such a computation can be deferred until absolutely necessary - or not all, if not.
For the exercise, let’s model a simple list of functions of type (Any) → Any
.
There should be two features available:
- One to add a function at the end of the list
- One to execute the functions' pipeline
class Quarantine(private val filename: String) {
private var functions = listOf<(Any) -> Any>()
fun bind(function: Function1<*, Any>): Quarantine {
functions = functions + (function as (Any) -> Any) (1)
return this
}
fun execute(): Any {
tailrec fun internalExecute(functions: List<Function1<*, Any>>, result: Any): Any =
if (functions.isEmpty()) result (2)
else {
val function = functions.first()
internalExecute(functions - function, (function as (Any) -> Any)(result))
}
return internalExecute(functions, filename)
}
}
1 | Do nothing but store the computation |
2 | Actually kickstart the whole computation pipeline using recursion |
This class is generic enough to be relevant in different contexts. Functions themselves are pretty straightforward. Here’s one as an example:
fun frequencies(): (List<String>) -> Map<String, Int> = { words ->
words.groupingBy { it }
.eachCount()
}
Handling different function signatures
The above Quarantine
class binds any function that accepts a single parameter and returns a value.
With that design, it’s possible to add the extractWords()
function, but not the extractStopWords()
one:
fun extractWords(): (String) -> List<String> = { (1)
read(it)
.flatMap { it.toLowerCase().split("\\W|_".toRegex()) }
.filter { it.isNotBlank() && it.length >= 2 }
}
fun extractStopWords(): () -> List<String> = { read("stop_words.txt") } (2)
1 | Accepts a parameter, and conforms to (Any) → Any |
2 | Doesn’t accept a parameter |
Even worse, those functions shouldn’t be added one by one to the pipeline, because they are not meant to be called sequentially. Hence, they should be first "merged", and added as such to the beginning of the composition chain. This requires the following code:
class Quarantine(private val filename: String,
extractWords: (String) -> List<String>,
extractStopWords: () -> List<String>) {
private var functions: List<(Any) -> Any>
init {
functions = arrayListOf({it: String -> (1)
extractWords.invoke(it) to extractStopWords.invoke()
} as (Any) -> Any)
}
// Same as above
}
1 | Compose a function of type (String) → List<String> and a function of type () → List<String> into a function of type (String) → Pair<List<String>, List<String>> |
Conclusion
Purity is one of the foundations of FP. Side-effects in general, and I/O in particular, breaks purity. A straightforward way to cope with I/O is to wrap functions into other functions to create higher-order functions. Even better, higher-order functions provide laziness, and all their associated benefits.