Aug 4, 2019 / EXERCISE, PROGRAMMING, STYLE

Exercises in Programming Style: FP & I/O

In this week in Exercises in Programming Style’s post, we will explore one of the foundation of Functional Programming, I/O and how to cope with it.

This is the 13^th post in the Exercises in Programming Style focus series.Other posts include:

Purity, side-effects, and I/O

In FP, functions should be pure: a pure function’s return value must only depend on its input parameters. Once a function is pure, it also has referential transparency:

Referential transparency means that an expression can be replaced with its value. This requires that the expression is pure, that is to say the expression must be deterministic and side-effect free.

This works well in theory. As per the definition above, side-effects make functions "impure". Example of side effects include:

Reading the standard in
Reading from a file
Writing to the standard out
Writing to a file
etc.

When functions are pure, they can be composed. For example, here’s a sample of functions composed together:

Because of referential transparency, the previous design can be changed to the following design:

An equivalent sample of functions composed together

Let’s repeat it again. For that to work, every function involved needs to be pure: it should always return the same value when a specific parameter is passed, and be free of side-effects.

However, the real-world code is riddled with side-effects. For example, programs read parameters e.g. read from the command-line, do some transformation, and output the results e.g. write to the database. To cope with that, FP applied in day-to-day programming pushes I/O at the boundaries of the functions' "chain".

A real-world sample of functions composed together

Applying the theory

The original Python code reads from the standard input, and prints the results on the standard output. Compared to this, in all previous exercises, we already updated the design for automated testing purposes.

Function Description Pure

Function	Description	Pure
`extractWords()`	Read the words from a file
`removeStopWords()`	Read the stop words from a file and filter them out from the words list
`frequencies()`	Transform the words list into a frequencies map
`sorted()`	Transform the frequencies map into a list and sort the list by frequency
`top()`	Keep the top 25 words and transform the frequencies back into a list

extractWords()

Read the words from a file

removeStopWords()

Read the stop words from a file and filter them out from the words list

frequencies()

Transform the words list into a frequencies map

sorted()

Transform the frequencies map into a list and sort the list by frequency

top()

Keep the top 25 words and transform the frequencies back into a list

Functions composition applied to the word frequencies problem

One can improve the design by pushing the I/O further: the removeStopWords() function is impure because it reads stop words from a file, a strong side-effect. It can be broken into an impure extracting part, and a pure removing part.

Better functions composition applied to the word frequencies problem

Higher-Order functions: purity and laziness

Yet, functions that are at the boundaries are still impure. One straightforward way to achieve purity across the whole chain is to wrap every function… in a function. That’s what Higher-Order functions are for. When a function always returns the same function, it’s pure by definition.

Moreover, using higher-order functions allows for laziness: it’s very cheap to pass computations around. Executing such a computation can be deferred until absolutely necessary - or not all, if not.

For the exercise, let’s model a simple list of functions of type (Any) → Any. There should be two features available:

One to add a function at the end of the list
One to execute the functions' pipeline

class Quarantine(private val filename: String) {

  private var functions = listOf<(Any) -> Any>()

  fun bind(function: Function1<*, Any>): Quarantine {
    functions = functions + (function as (Any) -> Any)               (1)
    return this
  }

  fun execute(): Any {
    tailrec fun internalExecute(functions: List<Function1<*, Any>>, result: Any): Any =
      if (functions.isEmpty()) result                                (2)
      else {
        val function = functions.first()
        internalExecute(functions - function, (function as (Any) -> Any)(result))
      }
    return internalExecute(functions, filename)
  }
}

1	Do nothing but store the computation
2	Actually kickstart the whole computation pipeline using recursion

This class is generic enough to be relevant in different contexts. Functions themselves are pretty straightforward. Here’s one as an example:

fun frequencies(): (List<String>) -> Map<String, Int> = { words ->
  words.groupingBy { it }
       .eachCount()
}

Handling different function signatures

The above Quarantine class binds any function that accepts a single parameter and returns a value. With that design, it’s possible to add the extractWords() function, but not the extractStopWords() one:

fun extractWords(): (String) -> List<String> = {                         (1)
  read(it)
    .flatMap { it.toLowerCase().split("\\W|_".toRegex()) }
    .filter { it.isNotBlank() && it.length >= 2 }
}

fun extractStopWords(): () -> List<String> = { read("stop_words.txt") }  (2)

1	Accepts a parameter, and conforms to `(Any) → Any`
2	Doesn’t accept a parameter

Even worse, those functions shouldn’t be added one by one to the pipeline, because they are not meant to be called sequentially. Hence, they should be first "merged", and added as such to the beginning of the composition chain. This requires the following code:

class Quarantine(private val filename: String,
                         extractWords: (String) -> List<String>,
                     extractStopWords: () -> List<String>) {

  private var functions: List<(Any) -> Any>

  init {
    functions = arrayListOf({it: String ->                    (1)
      extractWords.invoke(it) to extractStopWords.invoke()
    } as (Any) -> Any)
  }
  // Same as above
}

1	Compose a function of type `(String) → List<String>` and a function of type `() → List<String>` into a function of type `(String) → Pair<List<String>, List<String>>`

Conclusion

Purity is one of the foundations of FP. Side-effects in general, and I/O in particular, breaks purity. A straightforward way to cope with I/O is to wrap functions into other functions to create higher-order functions. Even better, higher-order functions provide laziness, and all their associated benefits.

The complete source code for this post can be found on Github.

Follow me Follow me