In previous posts, we had a simple problem to handle - find and sort the 25 most frequent words from a file. Then, we had to comply with different sets of constraints regarding the code.
This week, the constraint is to achieve the goal with the shortest code possible. Of course, this depends a lot on the programming language, and on the API available - whether baked-in or from a standard library.
This is the 3rd post in the Exercises in Programming Style focus series.Other posts include:
- Introducing Exercises in Programming Style
- Exercises in Programming Style, stacking things up
- Exercises in Programming Style, Kwisatz Haderach-style (this post)
- Exercises in Programming Style, recursion
- Exercises in Programming Style with higher-order functions
- Composing Exercises in Programming Style
- Exercises in Programming Style, back to Object-Oriented Programming
- Exercises in Programming Style: maps are objects too
- Exercises in Programming Style: Event-Driven Programming
- Exercises in Programming Style and the Event Bus
- Reflecting over Exercises in Programming Style
- Exercises in Aspect-Oriented Programming Style
- Exercises in Programming Style: FP & I/O
- Exercises in Relational Database Style
- Exercises in Programming Style: spreadsheets
- Exercises in Concurrent Programming Style
- Exercises in Programming Style: sharing data among threads
- Exercises in Programming Style with Hazelcast
- Exercises in MapReduce Style
- Conclusion of Exercises in Programming Style
The name of this post is a reference found in the Dune book by Frank Herbert, where Kwisatz Haderach means "The Shortening of the Way". |
For that, the usage of Kotlin’s stdlib is more than welcome. Without further ado, here’s the code:
fun run(filename: String) = (read(filename)
.flatMap { it.toLowerCase().split("\\W|_".toRegex()) }
.filter { it.isNotBlank() && it.length >= 2 }
- read("stop_words.txt").flatMap { it.split(",") })
.groupBy { it }
.map { it.key to it.value.size }
.sortedBy { it.second }
.takeLast(25)
.toMap()
Let’s analyze the previous snippet.
read(filename).flatMap { it.toLowerCase().split("\\W|_".toRegex()) }
-
Read the file’s content into a list of strings, one element per line. Then, transform each line into lower case, and split those lines along non-word tokens.
filter { it.isNotBlank() && it.length >= 2 }
-
Remove blank strings, as well as those that have a length of 1.
- read("stop_words.txt").flatMap { it.split(",") }
-
Read the stop words file, and dispatch its single-string comma-separated list into a list of individual stop words. Remove them from the list of words.
groupBy { it }
-
Group the words by individual word. The key is the word itself, the value the repeated list of the same word.
map { it.key to it.value.size }
-
Transform each entry into a pair, the first item being the key, the second item the list’s size.
I believe the rest of the code to be pretty self-explanatory.
There’s an alternative using fold()
, which can replace the groupBy()
and map()
function calls:
.fold(mutableMapOf<String, Int>()) { map, word ->
map.merge(word, 1) { existing, new -> existing + new }
map
}.toList()
Fold’s signature - also called reduce in some languages is:
fun <T, R> Iterable<T>.fold(initial: R, operation: (acc: R, T) -> R): R
It requires two parameters:
- An initial value of type
R
- A function that transforms:
- a
R
value - and the next item in the collection
- into another
R
- a
Here, R
is a mutable map.
The reason to prefer a mutable data structure over an immutable one is because of the merge()
method.
This allows to put values in a map, handling the case whether the key already exists or not, in a very concise way.
Conclusion
While previous chapters required to think in a different way, this one requires to know one’s language API and libraries. This is a requirement that I’ve observed in other areas domains: Functional Programming in general, every kind of Reactive Programming API, the Clojure language, etc. This is the shortest I could come with, but I welcome you to add your suggestions in the comments.