As a follow-up of point 4 of my previous article, here’s a first little cheatsheet on the Scala collections API. As in Java, knowing API is a big step in creating code that is more relevant, productive and maintainable. Collections play such an important part in Scala that knowing the collections API is a big step toward better Scala knowledge.
Type inference
In Scala, collections are typed, which means you have to be extra-careful with elements type. Fortunaltey, constructors and companion objects factory have the ability to infer the type by themselves (most of the type). For example:
scala>val countries = List("France", "Switzerland", "Germany", "Spain", "Italy", "Finland")
countries: List[java.lang.String] = List(France, Switzerland, Germany, Spain, Italy, Finland)
Now, the countries
value is of type List[String]
since all elements of the collections are String
.
As a corollary, if you don’t explicitly set the type if the collection is empty, you’ll have a collection typed with Nothing
.
scala>val empty = List()
empty: List[Nothing] = List()
scala> 1 :: empty
res0: List[Int] = List(1)
scala> "1" :: empty
res1: List[java.lang.String] = List(1)
Adding a new element to the empty list will return a new list, typed according to the added element. This is also the case if a element of another type is added to a typed-collection.
scala> 1 :: countries
res2: List[Any] = List(1, France, Switzerland, Germany, Spain, Italy, Finland)
Default immutability
In Functional Programming, state is banished in favor of "pure" functions.
Scala being both Object-Oriented and Functional in nature, it offers both mutable and immutable collections under the same name but under different packages:
scala.collection.mutable
and scala.collection.immutable
.
For example, Set
and Map
are found under both packages (interstingly enough, there’s a scala.collection.immutable.List
but a scala.collection.mutable.MutableList
).
By default, collections that are imported in scope are those that are immutable in nature, through the scala.Predef
companion object (which is imported implicitly).
The collections API
The heart of the matter lies in the API themselves.
Beyond expected methods also found in Java (like size()
and indexOf()
), Scala brings to the table a unique functional approach to collections.
Filtering and partitioning
Scala collections can be filtered so that they return:
- either a new collection that retain only elements that satisfy a predicate (
filter()
) - or those that do not (
filterNot()
)
Both take a function that takes the element as a parameter and return a boolean. The following example returns a collection which only retains countries whose name has more than 6 characters.
scala> countries.filter(_.length > 6)
res3: List[java.lang.String] = List(Switzerland, Germany, Finland)
Additionally, the same function type can be used to partition the original collection into a pair of two collections, one that satisfies the predicate and one that doesn’t.
scala> countries.partition(_.length > 6)
res4: (List[java.lang.String], List[java.lang.String]) = (List(Switzerland, Germany, Finland),List(France, Spain, Italy))
Taking, droping and splitting
- Taking a collection means returning a collection that keeps only the first
n
elements of the original onescala> countries.take(2) res5: List[java.lang.String] = List(France, Switzerland)
- Droping a collection consists of returning a collection that keeps all elements but the first
n
elements of the original one.scala> countries.drop(2) res6: List[java.lang.String] = List(Germany, Spain, Italy, Finland)
- Splitting a collection consists in returning a pair of two collections, the first one being the one before the specified index, the second one after.
scala> countries.splitAt(2) res7: (List[java.lang.String], List[java.lang.String]) = (List(France, Switzerland),List(Germany, Spain, Italy, Finland))
Scala also offers takeRight(Int)
and dropRight(Int)
variant methods that do the same but start with the end of the collection.
Additionally, there are takeWhile(f: A ⇒ Boolean)
and dropWhile(f: A ⇒ Boolean)
variant methods that respectively take and drop elements from the collection sequentially (starting from the left) while the predicate is satisfied.
Grouping
Scala collections elements can be grouped in key/value pairs according to a defined key. The following example groups countries by their name’s first character.
countries.groupBy(_(0))
res8: scala.collection.immutable.Map[Char,List[java.lang.String]] =\
Map(F -> List(France, Finland), S -> List(Switzerland, Spain), G -> List(Germany), I -> List(Italy))
Set algebra
Three methods are available in the set algebra domain:
- union (
:::
andunion()
) - difference (
diff()
) - intersection (
intersect()
)
Those are pretty self-explanatory.
Map
The map(f: A ⇒ B)
method returns a new collection, which length is the same as the original one, and whose elements have been applied a function.
For example, the following example returns a new collection whose names are reversed.
scala> countries.map(_.reverse)
res9: List[String] = List(ecnarF, dnalreztiwS, ynamreG, niapS, ylatI, dnalniF)
Folding
Folding is the operation of, starting from an initial value, applying a function to a tuple composed of an accumulator and the element under scrutiny.
Considering that, it can be used as the above map
if the accumulator is a collection, like so:
scala> countries.foldLeft(List[String]())((list, x) => x.reverse :: list)
res10: List[String] = List(dnalniF, ylatI, niapS, ynamreG, dnalreztiwS, ecnarF)
Alternatively, you can provide other types of accumulator, like a string, to get different results:
scala> countries.foldLeft("")((concat, x) => concat + x.reverse)
res11: java.lang.String = ecnarFdnalreztiwSynamreGniapSylatIdnalniF
Zipping
Zipping creates a list of pairs, from a list of single elements. There are two variants:
zipWithIndex()
forms the pair with the index of the element and the element itself, like so:scala> countries.zipWithIndex res12: List[(java.lang.String, Int)] = List((France,0), (Switzerland,1), (Germany,2), (Spain,3), (Italy,4), (Finland,5))
Zipping with index is very important when you want to use an iterator but still want to have a reference to the index. It keeps you from declaring a variable outside the iteration and incrementing the former inside the latter.
- Additionally, you can also zip two lists together:
scala> countries.zip(List("Paris", "Bern", "Berlin", "Madrid", "Rome", "Helsinki")) res13: List[(java.lang.String, java.lang.String)] = \ List((France,Paris), (Switzerland,Bern), (Germany,Berlin), (Spain,Madrid), (Italy,Rome), (Finland,Helsinki))
The original collections don’t need to have the same size. The returned collection’s size will be the min of the sizes of the two original collections.
The reverse operation is also available, in the form of the unzip()
method which returns two lists when provided with a list of pairs.
The unzip3()
does the same with a triple list.
Conclusion
I’ve written this article in the form of a simple fact-oriented cheat sheet, so you can use it as such. In the next months, I’ll try to add other such cheatsheets.