Among the many features available in Java 8, streams seem to be one of the biggest game changers regarding the way to write Java code.
Usage is quite straightforward: the stream is created from a collection (or from a static method of an utility class), it’s processed using one or many of the available stream methods, and the collected back into a collection or an object.
One generally uses one of the static method that the Collectors
utility class offers:
Collectors.toList()
Collectors.toSet()
Collectors.toMap()
- etc.
Sometimes, however, there’s a need for more. The goal of this post is to describe how to achieve that.
The Collector
interface
Every one of the above static methods returns a Collector
.
But what is a Collector
? The following is a simplified diagram:
Interface | From the JavaDocs |
---|---|
|
Represents a supplier of results. There is no requirement that a new or distinct result be returned each time the supplier is invoked. |
|
Represents an operation that accepts two input arguments and returns no result. Unlike most other functional interfaces, |
|
Represents a function that accepts one argument and produces a result. |
|
Represents an operation upon two operands of the same type, producing a result of the same type as the operands.
This is a specialization of |
The documentation of each dependent interface doesn’t tell much, apart from the obvious.
Looking at the Collector
documentation yields a little more:
A
Collector
is specified by four functions that work together to accumulate entries into a mutable result container, and optionally perform a final transform on the result. They are:
- creation of a new result container (
supplier()
)- incorporating a new data element into a result container (
accumulator()
)- combining two result containers into one (
combiner()
)- performing an optional final transform on the container (
finisher()
)
The Stream.collect()
method
The real insight comes from the Stream.collect()
method documentation:
Performs a mutable reduction operation on the elements of this stream.
A mutable reduction is one in which the reduced value is a mutable result container, such as an ArrayList
, and elements are incorporated by updating the state of the result rather than by replacing the result.
This produces a result equivalent to:
R result = supplier.get(); for (T element : this stream) accumulator.accept(result, element); return result;
Note the combiner()
method is not used - it is only used within parallel streams, and for simplification purpose, will be set aside for the rest of this post.
Examples
Let’s have some examples to demo the development of custom collectors.
Single-value example
To start, let’s compute the size of a collection using a collector. Though not very useful, it’s a good introduction. Here are the requirements for the 4 interfaces:
- Since the end result should be an integer, the supplier should probably also return some kind of integer.
The problem is that neither
int
norInteger
are mutable, and this is required for the next step. A good candidate type would beMutableInt
from Apache Commons Lang. - The accumulator should only increment the
MutableInt
, whatever the element in the collection is. - Finally, the finisher just returns the
int
value wrapped by theMutableInt
.
Source is available on Github.
Grouping example
The second example shall be more useful. From a collection of strings, let’s create a Apache Commons Lang multi-valued map:
- The key should be a
char
- The corresponding values should be the strings that start with this
char
- The supplier is pretty straightforward, it returns a
MultiValuedMap
instance - The accumulator just calls the
put
method from the multi-valued map, using the above "specs" - The finisher returns the map itself
- The supplier is pretty straightforward, it returns a
Source is available on Github.
Partitioning example
The third example matches a use-case I encountered this week: given a collection and a predicate, dispatch elements that match into a collection and elements that do not into another.
- As the supplier returns a single instance, a new data structure e.g.
DoubleList
should first be designed - The accumulator must be initialized with the predicate, so that the
accept()
contract method signature is the same. - As for the above example, the finisher should return the
DoubleList
itself
Source is available on Github.
Final consideration
Developing a custom collector is not that hard, provided one understands the basic concepts behind it.
The real issue behind collectors is the whole Stream API. Streams need to be created first and then collected afterwards. Newer languages, with Functional Programming paradigm designed from the start - such as Scala or Kotlin, provide collections with such capabilities directly backed-in.
For example, to filter out something from a map in Java:
map.entrySet().stream()
.filter( entry -> entry.getKey().length() == 4)
.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
That would translate as the following in Kotlin:
map.entries.filter { it.key.length == 4 }