Java has a vast ecosystem of APIs, not all of which are effective or easy to learn. Developing a good API is not trivial: misdesigning key elements, defining simple abstractions, and threading models are among the themes that must be addressed. The official Elasticsearch Java SDK is a project with a design effort that has been made to address these elements.
Recently, this project surprised me, and I tried to examine the design ideas that make it interesting and effective, while also having some inevitable trade-offs to be aware of.
Code generation and the single source of truth
The Elastic Java SDK is not totally handwritten: it is generated from a canonical API specification, developed in Typescript, but there are also handcrafted artifacts. The client generation pipeline produces the Java model classes, builders, serializers, and the top-level namespace methods from this specification. This generated approach explains the consistency of naming, shape, and coverage across hundreds of endpoints and multiple language clients. The handcrafted (by Elastic engineers) parts are:
- Transport integration with the Low Level REST Client (LLRC)
- Core infrastructure: authentication, TLS, retries, test harnesses, Continuous Integration, JSON mapper setup.
- API ergonomics: builder patterns, nullability conventions, naming.
- ADR-driven decisions: Handcrafted parts are backed by Architecture Decision Records, documenting why certain design choices were made.
The Builder Pattern Done (Mostly) Right
The Elasticsearch SDK leans heavily on the Builder Pattern: it’s expressed in a base ObjectBuilder<T>
interface with a single build()
method. You’re building search queries, index mappings, bulk operations, etc. Without builders, it’d be chaos.
The SDK provides builder-lambda overloads (e.g., () → ObjectBuilder<T>
) so nested objects can be constructed inline with type-safe closures; these overloads are visible in the public Javadocs as function-typed builder setters. Internally, the generated builders inherit from ObjectBuilderBase
, which enforces single-use via a _checkSingleUse() when a call to the build()
method is needed.
Once you call build()
, you’re done. Reusing the builder is considered unsafe because internal structures, especially collections, may be shared between the builder and the built object. Mutating one could silently corrupt the other. So they close the gate after construction with a clean contract: configure it once, build, and forget. Every request and response class follows this pattern.
SearchRequest request = SearchRequest.of(s -> s
.index("products")
.query(q -> q
.match(m -> m
.field("name")
.query("laptop")
)
)
);
At first glance, this resembles a lambda-heavy mess. But what it’s doing is chaining a set of typed, nested DSL constructs. The of(…)
is a static
shortcut that instantiates the builder, applies a configuration function, and finalizes the build. You’re writing declarative Java code that directly mirrors the Elasticsearch query DSL.
DSL-Style Lambdas That Actually Work
These lambdas provide an effective and IDE-friendly implementation, and make it easier to author deeply nested DSL structures while remaining strongly typed. Java isn’t a functional language: different idioms are supported, but its nature is that of an object-oriented language. The SDK still manages to create an idiomatic syntax using lambdas, which leads to an easy declaration of the developer’s intentions.
client.search(s -> s
.index("products")
.query(q -> q
.bool(b -> b
.must(m -> m
.match(t -> t
.field("description")
.query("wireless")
)
)
)
)
);
Each lambda represents a nested configuration step, with type-safe closures. You’re not passing in strings or magic maps. You’re composing a statically typed tree. That’s the win. It avoids a common anti-pattern: chaining .withX()
, .setY()
methods on dozens of mutable objects, with nulls lurking everywhere. Here, each level is scoped, focused, and immutable.
Tagged Unions and Type-Safe Variants
Let’s talk polymorphism. Elasticsearch queries aren’t flat structuresthey’re variant types. A Query
can be a MatchQuery
, BoolQuery
, RangeQuery
, etc. The SDK models this with a Tagged Union pattern.
A tagged union is a data structure that can hold values of different data types, but only one at a time. It’s similar to a regular union, but includes a tag (or discriminator) that shows which data type is currently stored. This tag allows type-safe access to the stored value, preventing accidental data misuse.
The client implements a generalized TaggedUnion
pattern for many of these variant domains (queries, aggregations, analyzers, etc.). A TaggedUnion
exposes the current kind
and the strongly typed value; the builders expose explicit methods for each variant, making discovery and correctness easier. This pattern trades a small amount of indirection for compiler-enforced exhaustiveness for better discoverability in the IDE.
Each such union implements:
public interface TaggedUnion<Tag extends Enum<?>, BaseType> {
Tag _kind();
BaseType _get();
}
You inspect the _kind()
to figure out what it is, then call _get()
and cast it safely:
Query query = Query.of(q -> q
.match(m -> m.field("title").query("elasticsearch"))
);
if (query._kind() == Query.Kind.Match) {
MatchQuery match = (MatchQuery) query._get();
// Work with match safely
}
This design solution allows reasoning with the union construct even without the syntactic support of the most recent versions of Java, while maintaining backward compatibility with those who already used Elastic products in previous installations. Starting with Java 16+, we have a taste of structural pattern matching that could be used in the future for an evolution of the SDK; for example, a first step could be:
switch (query._kind()) {
case Match -> {
MatchQuery match = (MatchQuery) query._get();
// use match
}
case Term -> {
TermQuery term = (TermQuery) query._get();
// use term
}
}
Modular by Design: The Namespace Client Pattern
Elasticsearch’s API is wide. It has search, index management, mapping, ingest pipelines, security, cluster health, and more endpoints. Cramming all of this into one class would be a nightmare.
Instead, the SDK scopes things by domain:
ElasticsearchClient client = new ElasticsearchClient(transport);
client.indices().create(c -> c.index("catalog"));
client.search(s -> s
.index("products")
.query(q -> q.match(m -> m.field("name").query("laptop")))
);
Each sub-DSL node (indices()
, search()
, etc.) exposes only the relevant operations for its context. This is an effective support for both IDEs and our brains. It maps directly to the REST API structure (/_search
, /_indices
, etc.), making it easier to reason what goes where. It also makes the SDK highly maintainable. Adding new API groups becomes simple: no giant, god-interfaces to refactor.
Targeting a full-featured Java network SDK that must support:
- per-connection configuration (auth, headers, timeouts)
- multiple concurrent clients
- good testability and DI
- an ergonomic developer experience
The namespaced instance client pattern is a rational trade-off. You face slightly more ceremony and many small classes, but supporting modularity under the hood while giving a single discoverable root object to users.
Immutability and Transport Abstraction
Request objects don’t change. Builders build once. You pass data, not behavior: the SDK is inherently thread-safe and predictable.
Transport concerns are separated from typed models: by default, the Java client delegates protocol handling to a RestClientTransport, which itself uses a low-level HTTP client (for example, the Apache HTTP client) to manage connections, pooling, retries, and node discovery. This separation enables the Java client to focus on typed request/response modeling and (de)serialization while the transport handles operational concerns. The transport is pluggable, allowing the client’s configuration to adapt to different HTTP stacks when needed.
ElasticsearchTransport transport = new RestClientTransport(restClient, new JacksonJsonpMapper());
ElasticsearchClient client = new ElasticsearchClient(transport);
This separation of concerns makes the SDK easier to test, extend, and debug.
Could It Be Better?
The presence of many immutable elements impacts memory usage. Therefore, it becomes necessary to understand how to avoid building too many objects that push the developer to consider advanced constructs of the product, such as bundles, data sharing, use of temporal entities, and/or paging the data themselves.
A measurable trade-off of immutability and single-use builders is short-lived object allocation. In most applications, this overhead is negligible; in very hot loops or high-throughput pipelines, you should benchmark and, if necessary, build reusable immutable fragments or tune bulk/batching strategies. Also consider the overhead of serialization: the client provides hooks to use different JsonpMapper
implementations (for example, Jackson-based mappers) if you need custom parsing or to send pre-serialized payloads.
Today, Java offers more constructs: records, structural pattern matching with sealed classes, and deconstruction of instanceof
; this would have also meant continuous refactoring of the SDK by chasing Java features over time.
This SDK strikes a balance between being long-lasting and developer-friendly without altering the knowledge of previous Elastic products.
Conclusion
What can you learn from this SDK? Here is a summary table of the main concepts that could be useful for your next project. If you’re building a client libraryor even just a public-facing APIyou could do worse than borrowing a few ideas from this one. No magic. Just good design… but beware: The Context is the King!
Pattern | What It Solves |
---|---|
Builder (Single-Use) |
Safer immutable construction, avoids shared mutable state |
Fluent Interface (Lambdas) |
Declarative, nested, type-safe request definitions |
Tagged Union Pattern |
Models Elasticsearch variant types safely and explicitly |
Namespace Client Pattern |
Logical API grouping aligns with the REST structure |
Transport Abstraction |
Swappable HTTP and serialization layers |
Functional Thinking |
Fewer side effects, fewer temporary variables, more declarative code |