So far, we solved the word frequencies problem with code alone. This week, we will solve it with the help of infrastructure. And with our specific problem at hand, what better fit than a relational database?
This is the 14th post in the Exercises in Programming Style focus series.Other posts include:
- Introducing Exercises in Programming Style
- Exercises in Programming Style, stacking things up
- Exercises in Programming Style, Kwisatz Haderach-style
- Exercises in Programming Style, recursion
- Exercises in Programming Style with higher-order functions
- Composing Exercises in Programming Style
- Exercises in Programming Style, back to Object-Oriented Programming
- Exercises in Programming Style: maps are objects too
- Exercises in Programming Style: Event-Driven Programming
- Exercises in Programming Style and the Event Bus
- Reflecting over Exercises in Programming Style
- Exercises in Aspect-Oriented Programming Style
- Exercises in Programming Style: FP & I/O
- Exercises in Relational Database Style (this post)
- Exercises in Programming Style: spreadsheets
- Exercises in Concurrent Programming Style
- Exercises in Programming Style: sharing data among threads
- Exercises in Programming Style with Hazelcast
- Exercises in MapReduce Style
- Conclusion of Exercises in Programming Style
We can approach the problem by loading the data read from the files in the database, and retrieving the top 25 words with the relevant query. In SQL, this query would simply translate into the following:
SELECT value, COUNT(*) as count
FROM words
GROUP BY value
ORDER BY count DESC
LIMIT 25;
Both Python or Kotlin offer low-level database access API natively. However, none of them are very interesting, as it’s just connecting to the database and then sending SQL strings for queries. Instead, let’s take advantage of the occasion to use Exposed, the Kotlin SQL Framework:
Exposed is a prototype for a lightweight SQL library written over JDBC driver for Kotlin language. It does have two layers of database access: typesafe SQL wrapping DSL and lightweight data access objects.
Querying
Here’s a simplified diagram of Exposed’s querying feature:
This is how it can be used:
object Words : Table() {
val id = long("id").primaryKey().autoIncrement()
val docId = long("docId")
val value = varchar("value", 50)
}
Words.slice(Words.value, Words.value.count()).selectAll()
.groupBy(Words.value)
.orderBy(Words.id.count(), SortOrder.DESC)
.limit(25)
.map {
it[Words.value] to it[Words.value.count()]
}.toMap()
DDL, connections and transactions
Of course, Exposed doesn’t only provide querying capabilities, but also a lot of other features, such as connecting to the database. Here’s a diagram of those used in the code:
Here’s a sample of provided features, with code samples:
- Table creation
-
object Words : Table() { (1) val id = long("id").primaryKey().autoIncrement() val docId = long("docId") val value = varchar("value", 50) } SchemaUtils.create(Words) (2)
1 Define a table by creating a singleton extending the provided Table
class2 Create the table in the database - Database connection
-
Database.connect( "jdbc:h2:mem:test", (1) "org.h2.Driver" (2) )
1 JDBC URL 2 JDBC drive class name. The JAR containing the driver must be accessible - Transactions handling
-
transaction { createDbSchema() loadFileIntoDatabase(filename) query() }
Every code snippet defined in the
transaction
block will be executed inside… a transaction. By default, the transaction is stored - and retrieved - from the thread local.
Conclusion
Loading text files in the database to execute queries is far from optimal, to say the least. Actually, that’s a very bad use-case for it.
However, in most cases, letting the database do all the work is the way to go: be sure to explore all capabilities of your datastore instead of doing the work on the memory side through the API provided by the language.