Aug 11, 2019 / EXERCISE, PROGRAMMING, STYLE, RDBMS

Exercises in Relational Database Style

So far, we solved the word frequencies problem with code alone. This week, we will solve it with the help of infrastructure. And with our specific problem at hand, what better fit than a relational database?

This is the 14^th post in the Exercises in Programming Style focus series.Other posts include:

We can approach the problem by loading the data read from the files in the database, and retrieving the top 25 words with the relevant query. In SQL, this query would simply translate into the following:

SELECT value, COUNT(*) as count
  FROM words
  GROUP BY value
  ORDER BY count DESC
  LIMIT 25;

Both Python or Kotlin offer low-level database access API natively. However, none of them are very interesting, as it’s just connecting to the database and then sending SQL strings for queries. Instead, let’s take advantage of the occasion to use Exposed, the Kotlin SQL Framework:

Exposed is a prototype for a lightweight SQL library written over JDBC driver for Kotlin language. It does have two layers of database access: typesafe SQL wrapping DSL and lightweight data access objects.

Querying

Here’s a simplified diagram of Exposed’s querying feature:

This is how it can be used:

object Words : Table() {
  val id = long("id").primaryKey().autoIncrement()
  val docId = long("docId")
  val value = varchar("value", 50)
}

Words.slice(Words.value, Words.value.count()).selectAll()
  .groupBy(Words.value)
  .orderBy(Words.id.count(), SortOrder.DESC)
  .limit(25)
  .map {
    it[Words.value] to it[Words.value.count()]
  }.toMap()

DDL, connections and transactions

Of course, Exposed doesn’t only provide querying capabilities, but also a lot of other features, such as connecting to the database. Here’s a diagram of those used in the code:

Here’s a sample of provided features, with code samples:

Table creation

object Words : Table() {                              (1)
  val id = long("id").primaryKey().autoIncrement()
  val docId = long("docId")
  val value = varchar("value", 50)
}

SchemaUtils.create(Words)                             (2)

1	Define a table by creating a singleton extending the provided `Table` class
2	Create the table in the database

Database connection

Database.connect(
  "jdbc:h2:mem:test",            (1)
  "org.h2.Driver"                (2)
)

1	JDBC URL
2	JDBC drive class name. The JAR containing the driver must be accessible

Transactions handling

transaction {
  createDbSchema()
  loadFileIntoDatabase(filename)
  query()
}

Every code snippet defined in the transaction block will be executed inside… a transaction. By default, the transaction is stored - and retrieved - from the thread local.

Conclusion

Loading text files in the database to execute queries is far from optimal, to say the least. Actually, that’s a very bad use-case for it.

However, in most cases, letting the database do all the work is the way to go: be sure to explore all capabilities of your datastore instead of doing the work on the memory side through the API provided by the language.

The complete source code for this post can be found on Github.

To go further:

Follow me Follow me

Exercises in Relational Database Style

Querying

DDL, connections and transactions

Conclusion

To go further:

Exercises in Programming Style: spreadsheets

Exercises in Programming Style: FP & I/O