Leveraging annotation macros to generate caching boilerplate in Scala

There are only two hard things in Computer Science: cache invalidation and naming things.

— Phil Karlton

Since functional programming has reached the world of Serious Real Projects, a bunch of new buzzwords started to creep into the language of software engineers. Some of them are: monads, tail recursion, immutability, purity or currying. One of them is also memoization. This term defines the ability to cache results of calls for pure functions which don’t have side effects and always return the same result for given arguments. It is not something special, but we must be aware that with functional paradigms affecting our codebases with bigger and bigger impact, pure functions are becoming a very common thing. Programming languages and platforms themselves don’t offer too many elegant ways to apply memoization to such functions, so let’s see what options do we have with external libraries.

Memoization is just a kind of caching and there are a lot of ways to cache stuff. I’m not going to dive into aspect-oriented caching with magic annotations offered by some frameworks. Such approach is of course powerful and useful in some contextes, but let us focus on some simpler cases, where we don’t want to pull a framework.

Manual approach

One pretty straightforward method is to use the decorator pattern and wrap our function with a decorator object which stores values and manages the hideous process of cache invalidation. This painful job can be delegated to Guava with its slick API. First, let’s take a look at an example of some expensive function that might need caching:

class GraphBuilder {

  def creatGraph(elementCount: Int): Graph = {
    someExpensiveCode()
  }

}

To do our “manual AOP” and “weave in” the aspect of caching, let’s extract a trait and create a decorator:


trait GraphBuilder {
 def createGraph(elementCount: Int): Graph
}

class ExpensiveGraphBuilder extends GraphBuilder {

  override def creatGraph(elementCount: Int): Graph = {
   someExpensiveCode()
  }
}

class CachedGraphBuilder(inner: GraphBuilder) extends GraphBuilder {

  val graphs = CacheBuilder.newBuilder()
    .maximumSize(10000)
    .expireAfter(10, TimeUnit.MINUTES)
    .build(
       new CacheLoader[Int, Graph]() {
          def load(elementCount: Int) {
            inner.createGraph(elementCount);
          }
       })

  override def createGraph(elementCount: Int): Graph = {
    // hits the cache or calls the inner builder
    graphs.get(elementCount)
   }
}

Then, when we wire our dependencies, we can instantiate the builder as


val graphBuilder = new CachedGraphBuilder(new ExpensiveGraphBuilder())

It’s a decent approach which keeps our code compilant with the open-closed principle (if you don’t count trait extraction). However, some functions are part of traits or objects and cannot be decorated in such way, also it’s annoying to write the same boilerprate over and over again, so if you find yourself tired of it, it’s time to use a framework try something leaner.

Generating boilerplate in compilation time with macro annotations

MacMemo is a simple library that I implemented as an experiment. It introduces an annotation which can be placed over a function definition. When the compiler runs, it parses the annotation and generates boilerplate around function body to instantiate Guava cache and use it. Long story short, the whole above code becomes:


import com.softwaremill.macmemo.memoize

class GraphBuilder {

  @memoize(maxSize = 20000, expiresAfter = 2 hours)
  def creatGraph(elementCount: Int): Graph = {
    someExpensiveCode()
  }
}

We get a very brief solution to ensure that our function results get cached with desired details of invalidation. The macro will generate all necessary boilerplate code and insert it directly into the createGraph() method, wrapping its real code with cache calls.

If you like MacMemo, please star it on GitHub🙂

Custom cache providers

What if you don’t want Guava? A clever recent contribution of Marcin Kubala extends MacMemo with possibility to define custom providers, so you can easily write your own extension and use memcached, EhCache or whatever you like.

You can achieve this by bringing appropriate  implicit MemoCacheBuilder instance into memoized class scope (the scope of the method definition, not it’s usage, e.g: companion object, implicit val within class or an explicit imports available in scope of class definition).

The MemoCacheBuilder trait has following definition:

case class MemoizeParams(maxSize: Long, expiresAfterMillis: Long, concurrencyLevel: Option[Int])

trait MemoCacheBuilder {
  def build[V <: Object](bucketId: String, params: MemoizeParams): Cache[V]
}

trait Cache[V] {
  def get(key: List[Any], computeValue: => V): V
}

MemoizeParams is a config class instantiated basing on parameters passed to the annotation. The buckedId argument can be used to identify unique key for your annotated method name + enclosing type path (the Guava-based implementation doesn’t use this key but you may find it handy in your implementation). Your MemoCacheBuilder should be responsible for creating an instance of Cache which can take a list of method arguments and return a cached result. This cache instance will be instantiated exactly once per each object of class with annotated method.

How exactly can you bring your builder into the right scope? Here’s an example of one simple way:

class ClassWithMemo {

  implicit val cacheProvider = new MyCustomCacheBuilder

  @memoize(2, 5 days)
  def someMethod(param: Int) = {
    someExpensiveCode()
  }
}

The builder can be as well represented as an object, so we won’t need the ‘new’ keyword here. For more, check Marcin’s examples on GitHub.

Last, but not the least, MacMemo allows to disable its caching globally with system property, so you can easily switch it off for test purposes.

Handling services that require explicit shutdown in Scala

tail -f development.log

While working on our projects at SoftwareMill we have recently started depending on a few services that require explicit closing. Until now they were closed in a shutdown hook we manually registered. That has started to become error-prone, so I have decided to introduce a simple mechanism for registering those services during their initialization in MacWire-based modules and having a single centralized shutdown handler closing them. In this post I briefly go through the experimental shutdownables API.

View original post 330 more words

A few hints about Scala sequences

This blog post is inspired by a cool talk “The Dark Side of Scala” given by Tomek Nurkiewicz on Scalar conference. I’m going to focus on one particular problem he mentioned – confusing Scala sequence types like Seq, IndexedSeq, Traversable, List, Vector and others.
As non-sequential types like Map or Set are pretty straighforward to use, let’s put them aside for now.
All the information gathered in this article are mostly a summary of different discussions I found on the Internet, especially these two threads:

http://stackoverflow.com/questions/11702798/scala-guidelines-on-return-type-when-prefer-seq-iterable-traversable
http://stackoverflow.com/questions/6928327/when-should-i-choose-vector-in-scala

My point is to put it all in a form of simple Q&A which can serve as a cheat sheet for most non-exotic cases.

Q: What type should my API accept as input?
Answer: As general as possible. In most cases this will be Traversable <- Seq <- List.
Explanation: We want our API consumers to be able to call our code without needing them to convert types. If our function takes a Traversable, the caller can put almost any type of collection. This is usually sufficient if we just map(), fold(), head(), tail(), drop(), find(), filter() or groupBy(). In case you want to use length(), make it a Seq. If you need to efficiently prepend with ::, use a List.

Q: What type should my API return?
Answer: As specific as possible. Typically you want to return a List, Vector, Stack or Queue.
Explanation: This will not put any constraints on your API consumers but will allow them to eventually process returned data in optimal way and worry less about conversions.

Q: Should I use List or Vector?
Answer: You most probably want a Vector, unless your algorithm can be expressed only using ::, head and tail, then use a List.
Explanation: Some people compare List vs Vector in Scala to LinkedList vs ArrayList in Java. This is partially OK, because:

  • Scala Vector is a collection with good random access (like java.util.ArrayList)
  • Scala List is indeed a linked list with very fast append-first operation (::), but the links are unidirectional (for bi-directional use DoubleLinkedList).

However, Scala Vector has a very effective iteration implementation and, comparing to List is faster in traversing linearly (which is weird?), so unless you are planning to do quite some appending, Vector is a better choice. Additionally, Lists don’t work well with parallel algorithms, it’s hard to break them down into pieces and put together in an efficient way.

Q: What about other traits like IndexedSeq, LinearSeq, GenTraversable, TraversableOnce, Iterable or IterableLike?
Answer
: In many cases you don’t need to refer specifically to these types.
Explanation: Most of these types reveal some additional information about underlying implementation which may be important if your code is really performance-critical. Iterable may be familiar from Java world and usable when you really need to use an iterator with state (which is not really a functional apporach). I encourage you to not dig into other types unless you are not satisfied with your current performance and want to squeeze out some more.

 

Easy suite tagging with ScalaTest 2.0

If you are using ScalaTest 1.x and you need to tag some tests to make them easily skippable, you have to tag each method separately:

class MySpec extends FlatSpec with ShouldMatchers {

  it should "pass this exercise" taggedAs SlowTest in {
    // ...
  }

  it should "pass another exercise" taggedAs SlowTest in {
    // ...
  }
}

This approach has two major flaws:

  • It’s easy to forget about your tag.
  • If you use BeforeAndAfterAll, the code in beforeAll() will execute anyway. It’s possible that in slow tests this code will initialize some infrastructure (for example bring up the database) which is exactly the thing we want to avoid when we tag our tests.

One solution is to use nested suites and keep the beforeAll() initialization in the master suite. This requires some additional plumbing if we want to execute suites individually. However, with ScalaTest 2.0, we have a better option, allowing us to keep things simple and flexible: suite tagging.

In short, you can now annotate a whole suite, and then run tests with your annotation in exclusion rules. This will eliminate the whole suite with its beforeAll(), afterAll() and any other blocks surrounding method invocations.

How to prepare a tag

Just create a simple Java annotation like following.

package tags;

@org.scalatest.TagAnnotation
@Retention(RUNTIME)
@Target({METHOD, TYPE})
public @interface RequiresDb {
}

Remember – it has to be plain Java annotation. If you try to use Scala techniques like extending StaticAnnotation, then this will probably not work.
Now you can annotate your suite:

@RequiresDb
class MySpec extends FlatSpec with Matchers {
  // ...
}

That’s it, execute tests using this SBT command:

test-only * -- -l tags.RequiresDb

and it should do the trick. Note: use full class name, here it was tags.RequiresDb, where “tags” is just the Java package name.
Moreover, you can use a cool technique from this blog post and replace this pretty ugly incantation with simple

local:test

If you want a full working example, check my GitHub for one.

Dynamic queries in Rogue

I’ve spent some time googling for information about how to build Rogue queries dynamically but surprisingly I couldn’t find any straightforward answer. This is why I created following short post describing how to do it.

An usual query may look like this:

UserRecord where (_.surname eqs "Gates") and (_.age eqs 20) limit(5) fetch()

What if we want to add the our criteria conditionally? My first attempt looked naively like this:

It turns out you can’t compile this code, because type of queryByName is lost. The map().getOrElse() expression returns one of two different types. What you need here is to add ‘query‘ invocation to your record:

That’s pretty much it, somehow it is not easy to find in any examples. The queryByName object can now be used as a base for elegant, dynamic and typesafe Rogue query.  Also, note that orderDesc() and limit() are as well pulled to the first line. Special thanks to Piotr Buda, who showed me his queries in Slick which inspired me to go this way.

Will Node and Scala really dry up?

There’s a quite significant rise of buzz around Google’s Go programming language. Some may think that it’s just another peak of excitement coming after a period of calm because the Scala + Akka / Node plateau is over. Derek Collison says:  “The management layers and infrastructure layers of the newer technologies that provide this cloud delivery model? Within two years, a majority will be written in Go.”. Finally, we have this very interesting talk by Paul Dix, entitled bravely “Why Node and Scala will dry up: Go will drink their milkshake”. I strongly recommend watching this video, which was actually the reason why I decided to write my blog post. As a big enthusiast of Scala I want to address some of Paul’s concerns, so let’s take a look at his “allegations” (but remember to watch the talk first!):

Node.js

I don’t have experience with writing backend using Node, but I code in JS and Paul’s arguments look pretty convincing to me. The performance vs coding in javascript tradeoff just doesn’t appeal to me for most cases. If I was about to write a super-performant mission-critical system then maybe I’d take a chance to investigate it deeper. Since then, EOT.

Dependencies (language / library versions)

Paul mentions that this is the thing that hurts him most in Scala. Well, I can agree that it is a bit painful and dealing with compatibility takes time and effort, sometimes we just got lead into a dead end. On the other side we have another extreme of Java where high focus on backward compatibility results in really crappy debt and yet it still has lots of quirks when it comes to deal with versions. Maybe it’s wishful thinking but I suppose that aggressive strategy of version incompatibility in Scala will pay off in long term. Early adopters have to pay the price but I hope that reward will be worthy.

No centralized home for libraries

That was never a big deal for me, neither with Scala + SBT nor with Java + Maven. The notion of repository is a decent standard and looking up the right repo to get our libraries is usually very easy. I had a few struggles with finding location of some exotic libs in the Internet but it never took much time.

Option sucks

I guess that this part of video was the moment when I decided to write my blog post😉 Paul states that using Option requires vague and verbose handling which is pretty much comparable to handling nulls. Well, the main difference is that Option makes implicit explicit and brings null handling to a much safer level. Of course, you can always call .get() and hit a NPE but the difference is that you explicitly ignore a warning. It should be obvious since Tony Hoare explained why he calls null reference “the billion dollar mistake”. As for using map to safely work on wrapped values: it’s not Scala’s idiomatic way. Take Maybe in Haskell. However, I can agree that it may be more elegant to deal with in terms of style. Cedric shows a cool approach used in Fantom in this blog post.

Method invocation without dots or parens needs to die in a fire

Okay, I can agree with that🙂 There are some exceptions though, like test frameworks that give a neat DSLs using this capability, but in typical codebase it’s just confusing, especially if you browse code written by different people.

Pattern matching

As for using pattern matching with Option, I already pointed out that it may look a bit clumsy. However, using pattern matching with extractors and case classes has much more power than using if/else. Some neat examples can be found on Martin Odersky’s course at Coursera.

IO

A very good point. I was also disappointed that such a mature language as Scala requires to browse many resources over the Internet to find a decent third-party library for filesystem manipulation or network operations.

Too many concurrency options

My experience here is still weak but as all the community buzz around new Akka that I observe (especially on Twitter) always made me think that there are no other options for most cases. However, feel welcome to comment if I’m wrong🙂

Tuples are an abomination

I agree. I always feel uncomfortable when working with tuples, the whole _.1, _.2 stuff is just hard to read in many cases. It’s often a feeling of hitting a wall when I read concise, elegant Scala code and run into a _.1. What was that? Have to go back and check again. And then bang! _.2. What was that? The whole flow of pleasant code reading gets disturbed.

Language footprint is massive

Unfortunately yes. I often get a feeling that Scala is overwhelming with too much stuff to learn, know and follow. Martin Odersky even proposed to split Scala into different language levels to make it easier but it started a controversial discussion (Sigh… I can’t find the source. Was it on Stack Overflow? Please post a comment if you know it).

Go

I won’t comment too much on this language, because all I know now is stuff I learned from Paul’s talk and another cool introductory talk by Konrad given recently in SoftwareMill (in Polish). Just a few first impressions:

  • Controversial approach to versioning and dependencies.
  • Statements are not expressions. I got really used to expressions, imperative code looks a little stiff to me now🙂
  • Goroutines and channels: +1, quite interesting concepts
  • Garbage collection: may be risky if it’s really too simple?
  • Compilation to native code: +1 for deployment simplicity. Cool for micro services which are actually a pretty strong trend recently.
  • No inheritance: very good!
  • Syntax: a bit too C-ish

Nature abhors a vacuum

We may observe an interesting progress of adaptation of Go language. Let’s see what future brings and never stop learning.

Using Swagger with Scalatra

If you’re considering Scalatra for your web services, you probably should check out Swagger as a support library. One of its coolest features is possibility to automatically create interactive API docs which you can open in your browser and exercise. The official documentation shows a way to describe your services but it seems that new version offers better way to do that, with safe type references. Let’s take a look at an example to make it clearer (full working app is available on GitHub). This short tutorial does not focus on Scalatra itself and I assume that you have some knowledge on how to configure and use it. We are going to explore the subject of integration with Swagger. Okay, fine, here we go.

Project setup

Our example project is prepared for Scala 2.10 with Scalatra 2.2. Assuming we have a working web app build configuration, we now need to add new dependencies:

“org.scalatra” %% “scalatra-swagger” % “2.2.0”
“com.wordnik” % “swagger-core_2.10.0” % “1.2.0”

Now we have to define a main servlet for Swagger. This servlet will provide our documentation as a service. All other servlet that need to be exposed are going to use this one:

Exposing the browser app

To make your documentation available for exploration and execution via browser, add api-docs web resources to the application. They contain some javascript calling the Swagger servlet and styles, which you can plasy with to customize look and feel of your documentation.

Defining API

Now we can document a simple service. Let’s say we have a servlet consuming simple GET requests with optional query parameter:

It parses a query paramter named “type” and returns an object defined by simple case class ExampleItemList which can be easily transformed to JSON. Now what is this additional parameter defined as operation()? It’s a special definition of API operation defined in additional trait (extending SwaggerSupport):

It’s cool that we can use strong Scala typing to define both parameter and response types. Swagger will manage to expand our custom type (ExampleItemList) and prepare a nice documentation (with an example!).

If you browse http://localhost:8080/api-docs/ now, you can see example endpoint definition:

swagger-api-01

Below, you can fill out a form and send a test request. If you describe a POST and your request body is also going to be represented as a case class, you are given a template to fill, neat!

swagger-api-02

Here’s all the code that you need to make Swagger generate above form for you:

Shortcomings

Swagger has some problems with non-simple class serialization. For example, it cannot handle org.joda.time.DateTime. If you get some crazy exceptions during deployment with vague message, this may be the case.

I hope this short post clarified a bit how can you use Swagger to generate your ‘living documentation’ and utility forms to test services. Hopefully the official Scalatra / Swagger docs get more organized and up-to-date soon.
Special thanks to Michał Ostruszka for discovering all this stuff and sharing it with me!