From Solr to elasticsearch
Search is right at the centre of GOV.UK. It’s the main focus of the homepage and it appears in the corner of every single page. Many of our recent and upcoming apps such as licence finder also rely heavily on search. So, making sure we have the right tool for the job is vital. Recently we decided to begin switching away from Solr to elasticsearch for our search server. Rob Young, a developer at GDS explains in some detail the basis for our decisions – the usual disclaimers about this being quite technical apply.
A little background
Both Solr and elasticsearch are Lucene based search servers that expose a HTTP interface. They both provide a lot of the same features and, in fact, both depend on a lot of the same code. So why on earth do we want to go through the effort of switching?
Controlling the index
One of the great features of elasticsearch is that it exposes all sorts of index management operations through the HTTP interface, such as creating, deleting or modifying the schema of an index. Solr does allow you to create a new index based on an existing one, but nothing more. This extra control is great for a few reasons:
- it is easy to experiment with
- it puts the control of the index firmly in the hands of the application, where it belongs
- it makes temporary indexes for integration testing or A/B/ testing possible
Elasticsearch also allows us to have rich, nested documents that better model our data. It may be difficult to visualise so take a look at the two JSON documents below.
The first shows the structure of a Solr document where we want to nest additional links, the second shows how we could model that in elasticsearch.
Just about the most important feature of any search engine is the ability to query it. Both Solr and elasticsearch expose their query APIs over HTTP but they do so in quite different ways. Solr queries are made up of two and three letter URL parameters, while elasticsearch queries are clear, self documenting JSON objects passed in the HTTP body.
Here is a curl command that can be run from the terminal to query a Solr index. It’s quite difficult to interpret, some of the fields can be worked out but ultimately you would have to resort to the documentation find out what they mean.
Compare that to the (mostly) equivalent query using elasticsearch.
It’s much more verbose, but it’s also much more obvious what is happening.
Not every aspect of elasticsearch is an improvement on Solr and this includes performance. Solr performs very well on small indexes that don’t change very often, which includes us. We set up a few very simple performance tests. Our goal wasn’t to get an accurate picture of production performance but rather to get an idea of the difference in performance. However, elasticsearch is more than fast enough, so it’s not a compelling reason to stick to Solr.
Another concern that was raised was regarding index stability and the risk of corruption. This is a serious concern. It is also a difficult one to refute as stability issues often only arise in specific circumstances, under consistently high load for an extended period of time. We spoke to 37Signals and Mozilla, amongst others and did not find anything that worried us.
Combined with all the other reasons, on the 25th of June we gave elasticsearch the thumbs up and now we’re using it in production.