Sameer Siruguri

My Blog

Apache Solr vs Elastic Search: Another Take

I asked a co-worker to help me understand the differences between the two systems/appilcations, and here’s what he said (with PII removed). I am publishing it here because I think it’s a great narration:

My background using Lucene, Solr, and ElasticSearch: a few years ago, I was asked to participate in a project as the architect, lead developer, and manufacturing SME.  Let’s call it KTRP.

KTRP was purposed for many functions/capabilities; one being the aggregation, indexing, search, and other NLP capabilities against product engineering documents, tech specifications, and support documentation. I chose Lucene as the index/search engine. We then designed and added a method of distributing separate instances of Lucene, sending each instance the docs to index, sending queries to each instance, and aggregating and returning the results for a client to make sense of, etc. A ton of support code …

Solr at that time was more about the creation of a web-site crawler, and related/subsequent index and query capabilities. Solr matured during the build out of the KTRP project, but never to the point of being as flexible as we really needed and had already addressed within our own design. Solr did indeed morph into a framework for a distributed Lucene (today), but the design is heavily reliant and based on Hadoop components, e.g., Zookeeper, and the complexity level is moderate to high.

As KTRP came to an end (the govt body that was behind it pulled funding), I became aware of ElasticSearch (2011). One can read the text on motivation (of ElasticSearch) and to me, it reads like my journal of many of the issues that we faced and addressed (or kicked down the road; e.g., sharding), but the solutions were designed in rather than bolted on or added as a supporting component. ElasticSearch requires a minimum of support code …

Solr feels like 2005 while ElasticSearch feels like now and the future; e.g., JSON based API rather than SOAP and or parameterized URI only, Node.js server, and Javascript admin interface. And ElasticSearch supports sharding directly and simply – simple wtout being simplistic; also very easy to proxy.

In the end, all that matters is indexing and querying the index. Both are based on the excellent and battle tested Lucene engine. ElasticSearch is dynamic as to fields to index and one can set up custom mappings for all sorts of indices and optimize the query capability – this is the basis for ElasticSearch and the previous capabilities mentioned are icing on the cake. Solr is kinda “meh,” in that “I can do that in Solr, but it’s probably going to hurt …”

IMHO, ElasticSearch is well designed to address the issues one faces in building a non-trivial and scalable index/search solution. Solr is a solid solution too, but is an evolutionary result, rather like a Platypus …

Jim can demo some of the real time capabilities we’ll have wt ElasticSearch and I can show the plumbing and Automation queue and pipe interfaces …

 

Single Post Navigation

Leave a Reply

Your email address will not be published. Required fields are marked *