Understanding ElasticSearch analyzers

Sadly, lots of early Internet beer recipes aren’t necessarily in an easily digestible format; that is, these recipes are unstructured intermixed lists of directions and ingredients often originally composed in an email or forum post.

So while it’s hard to easily put these recipes into traditional data stores (ostensibly for easier searching), they’re perfect for ElasticSearch in their current form.

Accordingly, imagine an ElasticSearch index full of beer recipes, since…well…I enjoy making beer (and drinking it too).

First, I’ll add some beer recipes into ElasticSearch using Node’s ElasticSearch Client(note that the code is CoffeeScript though). I’ll be adding these beer recipes into a beer_recipes index like so:

Adding a beer recipe
<span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<code class='javascript'><span class='line'><span class="nx">beer_1</span> <span class="o">=</span> <span class="p">{</span>
</span><span class='line'>  <span class="nx">name</span><span class="o">:</span> <span class="s2">"Todd Enders' Witbier"</span><span class="p">,</span>
</span><span class='line'>  <span class="nx">style</span><span class="o">:</span> <span class="s2">"wit, Belgian ale, wheat beer"</span><span class="p">,</span>
</span><span class='line'>  <span class="nx">ingredients</span><span class="o">:</span> <span class="s2">"4.0 lbs Belgian pils malt, 4.0 lbs raw soft red winter wheat, 0.5 lbs rolled oats, 0.75 oz coriander, freshly ground Zest from two table oranges and two lemons, 1.0 oz 3.1% AA Saaz, 3/4 corn sugar for priming, Hoegaarden strain yeast"</span>
</span><span class='line'><span class="p">}</span>
</span><span class='line'>
</span><span class='line'><span class="nx">client</span><span class="p">.</span><span class="nx">index</span><span class="p">(</span><span class="s1">'beer_recipes'</span><span class="p">,</span> <span class="s1">'beer'</span><span class="p">,</span> <span class="nx">beer_1</span><span class="p">).</span><span class="nx">on</span><span class="p">(</span><span class="s1">'data'</span><span class="p">,</span> <span class="p">(</span><span class="nx">data</span><span class="p">)</span> <span class="o">-></span>
</span><span class='line'>  <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">data</span><span class="p">)</span>
</span><span class='line'><span class="p">).</span><span class="nx">exec</span><span class="p">()</span>
</span></code>

Note how the interesting part of a recipe JSON document, dubbed beer_1 is found in the ingredients field. This field is basically a big string of valuable text (you can imagine how this string was essentially the body of an email). So while the ingredients field is unstructured, it’s something clearly that people will want to search on.

I will add one more recipe for good measure:

Adding a second beer recipe
<span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<code class='javascript'><span class='line'><span class="nx">beer_2</span> <span class="o">=</span> <span class="p">{</span>
</span><span class='line'>  <span class="nx">name</span><span class="o">:</span> <span class="s2">"Wit"</span><span class="p">,</span>
</span><span class='line'>  <span class="nx">style</span><span class="o">:</span> <span class="s2">"wit, Belgian ale, wheat beer"</span><span class="p">,</span>
</span><span class='line'>  <span class="nx">ingredients</span><span class="o">:</span> <span class="s2">"4 lbs DeWulf-Cosyns 'Pils' malt, 3 lbs brewers' flaked wheat (inauthentic; will try raw wheat nest time), 6 oz rolled oats, 1 oz Saaz hops (3.3% AA), 0.75 oz bitter (Curacao) orange peel quarters (dried), 1 oz sweet orange peel (dried), 0.75 oz coriander (cracked), 0.75 oz anise seed, one small pinch cumin, 0.75 cup corn sugar (priming), 10 ml 88% food-grade lactic acid (at bottling), BrewTek 'Belgian Wheat' yeast"</span>
</span><span class='line'><span class="p">}</span>
</span><span class='line'>
</span><span class='line'><span class="nx">client</span><span class="p">.</span><span class="nx">index</span><span class="p">(</span><span class="s1">'beer_recipes'</span><span class="p">,</span> <span class="s1">'beer'</span><span class="p">,</span> <span class="nx">beer_2</span><span class="p">).</span><span class="nx">on</span><span class="p">(</span><span class="s1">'data'</span><span class="p">,</span> <span class="p">(</span><span class="nx">data</span><span class="p">)</span> <span class="o">-></span>
</span><span class='line'>  <span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">data</span><span class="p">)</span>
</span><span class='line'><span class="p">).</span><span class="nx">exec</span><span class="p">()</span>
</span></code>

It’s a hot summers day and I’m thinking I’d like to make a beer with lemon as an ingredient (to be clear: I want to use lemon zest, which is obtained from a lemon peel). So naturally, I need to find (i.e. search for) a recipe with lemons in it.

Consequently, I’ll search my index for recipes that contain the word “lemon” like so:

Searching for lemon
<span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<code class='javascript'><span class='line'><span class="nx">query</span> <span class="o">=</span> <span class="p">{</span> <span class="s2">"query"</span> <span class="o">:</span> <span class="p">{</span> <span class="s2">"term"</span> <span class="o">:</span> <span class="p">{</span> <span class="s2">"ingredients"</span> <span class="o">:</span> <span class="s2">"lemon"</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span>
</span><span class='line'>
</span><span class='line'><span class="nx">client</span><span class="p">.</span><span class="nx">search</span><span class="p">(</span><span class="s1">'beer_recipes'</span><span class="p">,</span> <span class="s1">'beer'</span><span class="p">,</span> <span class="nx">query</span><span class="p">).</span><span class="nx">on</span><span class="p">(</span><span class="s1">'data'</span><span class="p">,</span> <span class="p">(</span><span class="nx">data</span><span class="p">)</span> <span class="o">-></span>
</span><span class='line'>  <span class="nx">data</span> <span class="o">=</span> <span class="nx">JSON</span><span class="p">.</span><span class="nx">parse</span><span class="p">(</span><span class="nx">data</span><span class="p">)</span>
</span><span class='line'>  <span class="k">for</span> <span class="nx">doc</span> <span class="k">in</span> <span class="nx">data</span><span class="p">.</span><span class="nx">hits</span><span class="p">.</span><span class="nx">hits</span>
</span><span class='line'>      <span class="nx">console</span><span class="p">.</span><span class="nx">log</span> <span class="nx">doc</span><span class="p">.</span><span class="nx">_source</span><span class="p">.</span><span class="nx">style</span>
</span><span class='line'>      <span class="nx">console</span><span class="p">.</span><span class="nx">log</span> <span class="nx">doc</span><span class="p">.</span><span class="nx">_source</span><span class="p">.</span><span class="nx">name</span>
</span><span class='line'>      <span class="nx">console</span><span class="p">.</span><span class="nx">log</span> <span class="nx">doc</span><span class="p">.</span><span class="nx">_source</span><span class="p">.</span><span class="nx">ingredients</span>
</span><span class='line'><span class="p">).</span><span class="nx">exec</span><span class="p">()</span>
</span></code>

But nothing shows up – there are no results! Why is that?

If you look closely in the earlier code example (specifically, the beer_1 JSON document), you can see that the word “lemons” is in the text (i.e. “…two table oranges and two lemons…”). It turns out, by default, the way values are indexed by ElasticSearch, lemon doesn’t necessarily match – lemons does though.

Related:
1 2 Page 1
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.