Mongoid batch inserts

In SQL land, all databases support batch inserts. Batch inserts are an effective and efficient mechanism to insert a lot of similar data. That is, instead of issuing x insert statements, you execute 1 insert with x records. This is much more efficient because the insert statement doesn’t need to be re-parsed x times, there is only 1 network trip as opposed to x, and in the case of transactions, there is only 1 transaction instead of x. When compared to x inserts, batch inserts are always faster.

As it turns out, MongoDB supports batch inserts! And just like in SQL land, Mongo’s batching feature is much faster at inserting a lot of data in one insert rather than x inserts.

For example, the Mongo Ruby driver’s insert method takes a collection; thus, you can insert an array of hashes quite efficiently. Even if you are using a ODM like Mongoid, you can still perform batch inserts as all you need to do is get a reference to the model object’s underlying collection and then issue an insert with an array of hashes matching the collection’s intended document structure.

For instance, to insert a collection of Tag models (each having 3 fields: name, system_tag, and account_id) in one fell swoop I can do the following:

Batch inserts with Mongoid model example
<span class='line-number'>1</span>
<span class='line-number'>2</span>
<code class='ruby'><span class='line'><span class="n">tags</span> <span class="o">=</span> <span class="o">[</span><span class="s1">'a'</span><span class="p">,</span> <span class="s1">'bunch'</span><span class="p">,</span> <span class="s1">'of'</span><span class="p">,</span> <span class="s1">'tags'</span><span class="o">].</span><span class="n">collect</span> <span class="p">{</span> <span class="o">|</span><span class="n">tag</span><span class="o">|</span> <span class="p">{</span><span class="nb">name</span><span class="p">:</span> <span class="n">tag</span><span class="p">,</span> <span class="n">system_tag</span><span class="p">:</span> <span class="kp">true</span><span class="p">,</span> <span class="n">account_id</span><span class="p">:</span> <span class="nb">id</span><span class="p">}</span> <span class="p">}</span>
</span><span class='line'><span class="no">Tag</span><span class="o">.</span><span class="n">collection</span><span class="o">.</span><span class="n">insert</span> <span class="n">tags</span>

In the code above, the insert takes a collection of hashes; what’s more, the insert is tied to the tags collection via the Tag.collection call.

Batch inserts are always faster if you have a lot of similar documents – in our case, we saw a tremendous performance increase when employing batching.