Elasticsearch: Not just another NoSQL store

There is already a very detailed article from Elastic on whether Elasticsearch can be used as a NoSQL store. And even though there is no formal definition for NoSQL data stores, most share a common characteristic, flexible schema management. Elasticsearch is also schema-flexible, but its schema management behaves differently than typical NoSQL stores. This article describes the automatic process that manages schema in Elasticsearch and why you should not treat Elasticsearch just like another NoSQL store.


Elasticsearch and Schemas


Elasticsearch has schema capabilities called mappings, which determine the structure of a document and each field's configuration, including its type. Nevertheless, Elasticsearch does not require a mapping to be defined. Instead, it uses a process called dynamic mapping, which allows for some interesting approaches to schema management.


Dynamic Mapping

If no explicit field mappings are provided, mappings are generated on the fly based on the data's structure and concrete values.


Disabled Dynamic Mapping


If dynamic mapping is disabled, any fields not explicitly defined will not be searchable but will still be included as part of the stored document.


Strict Dynamic Mapping


If dynamic mapping is set to strict any unknown fields will cause the document to be rejected during indexing with an error. Elasticsearch is neither schema-strict nor schema-less but is instead schema-flexible. Since the dynamic attribute can be set at the field level, different strategies can be used for different parts of a document. Combining the different approaches allows for use cases where part of the data model is well-understood. In contrast, other parts of the data model are loosely defined and vary from document to document.


Dynamic Mapping All The Way?


Not requiring a predefined schema removes the burden of schema management as the model evolves. However, always relying on dynamic mapping can sometimes negatively impact:

  • Indexing performance

  • Index size

  • Query behavior and performance

Incorrect Field Types


Dynamic mapping determines the type of a new field based on its contents when it first appears. But what happens if the first occurrence of a field contains malformed data? For example, if the first occurrence of a date field is blank, then the field will be mapped as text instead of date. The incorrect mapping will prevent performing date operations on the field.


Analyzing Unnecessary Fields


If a new field is detected as a string during dynamic mapping, it will be mapped as text. Text by default is analyzed, which breaks down the text into tokens stored separately from the actual document. Analyzing fields that do not need to be searched unnecessarily slows down indexing and increases index size.


Beyond Dynamic Mapping


Elasticsearch provides a few tools to guide the dynamic mapping process:

  1. Dynamic Templates

  2. Elastic Common Schema

Dynamic Templates


Dynamic templates are global rules that control the dynamic mapping behavior for a field based on different conditions:

  1. The datatype detected by Elasticsearch.

  2. The name of the field

  3. The path of the field

Dynamic templating is a great tool for managing schema when the data follows certain conventions.


Elastic Common Schema (ECS)


Elastic Common Schema is not an automated mechanism for managing schema but a specification for dealing with data originating from diverse sources. The standardization of certain fields combined with dynamic templating allows for a more optimized schema and helps avoid some of the drawbacks of relying only on dynamic mapping.


Understand Your Data Model


Elasticsearch schema mechanisms like dynamic mapping and dynamic templates allow it to be schema-flexible. However, you should not solely rely on dynamic mappings but instead try to understand your model and define mappings if possible. Eventually, if you are serious about improving recall and precision, you will end up creating custom analyzers for fields, thus defining your mappings to some extent. Future posts will provide ways to optimize your mappings and the approaches you can take when a mapping needs to change over time. Did you like this article? Subscribe to our blog by adding your email address to the form below. You can also email me at andreas@inventaconsulting.net or schedule a call to find out how Inventa Consulting can help you with Elasticsearch.

SUBSCRIBE

Stay up to date with articles about guides and best practices for Elasticsearch

Thanks for subscribing!