In a previous article, I described how the process of dynamic mapping works and what are some of its possible drawbacks:
Slower indexing performance
Higher space usage
This article provides a guide for optimizing Elasticsearch mappings by:
Disabling dynamic mapping and
Optimizing each field mapping explicitly
Optimizing mappings is definitely worth exploring for use cases with a large number of fields or high indexing traffic.
Disabling Dynamic Mapping
There are 2 ways to disable dynamic mapping depending on your control and knowledge of the model that is sent to Elasticsearch for indexing.
Strict Dynamic Mapping
Strict dynamic mapping is preferred when you have control over the model mapped in Elasticsearch. It is applied by setting the dynamic property to strict at the top of your mapping. Strict dynamic mapping instructs Elasticsearch to reject all documents that introduce properties that are not explicitly mapped. The benefits of strict dynamic mapping are:
Eliminating unwanted fields
Ensuring new fields are mapped optimally
Eliminating Unwanted Fields
Strict dynamic mapping will catch any fields that are sent to Elasticsearch by accident. In most cases the data sent to Elasticsearch originates from another data source but not every field from that datasource needs to be in the index.
Ensuring New Fields Are Mapped Optimally
Strict dynamic mapping will also catch any fields that are new to Elasticsearch since all fields have to be explicitly mapped before being used. The field configuration can be optimized as the new field is added to the field mappings.
Disabled Dynamic Mapping
The model of an Elasticsearch document cannot always be known in advance as it is in the case of ingesting data from external sources that are out of your control. In such cases, strict dynamic mapping is not an option and instead dynamic mapping must be simply disabled by setting the dynamic property to false. Disabled dynamic mapping instructs Elasticsearch to simply ignore any properties not explicitly mapped which means they will not be searchable. Disabled dynamic mapping is not as efficient as strict dynamic mapping since it does not detect unwanted or new fields. It also does not prevent unwanted fields from being stored as part of the document source.
Optimizing Field Mappings
Elasticsearch in general provides sensible defaults and when it comes to field mappings it enables the following capabilities without any additional configuration:
Fields are searchable
Sorting, aggregations and scripting are allowed on fields
Field metadata for text scoring is enabled
However, when dealing with a large number of fields over a large number of documents these capabilities can negatively affect your indexing speed and storage requirements.
Fields that need to be returned by queries but are not involved in the queries themselves can be marked as non-searchable. By setting the index property to false, Elasticsearch does not include the field in the inverted index and the field is only stored in the document source so it can be retrieved as part of search results.
Sorting, aggregations and scripts do not use the inverted index but use another type of metadata called doc_values and are enabled for all non-analyzed fields. If you know that a certain field will not be used for sorting, aggregations or scripting then the doc_values can be disabled.
Field norms represent metadata used for scoring the match of a query against a text field. For fields not participating in scoring queries the norms can be disabled. This is common when a field is only used for filtering. Did you like this article? Subscribe to our blog by adding your email address to the form below. You can also email me at firstname.lastname@example.org or schedule a call to find out how Inventa Consulting can help you with Elasticsearch.