Define Atlas Search Indexes¶
On this page
Atlas Search can index data in different ways. When you define an Atlas Search index, you can specify a particular analyzer or multiple analyzers to index certain fields. You can also index certain fields and omit others, or you can dynamically index all the fields in a collection. You can define Atlas Search indexes through the Atlas User Interface and Atlas Search API.
Atlas Search indexes are eventually consistent.
Limitation¶
Atlas Search cannot index numeric, date, or boolean values if they are part of an array.
Syntax¶
1 { 2 "name": "<index-name>", 3 "analyzer": "<analyzer-for-index>", 4 "searchAnalyzer": "<analyzer-for-query>", 5 "mappings": { 6 "dynamic": <boolean>, 7 "fields": { <field-definition> } 8 }, 9 "analyzers": [ <custom-analyzer> ], 10 "synonyms": [ 11 { 12 "name": "<synonym-mapping-name>", 13 "source": { 14 "collection": "<source-collection-name>" 15 }, 16 "analyzer": "<synonym-mapping-analyzer>" 17 } 18 ] 19 }
Options¶
Field | Type | Necessity | Description |
---|---|---|---|
analyzer | string | Optional | Specifies the analyzer to apply to
string fields when indexing. If you set this only at the top and
do not specify an analyzer for the fields in the index
definition, Atlas Search applies this analyzer to all the fields. To
use a different analyzer for each field, you must specify a
different analyzer for the field. If omitted, defaults to
Standard Analyzer. |
analyzers | array of Custom Analyzers | Optional | Specifies the Custom Analyzers to use in this index. |
mappings | Required | Specifies how to index fields at different paths for this
index. | |
mappings.dynamic | boolean | Optional | Enables or disables dynamic mapping of fields for this index. If set to
If set to If omitted, defaults to Important Atlas indexes all fields in a See index configuration example on this page. |
mappings.fields | document | Conditional | Required only if dynamic mapping is disabled. Specifies the fields that you would like to index. See the example on this page. |
name | string | Optional | Specifies a name for the index. In each namespace, names of all indexes in
the namespace must be unique. If omitted, defaults to
default . |
searchAnalyzer | string | Optional | Specifies the analyzer to apply to query
text before searching with it. If omitted, defaults to
Standard Analyzer. |
synonyms | array of Synonym Mapping Definition | Optional | Synonym mappings to use in your index. To learn more, see
Define Synonym Mappings in Your Atlas Search Index. |
Static and Dynamic Mappings¶
For Static mappings, set mappings.dynamic
to false
and
specify the fields to index using mappings.fields
. Atlas Search only
indexes the specified fields with specific options.
Use static mappings to configure index options for fields that should not be indexed dynamically, or to configure a single field independently from others in an index.
You must specify static mappings when mappings.dynamic
is
false
.
For Dynamic mappings, set mappings.dynamic
to true
. Atlas Search
automatically indexes the fields of supported types in each document.
Use dynamic mappings if your schema changes regularly or is unknown, or
when experimenting with Atlas Search. You can configure an entire index to use
dynamic mappings, or specify individual fields, such as fields of type
document
, to be dynamically mapped.
Dynamically mapped indexes occupy more disk space than statically mapped indexes and may be less performant.
BSON Data Types¶
The table below enumerates all the BSON data types and indicates whether they are included in an Atlas Search index with dynamic mappings.
BSON Type | Included in Dynamic Index? | Atlas Search Field Type |
---|---|---|
Double | yes | |
32-bit integer | yes | |
64-bit integer | yes | |
String | yes * | |
Date | yes | |
Object | yes | |
ObjectId | no | |
Boolean | no | |
Timestamp | no | |
Array | yes | |
Binary Data | no | |
Null | no | |
Regular Expression | no | |
JavaScript | no | |
Decimal128 | no | |
Min key | no | |
Max key | no |
* You can't use dynamic mapping
to automatically index string
fields for faceting. You must index
fields using stringFacet to run a facet query
on string
fields.
array¶
For indexing arrays, Atlas Search only requires the data type of the array elements. You don't have to specify that the data is contained in an array in the index definition.
Atlas Search doesn't index documents inside an array.
The following index definition for the sample_mflix.movies
collection in the sample dataset
indexes the genres
field, which contains an array of string
values.
{ "mappings": { "dynamic": false, "fields": { "genres": { "type": "string" } } } }
Atlas Search Field Types¶
autocomplete¶
You can use the autocomplete
data type to index text values for
autocompletion. You can configure an autocomplete
field to satisfy
a variety of use cases. To learn more about the configuration options
available in the autocomplete
data type, such as tokenization
strategy and diacritic folding, see
autocomplete. You can use the autocomplete operator to query only fields indexed using
autocomplete.
You can't use the autocomplete
type to index fields whose value
is an array of strings.
The autocomplete
type takes the following options:
Option | Type | Necessity | Purpose | Default | ||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
type | string | required | The type of field. Value must be autocomplete . | |||||||||||||||||||||||||||||||||||||||||||||||||
analyzer | string | optional | Name of the analyzer to use with this
autocomplete mapping. You can use any Atlas Search analyzer except the
| lucene.standard | ||||||||||||||||||||||||||||||||||||||||||||||||
maxGrams | int | optional | The maximum number of characters per indexed sequence. The
value limits the character length of indexed tokens. When you
search for terms longer than the maxGrams value, Atlas Search
truncates the tokens to the maxGrams length. | 15 | ||||||||||||||||||||||||||||||||||||||||||||||||
minGrams | int | optional | The minimum number of characters per indexed sequence. We
recommend 4 for the minimum value. A value that is less
than 4 could impact performance because the size of the
index can become very large. We recommend the default value of
2 for edgeGram only. | 2 | ||||||||||||||||||||||||||||||||||||||||||||||||
tokenization | enum | optional | The tokenization strategy to use when indexing the field for autocompletion. Value can be one of the following:
For example, consider the following sentence:
When tokenized with edgeGram
rightEdgeGram
nGram
Note Indexing a field for autocomplete with an | edgeGram | ||||||||||||||||||||||||||||||||||||||||||||||||
foldDiacritics | boolean | optional | The setting to indicate whether diacritics should be included or removed from the indexed text. Value can be one of the following:
| true |
{ "mappings": { "dynamic": true|false, "fields": { "<field-name>": [ { "type": "autocomplete", "analyzer": "lucene.standard", "tokenization": "edgeGram|rightEdgeGram|nGram", "minGrams": <2>, "maxGrams": <15>, "foldDiacritics": true|false } ] } } }
boolean¶
The boolean
data type is used for indexing true
and false
values. It works in conjunction with the equals
operator.
Fields of type boolean
cannot be dynamically indexed. You must
index fields of type boolean
using static mappings.
The following example index definition maps a field named
verified_user
to the boolean
data type and a field named
teammates
to the objectId
data type.
{ "mappings": { "dynamic": false, "fields": { "verified_user": { "type": "boolean" }, "teammates": { "type": "objectId" } } } }
date¶
The date
type is used for indexing date values. It takes the
type
option. The value of type
must be date
. A date can't
be indexed if it is part of an array.
document¶
The document
data type is used for fields with embedded documents.
It takes the following parameters:
Option | Type | Necessity | Purpose | Default |
---|---|---|---|---|
type | string | Required | The type of field. Value must be document . | |
dynamic | boolean | Conditional | If set to
If omitted or set to Important Atlas indexes all fields in a | false |
fields | document | Conditional | Maps document field names to field definitions. To learn more,
see an example. This is required
if dynamic is omitted or set to false . |
geo¶
The geo
type is used for indexing geographic point and shape
coordinates. For this type, the indexed field must be a
GeoJSON object.
Option | Type | Necessity | Purpose | Default |
---|---|---|---|---|
type | string | Required | The type of field. Value must be geo . | |
indexShapes | boolean | Optional | Specifies whether or not to index shapes. By default, Atlas Search:
Value can be:
| false |
{ "mappings": { "dynamic": false, "fields": { "type": "document", "<field-name>": { "indexShapes": true|false, "type": "geo" } } } }
number¶
The number
type is used for fields with numeric values of
int32
, int64
, and double
data types. The number
type
has the following options:
Option | Type | Necessity | Purpose | Default |
---|---|---|---|---|
type | string | Required | The type of field. Value must be number . | |
representation | string | Optional | The data type of the field to index. Values are:
To learn more, see example below. | double |
indexIntegers | boolean | Optional | Indicates whether to index or omit indexing int32 and
int64 type values. Value can be true or false .
To learn more, see example below. | true |
indexDoubles | boolean | Optional | Indicates whether to index or omit indexing double type
values. Value can be true or false . To learn more,
see example below. | true |
representation
Example¶
The following index definition for the
sample_analytics.accounts
collection in the sample
dataset indexes the
account_id
field with 64-bit integer values. The
following example also:
- Indexes all other integer values in the
account_id
field - Rounds any decimal values and indexes small double type
values in the
account_id
field
{ "mappings": { "dynamic": false, "fields": { "account_id": { "type": "number", "representation": "int64" } } } }
indexIntegers
Example¶
The following index definition for the
sample_airbnb.listingsAndReviews
collection in the
sample dataset omits the
bathrooms
field with 32-bit and 64-bit integer values.
The following example will index the bathrooms
field with
double
type values.
{ "mappings": { "dynamic": false, "fields": { "bathrooms": { "type": "number", "indexIntegers": false } } } }
indexDoubles
Example¶
The following index definition for the
sample_analytics.accounts
collection in the
sample dataset:
- Indexes the
account_id
field with integer values. - Omits the
account_id
field with doubles values.
{ "mappings": { "dynamic": false, "fields": { "account_id": { "type": "number", "representation": "int64", "indexDoubles": false } } } }
objectId¶
The objectId
data type is used for indexing ObjectId fields. It works in conjunction with
the equals operator.
Fields of type objectId
can't be dynamically indexed. You must
index fields of type objectId
using static mappings. To learn more, see the example in the boolean
section on this page.
string¶
You can't use dynamic mapping to
automatically index string
fields for faceting. You must index
the fields using stringFacet to run a facet
query on string
fields.
The string
data type takes the following parameters:
Option | Type | Necessity | Purpose | Default |
---|---|---|---|---|
type | string | Required | The type of field. Value must be string . | |
analyzer | string | Optional | The name of a built-in or overridden analyzer to use for indexing the field. | lucene.standard |
searchAnalyzer | string | Optional | The analyzer to use when querying the field. | lucene.standard |
indexOptions | string | Optional | Specifies the amount of information to store for the indexed field. Value can be one of the following:
| offsets |
store | boolean | Optional | Specifies whether or not to store the exact document text as
well as the analyzed values in the index. Value can be true
or false . The value for this option must be true for
Highlighting. | true |
ignoreAbove | int | Optional | The maximum number of characters in the value of the field to
index. Atlas Search doesn't index if the field value is greater than
the specified number of characters. | |
multi | String Field Definition | Optional | The string field to index with the name of the alternate
analyzer specified in the multi object. To learn more about
specifying the multi object, see example
below. | |
norms | string | Optional | Specifies whether to include or omit the field length in the result when scoring. The length of the field is determined by the number of tokens produced by the analyzer for the field. Value can be one of the following:
If value is If value is | include |
multi
Example¶
The following index definition for a library.books
collection
indexes string values in the field text
with the
lucene.english
and lucene.french
analyzers in addition to
the default lucene.standard
analyzer:
{ "mappings": { "dynamic": false, "fields": { "text": { "type": "string", "multi": { "english": { "type": "string", "analyzer": "lucene.english" }, "french": { "type": "string", "analyzer": "lucene.french" } } } } } }
stringFacet¶
The stringFacet
data type is used for indexing string fields for faceting, which allows you
to run a facet query on that field. Atlas Search doesn't apply the analyzer
when indexing string
fields for faceting. The stringFacet
data
type takes the following parameter:
Option | Type | Necessity | Purpose | Default |
---|---|---|---|---|
type | string | Required | The type of field. Value must be stringFacet . |
Example¶
The following index definition for the sample_mflix.movies
collection in the sample dataset
indexes the genres
field as string
for faceting.
{ "mappings": { "dynamic": false, "fields": { "genres": { "type": "stringFacet" } } } }
Examples¶
Static Mapping Example¶
The following example index definition uses static mappings.
- The default index analyzer is lucene.standard.
- The default search analyzer is lucene.standard. You can change the search analyzer if you want the query term to be parsed differently than how it is stored in your Atlas Search index.
The index specifies static field mappings (
dynamic
:false
), which means fields that are not explicitly mentioned are not indexed. So, the index definition includes:The
address
field, which is of typedocument
. It has two embedded sub-fields,city
andstate
.The
city
sub-field uses the lucene.simple analyzer by default for queries. It uses theignoreAbove
option to ignore any string of more than 255 bytes in length.The
state
sub-field uses the lucene.english analyzer by default for queries.The
company
field, which is of typestring
. It uses the lucene.whitespace analyzer by default for queries. It has amulti
analyzer namedmySecondaryAnalyzer
which uses the lucene.french analyzer by default for queries.For more information on
multi
analyzers, see Path Construction.- The
employees
field, which is an array of strings. It uses the lucene.standard analyzer by default for queries. For indexing arrays, Atlas Search only requires the data type of the array elements. You don't have to specify that the data is contained in an array in the index definition.
{ "analyzer": "lucene.standard", "searchAnalyzer": "lucene.standard", "mappings": { "dynamic": false, "fields": { "address": { "type": "document", "fields": { "city": { "type": "string", "analyzer": "lucene.simple", "ignoreAbove": 255 }, "state": { "type": "string", "analyzer": "lucene.english" } } }, "company": { "type": "string", "analyzer": "lucene.whitespace", "multi": { "mySecondaryAnalyzer": { "type": "string", "analyzer": "lucene.french" } } }, "employees": { "type": "string", "analyzer": "lucene.standard" } } } }
Combined Mapping Example¶
The following example index definition uses both static and dynamic mappings.
- The default index analyzer is lucene.standard.
- The default search analyzer is lucene.standard. You can change the search analyzer if you want the query term to be parsed differently than how it is stored in your Atlas Search index.
The index specifies static field mappings (
dynamic
:false
), which means fields that are not explicitly mentioned are not indexed. So, the index definition includes:- The
company
field, which is of typestring
. It uses the lucene.whitespace analyzer by default for queries. It has amulti
analyzer namedmySecondaryAnalyzer
which uses the lucene.french analyzer by default for queries. For more information onmulti
analyzers, see Path Construction. - The
employees
field, which is an array of strings. It uses the lucene.standard analyzer by default for queries. - The
address
field, which is of typedocument
. It has two embedded sub-fields,city
andstate
. Instead of explicitly mentioning each nested field in the document, the index definition enables dynamic mapping for all the sub-fields in the document. It uses the lucene.standard analyzer by default for queries.
- The
{ "analyzer": "lucene.standard", "searchAnalyzer": "lucene.standard", "mappings": { "dynamic": false, "fields": { "company": { "type": "string", "analyzer": "lucene.whitespace", "multi": { "mySecondaryAnalyzer": { "type": "string", "analyzer": "lucene.french" } } }, "employees": { "type": "string", "analyzer": "lucene.standard" }, "address": { "type": "document", "dynamic": true, "analyzer": "lucene.standard" } } } }