Define Atlas Search Indexes¶

On this page

Limitation

Syntax
Options
Static and Dynamic Mappings
BSON Data Types
array
Atlas Search Field Types
autocomplete
boolean
date
document
geo
number
objectId
string
stringFacet
Examples
Static Mapping Example
Combined Mapping Example

Atlas Search can index data in different ways. When you define an Atlas Search index, you can specify a particular analyzer or multiple analyzers to index certain fields. You can also index certain fields and omit others, or you can dynamically index all the fields in a collection. You can define Atlas Search indexes through the Atlas User Interface and Atlas Search API.

Important

If you use the $out aggregation stage to modify a collection with an Atlas Search index, you must delete and re-create the search index. If possible, consider using $merge instead of $out.

Note

Atlas Search indexes are eventually consistent.

Limitation¶

Atlas Search cannot index numeric, date, or boolean values if they are part of an array.

Syntax¶

1 { 
2   "name": "<index-name>", 
3   "analyzer": "<analyzer-for-index>", 
4   "searchAnalyzer": "<analyzer-for-query>", 
5   "mappings": { 
6     "dynamic": <boolean>, 
7     "fields": { <field-definition> } 
8   }, 
9   "analyzers": [ <custom-analyzer> ],
10   "synonyms": [
11     {
12       "name": "<synonym-mapping-name>",
13       "source": {
14         "collection": "<source-collection-name>"
15       },
16       "analyzer": "<synonym-mapping-analyzer>"
17     }
18   ] 
19 }

Options¶

Field	Type	Necessity	Description
`analyzer`	string	Optional	Specifies the analyzer to apply to string fields when indexing. If you set this only at the top and do not specify an analyzer for the fields in the index definition, Atlas Search applies this analyzer to all the fields. To use a different analyzer for each field, you must specify a different analyzer for the field. If omitted, defaults to Standard Analyzer.
`analyzers`	array of Custom Analyzers	Optional	Specifies the Custom Analyzers to use in this index.
`mappings`	Document Field Definition	Required	Specifies how to index fields at different paths for this index.
`mappings.dynamic`	boolean	Optional	Enables or disables dynamic mapping of fields for this index. If set to `true`, Atlas Search recursively indexes all fields and embedded documents in the `document` except: Fields of certain data types. To learn more, see BSON Data Types. Any fields explicitly excluded by the `mappings.fields` parameter. If set to `false`, you must specify individual fields to index using `mappings.fields`. If omitted, defaults to `false`. Important Atlas indexes all fields in a `dynamic` `document` using the default settings for the detected data type. All nested documents under the dynamic `document` are treated as `dynamic`, unless explicitly overridden. See index configuration example on this page.
`mappings.fields`	document	Conditional	Required only if dynamic mapping is disabled. Specifies the fields that you would like to index. See the example on this page.
`name`	string	Optional	Specifies a name for the index. In each namespace, names of all indexes in the namespace must be unique. If omitted, defaults to `default`.
`searchAnalyzer`	string	Optional	Specifies the analyzer to apply to query text before searching with it. If omitted, defaults to Standard Analyzer.
`synonyms`	array of Synonym Mapping Definition	Optional	Synonym mappings to use in your index. To learn more, see Define Synonym Mappings in Your Atlas Search Index.

Static and Dynamic Mappings¶

For Static mappings, set mappings.dynamic to false and specify the fields to index using mappings.fields. Atlas Search only indexes the specified fields with specific options.

Use static mappings to configure index options for fields that should not be indexed dynamically, or to configure a single field independently from others in an index.

Note

You must specify static mappings when mappings.dynamic is false.

For Dynamic mappings, set mappings.dynamic to true. Atlas Search automatically indexes the fields of supported types in each document.

Use dynamic mappings if your schema changes regularly or is unknown, or when experimenting with Atlas Search. You can configure an entire index to use dynamic mappings, or specify individual fields, such as fields of type document, to be dynamically mapped.

Note

Dynamically mapped indexes occupy more disk space than statically mapped indexes and may be less performant.

BSON Data Types¶

The table below enumerates all the BSON data types and indicates whether they are included in an Atlas Search index with dynamic mappings.

BSON Type	Included in Dynamic Index?	Atlas Search Field Type
Double	yes	number
32-bit integer	yes	number
64-bit integer	yes	number
String	yes ^*	string stringFacet
Date	yes	date
Object	yes	document
ObjectId	no	objectId
Boolean	no	boolean
Timestamp	no
Array	yes
Binary Data	no
Null	no
Regular Expression	no
JavaScript	no
Decimal128	no
Min key	no
Max key	no

^* You can't use dynamic mapping to automatically index string fields for faceting. You must index fields using stringFacet to run a facet query on string fields.

array¶

For indexing arrays, Atlas Search only requires the data type of the array elements. You don't have to specify that the data is contained in an array in the index definition.

Note

Atlas Search doesn't index documents inside an array.

Example

The following index definition for the sample_mflix.movies collection in the sample dataset indexes the genres field, which contains an array of string values.

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "genres": {
        "type": "string"
      }
    }
  }
}

Atlas Search Field Types¶

autocomplete¶

You can use the autocomplete data type to index text values for autocompletion. You can configure an autocomplete field to satisfy a variety of use cases. To learn more about the configuration options available in the autocomplete data type, such as tokenization strategy and diacritic folding, see autocomplete. You can use the autocomplete operator to query only fields indexed using autocomplete.

Note

You can't use the autocomplete type to index fields whose value is an array of strings.

The autocomplete type takes the following options:

Option

Type

Necessity

Purpose

Default

type

string

required

The type of field. Value must be autocomplete.

analyzer

string

optional

Name of the analyzer to use with this autocomplete mapping. You can use any Atlas Search analyzer except the lucene.kuromoji language analyzer and the following custom analyzer tokenizers and token filters:

nGram Tokenizer
edgeGram Tokenizer
daitchMokotoffSoundex Token Filter
nGram Token Filter
edgeGram Token Filter
shingle Token Filter

lucene.standard

maxGrams

int

optional

The maximum number of characters per indexed sequence. The value limits the character length of indexed tokens. When you search for terms longer than the maxGrams value, Atlas Search truncates the tokens to the maxGrams length.

15

minGrams

int

optional

The minimum number of characters per indexed sequence. We recommend 4 for the minimum value. A value that is less than 4 could impact performance because the size of the index can become very large. We recommend the default value of 2 for edgeGram only.

2

tokenization

enum

optional

The tokenization strategy to use when indexing the field for autocompletion. Value can be one of the following:

edgeGram - to create indexable tokens, referred to as grams, from variable-length character sequences starting at the left side of the words as delimited by the analyzer used with this autocomplete mapping.
rightEdgeGram - to create indexable tokens, referred to as grams, from variable-length character sequences starting at the right side of the words as delimited by the analyzer used with this autocomplete mapping.
Note
You can specify rightEdgeGram only in the JSON Editor. You can't select the rightEdgeGram tokenization strategy in the Visual Editor.
nGram - to create indexable tokens, referred to as grams, by sliding a variable-length character window over a word. Atlas Search creates more tokens for nGram than edgeGram or rightEdgeGram. Therefore, nGram takes more space and time to index the field. nGram is better suited for querying languages with long, compound words or languages that don't use spaces.

For example, consider the following sentence:

The quick brown fox jumps over the lazy dog.

When tokenized with minGrams value of 2 and maxGrams value of 5, Atlas Search indexes the following sequence of characters based on the tokenization value you choose:

edgeGram

Th
The
The{SPACE}
The q
qu
qui
quic
quick
...

rightEdgeGram

og
dog
{SPACE}dog
y dog
zy
azy
lazy
{SPACE}lazy
he
the
{SPACE}the
r the
er
ver
over
{SPACE}over
...

nGram

Th
The
The{SPACE}
The q
he
he{SPACE}
he q
he qu
e{SPACE}
e q
e qu
e qui
{SPACE}q
{SPACE}qu
{SPACE}qui
{SPACE}quic
qu
qui
quic
quick
...

Note

Indexing a field for autocomplete with an edgeGram, rightEdgeGram, or nGram tokenization strategy is more computationally expensive than indexing a string field. The index takes more space than an index with regular string fields.

edgeGram

foldDiacritics

boolean

optional

The setting to indicate whether diacritics should be included or removed from the indexed text. Value can be one of the following:

true - to ignore diacritic marks in the index and query text. Returns results with and without diacritic marks. For example, a search for cafè returns results with the characters cafè and cafe.
false - to include diacritic marks in the index and query text. Returns only results that match the strings with or without diacritics in the query. For example, a search for cafè returns results only with the characters cafè. A search for cafe returns results only with the characters cafe.

true

Example

{
  "mappings": {
    "dynamic": true|false,
    "fields": {
      "<field-name>": [
        {
          "type": "autocomplete",
          "analyzer": "lucene.standard",
          "tokenization": "edgeGram|rightEdgeGram|nGram",
          "minGrams": <2>,
          "maxGrams": <15>,
          "foldDiacritics": true|false
        }
      ]
    }
  }
}

boolean¶

The boolean data type is used for indexing true and false values. It works in conjunction with the equals operator.

Note

Fields of type boolean cannot be dynamically indexed. You must index fields of type boolean using static mappings.

Example

The following example index definition maps a field named verified_user to the boolean data type and a field named teammates to the objectId data type.

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "verified_user": {
        "type": "boolean"
      },
      "teammates": {
        "type": "objectId"
      }
    }
  }
}

date¶

The date type is used for indexing date values. It takes the type option. The value of type must be date. A date can't be indexed if it is part of an array.

document¶

The document data type is used for fields with embedded documents. It takes the following parameters:

Option	Type	Necessity	Purpose	Default
`type`	string	Required	The type of field. Value must be `document`.
`dynamic`	boolean	Conditional	If set to `true`, Atlas Search recursively indexes all fields and embedded documents in the `document` except: Fields of certain data types. To learn more, see BSON Data Types. Any fields that you explicitly exclude using the `fields` parameter. If omitted or set to `false`, you must specify individual fields to index. Important Atlas indexes all fields in a `dynamic` `document` using the default settings for the detected data type. All nested documents under the dynamic `document` are treated as `dynamic`, unless explicitly overridden.	false
`fields`	document	Conditional	Maps document field names to field definitions. To learn more, see an example. This is required if `dynamic` is omitted or set to `false`.

geo¶

The geo type is used for indexing geographic point and shape coordinates. For this type, the indexed field must be a GeoJSON object.

Option	Type	Necessity	Purpose	Default
`type`	string	Required	The type of field. Value must be `geo`.
`indexShapes`	boolean	Optional	Specifies whether or not to index shapes. By default, Atlas Search: Indexes points, even when nested. Does not index shape geometries such as lines and polygons. Value can be: `true` to index shapes and points `false` to index only points	`false`

Example

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "type": "document",
      "<field-name>": {
        "indexShapes": true|false,
        "type": "geo"
      }
    }
  }
}

number¶

The number type is used for fields with numeric values of int32, int64, and double data types. The number type has the following options:

Option	Type	Necessity	Purpose	Default
`type`	string	Required	The type of field. Value must be `number`.
`representation`	string	Optional	The data type of the field to index. Values are: `int64` - for indexing large integers without loss of precision and for rounding double values to integers. You can't use this type to index large double values. `double` - for indexing large double values without rounding. To learn more, see example below.	`double`
`indexIntegers`	boolean	Optional	Indicates whether to index or omit indexing `int32` and `int64` type values. Value can be `true` or `false`. To learn more, see example below.	`true`
`indexDoubles`	boolean	Optional	Indicates whether to index or omit indexing `double` type values. Value can be `true` or `false`. To learn more, see example below.	`true`

`representation` Example¶

Example

The following index definition for the sample_analytics.accounts collection in the sample dataset indexes the account_id field with 64-bit integer values. The following example also:

Indexes all other integer values in the account_id field
Rounds any decimal values and indexes small double type values in the account_id field

{
  "mappings": {
      "dynamic": false,
      "fields": {
          "account_id": {
            "type": "number",
            "representation": "int64"
          }
      }
  }
}

`indexIntegers` Example¶

Example

The following index definition for the sample_airbnb.listingsAndReviews collection in the sample dataset omits the bathrooms field with 32-bit and 64-bit integer values. The following example will index the bathrooms field with double type values.

{
  "mappings": {
      "dynamic": false,
      "fields": {
        "bathrooms": {
            "type": "number",
            "indexIntegers": false
        }
      }
  }
}

`indexDoubles` Example¶

Example

The following index definition for the sample_analytics.accounts collection in the sample dataset:

Indexes the account_id field with integer values.
Omits the account_id field with doubles values.

{
  "mappings": {
      "dynamic": false,
      "fields": {
        "account_id": {
            "type": "number",
            "representation": "int64",
            "indexDoubles": false
        }
      }
  }
}

objectId¶

The objectId data type is used for indexing ObjectId fields. It works in conjunction with the equals operator.

Note

Fields of type objectId can't be dynamically indexed. You must index fields of type objectId using static mappings. To learn more, see the example in the boolean section on this page.

string¶

Note

You can't use dynamic mapping to automatically index string fields for faceting. You must index the fields using stringFacet to run a facet query on string fields.

The string data type takes the following parameters:

Option	Type	Necessity	Purpose	Default
`type`	string	Required	The type of field. Value must be `string`.
`analyzer`	string	Optional	The name of a built-in or overridden analyzer to use for indexing the field.	`lucene.standard`
`searchAnalyzer`	string	Optional	The analyzer to use when querying the field.	`lucene.standard`
`indexOptions`	string	Optional	Specifies the amount of information to store for the indexed field. Value can be one of the following: `docs` - Only indexes documents. The frequency and position of the indexed term are ignored. Only a single occurence of the term is reflected in the score. `freqs` - Only indexes documents and term frequency. The position of the indexed term is ignored. `positions` - Indexes documents, term frequency, and term positions. `offsets` - (Default) Indexes documents, term frequency, term positions, and term offsets. This option is required for Highlighting.	`offsets`
`store`	boolean	Optional	Specifies whether or not to store the exact document text as well as the analyzed values in the index. Value can be `true` or `false`. The value for this option must be `true` for Highlighting.	`true`
`ignoreAbove`	int	Optional	The maximum number of characters in the value of the field to index. Atlas Search doesn't index if the field value is greater than the specified number of characters.
`multi`	String Field Definition	Optional	The string field to index with the name of the alternate analyzer specified in the `multi` object. To learn more about specifying the `multi` object, see example below.
`norms`	string	Optional	Specifies whether to include or omit the field length in the result when scoring. The length of the field is determined by the number of tokens produced by the analyzer for the field. Value can be one of the following: `include` - to include the field length when scoring. `omit` - to omit the field length when scoring. If value is `include`, Atlas Search uses the length of the field to determine the higher score when scoring. For example, if two documents match an Atlas Search query, the document with the shorter field length scores higher than the document with the longer field length. If value is `omit`, Atlas Search ignores the field length when scoring.	`include`

`multi` Example¶

Example

The following index definition for a library.books collection indexes string values in the field text with the lucene.english and lucene.french analyzers in addition to the default lucene.standard analyzer:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "text": {
        "type": "string",
        "multi": {
          "english": {
            "type": "string",
            "analyzer": "lucene.english"
          },
          "french": {
            "type": "string",
            "analyzer": "lucene.french"
          }
        }
      }
    }
  }
}

stringFacet¶

Note

Preview

Atlas Search facet and count are in preview. The features and the corresponding documentation may change at any time in the preview stage.

The stringFacet data type is used for indexing string fields for faceting, which allows you to run a facet query on that field. Atlas Search doesn't apply the analyzer when indexing string fields for faceting. The stringFacet data type takes the following parameter:

Option	Type	Necessity	Purpose	Default
`type`	string	Required	The type of field. Value must be `stringFacet`.

Example¶

The following index definition for the sample_mflix.movies collection in the sample dataset indexes the genres field as string for faceting.

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "genres": {
        "type": "stringFacet"
      }
    }
  }
}

Examples¶

Static Mapping Example¶

The following example index definition uses static mappings.

The default index analyzer is lucene.standard.
The default search analyzer is lucene.standard. You can change the search analyzer if you want the query term to be parsed differently than how it is stored in your Atlas Search index.
The index specifies static field mappings (dynamic: false), which means fields that are not explicitly mentioned are not indexed. So, the index definition includes:
- The address field, which is of type document. It has two embedded sub-fields, city and state.
  The city sub-field uses the lucene.simple analyzer by default for queries. It uses the ignoreAbove option to ignore any string of more than 255 bytes in length.
  The state sub-field uses the lucene.english analyzer by default for queries.
- The company field, which is of type string. It uses the lucene.whitespace analyzer by default for queries. It has a multi analyzer named mySecondaryAnalyzer which uses the lucene.french analyzer by default for queries.
  For more information on multi analyzers, see Path Construction.
- The employees field, which is an array of strings. It uses the lucene.standard analyzer by default for queries. For indexing arrays, Atlas Search only requires the data type of the array elements. You don't have to specify that the data is contained in an array in the index definition.

{
  "analyzer": "lucene.standard",
  "searchAnalyzer": "lucene.standard",
  "mappings": {
    "dynamic": false,
    "fields": {
      "address": {
        "type": "document",
        "fields": {
          "city": {
            "type": "string",
            "analyzer": "lucene.simple",
            "ignoreAbove": 255
          },
          "state": {
            "type": "string",
            "analyzer": "lucene.english"
          }
        }
      },
      "company": {
        "type": "string",
        "analyzer": "lucene.whitespace",
        "multi": {
          "mySecondaryAnalyzer": {
            "type": "string",
            "analyzer": "lucene.french"
          }
        }
      },
      "employees": {
        "type": "string",
        "analyzer": "lucene.standard"
      }
    }
  }
}

Combined Mapping Example¶

The following example index definition uses both static and dynamic mappings.

The default index analyzer is lucene.standard.
The default search analyzer is lucene.standard. You can change the search analyzer if you want the query term to be parsed differently than how it is stored in your Atlas Search index.
The index specifies static field mappings (dynamic: false), which means fields that are not explicitly mentioned are not indexed. So, the index definition includes:
- The company field, which is of type string. It uses the lucene.whitespace analyzer by default for queries. It has a multi analyzer named mySecondaryAnalyzer which uses the lucene.french analyzer by default for queries. For more information on multi analyzers, see Path Construction.
- The employees field, which is an array of strings. It uses the lucene.standard analyzer by default for queries.
- The address field, which is of type document. It has two embedded sub-fields, city and state. Instead of explicitly mentioning each nested field in the document, the index definition enables dynamic mapping for all the sub-fields in the document. It uses the lucene.standard analyzer by default for queries.

{
  "analyzer": "lucene.standard",
  "searchAnalyzer": "lucene.standard",
  "mappings": {
    "dynamic": false,
    "fields": {
      "company": {
        "type": "string",
        "analyzer": "lucene.whitespace",
        "multi": {
          "mySecondaryAnalyzer": {
            "type": "string",
            "analyzer": "lucene.french"
          }
        }
      },
      "employees": {
        "type": "string",
        "analyzer": "lucene.standard"
      },
      "address": {
        "type": "document",
        "dynamic": true,
        "analyzer": "lucene.standard"
      }
    }
  }
}

← Custom Analyzers Define Synonym Mappings in Your Atlas Search Index →

1	{
2	"name": "<index-name>",
3	"analyzer": "<analyzer-for-index>",
4	"searchAnalyzer": "<analyzer-for-query>",
5	"mappings": {
6	"dynamic": <boolean>,
7	"fields": { <field-definition> }
8	},
9	"analyzers": [ <custom-analyzer> ],
10	"synonyms": [
11	{
12	"name": "<synonym-mapping-name>",
13	"source": {
14	"collection": "<source-collection-name>"
15	},
16	"analyzer": "<synonym-mapping-analyzer>"
17	}
18	]
19	}

Define Atlas Search Indexes¶

Limitation¶

Syntax¶

Options¶

Static and Dynamic Mappings¶

BSON Data Types¶

array¶

Atlas Search Field Types¶

autocomplete¶

boolean¶

date¶

document¶

geo¶

number¶

representation Example¶

indexIntegers Example¶

indexDoubles Example¶

objectId¶

string¶

multi Example¶

stringFacet¶

Example¶

Examples¶

Static Mapping Example¶

Combined Mapping Example¶

`representation` Example¶

`indexIntegers` Example¶

`indexDoubles` Example¶

`multi` Example¶