Index Performance
On this page
Resource Requirements
Index Size and Configuration
If you create an Atlas Search index for a collection that has or will soon have more than two billion documents, you must shard your cluster.
When you create an Atlas Search index, the default configuration sets field mapping to dynamic, which means that all the data in your collection is actively added to your Atlas Search index. Other options such as enabling highlights can also result in your index taking up more disk space. You can reduce the size and performance footprint of your Atlas Search index by:
- Specifying a custom index definition to narrow the amount and type of data that is indexed.
- Setting the
store
option tofalse
when specifying a string type in an index definition.
Some limitations apply to Atlas Search on M0
, M2
, and M5
clusters only. To learn more, see
Atlas Search Free and Shared Tier Limitations.
Considerations
Some index configuration options can lead to indexes that take up a significant proportion of your disk space. In some cases, your index could be many times larger than the size of your data. Although this is expected behavior, it's important to be aware of the following indexing-intensive features:
Autocomplete
The autocomplete Atlas Search field type can cause large indexes, especially in the following cases:
- Using
nGram
tokenization. - Setting a wide
minGrams
tomaxGrams
range. - Setting a
minGram
value of1
on a collection with millions of documents.
multi
Analyzers
Using a multi
analyzer to analyze the same field multiple
different ways can cause large indexes, especially when analyzing
fields with very long values.
Synonym Collections
A large synonyms source collection can cause large indexes.
Creating and Updating an Atlas Search Index
Creating an Atlas Search index is resource-intensive. The performance of your Atlas cluster may be impacted while the index builds.
Atlas replicates all writes on the collection. This means that for each collection with Atlas Search indexes, the writes are amplified to the amount of Atlas Search indexes defined for that collection.
In some instances, your Atlas Search index must be rebuilt. Rebuilding the Atlas Search index also consumes resources and may affect database performance. Atlas Search automatically rebuilds the index only in the event of:
- Changes to the index definition
- Atlas Search version updates that include breaking changes
- Hardware-related problems such as index corruption
Atlas Search supports no-downtime indexing, which means you can continue to read and write to your cluster while your index is being rebuilt. Atlas Search keeps your old index up-to-date while the new index is being built. Once Atlas Search rebuilds the index, the old index is automatically replaced without any further action from your side.
Eventual Consistency and Indexing Latency
Atlas Search supports eventual consistency and does not provide any stronger
consistency guarantees. This means that data inserted into a MongoDB
collection and indexed by Atlas Search will not be available immediately for
$search
queries.
Atlas Search reads data from MongoDB change streams and indexes that data in an asynchronous process. This process is typically very fast, but may sometimes be impacted by replication latency, system resource availability, and index definition complexity.
Document Mapping Explosions
Mapping explosions
occur when Atlas Search indexes a document with arbitrary
keys and you have a dynamic mapping.
The mongot
process might consume increasing amounts of memory and
could crash. If you add too many fields to an index, mapping explosions
can occur. To address this issue, you can upgrade your cluster or use a
static mapping that does not index all
fields in your data.
When searching over fields using a wildcard path, design your search to use a tuple-like schema. If you perform a wildcard path search that uses a key-value schema, Atlas Search indexes each key as its own field, which can cause mapping explosions.
An example of a key-value schema is as follows:
ruleBuilder: { ruleName1: <data>, ruleName2: <data>, ..... ruleName1025: <data> }
An example of the same data restructured to use a tuple-like schema is as follows:
{ ruleBuilder: [ {name: ruleName1, data: <data>}, {name: ruleName2, data: <data>}, ... {name: ruleName1025, data: <data>} ] }
Storing Source Fields
The Atlas Search index storedSource
option and $search
returnStoredSource option are in preview, but can be
used in production applications. If there are any syntax or behavior
changes between the preview stage and general availability (GA), we
will proactively communicate before introducing any breaking changes.
The MongoDB Cloud Support team will help troubleshoot any issues
related to using this feature as part of your contract.
You can configure fields to
store on Atlas Search and improve performance of subsequent aggregation
pipeline stages like $sort
, $match
,
$group
, and $skip
. Use this optimization if
your original documents and matched dataset are so large that a full
data lookup is inefficient. To learn more about storing specific fields
on Atlas Search and returning those stored fields only, see
Define Stored Source Fields in Your Atlas Search Index and
Return Stored Source Fields.
We recommend storing only the minimum number of fields required for
subsequent stages. If necessary, you can use $lookup
at
the end of the pipeline stage to retrieve entire documents as shown in
the Examples. Storing unnecessary fields
increases disk utilization and could negatively impact performance
during indexing and querying.
Scaling Considerations
Atlas Search Upgrade
Atlas Search is deployed on your Atlas cluster. When a new version of Atlas Search is deployed, your Atlas cluster might experience brief network failures in returning query results. To mitigate issues during deployment and minimize impact to your application, consider the following:
- Implement retry logic in your application.
- Configure Atlas maintenance windows.
To learn more about the changes in each release, see Atlas Search Changelog.
Scaling Up Indexing Performance
You can scale up your initial sync and steady state indexing for an Atlas Search index by upgrading your cluster to a higher tier with more cores. Atlas Search uses a percentage of all available cores to run both initial sync and steady state indexing and performance improves as new cores are made available by upgrading your cluster.