Query Your Archive¶
Serverless instances are in preview and do not support this feature at this time. To learn more, see Serverless Instance Limitations.
You can run queries against your archived data.
Connection String¶
To run queries, you must first connect to your Online Archive. Your cluster connection string allows you to only query data in your Atlas cluster. To query your Online Archive, you must use one of the following:
- Connect to Online Archive - this read-only connection string allows you to read data directly from the live cluster, thereby impacting available resources for IOPS, and from your Online Archive.
- Connect to Online Archive and Cluster - this read-only connection string allows you to read data from the Online Archive only and doesn't affect resources of the cluster.
Performance Considerations¶
In general, your queries against archived data will be much slower than your queries against data on the Atlas cluster. When you query your cluster and archived data through the federated connection string:
- Blocking queries, such as sorts that consume and process all input documents to the sort operation before returning results, have performance characteristics associated with the slowest storage, the archive, being queried. The sort operations require all data from the sources being queried before returning the results.
- Streaming queries, such as finds, have performance characteristics associated with the highest performing storage, the Atlas cluster, being queried. Atlas returns the results as soon as they are available, which means returning results from the archive takes longer than returning results from the Atlas cluster.
Query Price¶
For your federated and archive-only queries, you incur costs for the following items.
Data Scan¶
During data scan, Atlas processes data from both the cluster and
the archive. Atlas runs as much of the query on the cluster as it
can to minimize the amount of data it needs to scan. For example, for a
match
query that specifies a specific value, Atlas only
retrieves documents with the specified value from the cluster.
Atlas then combines the retrieved documents with the archived data
and returns.
For blocking queries that need to access all data stored in the
underlying cluster, Atlas retrieves all data. For example, for a
sort
(with no match
), Atlas retrieves all data from the
cluster and archive to be sorted.
Data Access¶
MongoDB charges a fee for each partition that you query in the archive. If your query requires querying specific partitions, MongoDB downloads the partitions and each downloaded partition corresponds to a single access.
Data Seek¶
To find partitions based on the query and query fields, Atlas runs operations on the archive. Each such operation that Atlas runs finds up to 1000 partitions. Atlas runs the minimum number of required operations to find the partitions required to satisfy the query. For example, if your query requires 100 partitions that are covered in your query fields, Atlas runs only one operation to satisfy the query.
Data Transfer¶
Data that is transferred to the federated infrastructure incurs data transfer costs.