Configure Online Archive¶
On this page
Serverless instances are in preview and do not support this feature at this time. To learn more, see Serverless Instance Limitations.
Overview¶
You can configure data in a collection to be archived by specifying an archiving rule. The archiving rule for a:
- Time series collection is a combination of a time that is used to determine when to archive data and a numeric value representing the number of days that the Atlas cluster stores the data.
Standard collection can be one of the following:
- A combination of a date that is used to determine when to archive data and a numeric value representing the number of days that the Atlas cluster stores the data.
- A custom query that is used to select the documents to archive.
To configure your Atlas cluster for online archive:
- Create an archiving rule by providing the collection namespace and the criteria for selecting data to archive in the collection.
- (Optional) Specify commonly queried fields to partition archived data.
Configure Online Archive Through the User Interface¶
To configure an Online Archive, in your Atlas UI:
Navigate to the Database Deployments page for your project.¶
- If it is not already displayed, select the organization that contains your desired project from the Organizations menu in the navigation bar.
- If it is not already displayed, select your desired project from the Projects menu in the navigation bar.
- If the Database Deployments page is not already displayed, click Databases in the sidebar.
Navigate to the Online Archive tab for your cluster.¶
- Click the name of the cluster.
- Click the Online Archive tab to view the list of online archives, if any, for the cluster.
Start configuring online archive for your collection.¶
To configure an online archive for your collection, click:
- Configure Online Archive button the first time.
- Add Archive button subsequently.
Review the Online Archive Overview and click Next to proceed.¶
Create an Archiving Rule by providing the following information.¶
Specify the collection namespace, which includes the database name, the dot (
.
) separator, and the collection name (that is,<database>.<collection>
), in the Namespace field.You can't modify the namespace once the online archive is created.
Specify the criteria for selecting documents to archive for the type of collection you want to archive.
NoteAtlas runs an index sufficiency query to determine the efficiency of the archival process. If the number of documents scanned to the number of documents returned is 10 or more, the query result triggers an
Index Sufficiency Warning
. This warning indicates that you have insufficient indexes for an efficient archival process. For date-based archives, you must index the date field. For custom criteria that use an expression, Atlas might first convert a value before it evaluates it against the query.
Click Next to specify the most commonly queried fields.¶
(Optional) Specify the two most frequently queried fields in your collection to create partitions in your online archive.¶
Enter up to two most commonly queried fields from the collection in
the Second most commonly queried field and
Third most commonly queried field fields respectively. To
specify nested fields, use the dot notation. Do not include quotes (""
)
around nested fields that you specify using dot notation.
The specified fields are used to partition your archived data. Partitions are similar to folders. The date field is in the first position of the partition by default. You can move another field to the first position of the partition if you frequently query by that field.
The order of fields listed in the path is important in the same way as it is in Compound Indexes. Data in the specified path is partitioned first by the value of the first field, and then by the value of the next field, and so on. Atlas supports queries on the specified fields using the partitions.
For example, suppose you are configuring the online archive for the
movies
collection in the sample_mflix
database. If your
archived field is the released
date field, which you moved to the
third position, your first queried field is title
, and your
second queried field is plot
, your partition will look similar to
the following:
/title/plot/released
Atlas creates partitions first for the title
field, followed
by the plot
field, and then the released
field. Atlas
uses the partitions for queries on the following fields:
- the
title
field, - the
title
field and theplot
field, - the
title
field and theplot
field and thereleased
field.
Atlas can also use the partitions to support a query on the
title
and released
fields. However, in this case, Atlas
would not be as efficient in supporting the query as it would be if
the query were on the title
and plot
fields only. Partitions
are parsed in order; if a query omits a particular partition,
Atlas is less efficient in making use of any partitions that
follow that. Since a query on title
and released
omits
plot
, Atlas uses the title
partition more efficiently
than the released
partition to support this query.
Atlas can't use the partitions to support queries on fields not
specified here. Also, Atlas can't use the partitions to support
queries that include the following fields without the title
field:
- the
plot
field, - the
released
field, or - the
plot
andreleased
fields.
The value of a partition field can be up to a maximum of 700 characters. Documents with values exceeding 700 characters are not archived.
- Choose fields that do not contain polymorphic data. Atlas determines the data type of a partition field by sampling 10 documents from the collection. Atlas will not archive a document if the specified field value in a document does not match values in other documents in the same collection.
- Choose query fields that do not have a large number of possible
values unless you always use those fields in your queries. Query
fields, such as
_id
, with possibly large number of values can cause operations such ascount
to open all partitions resulting in high latency. - Choose fields that you query frequently and order them from the most frequently queried in the first position to the least queried field in the last position. For example, if you frequently query on the date field, then leave the date field in the first position. But if you frequently query on another field, then that field should be in the first position.
For fields of type string
with high cardinality, Atlas
creates a large number of partitions. MongoDB doesn't recommend
string
type fields with high cardinality as a query field.
Atlas supports the following partition attribute types:
date
int
long
objectId
string
uuid
NotePartition fields of type UUID must be of binary subtype 4. Atlas skips partition fields of type UUID with subtype 3.
To learn more about the supported partition attribute types, see Partition Attribute Types.
While partitions improve query performance, queries that don't contain these fields require a full collection scan of all archived documents, which will take longer and increase your costs. To learn more about how partitions improve your query performance in Data Lake, see Data Structure in S3.
Click Next to verify and confirm the online archive settings.¶
Copy and run the displayed query in your mongosh
shell to see the documents that match the criteria in the rule you defined in step 5.¶
You can run explain on the query to check whether it uses an index. Proceed to the next step to create the index if the fields are not indexed. If the fields are already indexed, skip to step 11.
Verify and confirm your archiving rule.¶
- Click Begin Archiving in the Confirm an online archive tab.
- Click Confirm in the Begin Archiving window.
Once your document is queued for archiving, you can no longer edit the document. See Restore Archived Data to move archived data back into the live Atlas cluster.
Configure Online Archive Through the API¶
To configure an online archive from the API, send a POST
request to
the onlineArchives endpoint. If the cluster
already has an Active
online archive with the same archiving rule
for the same database and collection, the operation will fail. However,
if the existing online archive is in Paused
or Deleted
state,
the new online archive is created and its status is set to Active
.
To learn more about the API syntax and options, see
Create an Online Archive.
Limitations¶
You can create up to 50 online archives per cluster and up to 20 can be active per cluster. The following limitations apply:
- You can configure multiple online archives in the same namespace, but only one can be active at any given time.
- You cannot create multiple online archives on the same fields in the same collection.