Docs Menu

Sample Training Dataset

On this page

  • Collections
  • sample_training.companies
  • sample_training.grades
  • sample_training.inspections
  • sample_training.posts
  • sample_training.routes
  • sample_training.trips
  • sample_training.zips

The sample_training database contains a set of realistic data used in MongoDB Private Training Offerings. This dataset is based on public available data sources such as:

These realistic datasets are used by our students to explore MongoDB's functionality across our private training labs and exercises.

To learn how to load the sample data provided by Atlas into your cluster, see Load Sample Data.

The sample_training database contains the following collections:

Collection Name
Description
Contains a list of Crunchbase Data company information.
Contains student grade information on a given class, including scores on different assessments.
Contains a list of New York City business inspections, including whether the business failed or passed the inspection.
Contains randomized US Senate speeches organized as blog posts with randomly generated comments.
Contains information of airline routes, with source and destination airports, the service airline and the type of airplane. This collection is used in labs that explore the $graphLookup aggregation stage.
Contains New York City Citibike Data trips data. This data is useful to explore the $graphLookup aggregation stage and showcase Geospatial Queries .
Contains United States general cities postal/zip code data.

This collection contains information on companies listed on Crunchbase. It has a variety of information such as the company website and/or blog websites about the company, funding rounds, and known individuals associated with the company.

This collection contains the following indexes:

Name
Index
Description
_id_
{ "_id": 1 }
Primary key index on the _id field.
{
"_id": {
"$oid": "52cdef7c4bab8bd675298291"
},
"acquisition": null,
"acquisitions": [],
"alias_list": null,
"blog_feed_url": "http://mobiance.wordpress.com/feed/",
"blog_url": "http://mobiance.wordpress.com/",
"category_code": "web",
"competitions": [],
"created_at": "Tue Feb 12 17:31:58 UTC 2008",
"crunchbase_url": "http://www.crunchbase.com/company/mobiance",
"deadpooled_day": null,
"deadpooled_month": null,
"deadpooled_url": null,
"deadpooled_year": null,
"description": null,
"email_address": "info@mobiance.com",
"external_links": [],
"founded_day": {
"$numberInt": "1"
},
"founded_month": {
"$numberInt": "10"
},
"founded_year": {
"$numberInt": "2004"
},
"funding_rounds": [],
"homepage_url": "http://www.mobiance.com",
"image": {
"attribution": null,
"available_sizes": [
[
[
{
"$numberInt": "150"
},
{
"$numberInt": "43"
}
],
"assets/images/resized/0001/1859/11859v1-max-150x150.png"
],
[
[
{
"$numberInt": "208"
},
{
"$numberInt": "60"
}
],
"assets/images/resized/0001/1859/11859v1-max-250x250.png"
],
[
[
{
"$numberInt": "208"
},
{
"$numberInt": "60"
}
],
"assets/images/resized/0001/1859/11859v1-max-450x450.png"
]
]
},
"investments": [],
"ipo": null,
"milestones": [],
"name": "Mobiance",
"number_of_employees": {
"$numberInt": "5"
},
"offices": [
{
"address1": "BC-3, Atrium Business Center,",
"address2": "Coles Road, Frazer Town,",
"city": "Bangalore",
"country_code": "IND",
"description": null,
"latitude": null,
"longitude": null,
"state_code": null,
"zip_code": "560005"
}
],
"overview": "<p>Mobiance provides the technology to track cell phones ...",
"partners": [],
"permalink": "mobiance",
"phone_number": "+91-80- 41264756",
"products": [],
"providerships": [],
"relationships": [
{
"is_past": true,
"person": {
"first_name": "Ritesh",
"last_name": "Ambastha",
"permalink": "ritesh-ambastha"
},
"title": "Product Manager"
}
],
"screenshots": [],
"tag_list": null,
"total_money_raised": "$0",
"twitter_username": null,
"updated_at": "Thu Dec 01 07:37:10 UTC 2011",
"video_embeds": []
}

This collection has randomly generated student grades. Each document contains a class_id that identifies the class and a student_id that identifies the student. All student class exam scores are stored in the scores array, which contains subdocuments with two fields representing the type of assessment and the student score for that assessment.

This collection contains the following indexes:

Name
Index
Description
_id_
{ "_id": 1 }
Primary key index on the _id field.
{
"_id": {
"$oid": "56d5f7eb604eb380b0d8d8fa"
},
"class_id": {
"$numberDouble": "173"
},
"scores": [
{
"score": {
"$numberDouble": "19.81430597438296"
},
"type": "exam"
},
{
"score": {
"$numberDouble": "16.851404299968642"
},
"type": "quiz"
},
{
"score": {
"$numberDouble": "60.108751761488186"
},
"type": "homework"
},
{
"score": {
"$numberDouble": "22.886167083915776"
},
"type": "homework"
}
],
"student_id": {
"$numberDouble": "4"
}
}

The inspections collection was taken from the NYC OpenData dataset. Each inspections document contains information about:

  • The inspected business name, sector and address,
  • Inspection id, result, date and certificate number.

This collection contains the following indexes:

Name
Index
Description
_id_
{ "_id": 1 }
Primary key index on the _id field.
{
"_id": {
"$oid": "56d61033a378eccde8a8357e"
},
"address": {
"city": "LAWRENCE",
"number": 1,
"street": "BAY BLVD",
"zip": 11559
},
"business_name": "SPRAGUE OPERATING RESOURCES LLC.",
"certificate_number": 3019422,
"date": "Mar 3 2015",
"id": "11247-2015-ENFO",
"result": "Fail",
"sector": "Fuel Oil Dealer - 814"
}

The posts collection is a set of randomly generated blog posts created using US Senate speeches as the seed for the document body field. On each document you will find:

  • Information on the blog posts like body text, author, permalink, date and title,
  • Randomly generated list of tags,
  • Randomly generated list of comment subdocuments.

This collection contains the following indexes:

Name
Index
Description
_id_
{ "_id": 1 }
Primary key index on the _id field.
{
"_id": {
"$oid": "50ab0f8bbcf1bfe2536dc3f9"
},
"author": "machine",
"body": "Amendment I\n<p>Congress shall make no law respecting ... ",
"comments": [
{
"author": "Santiago Dollins",
"body": "Lorem ipsum dolor sit amet, consectetur adipisicing...",
"email": "HvizfYVx@pKvLaagH.com"
},
{
"author": "Jaclyn Morado",
"body": "Lorem ipsum dolor sit amet, consectetur adipisicing...",
"email": "WpOUCpdD@hccdxJvT.com"
}
...
],
"date": {
"$date": {
"$numberLong": "1332804016000"
}
},
"permalink": "aRjNnLZkJkTyspAIoRGe",
"tags": [
"watchmaker",
"santa",
"xylophone",
"math",
"handsaw",
"dream",
"undershirt",
"dolphin",
"tanker",
"action"
],
"title": "Bill of Rights"
}

The routes collection data was sourced from the Open Flights data. The documents of this collection have information on airline routes between airports.

Each document contains information about:

  • Airline data in subdocument containing the name, alias, unique identifier and the IATA airline code,
  • The source and destination airports, identified their IATA airport code,
  • Route codeshare and the number of stops.

This collection contains the following indexes:

Name
Index
Description
_id_
{ "_id": 1 }
Primary key index on the _id field.
{
"_id": {
"$oid": "56e9b39b732b6122f877fa5c"
},
"airline": {
"alias": "2G",
"iata": "CRG",
"id": 1654,
"name": "Cargoitalia"
},
"airplane": "A81",
"codeshare": "",
"dst_airport": "OVB",
"src_airport": "BTK",
"stops": 0
}

The trips collection contains bike trips data from the New York City Citibike service. The documents are composed of:

  • Bicycle unique identifier,
  • Trip start and stop time and date,
  • Trip start and end stations names and geospatial location,
  • User information such as gender, year of birth and service type (Customer or Subscriber).

This collection contains the following indexes:

Name
Index
Description
_id_
{ "_id": 1 }
Primary key index on the _id field.
{
"_id": {
"$oid": "572bb8222b288919b68abf82"
},
"bikeid": 14785,
"birth year": 1977,
"end station id": 433,
"end station location": {
"coordinates": [
-73.98057249,
40.72955361
],
"type": "Point"
},
"end station name": "E 13 St & Avenue A",
"gender": 1,
"start station id": 518,
"start station location": {
"coordinates": [
-73.9734419,
40.74780373
],
"type": "Point"
},
"start station name": "E 39 St & 2 Ave",
"start time": {
"$date": {
"$numberLong": "1332804016000"
}
},
"stop time": {
"$date": {
"$numberLong": "1352114016000"
}
},
"tripduration": 812,
"usertype": "Subscriber"
}

The zips collection contains information of US cities and their area postal/zip code. Documents contain information on the city name, area zip code, city center geo coordinates (latitude and longitude), state and population.

This dataset is used to explore 2d Index creation and queries.

This collection contains the following indexes:

Name
Index
Description
_id_
{ "_id": 1 }
Primary key index on the _id field.
{
"_id": {
"$oid": "5c8eccc1caa187d17ca6ed29"
},
"city": "CLEVELAND",
"loc": {
"x": 86.559355,
"y": 33.992106
},
"pop": 2369,
"state": "AL",
"zip": "35049"
}
Give Feedback
MongoDB logo
© 2021 MongoDB, Inc.

About

  • Careers
  • Legal Notices
  • Privacy Notices
  • Security Information
  • Trust Center
© 2021 MongoDB, Inc.