Sample Training Dataset¶
On this page
The sample_training
database contains a set of realistic data used in
MongoDB Private Training Offerings.
This dataset is based on public available data sources such as:
These realistic datasets are used by our students to explore MongoDB's functionality across our private training labs and exercises.
To learn how to load the sample data provided by Atlas into your cluster, see Load Sample Data.
Collections¶
The sample_training
database contains the following collections:
Collection Name | Description |
---|---|
Contains a list of Crunchbase Data company information. | |
Contains student grade information on a given class, including scores on
different assessments. | |
Contains a list of New York City business inspections, including whether
the business failed or passed the inspection. | |
Contains randomized US Senate speeches organized as blog posts with
randomly generated comments. | |
Contains information of airline routes, with source and destination
airports, the service airline and the type of airplane. This collection
is used in labs that explore the
$graphLookup
aggregation stage. | |
Contains New York City Citibike Data trips data.
This data is useful to explore the $graphLookup aggregation stage and
showcase Geospatial Queries . | |
Contains United States general cities postal/zip code data. |
sample_training.companies
¶
This collection contains information on companies listed on Crunchbase. It has a variety of information such as the company website and/or blog websites about the company, funding rounds, and known individuals associated with the company.
Indexes¶
This collection contains the following indexes:
Name | Index | Description |
---|---|---|
_id_ | { "_id": 1 } | Primary key index on the _id field. |
Sample Document¶
{ "_id": { "$oid": "52cdef7c4bab8bd675298291" }, "acquisition": null, "acquisitions": [], "alias_list": null, "blog_feed_url": "http://mobiance.wordpress.com/feed/", "blog_url": "http://mobiance.wordpress.com/", "category_code": "web", "competitions": [], "created_at": "Tue Feb 12 17:31:58 UTC 2008", "crunchbase_url": "http://www.crunchbase.com/company/mobiance", "deadpooled_day": null, "deadpooled_month": null, "deadpooled_url": null, "deadpooled_year": null, "description": null, "email_address": "info@mobiance.com", "external_links": [], "founded_day": { "$numberInt": "1" }, "founded_month": { "$numberInt": "10" }, "founded_year": { "$numberInt": "2004" }, "funding_rounds": [], "homepage_url": "http://www.mobiance.com", "image": { "attribution": null, "available_sizes": [ [ [ { "$numberInt": "150" }, { "$numberInt": "43" } ], "assets/images/resized/0001/1859/11859v1-max-150x150.png" ], [ [ { "$numberInt": "208" }, { "$numberInt": "60" } ], "assets/images/resized/0001/1859/11859v1-max-250x250.png" ], [ [ { "$numberInt": "208" }, { "$numberInt": "60" } ], "assets/images/resized/0001/1859/11859v1-max-450x450.png" ] ] }, "investments": [], "ipo": null, "milestones": [], "name": "Mobiance", "number_of_employees": { "$numberInt": "5" }, "offices": [ { "address1": "BC-3, Atrium Business Center,", "address2": "Coles Road, Frazer Town,", "city": "Bangalore", "country_code": "IND", "description": null, "latitude": null, "longitude": null, "state_code": null, "zip_code": "560005" } ], "overview": "<p>Mobiance provides the technology to track cell phones ...", "partners": [], "permalink": "mobiance", "phone_number": "+91-80- 41264756", "products": [], "providerships": [], "relationships": [ { "is_past": true, "person": { "first_name": "Ritesh", "last_name": "Ambastha", "permalink": "ritesh-ambastha" }, "title": "Product Manager" } ], "screenshots": [], "tag_list": null, "total_money_raised": "$0", "twitter_username": null, "updated_at": "Thu Dec 01 07:37:10 UTC 2011", "video_embeds": [] }
sample_training.grades
¶
This collection has randomly generated student grades.
Each document contains a class_id
that identifies the class and a
student_id
that identifies the student.
All student class exam scores are stored in the scores
array, which contains
subdocuments with two fields representing the type of assessment and the student
score for that assessment.
Indexes¶
This collection contains the following indexes:
Name | Index | Description |
---|---|---|
_id_ | { "_id": 1 } | Primary key index on the _id field. |
Sample Document¶
{ "_id": { "$oid": "56d5f7eb604eb380b0d8d8fa" }, "class_id": { "$numberDouble": "173" }, "scores": [ { "score": { "$numberDouble": "19.81430597438296" }, "type": "exam" }, { "score": { "$numberDouble": "16.851404299968642" }, "type": "quiz" }, { "score": { "$numberDouble": "60.108751761488186" }, "type": "homework" }, { "score": { "$numberDouble": "22.886167083915776" }, "type": "homework" } ], "student_id": { "$numberDouble": "4" } }
sample_training.inspections
¶
The inspections
collection was taken from the NYC OpenData dataset.
Each inspections
document contains information about:
- The inspected business name, sector and address,
- Inspection id, result, date and certificate number.
Indexes¶
This collection contains the following indexes:
Name | Index | Description |
---|---|---|
_id_ | { "_id": 1 } | Primary key index on the _id field. |
Sample Document¶
{ "_id": { "$oid": "56d61033a378eccde8a8357e" }, "address": { "city": "LAWRENCE", "number": 1, "street": "BAY BLVD", "zip": 11559 }, "business_name": "SPRAGUE OPERATING RESOURCES LLC.", "certificate_number": 3019422, "date": "Mar 3 2015", "id": "11247-2015-ENFO", "result": "Fail", "sector": "Fuel Oil Dealer - 814" }
sample_training.posts
¶
The posts
collection is a set of randomly generated blog posts created
using US Senate speeches as the seed for the document body field.
On each document you will find:
- Information on the blog posts like body text, author, permalink, date and title,
- Randomly generated list of tags,
- Randomly generated list of comment subdocuments.
Indexes¶
This collection contains the following indexes:
Name | Index | Description |
---|---|---|
_id_ | { "_id": 1 } | Primary key index on the _id field. |
Sample Document¶
{ "_id": { "$oid": "50ab0f8bbcf1bfe2536dc3f9" }, "author": "machine", "body": "Amendment I\n<p>Congress shall make no law respecting ... ", "comments": [ { "author": "Santiago Dollins", "body": "Lorem ipsum dolor sit amet, consectetur adipisicing...", "email": "HvizfYVx@pKvLaagH.com" }, { "author": "Jaclyn Morado", "body": "Lorem ipsum dolor sit amet, consectetur adipisicing...", "email": "WpOUCpdD@hccdxJvT.com" } ... ], "date": { "$date": { "$numberLong": "1332804016000" } }, "permalink": "aRjNnLZkJkTyspAIoRGe", "tags": [ "watchmaker", "santa", "xylophone", "math", "handsaw", "dream", "undershirt", "dolphin", "tanker", "action" ], "title": "Bill of Rights" }
sample_training.routes
¶
The routes
collection data was sourced from the Open Flights data.
The documents of this collection have information on airline routes between
airports.
Each document contains information about:
- Airline data in subdocument containing the name, alias, unique identifier and the IATA airline code,
- The source and destination airports, identified their IATA airport code,
- Route codeshare and the number of stops.
Indexes¶
This collection contains the following indexes:
Name | Index | Description |
---|---|---|
_id_ | { "_id": 1 } | Primary key index on the _id field. |
Sample Document¶
{ "_id": { "$oid": "56e9b39b732b6122f877fa5c" }, "airline": { "alias": "2G", "iata": "CRG", "id": 1654, "name": "Cargoitalia" }, "airplane": "A81", "codeshare": "", "dst_airport": "OVB", "src_airport": "BTK", "stops": 0 }
sample_training.trips
¶
The trips
collection contains bike trips data from the New York City Citibike
service.
The documents are composed of:
- Bicycle unique identifier,
- Trip start and stop time and date,
- Trip start and end stations names and geospatial location,
- User information such as gender, year of birth and service type (Customer or Subscriber).
Indexes¶
This collection contains the following indexes:
Name | Index | Description |
---|---|---|
_id_ | { "_id": 1 } | Primary key index on the _id field. |
Sample Document¶
{ "_id": { "$oid": "572bb8222b288919b68abf82" }, "bikeid": 14785, "birth year": 1977, "end station id": 433, "end station location": { "coordinates": [ -73.98057249, 40.72955361 ], "type": "Point" }, "end station name": "E 13 St & Avenue A", "gender": 1, "start station id": 518, "start station location": { "coordinates": [ -73.9734419, 40.74780373 ], "type": "Point" }, "start station name": "E 39 St & 2 Ave", "start time": { "$date": { "$numberLong": "1332804016000" } }, "stop time": { "$date": { "$numberLong": "1352114016000" } }, "tripduration": 812, "usertype": "Subscriber" }
sample_training.zips
¶
The zips
collection contains information of US cities and their area
postal/zip code.
Documents contain information on the city name, area zip code, city center
geo coordinates (latitude and longitude), state and population.
This dataset is used to explore 2d Index creation and queries.
Indexes¶
This collection contains the following indexes:
Name | Index | Description |
---|---|---|
_id_ | { "_id": 1 } | Primary key index on the _id field. |
Sample Document¶
{ "_id": { "$oid": "5c8eccc1caa187d17ca6ed29" }, "city": "CLEVELAND", "loc": { "x": 86.559355, "y": 33.992106 }, "pop": 2369, "state": "AL", "zip": "35049" }