M101: MongoDB Basics

MongoDB M101 Course Notes

Terms

  • MQL: MongoDB Query Language
  • Atlas: MongoDB official clound database hosting service

Features

  • Implicit creation of db, collection, and field
  • Auto-generated unique ObjectId: _id

Connecting to a DB

via shell

mongo "mongodb+srv://cluster0.abcd01.mongodb.net/myFirstDatabase" --username m001-student

via golang


import "go.mongodb.org/mongo-driver/mongo"

ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
client, err := mongo.Connect(ctx, options.Client().ApplyURI(
	"mongodb+srv://m001-student:<password>@cluster0.abcde67.mongodb.net/myFirstDatabase?retryWrites=true&w=majority",
))
if err != nil { log.Fatal(err) }

via Compass GUI
mongodb+srv://m001-student:<password>@cluster0.fabc67.mongodb.net/test

Useful commands

create a new database DB
use DB

List all dbs
show dbs

Create or switch to a db
use databaseName

Show collection
show collections

Show status
db.stats()

Delete database
db.dropDatabase()

Find a random document
db.collection.findOne()

Count
db.collectionName.find().count()

Importing, Exporting, and Querying Data

Export

Import

mongodump --uri "mongodb+srv://<your username>:<your password>@<your cluster>.mongodb.net/sample_supplies"

mongoexport --uri="mongodb+srv://<your username>:<your password>@<your cluster>.mongodb.net/sample_supplies" --collection=sales --out=sales.json

mongorestore --uri "mongodb+srv://<your username>:<your password>@<your cluster>.mongodb.net/sample_supplies"  --drop dump

mongoimport --uri="mongodb+srv://<your username>:<your password>@<your cluster>.mongodb.net/sample_supplies" --drop sales.json

Import a JSON Array

the sites.json is like:

[{
  "name": "京东",
  "url": "https://jd.com/"
},{
  "name": "淘宝",
  "url": "https://www.taobao.com/"
}]
mongoimport --db=db1 --collection=sites --file=./sites.json --jsonArray

Query

What is a namespace?
data_base_name.collection_name

Using Filter in MongoDB Compass We looked at the sample_training.zips collection and issued the following queries:

{"state": "NY"}
{"state": "NY", "city": "ALBANY"}
show dbs

use sample_training

show collections

db.zips.find({"state": "NY"})

it iterates through the cursor.

db.zips.find({"state": "NY"}).count()

db.zips.find({"state": "NY", "city": "ALBANY"})

db.zips.find({"state": "NY", "city": "ALBANY"}).pretty()

it iterates through a cursor

A curosr is a pointer to a result set of a query.

A pointer is a direct address of memory location.

ZIP code == postal code

Creating and Manipulating Documents

ObjectId() is the default value for the “_id” field unless otherwise specified.

Insert
db.test.insert({})

Schema validation

// Insert three test documents:
db.inspections.insert([ { "test": 1 }, { "test": 2 }, { "test": 3 } ])

// Insert three test documents but specify the _id values:
db.inspections.insert([{ "_id": 1, "test": 1 },{ "_id": 1, "test": 2 },
                       { "_id": 3, "test": 3 }])

// Insert multiple documents specifying the _id values, and using the "ordered": false option.
db.inspections.insert([{ "_id": 1, "test": 1 },{ "_id": 1, "test": 2 },
                       { "_id": 3, "test": 3 }],{ "ordered": false })

Updating

updateOne()
updateMany()

findOne()
find()
// Update all documents in the zips collection where the city field is equal to "HUDSON" by adding 10 to the current value of the "pop" field.
db.zips.updateMany({ "city": "HUDSON" }, { "$inc": { "pop": 10 } })

//Update a single document in the zips collection where the zip field is equal to "12534" by setting the value of the "pop" field to 17630.
db.zips.updateOne({ "zip": "12534" }, { "$set": { "pop": 17630 } })
// if "$set" a non-exist field, it will be created automatically, act like "$push".


//Find all documents in the grades collection where the student_id field is 151 , and the class_id field is 339.
db.grades.find({ "student_id": 151, "class_id": 339 }).pretty()

// Update one document in the grades collection where the student_id is ``250`` *, and the class_id field is 339 , by adding a document element to the "scores" array.
db.grades.updateOne({ "student_id": 250, "class_id": 339 },
                    { "$push": { "scores": { "type": "extra credit",
                                             "score": 100 }
                                }
                     })

Deleting

deleteOne("_id":11)
updateOne("_id":11)
findOne("_id":11)

deleteMany()
updateMany()
find()
db.inspections.find({ "test": 1 }).pretty()
db.inspections.deleteMany({ "test": 1 })
db.inspections.deleteOne({ "test": 3 })

db.inspection.drop()

Advance CRUD Operations

Comparison Operators


db.trips.find({ "tripduration": { "$lte" : 70 },
                "usertype": { "$ne": "Subscriber" } }).pretty()

db.trips.find({ "tripduration": { "$lte" : 70 },
                "usertype": { "$eq": "Customer" }}).pretty()

db.trips.find({ "tripduration": { "$lte" : 70 },
                "usertype": "Customer" }).pretty()

What is the difference between the number of people born in 1998 and the number of people born after 1998 in the sample_training.trips collection?

db.zips.find({"pop": {"$lt": 1000}}).count()				

What is the difference between the number of people born in 1998 and the number of people born after 1998 in the sample_training.trips collection?

MongoDB Enterprise atlas-tcmtj0-shard-0:PRIMARY> db.trips.find({"birth year": 1998}).count()
MongoDB Enterprise atlas-tcmtj0-shard-0:PRIMARY> db.trips.find({"birth year": {"$gt":1998}}).count()

Using the sample_training.routes collection find out which of the following statements will return all routes that have at least one stop in them?

MongoDB Enterprise atlas-tcmtj0-shard-0:PRIMARY> db.routes.find({"stops": {"$gt": 0}}).count()

Logic Operators

$and	Match all of the specified query clauses
$or		At least one of the query clauses is matched
$nor	Fail to match both given clauses
$not	Negates the query requirement
$and	is used as the default operator when an operator is not specified
$or		{<operator> : [{statement1} , {statement2}, ... ]}
$nor	
$not	{$not : {statement}}
db.routes.find({ "$and": [ { "$or" :[ { "dst_airport": "KZN" },
                                    { "src_airport": "KZN" }
                                  ] },
                          { "$or" :[ { "airplane": "CR2" },
                                     { "airplane": "A81" } ] }
                         ]}).pretty()

How many businesses in the sample_training.inspections dataset have the inspection result “Out of Business” and belong to the “Home Improvement Contractor - 100” sector?

MongoDB Enterprise atlas-tcmtj0-shard-0:PRIMARY> db.inspections.find({"$and": [{"sector":"Home Improvement Contractor - 100"}, {"result": "Out of Business"}]}).count()

Which is the most succinct query to return all documents from the sample_training.inspections collection where the inspection date is either “Feb 20 2015”, or “Feb 21 2015” and the company is not part of the “Cigarette Retail Dealer - 127” sector?

db.inspections.find(
  { "$or": [ { "date": "Feb 20 2015" },
             { "date": "Feb 21 2015" } ],
    "sector": { "$ne": "Cigarette Retail Dealer - 127" }}).pretty()

Find a range between 5,000 and 1,000,000

db.zips.find({"pop": {"$gt":5000, "$lt": 1000000}}).count()
How many companies in the sample_training.companies dataset were

either founded in 2004

    [and] either have the social category_code [or] web category_code,
[or] were founded in the month of October

    [and] also either have the social category_code [or] web category_code?
b.companies.find({ "$and" : [{"$or" :[ {"founded_year":2004 } , {"founded_month": 10} ]}, {"$or": [ {"category_code": "web"}, {"category_code": "social"}]} ] } ).count()

Expressive Query Operator

$expr allow the use of aggregation expressions within the query language

{$expr: { <expression> }}

$expr allows us to use variables and conditional statements

Can we compare fields within the same document to each other?

The meaning of $ sign

$ denotes the use of an oprator   (eg: "$gt", "$eq")
$ addresses the field value (eg:"$start station name")

Example

// start station equal to end station
{"$expr" : { "$eq" : ["$start station id", "$end station id"] }}
{"$expr" : { "$eq" : ["$start station name", "$end station name"] }}

Example: find all the companies that have more employees than the year in which they were founded

Example

use sample_training

// Find all documents where the trip started and ended at the same station:
db.trips.find({ "$expr": { "$eq": [ "$end station id", "$start station id"] }
              }).count()

// Find all documents where the trip lasted longer than 1200 seconds, and started and ended at the same station:
db.trips.find({ "$expr": { "$and": [ { "$gt": [ "$tripduration", 1200 ]},
                         { "$eq": [ "$end station id", "$start station id" ]}
                       ]}}).count()

Which of the following statements will find all the companies that have more employees than the year in which they were founded?  公司人数比时间大。

db.companies.find(
    { "$expr": { "$lt": [ "$founded_year", "$number_of_employees" ] } }
  ).count()

db.companies.find(
    { "$expr": { "$gt": [ "$number_of_employees", "$founded_year" ]} }
  ).count()

How many companies in the sample_training.companies collection have the same permalink as their twitter_username?

db.companies.find({"$expr": { "$eq": ["$permalink", "$twitter_username"] }}).count()

Array Operators(query)

$push
Allows us to add an element to an array.

$push
Turns a field into an array field if it was previously a different type.

Find all documents with exactly 20 amenities which include all the amenities listed in the query array:

db.listingsAndReviews.find({ "amenities": {
                                  "$size": 20,
                                  "$all": [ "Internet", "Wifi",  "Kitchen",
                                           "Heating", "Family/kid friendly",
                                           "Washer", "Dryer", "Essentials",
                                           "Shampoo", "Hangers",
                                           "Hair dryer", "Iron",
                                           "Laptop friendly workspace" ]
                                         }
                            }).pretty()

What is the name of the listing in the sample_airbnb.listingsAndReviews dataset that accommodates more than 6 people and has exactly 50 reviews?

db.listingsAndReviews.find({ "reviews": { "$size":50 },                              "accommodates": { "$gt":6 }})

Using the sample_airbnb.listingsAndReviews collection find out how many documents have the “property_type” “House”, and include “Changing table” as one of the “amenities”?

“amenities” is an array field.


db.listingsAndReviews.find({"property_type": "House", "amenities": {"$all": ["Changing table"]}})

Which of the following queries will return all listings that have “Free parking on premises”, “Air conditioning”, and “Wifi” as part of their amenities, and have at least 2 bedrooms in the sample_airbnb.listingsAndReviews collection?

db.listingsAndReviews.find(
  { "amenities":
      { "$all": [ "Free parking on premises", "Wifi", "Air
        conditioning" ] }, "bedrooms": { "$gte":  2 } } ).pretty()

Array Operators and Projection

Projection and $elemMatch

Switch to this database:

use sample_airbnb

Find all documents with exactly 20 amenities which include all the amenities listed in the query array, and display their price and address:

db.listingsAndReviews.find({ "amenities":
        { "$size": 20, "$all": [ "Internet", "Wifi",  "Kitchen", "Heating",
                                 "Family/kid friendly", "Washer", "Dryer",
                                 "Essentials", "Shampoo", "Hangers",
                                 "Hair dryer", "Iron",
                                 "Laptop friendly workspace" ] } },
                            {"price": 1, "address": 1}).pretty()

Find all documents that have Wifi as one of the amenities only include price and address in the resulting cursor:

db.listingsAndReviews.find({ "amenities": "Wifi" },
                           { "price": 1, "address": 1, "_id": 0 }).pretty()

Find all documents that have Wifi as one of the amenities only include price and address in the resulting cursor, also exclude "maximum_nights". *This will be an error:

db.listingsAndReviews.find({ "amenities": "Wifi" },
                           { "price": 1, "address": 1,
                             "_id": 0, "maximum_nights":0 }).pretty()

Switch to this database:

use sample_training

Get one document from the collection:

db.grades.findOne()

Find all documents where the student in class 431 received a grade higher than 85 for any type of assignment:

db.grades.find({ "class_id": 431 },
               { "scores": { "$elemMatch": { "score": { "$gt": 85 } } }
             }).pretty()

Find all documents where the student had an extra credit score:

db.grades.find({ "scores": { "$elemMatch": { "type": "extra credit" } }
               }).pretty()

How many companies in the sample_training.companies collection have offices in the city of Seattle?

db.companies.find({"offices": { "$elemMatch" : { "city": "Seattle" }}}).count()

Which of the following queries will return only the names of companies from the sample_training.companies collection that had exactly 8 funding rounds?

db.companies.find({ "funding_rounds": { "$size": 8 } },
                  { "name": 1, "_id": 0 })

Array Operators and Sub-Documents(dot-notation)

use sample_training

db.trips.findOne({ "start station location.type": "Point" })



db.companies.find({ "relationships.0.person.last_name": "Zuckerberg" },
                  { "name": 1 }).pretty()

db.companies.find({ "relationships.0.person.first_name": "Mark",
                    "relationships.0.title": { "$regex": "CEO" } },
                  { "name": 1 }).count()


db.companies.find({ "relationships.0.person.first_name": "Mark",
                    "relationships.0.title": {"$regex": "CEO" } },
                  { "name": 1 }).pretty()



db.companies.find({ "relationships":
                      { "$elemMatch": { "is_past": true,
                                        "person.first_name": "Mark" } } },
                  { "name": 1 }).pretty()

db.companies.find({ "relationships":
                      { "$elemMatch": { "is_past": true,
                                        "person.first_name": "Mark" } } },
                  { "name": 1 }).count()

Longitude decreases in value as you move west.

How many trips in the sample_training.trips collection started at stations that are to the west of the -74 longitude coordinate?


db.trips.find({ "start station location.coordinates.0": { "$lt": -74 }}).count()

The “start station location” has a sub-document that contains the coordinates array. To get to this coordinates array we must use use dot-notation. We can issue a range query to find all documents in this longitude. The caveat is to remember that all trips take place in NYC so the latitude value in the coordinates array will always be positive, and we don’t have to worry about it when issuing a range query like this.

How many inspections from the sample_training.inspections collection were conducted in the city of NEW YORK?

db.inspections.find({"address.city": "NEW YORK"}).count()

Which of the following queries will return the names and addresses of all listings from the sample_airbnb.listingsAndReviews collection where the first amenity in the list is “Internet”?

db.listingsAndReviews.find({ "amenities.0": "Internet" },
                           { "name": 1, "address": 1 }).pretty()

Indexing and Aggregation Pipeline

Aggregation Framework

Aggregation Course!

In its simplest form, another way to query data in MongoDB.

Switch to this database:

use sample_airbnb

Find all documents that have Wifi as one of the amenities. Only include price and address in the resulting cursor.

db.listingsAndReviews.find({ "amenities": "Wifi" },
                           { "price": 1, "address": 1, "_id": 0 }).pretty()

Using the aggregation framework find all documents that have Wifi as one of the amenities*. Only include* price and address in the resulting cursor.

db.listingsAndReviews.aggregate([
                                  { "$match": { "amenities": "Wifi" } },
                                  { "$project": { "price": 1,
                                                  "address": 1,
                                                  "_id": 0 }}]).pretty()

Find one document in the collection and only include the address field in the resulting cursor.

db.listingsAndReviews.findOne({ },{ "address": 1, "_id": 0 })

Project only the address field value for each document, then group all documents into one document per address.country value.

db.listingsAndReviews.aggregate([ { "$project": { "address": 1, "_id": 0 }}, { "$group": { "_id": "$address.country" }}])

COPY Project only the address field value for each document, then group all documents into one document per address.country value, and count one for each document in each group.

db.listingsAndReviews.aggregate([
                                  { "$project": { "address": 1, "_id": 0 }},
                                  { "$group": { "_id": "$address.country",
                                                "count": { "$sum": 1 } } }
                                ])

We can also calculate using aggregation.
aggregate-syntax

aggregate

MongoDB Enterprise > db.listingsAndReviews.aggregate([ { "$project": { "room_type": 1, "_id": 0 }},
...                                   { "$group": { "_id": "$room_type" }}])
{ "_id" : "Private room" }
{ "_id" : "Entire home/apt" }
{ "_id" : "Shared room" }

What are the differences between using aggregate() and find()?

  • aggregate() can do what find() can and more.
  • aggregate() allows us to compute and reshape data in the cursor.

sort() and limit()


use sample_training

// sort by "pop"
db.zips.find().sort({ "pop": 1 }).limit(1)

db.zips.find({ "pop": 0 }).count()

// find the most populated zipcode
db.zips.find().sort({ "pop": -1 }).limit(1)

db.zips.find().sort({ "pop": -1 }).limit(10)


// 1: increasing , -1: decreasing
db.zips.find().sort({ "pop": 1, "city": -1 })

cursor.limit().sort() means cursor.sort().limit()

Which of the following commands will return the name and founding year for the 5 oldest companies in the sample_training.companies collection?

// These two queries are the same
db.companies.find({ "founded_year": { "$ne": null }},
                  { "name": 1, "founded_year": 1 }
                 ).limit(5).sort({ "founded_year": 1 })


db.companies.find({ "founded_year": { "$ne": null }},
                  { "name": 1, "founded_year": 1 }
                 ).sort({ "founded_year": 1 }).limit(5)

In what year was the youngest bike rider from the sample_training.trips collection born?

db.trips.find({"birth year": {"$ne":""}}, {"birth year": 1}).sort({"birth year": 1}).limit(1).pretty()

Introduction to Indexes

Both finding "birth year": 1989 or sorting counld benifited from an index.

use sample_training

// Single field index is ok
db.trips.find({ "birth year": 1989 })

// need compound index
db.trips.find({ "start station id": 476 }).sort( { "birth year": 1 } )

// Single field index. 
db.trips.createIndex({ "birth year": 1 })

// not perfect for
db.trips.find({ "start station id": 476 }).sort( { "birth year": 1 } )
// then we create a compound index
db.trips.createIndex({ "start station id": 1, "birth year": 1 })

Avoid sorting!

Jameela often queries the sample_training.routes collection by the src_airport field like this:

db.routes.find({ "src_airport": "MUC" }).pretty()

Which command will create an index that will support this query?

db.routes.createIndex({ "src_airport": -1 })

Introduction to Data Modeling

Data modeling is a way to organize fields in a document to support your application performance and querying capabilities.

Rule: Data is stored in the way that it is used.

Optimized for fast and easy way to retrieval.

Data that is used together should be stored together
Evolving application implies an evolving data model

Data Modeling Course

What is data modeling?
a way to organize fields in a document to support your application performance and querying capabilities

Upsert - Update or Insert?

Everything in MQL that is used to locate a document in a collection can also be used to modify this document.

db.collection.updateOne({<query to locate>}, {<update>})

Upsert is a hybrid of udpate and insert, it should only be used when it is needed.

db.collection.updateOne({<query>}, {<update>}, {"upsert": true})

by default upsert is false. if upsert is true:

  • is there a match?(yes), update the matched document
  • is there a match?(no), insert a new document
db.iot.updateOne({ "sensor": r.sensor, "date": r.date,
                   "valcount": { "$lt": 48 } },
                         { "$push": { "readings": { "v": r.value, "t": r.time } },
                        "$inc": { "valcount": 1, "total": r.value } },
                 { "upsert": true })

update and insert

db.iot.updateOne({ "sensor": r.sensor, "date": r.date,
                   "valcount": { "$lt": 48 } },
                         { "$push": { "readings": { "v": r.value, "t": r.time } },
                        "$inc": { "valcount": 1, "total": r.value } },
                 { "upsert": true })

How does the upsert option work?

  • When upsert is set to true and the query predicate returns an empty cursor, the update operation creates a new document using the directive from the query predicate and the update predicate.
  • When upsert is set to false and the query predicate returns an empty cursor then there will be no updated documents as a result of this operation.
  • By default upsert is set to false.

Next Steps

Atlas Features - Data Explorer Features

  • What actions are available to you via the Aggregation Builder in the Atlas Data Explorer? Export pipeline to a programming language.

  • Syntax for each selected aggregation stage.

  • A preview of the data in the pipeline at each selected stage.

Atlas Products and Options

MongoDB Compass

References