Terms
- MQL: MongoDB Query Language
- Atlas: MongoDB official clound database hosting service
Features
- Implicit creation of db, collection, and field
- Auto-generated unique ObjectId:
_id
Connecting to a DB
via shell
mongo "mongodb+srv://cluster0.abcd01.mongodb.net/myFirstDatabase" --username m001-student
via golang
import "go.mongodb.org/mongo-driver/mongo"
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
client, err := mongo.Connect(ctx, options.Client().ApplyURI(
"mongodb+srv://m001-student:<password>@cluster0.abcde67.mongodb.net/myFirstDatabase?retryWrites=true&w=majority",
))
if err != nil { log.Fatal(err) }
via Compass GUI
mongodb+srv://m001-student:<password>@cluster0.fabc67.mongodb.net/test
Useful commands
create a new database DB
use DB
List all dbs
show dbs
Create or switch to a db
use databaseName
Show collection
show collections
Show status
db.stats()
Delete database
db.dropDatabase()
Find a random document
db.collection.findOne()
Count
db.collectionName.find().count()
Importing, Exporting, and Querying Data
Export
Import
mongodump --uri "mongodb+srv://<your username>:<your password>@<your cluster>.mongodb.net/sample_supplies"
mongoexport --uri="mongodb+srv://<your username>:<your password>@<your cluster>.mongodb.net/sample_supplies" --collection=sales --out=sales.json
mongorestore --uri "mongodb+srv://<your username>:<your password>@<your cluster>.mongodb.net/sample_supplies" --drop dump
mongoimport --uri="mongodb+srv://<your username>:<your password>@<your cluster>.mongodb.net/sample_supplies" --drop sales.json
Import a JSON Array
the sites.json
is like:
[{
"name": "京东",
"url": "https://jd.com/"
},{
"name": "淘宝",
"url": "https://www.taobao.com/"
}]
mongoimport --db=db1 --collection=sites --file=./sites.json --jsonArray
Query
What is a namespace?
data_base_name.collection_name
Using Filter in MongoDB Compass We looked at the sample_training.zips collection and issued the following queries:
{"state": "NY"}
{"state": "NY", "city": "ALBANY"}
show dbs
use sample_training
show collections
db.zips.find({"state": "NY"})
it
iterates through the cursor.
db.zips.find({"state": "NY"}).count()
db.zips.find({"state": "NY", "city": "ALBANY"})
db.zips.find({"state": "NY", "city": "ALBANY"}).pretty()
it
iterates through a cursor
A curosr
is a pointer to a result set of a query.
A pointer
is a direct address of memory location.
ZIP code == postal code
Creating and Manipulating Documents
ObjectId()
is the default value for the “_id” field unless otherwise specified.
Insert
db.test.insert({})
Schema validation
// Insert three test documents:
db.inspections.insert([ { "test": 1 }, { "test": 2 }, { "test": 3 } ])
// Insert three test documents but specify the _id values:
db.inspections.insert([{ "_id": 1, "test": 1 },{ "_id": 1, "test": 2 },
{ "_id": 3, "test": 3 }])
// Insert multiple documents specifying the _id values, and using the "ordered": false option.
db.inspections.insert([{ "_id": 1, "test": 1 },{ "_id": 1, "test": 2 },
{ "_id": 3, "test": 3 }],{ "ordered": false })
Updating
updateOne()
updateMany()
findOne()
find()
// Update all documents in the zips collection where the city field is equal to "HUDSON" by adding 10 to the current value of the "pop" field.
db.zips.updateMany({ "city": "HUDSON" }, { "$inc": { "pop": 10 } })
//Update a single document in the zips collection where the zip field is equal to "12534" by setting the value of the "pop" field to 17630.
db.zips.updateOne({ "zip": "12534" }, { "$set": { "pop": 17630 } })
// if "$set" a non-exist field, it will be created automatically, act like "$push".
//Find all documents in the grades collection where the student_id field is 151 , and the class_id field is 339.
db.grades.find({ "student_id": 151, "class_id": 339 }).pretty()
// Update one document in the grades collection where the student_id is ``250`` *, and the class_id field is 339 , by adding a document element to the "scores" array.
db.grades.updateOne({ "student_id": 250, "class_id": 339 },
{ "$push": { "scores": { "type": "extra credit",
"score": 100 }
}
})
Deleting
deleteOne("_id":11)
updateOne("_id":11)
findOne("_id":11)
deleteMany()
updateMany()
find()
db.inspections.find({ "test": 1 }).pretty()
db.inspections.deleteMany({ "test": 1 })
db.inspections.deleteOne({ "test": 3 })
db.inspection.drop()
Advance CRUD Operations
Comparison Operators
db.trips.find({ "tripduration": { "$lte" : 70 },
"usertype": { "$ne": "Subscriber" } }).pretty()
db.trips.find({ "tripduration": { "$lte" : 70 },
"usertype": { "$eq": "Customer" }}).pretty()
db.trips.find({ "tripduration": { "$lte" : 70 },
"usertype": "Customer" }).pretty()
What is the difference between the number of people born in 1998 and the number of people born after 1998 in the sample_training.trips collection?
db.zips.find({"pop": {"$lt": 1000}}).count()
What is the difference between the number of people born in 1998 and the number of people born after 1998 in the sample_training.trips collection?
MongoDB Enterprise atlas-tcmtj0-shard-0:PRIMARY> db.trips.find({"birth year": 1998}).count()
MongoDB Enterprise atlas-tcmtj0-shard-0:PRIMARY> db.trips.find({"birth year": {"$gt":1998}}).count()
Using the sample_training.routes collection find out which of the following statements will return all routes that have at least one stop in them?
MongoDB Enterprise atlas-tcmtj0-shard-0:PRIMARY> db.routes.find({"stops": {"$gt": 0}}).count()
Logic Operators
$and Match all of the specified query clauses
$or At least one of the query clauses is matched
$nor Fail to match both given clauses
$not Negates the query requirement
$and is used as the default operator when an operator is not specified
$or {<operator> : [{statement1} , {statement2}, ... ]}
$nor
$not {$not : {statement}}
db.routes.find({ "$and": [ { "$or" :[ { "dst_airport": "KZN" },
{ "src_airport": "KZN" }
] },
{ "$or" :[ { "airplane": "CR2" },
{ "airplane": "A81" } ] }
]}).pretty()
How many businesses in the sample_training.inspections dataset have the inspection result “Out of Business” and belong to the “Home Improvement Contractor - 100” sector?
MongoDB Enterprise atlas-tcmtj0-shard-0:PRIMARY> db.inspections.find({"$and": [{"sector":"Home Improvement Contractor - 100"}, {"result": "Out of Business"}]}).count()
Which is the most succinct query to return all documents from the sample_training.inspections collection where the inspection date is either “Feb 20 2015”, or “Feb 21 2015” and the company is not part of the “Cigarette Retail Dealer - 127” sector?
db.inspections.find(
{ "$or": [ { "date": "Feb 20 2015" },
{ "date": "Feb 21 2015" } ],
"sector": { "$ne": "Cigarette Retail Dealer - 127" }}).pretty()
Find a range between 5,000 and 1,000,000
db.zips.find({"pop": {"$gt":5000, "$lt": 1000000}}).count()
How many companies in the sample_training.companies dataset were
either founded in 2004
[and] either have the social category_code [or] web category_code,
[or] were founded in the month of October
[and] also either have the social category_code [or] web category_code?
b.companies.find({ "$and" : [{"$or" :[ {"founded_year":2004 } , {"founded_month": 10} ]}, {"$or": [ {"category_code": "web"}, {"category_code": "social"}]} ] } ).count()
Expressive Query Operator
$expr allow the use of aggregation expressions within the query language
{$expr: { <expression> }}
$expr allows us to use variables and conditional statements
Can we compare fields within the same document to each other?
The meaning of $
sign
$ denotes the use of an oprator (eg: "$gt", "$eq")
$ addresses the field value (eg:"$start station name")
Example
// start station equal to end station
{"$expr" : { "$eq" : ["$start station id", "$end station id"] }}
{"$expr" : { "$eq" : ["$start station name", "$end station name"] }}
Example: find all the companies that have more employees than the year in which they were founded
Example
use sample_training
// Find all documents where the trip started and ended at the same station:
db.trips.find({ "$expr": { "$eq": [ "$end station id", "$start station id"] }
}).count()
// Find all documents where the trip lasted longer than 1200 seconds, and started and ended at the same station:
db.trips.find({ "$expr": { "$and": [ { "$gt": [ "$tripduration", 1200 ]},
{ "$eq": [ "$end station id", "$start station id" ]}
]}}).count()
Which of the following statements will find all the companies that have more employees than the year in which they were founded? 公司人数比时间大。
db.companies.find(
{ "$expr": { "$lt": [ "$founded_year", "$number_of_employees" ] } }
).count()
db.companies.find(
{ "$expr": { "$gt": [ "$number_of_employees", "$founded_year" ]} }
).count()
How many companies in the sample_training.companies collection have the same permalink as their twitter_username?
db.companies.find({"$expr": { "$eq": ["$permalink", "$twitter_username"] }}).count()
Array Operators(query)
$push
Allows us to add an element to an array.
$push
Turns a field into an array field if it was previously a different type.
Find all documents with exactly 20 amenities which include all the amenities listed in the query array:
db.listingsAndReviews.find({ "amenities": {
"$size": 20,
"$all": [ "Internet", "Wifi", "Kitchen",
"Heating", "Family/kid friendly",
"Washer", "Dryer", "Essentials",
"Shampoo", "Hangers",
"Hair dryer", "Iron",
"Laptop friendly workspace" ]
}
}).pretty()
What is the name of the listing in the sample_airbnb.listingsAndReviews dataset that accommodates more than 6 people and has exactly 50 reviews?
db.listingsAndReviews.find({ "reviews": { "$size":50 }, "accommodates": { "$gt":6 }})
Using the sample_airbnb.listingsAndReviews collection find out how many documents have the “property_type” “House”, and include “Changing table” as one of the “amenities”?
“amenities” is an array field.
db.listingsAndReviews.find({"property_type": "House", "amenities": {"$all": ["Changing table"]}})
Which of the following queries will return all listings that have “Free parking on premises”, “Air conditioning”, and “Wifi” as part of their amenities, and have at least 2 bedrooms in the sample_airbnb.listingsAndReviews collection?
db.listingsAndReviews.find(
{ "amenities":
{ "$all": [ "Free parking on premises", "Wifi", "Air
conditioning" ] }, "bedrooms": { "$gte": 2 } } ).pretty()
Array Operators and Projection
Projection and $elemMatch
Switch to this database:
use sample_airbnb
Find all documents with exactly 20 amenities which include all the amenities listed in the query array, and display their price and address:
db.listingsAndReviews.find({ "amenities":
{ "$size": 20, "$all": [ "Internet", "Wifi", "Kitchen", "Heating",
"Family/kid friendly", "Washer", "Dryer",
"Essentials", "Shampoo", "Hangers",
"Hair dryer", "Iron",
"Laptop friendly workspace" ] } },
{"price": 1, "address": 1}).pretty()
Find all documents that have Wifi as one of the amenities only include price and address in the resulting cursor:
db.listingsAndReviews.find({ "amenities": "Wifi" },
{ "price": 1, "address": 1, "_id": 0 }).pretty()
Find all documents that have Wifi as one of the amenities only include price and address in the resulting cursor, also exclude "maximum_nights"
. *This will be an error:
db.listingsAndReviews.find({ "amenities": "Wifi" },
{ "price": 1, "address": 1,
"_id": 0, "maximum_nights":0 }).pretty()
Switch to this database:
use sample_training
Get one document from the collection:
db.grades.findOne()
Find all documents where the student in class 431 received a grade higher than 85 for any type of assignment:
db.grades.find({ "class_id": 431 },
{ "scores": { "$elemMatch": { "score": { "$gt": 85 } } }
}).pretty()
Find all documents where the student had an extra credit score:
db.grades.find({ "scores": { "$elemMatch": { "type": "extra credit" } }
}).pretty()
How many companies in the sample_training.companies collection have offices in the city of Seattle?
db.companies.find({"offices": { "$elemMatch" : { "city": "Seattle" }}}).count()
Which of the following queries will return only the names of companies from the sample_training.companies collection that had exactly 8 funding rounds?
db.companies.find({ "funding_rounds": { "$size": 8 } },
{ "name": 1, "_id": 0 })
Array Operators and Sub-Documents(dot-notation)
use sample_training
db.trips.findOne({ "start station location.type": "Point" })
db.companies.find({ "relationships.0.person.last_name": "Zuckerberg" },
{ "name": 1 }).pretty()
db.companies.find({ "relationships.0.person.first_name": "Mark",
"relationships.0.title": { "$regex": "CEO" } },
{ "name": 1 }).count()
db.companies.find({ "relationships.0.person.first_name": "Mark",
"relationships.0.title": {"$regex": "CEO" } },
{ "name": 1 }).pretty()
db.companies.find({ "relationships":
{ "$elemMatch": { "is_past": true,
"person.first_name": "Mark" } } },
{ "name": 1 }).pretty()
db.companies.find({ "relationships":
{ "$elemMatch": { "is_past": true,
"person.first_name": "Mark" } } },
{ "name": 1 }).count()
Longitude decreases in value as you move west.
How many trips in the sample_training.trips collection started at stations that are to the west of the -74 longitude coordinate?
db.trips.find({ "start station location.coordinates.0": { "$lt": -74 }}).count()
The “start station location” has a sub-document that contains the coordinates array. To get to this coordinates array we must use use dot-notation. We can issue a range query to find all documents in this longitude. The caveat is to remember that all trips take place in NYC so the latitude value in the coordinates array will always be positive, and we don’t have to worry about it when issuing a range query like this.
How many inspections from the sample_training.inspections collection were conducted in the city of NEW YORK?
db.inspections.find({"address.city": "NEW YORK"}).count()
Which of the following queries will return the names and addresses of all listings from the sample_airbnb.listingsAndReviews collection where the first amenity in the list is “Internet”?
db.listingsAndReviews.find({ "amenities.0": "Internet" },
{ "name": 1, "address": 1 }).pretty()
Indexing and Aggregation Pipeline
Aggregation Framework
In its simplest form, another way to query data in MongoDB.
Switch to this database:
use sample_airbnb
Find all documents that have Wifi as one of the amenities. Only include price and address in the resulting cursor.
db.listingsAndReviews.find({ "amenities": "Wifi" },
{ "price": 1, "address": 1, "_id": 0 }).pretty()
Using the aggregation framework find all documents that have Wifi as one of the amenities*. Only include*
price and address in the resulting cursor.
db.listingsAndReviews.aggregate([
{ "$match": { "amenities": "Wifi" } },
{ "$project": { "price": 1,
"address": 1,
"_id": 0 }}]).pretty()
Find one document in the collection and only include the address field in the resulting cursor.
db.listingsAndReviews.findOne({ },{ "address": 1, "_id": 0 })
Project only the address field value for each document, then group all documents into one document per address.country value.
db.listingsAndReviews.aggregate([ { "$project": { "address": 1, "_id": 0 }}, { "$group": { "_id": "$address.country" }}])
COPY Project only the address field value for each document, then group all documents into one document per address.country value, and count one for each document in each group.
db.listingsAndReviews.aggregate([
{ "$project": { "address": 1, "_id": 0 }},
{ "$group": { "_id": "$address.country",
"count": { "$sum": 1 } } }
])
We can also calculate using aggregation.
MongoDB Enterprise > db.listingsAndReviews.aggregate([ { "$project": { "room_type": 1, "_id": 0 }},
... { "$group": { "_id": "$room_type" }}])
{ "_id" : "Private room" }
{ "_id" : "Entire home/apt" }
{ "_id" : "Shared room" }
What are the differences between using aggregate() and find()?
- aggregate() can do what find() can and more.
- aggregate() allows us to compute and reshape data in the cursor.
sort() and limit()
use sample_training
// sort by "pop"
db.zips.find().sort({ "pop": 1 }).limit(1)
db.zips.find({ "pop": 0 }).count()
// find the most populated zipcode
db.zips.find().sort({ "pop": -1 }).limit(1)
db.zips.find().sort({ "pop": -1 }).limit(10)
// 1: increasing , -1: decreasing
db.zips.find().sort({ "pop": 1, "city": -1 })
cursor.limit().sort()
means cursor.sort().limit()
Which of the following commands will return the name and founding year for the 5 oldest companies in the sample_training.companies
collection?
// These two queries are the same
db.companies.find({ "founded_year": { "$ne": null }},
{ "name": 1, "founded_year": 1 }
).limit(5).sort({ "founded_year": 1 })
db.companies.find({ "founded_year": { "$ne": null }},
{ "name": 1, "founded_year": 1 }
).sort({ "founded_year": 1 }).limit(5)
In what year was the youngest bike rider from the sample_training.trips collection born?
db.trips.find({"birth year": {"$ne":""}}, {"birth year": 1}).sort({"birth year": 1}).limit(1).pretty()
Introduction to Indexes
Both finding "birth year": 1989
or sorting counld benifited from an index.
use sample_training
// Single field index is ok
db.trips.find({ "birth year": 1989 })
// need compound index
db.trips.find({ "start station id": 476 }).sort( { "birth year": 1 } )
// Single field index.
db.trips.createIndex({ "birth year": 1 })
// not perfect for
db.trips.find({ "start station id": 476 }).sort( { "birth year": 1 } )
// then we create a compound index
db.trips.createIndex({ "start station id": 1, "birth year": 1 })
Avoid sorting!
Jameela often queries the sample_training.routes collection by the src_airport field like this:
db.routes.find({ "src_airport": "MUC" }).pretty()
Which command will create an index that will support this query?
db.routes.createIndex({ "src_airport": -1 })
Introduction to Data Modeling
Data modeling is a way to organize fields in a document to support your application performance and querying capabilities.
Rule: Data is stored in the way that it is used.
Optimized for fast and easy way to retrieval.
Data that is used together should be stored together
Evolving application implies an evolving data model
What is data modeling?
a way to organize fields in a document to support your application performance and querying capabilities
Upsert - Update or Insert?
Everything in MQL that is used to locate a document in a collection can also be used to modify this document.
db.collection.updateOne({<query to locate>}, {<update>})
Upsert is a hybrid of udpate and insert, it should only be used when it is needed.
db.collection.updateOne({<query>}, {<update>}, {"upsert": true})
by default upsert
is false
. if upsert is true
:
- is there a match?(yes), update the matched document
- is there a match?(no), insert a new document
db.iot.updateOne({ "sensor": r.sensor, "date": r.date,
"valcount": { "$lt": 48 } },
{ "$push": { "readings": { "v": r.value, "t": r.time } },
"$inc": { "valcount": 1, "total": r.value } },
{ "upsert": true })
update and insert
db.iot.updateOne({ "sensor": r.sensor, "date": r.date,
"valcount": { "$lt": 48 } },
{ "$push": { "readings": { "v": r.value, "t": r.time } },
"$inc": { "valcount": 1, "total": r.value } },
{ "upsert": true })
How does the upsert option work?
- When upsert is set to true and the query predicate returns an empty cursor, the update operation creates a new document using the directive from the query predicate and the update predicate.
- When upsert is set to false and the query predicate returns an empty cursor then there will be no updated documents as a result of this operation.
- By default upsert is set to false.
Next Steps
Atlas Features - Data Explorer Features
-
What actions are available to you via the Aggregation Builder in the Atlas Data Explorer? Export pipeline to a programming language.
-
Syntax for each selected aggregation stage.
-
A preview of the data in the pipeline at each selected stage.