M201: Chapter 4: CRUD Optimization (Chapter 4 of 5)

Chapter 4

Optimizing your CRUD Operations

  • Index Selectivity
  • Equality, Sort, Range
  • Performance Tradeoffs
// use the m201 database
use m201

// create an explainable object
var exp = db.restaurants.explain("executionStats")

// run an explained query (COLLSCAN & in-memory sort)
exp.find({ "address.zipcode": { $gt: '50000' }, cuisine: 'Sushi' }).sort({ stars: -1 })

// create a naive index
db.restaurants.createIndex({"address.zipcode": 1,"cuisine": 1,"stars": 1})

// rerun the query (uses the index, but isn't very selective and still does an 
// in-memory sort)
exp.find({ "address.zipcode": { $gt: '50000' }, cuisine: 'Sushi' }).sort({ stars: -1 })

// see how many documents match 50000 for zipcode (10)
db.restaurants.find({ "address.zipcode": '50000' }).count()

// see how many documents match our range (about half)
db.restaurants.find({ "address.zipcode": { $gt: '50000' } }).count()

// see how many documents match an equality condition on cuisine (~2%)
db.restaurants.find({ cuisine: 'Sushi' }).count()

// reorder the index key pattern to be more selective
db.restaurants.createIndex({ "cuisine": 1, "address.zipcode": 1, "stars": 1 })

// and rerun the query (faster, still doing an in-memory sort)
exp.find({ "address.zipcode": { $gt: '50000' }, cuisine: 'Sushi' }).sort({ stars: -1 })

// swap stars and zipcode to prevent an in-memory sort
db.restaurants.createIndex({ "cuisine": 1, "stars": 1, "address.zipcode": 1 })

// awesome, no more in-memory sort! (uses the equality, sort, range rule)
exp.find({ "address.zipcode": { $gt: '50000' }, cuisine: 'Sushi' }).sort({ stars: -1 })

// in memory sort, we need to prevent this
"stage": "SORT"

// index scan
"stage": "IXSCAN"

When building indexes to service your queries, which of the following is the general rule of thumb you should keep when ordering your index keys?

Note, use the following definitions to for this question:

  • equality: indexed fields on which our queries will do equality matching
  • range: indexed fields on which our queries will have a range condition
  • sort: indexed fields on which our queries will sort on

The order is : equality, sort, range

Covered Queries

What are covered queries?

  • Very performant
  • Satisfied entirely by index keys
  • 0 documents need to be examined

// use the m201 database
use m201

// create an explainable object
var exp = db.restaurants.explain("executionStats")

// create a compound index on three fields
db.restaurants.createIndex({name: 1, cuisine: 1, stars: 1})

// checkout a projected query
db.restaurants.find({name: { $gt: 'L' }, cuisine: 'Sushi', stars: { $gte: 4.0 } }, { _id: 0, name: 1, cuisine: 1, stars: 1 })

// and look at it's explain output (it's covered, no docs examined)
exp.find({name: { $gt: 'L' }, cuisine: 'Sushi', stars: { $gte: 4.0 } }, { _id: 0, name: 1, cuisine: 1, stars: 1 })


// get the same output as the first query
db.restaurants.find({name: { $gt: 'L' }, cuisine: 'Sushi', stars: { $gte: 4.0 } }, { _id: 0, address: 0 })

// but when looking at the explain output we see that it's not a covered query
exp.find({name: { $gt: 'L' }, cuisine: 'Sushi', stars: { $gte: 4.0 } }, { _id: 0, address: 0 })

Problem

Given the following indexes:

{ _id: 1 }
{ name: 1, dob: 1 }
{ hair: 1, name: 1 }

Which of the following queries could be covered by one of the given indexes?

db.example.find( { name : { $in : [ "Bart", "Homer" ] } }, {_id : 0, dob : 1, name : 1} )

Regex Performance

db.users.createIndex({username: 1})
db.users.find({username: /^kirby/})

Aggregation Performance

  • Index Usage
  • Memory Constraints

“Realtime” Processing

  • Provide data for applications
  • Query performance is more important

Batch Processing

  • Provide data for analytics
  • Query performance is less important
db.orders.aggregate([
	{$<operator> : <predicate>},
	{$<operator> : <predicate>},
	...
], {explain: true})

top-k sorting algorithm

db.orders.aggregate([
	{$match:{cust_id: {$lt: 50}}},
	{$limit: 10},
	{$sort: {total: 1}}
])

Memory Constraints

  • Results are subject to 16MB document limit
    • use $limit and $project
  • 100MB of RAM per stage
    • Use indexes
    • db.orders.aggregate([], {allowDiskUse: true})

{allowDiskUse: true} doesn’t work with $graphLookup

  • Transforming data in a pipeline stage prevents us from using indexes in the stages that follow
  • When $limit and $sort are close together a very performant top-k sort can be performed

Lab 4.1: Equality, Sort, Range

In this lab you’re going to use the equality, sort, range rule to determine which index best supports a given query.

Given the following query:

db.accounts.find( { accountBalance : { $gte : NumberDecimal(100000.00) }, city: "New York" } )
           .sort( { lastName: 1, firstName: 1 } )

Which of the following indexes best supports this query with regards to the equality, sort, range rule.

{ city: 1, lastName: 1, firstName: 1, accountBalance: 1 }

Lab 4.2: Aggregation Performance

db.restaurants.dropIndexes()

// without an index, this will be an error.
db.restaurants.aggregate([
  { $match: { stars: { $gt: 2 } } },
  { $sort: { stars: 1 } },
  { $group: { _id: "$cuisine", count: { $sum: 1 } } }
])
{
  "ok": 0,
  "errmsg": "Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting. Aborting operation. Pass allowDiskUse:true to opt in.",
  "code": 16819,
  "codeName": "Location16819"
}

the create an index

db.restaurants.createIndex({stars: 1})

with an index, the query is success.

mongos> db.restaurants.aggregate([   { $match: { stars: { $gt: 2 } } },   { $sort: { stars: 1 } },   { $group: { _id: "$cuisine", count: { $sum: 1 } } } ])
{ "_id" : "Mediterranean", "count" : 13691 }
{ "_id" : "Indonesian", "count" : 13648 }
{ "_id" : "Thai", "count" : 13432 }
{ "_id" : "Vegetarian", "count" : 13778 }
{ "_id" : "Cajun", "count" : 13691 }
{ "_id" : "African", "count" : 13777 }
{ "_id" : "Chinese", "count" : 13564 }
{ "_id" : "Sichuan", "count" : 13772 }
{ "_id" : "Caribbean", "count" : 13614 }
{ "_id" : "Asian", "count" : 13829 }
{ "_id" : "Turkish", "count" : 13863 }
{ "_id" : "Tibetan", "count" : 13722 }
{ "_id" : "American", "count" : 13736 }
{ "_id" : "Seafood", "count" : 13744 }
{ "_id" : "Malaysian", "count" : 13514 }
{ "_id" : "Brazilian", "count" : 13771 }
{ "_id" : "Cuban", "count" : 13621 }
{ "_id" : "Korean", "count" : 13437 }
{ "_id" : "Moroccan", "count" : 13680 }
{ "_id" : "Vietnamese", "count" : 13643 }
Type "it" for more