M121: Chapter 1: Basic Aggregation

The MongoDB Aggregation Framework(Chapter 1 of 6)

Chapter 1: Basic Aggregation - $match and $project

$match: Filtering documents

db.solarSystemaggregate([{$match: {}}])

$match uses standard MongoDB query operators!

db.solarSystem.count({type: {$ne: "Star"}})

$match does not allow for projection

Which of the following is/are true of the $match stage?

It uses the familiar MongoDB query language. It should come very early in an aggregation pipeline.

Lab $match

Problem:

Help MongoDB pick a movie our next movie night! Based on employee polling, we’ve decided that potential movies must meet the following criteria.

  • imdb.rating is at least 7
  • genres does not contain “Crime” or “Horror”
  • rated is either “PG” or “G”
  • languages contains “English” and “Japanese”

Assign the aggregation to a variable named pipeline, like:

var pipeline = [ { $match: { ... } } ]

As a hint, your aggregation should return 23 documents. You can verify this by typing db.movies.aggregate(pipeline).itcount() Download the m121/chapter1.zip handout with this lab. Unzip the downloaded folder and copy its contents to the m121 directory. Load validateLab1.js into mongo shell

load('validateLab1.js')

And run the validateLab1 validation method

validateLab1(pipeline)
var pipeline = [{
    $match: 
        {
			// greter or equal than 7
         "imdb.rating": { $gte: 7 }, 
		 // none of them
         genres: {$nin: ['Crime', 'Horror']},
		 // either
         rated: {$in : ["PG", "G"]},
		 // contains all 
         languages: {$all : ["English", "Japanese"]}
       }
}]

Shaping documents with $project

// remove the _id and retain the name field
db.solarSystem.aggregate([{$project: { _id: 0, name: 1}}])

If $match is like a filter method, $project is like $map.

// project ``name`` and remove ``_id``
db.solarSystem.aggregate([{ "$project": { "_id": 0, "name": 1 } }]);

// project ``name`` and ``gravity`` fields, including default ``_id``
db.solarSystem.aggregate([{ "$project": { "name": 1, "gravity": 1 } }]);

// using dot-notation to express the projection fields
db.solarSystem.aggregate([{ "$project": { "_id": 0, "name": 1, "gravity.value": 1 } }]);

// reassing ``gravity`` field with value from ``gravity.value`` embeded field
db.solarSystem.aggregate([{"$project": { "_id": 0, "name": 1, "gravity": "$gravity.value" }}]);

// creating a document new field ``surfaceGravity``
db.solarSystem.aggregate([{"$project": { "_id": 0, "name": 1, "surfaceGravity": "$gravity.value" }}]);

// creating a new field ``myWeight`` using expressions
db.solarSystem.aggregate([{"$project": { "_id": 0, "name": 1, "myWeight": { "$multiply": [ { "$divide": [ "$gravity.value", 9.8 ] }, 86 ] } }}]);

Lab - Changing Document Shape with $project

Our first movie night was a success. Unfortunately, our ISP called to let us know we’re close to our bandwidth quota, but we need another movie recommendation!

Using the same $match stage from the previous lab, add a $project stage to only display the the title and film rating (title and rated fields).

Assign the results to a variable called pipeline.

var pipeline = [{
    $match: 
        {
         "imdb.rating": { $gte: 7 }, 
         genres: {$nin: ['Crime', 'Horror']},
         rated: {$in : ["PG", "G"]},
         languages: {$all : ["English", "Japanese"]}
       }
}, { $project: { _id: 0, title: 1, rated: 1 } }]

Load validateLab2.js which was included in the same handout as validateLab1.js and execute validateLab2(pipeline)?

load('./validateLab2.js')

And run the validateLab2 validation method

validateLab2(pipeline)
MongoDB Enterprise Cluster0-shard-0:PRIMARY> db.movies.aggregate(pipeline)
{ "title" : "Those Magnificent Men in Their Flying Machines or How I Flew from London to Paris in 25 hours 11 minutes", "rated" : "G" }
{ "title" : "Red Sun", "rated" : "PG" }
{ "title" : "Babies", "rated" : "PG" }
{ "title" : "The Karate Kid", "rated" : "PG" }
{ "title" : "Dragon Ball Z: Tree of Might", "rated" : "PG" }
{ "title" : "Cars", "rated" : "G" }
{ "title" : "Jack and the Beanstalk", "rated" : "G" }
{ "title" : "The Transformers: The Movie", "rated" : "PG" }
{ "title" : "Defending Your Life", "rated" : "PG" }
{ "title" : "The Cat Returns", "rated" : "G" }
{ "title" : "Hell in the Pacific", "rated" : "G" }
{ "title" : "The Goodbye Girl", "rated" : "PG" }
{ "title" : "Tora! Tora! Tora!", "rated" : "G" }
{ "title" : "Local Hero", "rated" : "PG" }
{ "title" : "Summer Wars", "rated" : "PG" }
{ "title" : "The Secret World of Arrietty", "rated" : "G" }
{ "title" : "Empire of the Sun", "rated" : "PG" }
{ "title" : "Dreams", "rated" : "PG" }
{ "title" : "Millennium Actress", "rated" : "PG" }
{ "title" : "Whisper of the Heart", "rated" : "G" }
Type "it" for more
MongoDB Enterprise Cluster0-shard-0:PRIMARY> load('./validateLab2.js')
true
MongoDB Enterprise Cluster0-shard-0:PRIMARY> validateLab2(pipeline)

Lab - Computing Fields

Using the Aggregation Framework, find a count of the number of movies that have a title composed of one word. To clarify, “Cinderella” and “3-25” should count, where as “Cast Away” would not.

Make sure you look into the $split String expression and the $size Array expression

To get the count, you can append itcount() to the end of your pipeline

db.movies.aggregate([
  {
    $match: {
      title: {
        $type: "string"
      }
    }
  },
  {
    $project: {
      title: { $split: ["$title", " "] },
      _id: 0
    }
  },
  {
    $match: {
      title: { $size: 1 }
    }
  }
]).itcount()

Optional Lab - Expressions with $project

Let’s find how many movies in our movies collection are a “labor of love”, where the same person appears in cast, directors, and writers

db.movies.aggregate([
  {
    $match: {
      cast: { $elemMatch: { $exists: true } },
      directors: { $elemMatch: { $exists: true } },
      writers: { $elemMatch: { $exists: true } }
    }
  },
  {
    $project: {
      _id: 0,
      cast: 1,
      directors: 1,
      writers: {
        $map: {
          input: "$writers",
          as: "writer",
          in: {
            $arrayElemAt: [
              {
                $split: ["$$writer", " ("]
              },
              0
            ]
          }
        }
      }
    }
  },
  {
    $project: {
      labor_of_love: {
        $gt: [
          { $size: { $setIntersection: ["$cast", "$directors", "$writers"] } },
          0
        ]
      }
    }
  },
  {
    $match: { labor_of_love: true }
  },
  {
    $count: "labors of love"
  }
])

With our first $match stage, we filter out documents that are not an array or have an empty array for the fields we are interested in.