M320: Chapter 5: Conclusion

Q2

Scenario

We built a very successful navigation application for cell phones. The application has been installed on many devices throughout the world.

Over a long period of time, the application is active on an average of 10 million cell phones, with a maximum peak of 50 million devices at the busiest time of the year. This average value includes all of the peaks.

Each device, when active, sends 100 bytes of data every minute, which the server writes as one operation.

We want to keep the data for one year.

Based on the above numbers, which of the following are true statements regarding the quantification of this workload and the sizing of the database?

To simplify the calculations and conversions in data units, use:

1,000,000,000 bytes is 1 gigabyte
1,000,000,000,000 bytes is 1 terabyte
1,000,000,000,000,000 bytes is 1 petabyte and round all results to 2 significant digits.
The size of the data in the database is 530 terabytes.

This is the amount of space needed to keep the data for a one year period.

10,000,000 cellphones * 100 bytes/(minute*cellphone) * (60 * 24 * 365 minutes/year)

The peak number is 830,000 writes/second.

50,000,000 cellphones/minute * 1 minute/60 seconds

The average write rate is 170,000 writes/second.

10,000,000 cellphones/minute * 1 minute/60 seconds

Q7

Your team just hired a new Data Architect, and her amazing ideas are gaining traction with the company leadership.

You already updated your application to be able to handle the new data organization in your database. Now you have been tasked with implementing her proposed new data organization approach to your database with minimum downtime for the users of the application.

Which pattern solution is best suited for this situation? The Schema Versioning Pattern

Q8

A new decision maker came aboard your online bookstore team. They want to be able to track which genres are most popular daily. To keep this metric up to date without running massive queries for obtaining it, which of the following schema patterns would you choose to implement?

The Computed Pattern The Computed Pattern allows your application to calculate values at write time. In this case, the sum of the number of views would be calculated in a rolling fashion by book genre.

Q9 IoT

You work as a developer at a factory. Your factory wants to track the usage statistics of the automatic lighting that was recently installed throughout its facilities. The lights send an update to the database every 10 seconds, but the management is interested in an hourly report instead. Additionally, we are only looking to store this information for at most 5 years, so an easy way to purge old data would be beneficial to our data modeling approach.

Which pattern solution is best suited for this situation? The Bucket Pattern

Q10

With the digitization of every area of our lives, the famous NYC bodegas (convenience stores) are trying to keep up. Bodegas don’t just know everything that goes on in the neighborhood, they also supply all types of household goods, hardware supplies, and groceries. The New York City Bodega Association is looking to create an app that will help them keep track of their unique, versatile inventory and help customers look up whether items are in stock before visiting the bodega.

Which pattern solution is best suited for this application?

The Polymorphic Pattern The problem states that bodegas sell a variety of items from different categories, with different purposes and properties. In this case, the Polymorphic Pattern will be the best candidate to catalog this set of goods.

Q11

You are a developer working on an e-commerce application. Each time your application retrieves an item from the inventory collection, it also needs to retrieve data about the orders in which this item is present. This leads to additional queries on the orders collection.

We know that reducing the total number of queries we perform on the system would solve the main performance issue we are seeing at peak time.

Which pattern solution is best suited for this situation?

The Extended Reference Pattern In this case, the Extended Reference Pattern will easily take care of the additional queries our application is making. To implement the pattern we can modify the inventory item documents by adding frequently-accessed order data directly to them. This will result in lowering the number of related queries on the orders collection, since the relevant data about the orders will now be part of the inventory item documents.

Q12

As a chemical manufacturer, you tend to keep your factory and your data organized, since dealing with chemicals requires a lot of precision, attention, and safety mechanisms. One of the safety mechanisms in the factory is the documentation about produced chemicals. This documentation is recorded in Material Safety Data Sheets which are large pdf documents containing safety details about a given chemical. These Data Sheets are part of the documents in the inventory collection, where other information such as the price, quantity, and warehouse location of the chemical is stored as well.

Keeping track of production, sales, and purchases requires a lot of data manipulation on an hourly basis. You notice that at especially busy times, your inventory tracking application slows down by a lot.

Which pattern solution is best suited for solving this issue?

The Subset Pattern

Epilogue

Congratulations on completing M320: Data Modeling!

The knowledge you have acquired will help you create more robust data models and efficient queries using MongoDB.

There are many more advanced topics and additional subjects we did not explore in this course. If you want to begin learning more about these, consult the following list of resources.

Sharding

Sharding is an extremely important topic for large-scale systems that will impact your design decisions. Most systems do not reach sizes that require Sharding. However if your system is already sharded or you are sure that your system will need to be, you should get familiar with the main concepts of Sharding. Here are some important reads on Sharding:

Documentation on Sharding

Choosing a Shard Key

Query Effectiveness

We taught you to think early about your queries and to model based on your system’s workload.

Once you’ve implemented your schema design, how do you assess the effectiveness of your queries? Consult the following resources to validate that your queries are working as expected, using the right indexes, and are not running too slowly:

Indexing Strategy

Analyze Query Performance

Document and Schema Validation

We mentioned that although MongoDB uses a flexible schema, you can still enforce constraints on your data models. You can add many different kinds of validation, such as field type, value, and presence.

To know more about Schema Validation, please refer to the following resources:

Documentation on the Schema Validation topic.

Transactions in MongoDB

We mentionned a few times that MongoDB now supports transactions. To know more about them, please refer to the following resources:

Documentation on Transactions

Videos explaining their implementation

Schema Design Patterns

To see additional information on our Schema Design Patterns, please refer to the following resources:

Series of blogs on Schema Design Patterns

Video from MongoDB World on Schema Design Patterns