MongoDB cookbook

Familiarise yourself with basics of MongoDB as we start our journey into understanding the support for aggregations in Mongo.

MongoDB cookbook

Agenda

MongoDB can be a great choice if the schema needs to be flexible. In this post, let's revisit the queries needed to get frequently required data.

This post assumes the following:

  • MongoDB is installed
  • Access to MongoDB shell

Queries

Let's get access to a MongoDB shell by using command mongo.

$ mongo

Let's see the various queries that can be invoked.

How to show the databases

This prints a list of all databases on this server.

> show databases;

How to create a collection

A MongoDB collection is synonymous to a relational database table. A collection can be created by using method createCollection.

> db.createCollection('people');

How to insert a document into a collection

A MongoDB document is synonymous to a relational database row. It can be inserted in the following way:

> db.people.insert({'name': 'Eddard Stark', 'house': 'Stark', 'location': 'Winterfell'})

How to do bulk insertion of documents

Multiple documents can be inserted at once by using method insertMany.

> db.people.insertMany([{'name': 'Cat Stark', 'house': 'Tully', 'location': 'Winterfell'}, {'name': 'Arya Stark', 'house': 'Stark', 'location': 'Winterfell'}, {'name': 'Sansa Stark', 'house': 'Stark', 'location': 'Winterfell'}, {'name': 'Jon Snow', 'location': 'Winterfell'}])

Find all documents

We can get all documents by using method find() and not passing any argument to it.

> db.people.find()

Find a document where name is Arya Stark

> db.people.find({'name': 'Arya Stark'})
{ "_id" : ObjectId("61c34002fa3c6ac0076acc11"), "name" : "Arya Stark", "house" : "Stark", "location" : "Winterfell" }

Find all documents where house is Stark

> db.people.find({'house': 'Stark'})
{ "_id" : ObjectId("61c33f5afa3c6ac0076acc0f"), "name" : "Eddard Stark", "house" : "Stark", "location" : "Winterfell" }
{ "_id" : ObjectId("61c34002fa3c6ac0076acc11"), "name" : "Arya Stark", "house" : "Stark", "location" : "Winterfell" }
{ "_id" : ObjectId("61c34002fa3c6ac0076acc12"), "name" : "Sansa Stark", "house" : "Stark", "location" : "Winterfell" }

Apply a limit while finding documents

> db.people.find({'house': 'Stark'}).limit(2)
{ "_id" : ObjectId("61c33f5afa3c6ac0076acc0f"), "name" : "Eddard Stark", "house" : "Stark", "location" : "Winterfell" }
{ "_id" : ObjectId("61c34002fa3c6ac0076acc11"), "name" : "Arya Stark", "house" : "Stark", "location" : "Winterfell" }

Find documents with multiple conditions

Multiple <field> : <value> pairs can be passed while using find.

> db.people.find({'house': 'Stark', 'name': 'Eddard Stark'})

This will act like SQL's AND. Hence only documents matching both the conditions would be returned.

Find document which satisfy any of the conditions

We want to find documents where house is either Stark or Tully.

> db.people.find({'$or': [{'house': 'Stark'}, {'house': 'Tully'}]})

The output would look like:

{ "_id" : ObjectId("61c33f5afa3c6ac0076acc0f"), "name" : "Eddard Stark", "house" : "Stark", "location" : "Winterfell" }
{ "_id" : ObjectId("61c34002fa3c6ac0076acc10"), "name" : "Cat Stark", "house" : "Tully", "location" : "Winterfell" }
{ "_id" : ObjectId("61c34002fa3c6ac0076acc11"), "name" : "Arya Stark", "house" : "Stark", "location" : "Winterfell" }
{ "_id" : ObjectId("61c34002fa3c6ac0076acc12"), "name" : "Sansa Stark", "house" : "Stark", "location" : "Winterfell" }

Find documents where a field exists

We want to find all documents where a house has been set.

> db.people.find({'house': {'$exists': true}})

This would return all documents except the one for Jon Snow.

Similarly, we can find all documents where a house has not been set.

> db.people.find({'house': {'$exists': true}})

This would return only document for Jon Snow.

Update documents

We realise that we haven't added an age field on the characters. Let's update few people, and add an age.

> db.people.updateOne({'name': 'Eddard Stark'}, {$set: {'age': 42}})

Hence, we need to use method updateOne. We need to pass two arguments to the method. The first argument specifies the filter which finds out the documents that needs to be updated. In our case, the filter is name equal to Eddard Stark. And we want a field called age to be set on this document. Make sure to use $set and pass the key and value which needs to be set.

Make sure you don't miss $set and do something like:

> db.people.updateOne({'name': 'Eddard Stark'}, {'age': 42})

If you do the above, the existing fields of this document would be erased, i.e name, house and location would be erased. And only field age would be available.

Similarly, let's update few more documents.

> db.people.updateOne({'name': 'Cat Stark'}, {$set: {'age': 40}})
> db.people.updateOne({'name': 'Arya Stark'}, {$set: {'age': 12}})
> db.people.updateOne({'name': 'Sansa Stark'}, {$set: {'age': 15}})

List documents ordered by a column

Let's list all documents of house Stark ordered by the age in ascending order.

> db.people.find({'house': 'Stark'}).sort({'age': 1})

age can be specified as -1 if we want it in descending order.

Selecting only few fields

Say we only want to fetch the name and age of the people, then we can write the following query:

> db.people.find({house: "Stark"}, {name: 1, age:1})

Refer Mongodb projection for a detailed explanataion.

Adding indexes

When we query for a particular field, unless we have an index, Mongo would have to perform a collection scan. A collection scan is similar to a full table scan in other databases.

Hence, if we find ourselves frequently querying on a particular field, then it makes sense to add that field to the index. Let's say our application allows filtering by house of the characters, and hence we end up doing filter on house very frequently, then we should add an index on field house.

> db.people.createIndex({'house': 1})

After adding this index, statements like db.people.find({'house': 'Stark'}) should be significantly faster than executing the same statement without an index.

Also .sort() on a field can gain in execution speed if there is an index on the field on which we are performing the ordering. Hence, if we find ourselves frequently doing a sort({age: 1}), then it makes sense to add an index on age.

Add more data

Let's add some more data to keep things interesting.

> db.people.insertMany([{'name': 'Tywin Lannister', 'house': 'Lannister', 'location': 'Casterly Rock', 'age': 55}, {'name': 'Jamie Lannister', 'house': 'Lannister', 'location': 'Casterly Rock', 'age': 35}, {'name': 'Tyrion Lannister', 'house': 'Lannister', 'location': 'Casterly Rock', alias: 'Imp', age: 30}, {'name': 'Cersei Lannister', 'location': 'Casterly Rock', house: 'Lannister', age: 35}])

Find total number of characters

We can use the following statement to find total number of characters in our collection.

> db.people.aggregate([{$count: "total_characters"}])

This would return {'total_charaters': 9}

Find the oldest member in each house

> db.people.aggregate([{$group: {_id: "$house", maximum: {$max: "$age"}}}])

This should output something like:

{ "_id" : "Lannister", "maximum" : 55 }
{ "_id" : null, "maximum" : null }
{ "_id" : "Tully", "maximum" : 40 }
{ "_id" : "Stark", "maximum" : 42 }

We saw an entry for null because we have one document without a house field. We do not have a house set for Jon Snow. Hence that contributes to null.

Find the number of members in each house

> db.people.aggregate([{$group: {_id: "$house", num_members: {$count: {}}}}])

This should output:

{ "_id" : "Lannister", "num_members" : 4 }
{ "_id" : null, "num_members" : 1 }
{ "_id" : "Tully", "num_members" : 1 }
{ "_id" : "Stark", "num_members" : 3 }

Find number of members in each house ordered by highest number of members

> db.people.aggregate([{$group: {_id: "$house", num_members: {$count: {}}}}, {$sort: {num_members: -1}}])

This would output:

{ "_id" : "Lannister", "num_members" : 4 }
{ "_id" : "Stark", "num_members" : 3 }
{ "_id" : "Tully", "num_members" : 1 }
{ "_id" : null, "num_members" : 1 }

Associations in Mongodb

The application would rarely be working with a single type of entity like this. Instead the application might be working with two kind of entities, say House and Member. Each House can have it's own fields and Member can have their own fields. And a member belongs to a particular house.

Considering an example, a house could have fields like name, sigil, seat and words. A member can have fields like name, age, power, kills etc. And a member would belong to a house.

There are two ways to manage associations in Mongo:

  • Embedding child documents in a parent document
  • Keeping references between documents
Embedded documents

The approach to take depends on the the application needs and kind of queries required to drive our application. Let's assume our application would need the following pages:

  • List all the houses
  • List all members of the houses on the house detail page.

In a traditional relational database, the schema would probably have two tables. One table would be house and the other table would be member. member would have a foreign key to house. Certain requirements like Sort the houses by the number of members would need a SQL query which would perform join between the two tables.

In Mongodb, we can follow embedded documents approach for such requirements.

Let's create a document for house Stark along with it's members.

> db.houses.insertOne({'name': 'Stark', 'sigil': 'Direwolf', 'seat': 'Winterfell', 'words': 'Winter is coming', members: [{'name': 'Eddard Stark', 'age': 40, 'power': 90}, {'name': 'Catelyn Stark', 'age': 38, 'power': 80}, {'name': 'Robb Stark', 'age': 19, 'power': 70}, {'name': 'Sansa Stark', 'age':17, 'power': 60}, {'name': 'Arya Stark', 'age': 15, 'power': 65}]})

You can notice that we have set a key called members on the house document itself. It is an array. We can consider each entry of this array as a document as well. Hence there is a parent document for house and it contains embedded child documents.

The embedded child documents look like:

[
    {'name': 'Eddard Stark', 'age': 40, 'power': 90},
    {'name': 'Catelyn Stark', 'age': 38, 'power': 80},
    {'name': 'Robb Stark', 'age': 19, 'power': 70},
    {'name': 'Sansa Stark', 'age':17, 'power': 60},
    {'name': 'Arya Stark', 'age': 15, 'power': 65}
]

Let's insert a document for house Lannister with members embedded in it.

> db.houses.insertOne({'name': 'Lannister', 'sigil': 'Lion', 'seat': 'Casterly Rock', 'words': 'Hear me roar', members: [{'name': 'Tyrion Lannister', 'age': 30, 'power': 80}, {'name': 'Jamie Lannister', 'age': 32, 'power': 85}, {'name': 'Tywin Lannister', 'age': 50, 'power': 87}, {'name': 'Cersei Lannister', 'age':32, 'power': 80}]})

Let's insert a document for house Baratheon with members embedded in it.

> db.houses.insertOne({'name': 'Baratheon', 'sigil': 'Stag', 'seat': "Storm's End", 'words': 'Ours is the fury', members: [{'name': 'Robert Baratheon', 'age': 40, 'power': 95}, {'name': 'Renly Baratheon', 'age': 36, 'power': 75}, {'name': 'Stannis Baratheon', 'age': 38, 'power': 82}]})

Executing the following statement would return all the houses. This can be used on the list page which lists all the houses.

> db.houses.find({})

Say the list page doesn't show the members of the house. In such case, we can only ask the Mongo server to return the specific fields we need. The query would look like:

> db.houses.find({}, {'name': 1, 'sigil': 1, 'seat': 1, 'words': 1})

This would output:

{ "_id" : ObjectId("61ca90df0ee3a7a24c8f101a"), "name" : "Stark", "sigil" : "Direwolf", "seat" : "Winterfell", "words" : "Winter is coming" }
{ "_id" : ObjectId("61ca93840ee3a7a24c8f101b"), "name" : "Lannister", "sigil" : "Lion", "seat" : "Casterly Rock", "words" : "Hear me roar" }
{ "_id" : ObjectId("61ca94ba0ee3a7a24c8f101c"), "name" : "Baratheon", "sigil" : "Stag", "seat" : "Storm's End", "words" : "Ours is the fury" }

You would notice, we have passed two arguments to find. The first argument is the condition. As we want all documents, hence the condition is {}. The second argument specifies the fields we want. In our example it is {'name': 1, 'sigil': 1, 'seat': 1, 'words': 1}. This is called a projection, and you can read more about it here.

We only wanted to exclude the members field. There is a better way of writing such projections. We can write it as:

> db.houses.find({}, {'members': 0})

The output would look exactly like the previous query's output.

Let's say our application also supports a detail page for a house. It displays detail about the house like it's name, 'sigil' etc. and also the members. In such case, the user would probably click on a house on the list page and will then navigate to the detail page. Hence, we can filter the house by the id and get all the details.

> db.houses.find({_id: ObjectId("61ca90df0ee3a7a24c8f101a")})

The application can then loop over the members and display the member details too.

Had we used a relational database, this would have either required 2 queries or a join between the tables. Joins are expensive as the data of two different tables isn't stored together and might be far apart on the disk.

Say we want to allow ordering on the house list page. We want to order by the strength of each house. This involves doing a sum of power of members for each house. The query to get houses ordered by number of members would be:

> db.houses.aggregate([{$unwind: "$members"}, {$group: {_id: "$name", strength: {$sum: "$members.power"}}}, {$sort: {strength: -1}}])

What we have used here is an aggregation pipeline. An aggregation pipeline is invoked by using function aggregate(). It takes an array as an argument. We have passed the following three entries in the array:

  • {$unwind: "$members"}
  • {$group: {_id: "$name", strength: {$sum: "$members.power"}}}
  • {$sort: {strength: -1}}

Read more about aggregation pipeline to understand it in detail.

In a relational database, this would again need a join between the two tables and then applying a group by on the joined data.

References across documents

Embedded documents approach works well when child documents need to be shown in the context of a parent document. However, at times the application might need to show child documents independent of a parent document. Let's say your appication has following needs:

- Members list page ordered by strength
- A member detail page

If we have a need for a members list page and it can be shown independently and without the house context, then it makes sense to create a separate collection for members

Before creating members collection and inserting data into it, let's unset members field from houses.

> db.houses.update({'name': 'Stark'}, {$unset: {'members': ''}})
> db.houses.update({'name': 'Lannister'}, {$unset: {'members': ''}})
> db.houses.update({'name': 'Baratheon'}, {$unset: {'members': ''}})

We can create a collection called members.

> db.createCollection('members')

We will keep a field house_id on members. Let's find _id of house Stark.

> db.houses.find({'name': 'Stark'}, {'_id': 1})

Output:

{ "_id" : ObjectId("61ca90df0ee3a7a24c8f101a") }

We can use this same _id as the member_id when inserting a member.

> db.members.insertOne({"name" : "Eddard Stark", "age" : 40, "power" : 90, 'house_id': ObjectId("61ca90df0ee3a7a24c8f101a")})

Let's insert more members for house Stark.

> db.members.insertMany([{ "name" : "Catelyn Stark", "age" : 38, "power" : 80, "house_id": ObjectId("61ca90df0ee3a7a24c8f101a") }, { "name" : "Robb Stark", "age" : 19, "power" : 70, "house_id": ObjectId("61ca90df0ee3a7a24c8f101a") }, { "name" : "Sansa Stark", "age" : 17, "power" : 60, "house_id": ObjectId("61ca90df0ee3a7a24c8f101a") }, { "name" : "Arya Stark", "age" : 15, "power" : 65, "house_id": ObjectId("61ca90df0ee3a7a24c8f101a") }])

You can find _id for house Lannister and similarly insert members for Lannister.

> db.members.insertMany([{ "name" : "Tyrion Lannister", "age" : 30, "power" : 80, "house_id" : ObjectId("61ca93840ee3a7a24c8f101b") }, { "name" : "Jamie Lannister", "age" : 32, "power" : 85, "house_id" : ObjectId("61ca93840ee3a7a24c8f101b") }, { "name" : "Tywin Lannister", "age" : 50, "power" : 87, "house_id" : ObjectId("61ca93840ee3a7a24c8f101b") }, { "name" : "Cersei Lannister", "age" : 32, "power" : 80, "house_id" : ObjectId("61ca93840ee3a7a24c8f101b") }])

Similarly, let's insert members for house Baratheon.

> db.members.insertMany([{ "name" : "Robert Baratheon", "age" : 40, "power" : 95, "house_id" : ObjectId("61ca94ba0ee3a7a24c8f101c") }, { "name" : "Renly Baratheon", "age" : 36, "power" : 75, "house_id" : ObjectId("61ca94ba0ee3a7a24c8f101c") }, { "name" : "Stannis Baratheon", "age" : 38, "power" : 82, "house_id" : ObjectId("61ca94ba0ee3a7a24c8f101c") }])

We want to query members ordered by their strength:

> db.members.find({}).sort({power: -1})

This would output:

{ "_id" : ObjectId("61cb11d30ee3a7a24c8f1026"), "name" : "Robert Baratheon", "age" : 40, "power" : 95, "house_id" : ObjectId("61ca94ba0ee3a7a24c8f101c") }
{ "_id" : ObjectId("61cabe9a0ee3a7a24c8f101d"), "name" : "Eddard Stark", "age" : 40, "power" : 90, "house_id" : ObjectId("61ca90df0ee3a7a24c8f101a") }
{ "_id" : ObjectId("61cb09f00ee3a7a24c8f1024"), "name" : "Tywin Lannister", "age" : 50, "power" : 87, "house_id" : ObjectId("61ca93840ee3a7a24c8f101b") }
{ "_id" : ObjectId("61cb09f00ee3a7a24c8f1023"), "name" : "Jamie Lannister", "age" : 32, "power" : 85, "house_id" : ObjectId("61ca93840ee3a7a24c8f101b") }
{ "_id" : ObjectId("61cb11d30ee3a7a24c8f1028"), "name" : "Stannis Baratheon", "age" : 38, "power" : 82, "house_id" : ObjectId("61ca94ba0ee3a7a24c8f101c") }
{ "_id" : ObjectId("61cabf8f0ee3a7a24c8f101e"), "name" : "Catelyn Stark", "age" : 38, "power" : 80, "house_id" : ObjectId("61ca90df0ee3a7a24c8f101a") }
{ "_id" : ObjectId("61cb09f00ee3a7a24c8f1022"), "name" : "Tyrion Lannister", "age" : 30, "power" : 80, "house_id" : ObjectId("61ca93840ee3a7a24c8f101b") }
{ "_id" : ObjectId("61cb09f00ee3a7a24c8f1025"), "name" : "Cersei Lannister", "age" : 32, "power" : 80, "house_id" : ObjectId("61ca93840ee3a7a24c8f101b") }
{ "_id" : ObjectId("61cb11d30ee3a7a24c8f1027"), "name" : "Renly Baratheon", "age" : 36, "power" : 75, "house_id" : ObjectId("61ca94ba0ee3a7a24c8f101c") }
{ "_id" : ObjectId("61cabf8f0ee3a7a24c8f101f"), "name" : "Robb Stark", "age" : 19, "power" : 70, "house_id" : ObjectId("61ca90df0ee3a7a24c8f101a") }
{ "_id" : ObjectId("61cabf8f0ee3a7a24c8f1021"), "name" : "Arya Stark", "age" : 15, "power" : 65, "house_id" : ObjectId("61ca90df0ee3a7a24c8f101a") }
{ "_id" : ObjectId("61cabf8f0ee3a7a24c8f1020"), "name" : "Sansa Stark", "age" : 17, "power" : 60, "house_id" : ObjectId("61ca90df0ee3a7a24c8f101a") }

Let's assume the application has pagination and each page shows 2 members. Hence the first page shows information about Robert Baratheon and Eddard Stark.

In case we want to show house information along with the members, say show the sigil and words along with each member, we would need to collect all the house_id which are relevant. In this case the house ids are [ObjectId("61ca94ba0ee3a7a24c8f101c"), ObjectId("61ca90df0ee3a7a24c8f101a")].

And then we would have to retreive the house information for these ids and combine them with members at application level.

> db.houses.find({_id: {$in: [ObjectId("61ca94ba0ee3a7a24c8f101c"), ObjectId("61ca90df0ee3a7a24c8f101a")]}})

This will give us information about both the houses and we can combine them with members at the application level.

Stay tuned for more posts on Mongo!