MongoDB cookbook
Familiarise yourself with basics of MongoDB as we start our journey into understanding the support for aggregations in Mongo.

Agenda
MongoDB can be a great choice if the schema needs to be flexible. In this post, let's revisit the queries needed to get frequently required data.
This post assumes the following:
- MongoDB is installed
- Access to MongoDB shell
Queries
Let's get access to a MongoDB shell by using command mongo
.
$ mongo
Let's see the various queries that can be invoked.
How to show the databases
This prints a list of all databases on this server.
> show databases;
How to create a collection
A MongoDB collection is synonymous to a relational database table. A collection can be created by using method createCollection
.
> db.createCollection('people');
How to insert a document into a collection
A MongoDB document is synonymous to a relational database row. It can be inserted in the following way:
> db.people.insert({'name': 'Eddard Stark', 'house': 'Stark', 'location': 'Winterfell'})
How to do bulk insertion of documents
Multiple documents can be inserted at once by using method insertMany
.
> db.people.insertMany([{'name': 'Cat Stark', 'house': 'Tully', 'location': 'Winterfell'}, {'name': 'Arya Stark', 'house': 'Stark', 'location': 'Winterfell'}, {'name': 'Sansa Stark', 'house': 'Stark', 'location': 'Winterfell'}, {'name': 'Jon Snow', 'location': 'Winterfell'}])
Find all documents
We can get all documents by using method find()
and not passing any argument to it.
> db.people.find()
Find a document where name is Arya Stark
> db.people.find({'name': 'Arya Stark'})
{ "_id" : ObjectId("61c34002fa3c6ac0076acc11"), "name" : "Arya Stark", "house" : "Stark", "location" : "Winterfell" }
Find all documents where house is Stark
> db.people.find({'house': 'Stark'})
{ "_id" : ObjectId("61c33f5afa3c6ac0076acc0f"), "name" : "Eddard Stark", "house" : "Stark", "location" : "Winterfell" }
{ "_id" : ObjectId("61c34002fa3c6ac0076acc11"), "name" : "Arya Stark", "house" : "Stark", "location" : "Winterfell" }
{ "_id" : ObjectId("61c34002fa3c6ac0076acc12"), "name" : "Sansa Stark", "house" : "Stark", "location" : "Winterfell" }
Apply a limit while finding documents
> db.people.find({'house': 'Stark'}).limit(2)
{ "_id" : ObjectId("61c33f5afa3c6ac0076acc0f"), "name" : "Eddard Stark", "house" : "Stark", "location" : "Winterfell" }
{ "_id" : ObjectId("61c34002fa3c6ac0076acc11"), "name" : "Arya Stark", "house" : "Stark", "location" : "Winterfell" }
Find documents with multiple conditions
Multiple <field> : <value>
pairs can be passed while using find
.
> db.people.find({'house': 'Stark', 'name': 'Eddard Stark'})
This will act like SQL
's AND
. Hence only documents matching both the conditions would be returned.
Find document which satisfy any of the conditions
We want to find documents where house is either Stark
or Tully
.
> db.people.find({'$or': [{'house': 'Stark'}, {'house': 'Tully'}]})
The output would look like:
{ "_id" : ObjectId("61c33f5afa3c6ac0076acc0f"), "name" : "Eddard Stark", "house" : "Stark", "location" : "Winterfell" }
{ "_id" : ObjectId("61c34002fa3c6ac0076acc10"), "name" : "Cat Stark", "house" : "Tully", "location" : "Winterfell" }
{ "_id" : ObjectId("61c34002fa3c6ac0076acc11"), "name" : "Arya Stark", "house" : "Stark", "location" : "Winterfell" }
{ "_id" : ObjectId("61c34002fa3c6ac0076acc12"), "name" : "Sansa Stark", "house" : "Stark", "location" : "Winterfell" }
Find documents where a field exists
We want to find all documents where a house
has been set.
> db.people.find({'house': {'$exists': true}})
This would return all documents except the one for Jon Snow
.
Similarly, we can find all documents where a house
has not been set.
> db.people.find({'house': {'$exists': true}})
This would return only document for Jon Snow
.
Update documents
We realise that we haven't added an age
field on the characters. Let's update few people, and add an age.
> db.people.updateOne({'name': 'Eddard Stark'}, {$set: {'age': 42}})
Hence, we need to use method updateOne
. We need to pass two arguments to the method. The first argument specifies the filter which finds out the documents that needs to be updated. In our case, the filter is name
equal to Eddard Stark
. And we want a field called age
to be set on this document. Make sure to use $set
and pass the key and value which needs to be set.
Make sure you don't miss $set
and do something like:
> db.people.updateOne({'name': 'Eddard Stark'}, {'age': 42})
If you do the above, the existing fields of this document would be erased, i.e name, house and location would be erased. And only field age
would be available.
Similarly, let's update few more documents.
> db.people.updateOne({'name': 'Cat Stark'}, {$set: {'age': 40}})
> db.people.updateOne({'name': 'Arya Stark'}, {$set: {'age': 12}})
> db.people.updateOne({'name': 'Sansa Stark'}, {$set: {'age': 15}})
List documents ordered by a column
Let's list all documents of house Stark ordered by the age in ascending order.
> db.people.find({'house': 'Stark'}).sort({'age': 1})
age
can be specified as -1
if we want it in descending order.
Selecting only few fields
Say we only want to fetch the name
and age
of the people, then we can write the following query:
> db.people.find({house: "Stark"}, {name: 1, age:1})
Refer Mongodb projection for a detailed explanataion.
Adding indexes
When we query for a particular field, unless we have an index, Mongo would have to perform a collection scan. A collection scan is similar to a full table scan in other databases.
Hence, if we find ourselves frequently querying on a particular field, then it makes sense to add that field to the index. Let's say our application allows filtering by house
of the characters, and hence we end up doing filter on house
very frequently, then we should add an index on field house
.
> db.people.createIndex({'house': 1})
After adding this index, statements like db.people.find({'house': 'Stark'})
should be significantly faster than executing the same statement without an index.
Also .sort()
on a field can gain in execution speed if there is an index on the field on which we are performing the ordering. Hence, if we find ourselves frequently doing a sort({age: 1})
, then it makes sense to add an index on age
.
Add more data
Let's add some more data to keep things interesting.
> db.people.insertMany([{'name': 'Tywin Lannister', 'house': 'Lannister', 'location': 'Casterly Rock', 'age': 55}, {'name': 'Jamie Lannister', 'house': 'Lannister', 'location': 'Casterly Rock', 'age': 35}, {'name': 'Tyrion Lannister', 'house': 'Lannister', 'location': 'Casterly Rock', alias: 'Imp', age: 30}, {'name': 'Cersei Lannister', 'location': 'Casterly Rock', house: 'Lannister', age: 35}])
Find total number of characters
We can use the following statement to find total number of characters in our collection.
> db.people.aggregate([{$count: "total_characters"}])
This would return {'total_charaters': 9}
Find the oldest member in each house
> db.people.aggregate([{$group: {_id: "$house", maximum: {$max: "$age"}}}])
This should output something like:
{ "_id" : "Lannister", "maximum" : 55 }
{ "_id" : null, "maximum" : null }
{ "_id" : "Tully", "maximum" : 40 }
{ "_id" : "Stark", "maximum" : 42 }
We saw an entry for null
because we have one document without a house
field. We do not have a house
set for Jon Snow
. Hence that contributes to null
.
Find the number of members in each house
> db.people.aggregate([{$group: {_id: "$house", num_members: {$count: {}}}}])
This should output:
{ "_id" : "Lannister", "num_members" : 4 }
{ "_id" : null, "num_members" : 1 }
{ "_id" : "Tully", "num_members" : 1 }
{ "_id" : "Stark", "num_members" : 3 }
Find number of members in each house ordered by highest number of members
> db.people.aggregate([{$group: {_id: "$house", num_members: {$count: {}}}}, {$sort: {num_members: -1}}])
This would output:
{ "_id" : "Lannister", "num_members" : 4 }
{ "_id" : "Stark", "num_members" : 3 }
{ "_id" : "Tully", "num_members" : 1 }
{ "_id" : null, "num_members" : 1 }
Associations in Mongodb
The application would rarely be working with a single type of entity like this. Instead the application might be working with two kind of entities, say House
and Member
. Each House
can have it's own fields and Member
can have their own fields. And a member belongs to a particular house.
Considering an example, a house could have fields like name
, sigil
, seat
and words
. A member can have fields like name
, age
, power
, kills
etc. And a member would belong to a house.
There are two ways to manage associations in Mongo:
- Embedding child documents in a parent document
- Keeping references between documents
Embedded documents
The approach to take depends on the the application needs and kind of queries required to drive our application. Let's assume our application would need the following pages:
- List all the houses
- List all members of the houses on the house detail page.
In a traditional relational database, the schema would probably have two tables. One table would be house
and the other table would be member
. member
would have a foreign key
to house
. Certain requirements like Sort the houses by the number of members
would need a SQL query which would perform join between the two tables.
In Mongodb, we can follow embedded documents
approach for such requirements.
Let's create a document for house Stark along with it's members.
> db.houses.insertOne({'name': 'Stark', 'sigil': 'Direwolf', 'seat': 'Winterfell', 'words': 'Winter is coming', members: [{'name': 'Eddard Stark', 'age': 40, 'power': 90}, {'name': 'Catelyn Stark', 'age': 38, 'power': 80}, {'name': 'Robb Stark', 'age': 19, 'power': 70}, {'name': 'Sansa Stark', 'age':17, 'power': 60}, {'name': 'Arya Stark', 'age': 15, 'power': 65}]})
You can notice that we have set a key called members
on the house document itself. It is an array. We can consider each entry of this array as a document as well. Hence there is a parent document for house and it contains embedded child documents.
The embedded child documents look like:
[
{'name': 'Eddard Stark', 'age': 40, 'power': 90},
{'name': 'Catelyn Stark', 'age': 38, 'power': 80},
{'name': 'Robb Stark', 'age': 19, 'power': 70},
{'name': 'Sansa Stark', 'age':17, 'power': 60},
{'name': 'Arya Stark', 'age': 15, 'power': 65}
]
Let's insert a document for house Lannister with members embedded in it.
> db.houses.insertOne({'name': 'Lannister', 'sigil': 'Lion', 'seat': 'Casterly Rock', 'words': 'Hear me roar', members: [{'name': 'Tyrion Lannister', 'age': 30, 'power': 80}, {'name': 'Jamie Lannister', 'age': 32, 'power': 85}, {'name': 'Tywin Lannister', 'age': 50, 'power': 87}, {'name': 'Cersei Lannister', 'age':32, 'power': 80}]})
Let's insert a document for house Baratheon with members embedded in it.
> db.houses.insertOne({'name': 'Baratheon', 'sigil': 'Stag', 'seat': "Storm's End", 'words': 'Ours is the fury', members: [{'name': 'Robert Baratheon', 'age': 40, 'power': 95}, {'name': 'Renly Baratheon', 'age': 36, 'power': 75}, {'name': 'Stannis Baratheon', 'age': 38, 'power': 82}]})
Executing the following statement would return all the houses. This can be used on the list page which lists all the houses.
> db.houses.find({})
Say the list page doesn't show the members of the house. In such case, we can only ask the Mongo server to return the specific fields we need. The query would look like:
> db.houses.find({}, {'name': 1, 'sigil': 1, 'seat': 1, 'words': 1})
This would output:
{ "_id" : ObjectId("61ca90df0ee3a7a24c8f101a"), "name" : "Stark", "sigil" : "Direwolf", "seat" : "Winterfell", "words" : "Winter is coming" }
{ "_id" : ObjectId("61ca93840ee3a7a24c8f101b"), "name" : "Lannister", "sigil" : "Lion", "seat" : "Casterly Rock", "words" : "Hear me roar" }
{ "_id" : ObjectId("61ca94ba0ee3a7a24c8f101c"), "name" : "Baratheon", "sigil" : "Stag", "seat" : "Storm's End", "words" : "Ours is the fury" }
You would notice, we have passed two arguments to find
. The first argument is the condition. As we want all documents, hence the condition is {}
. The second argument specifies the fields we want. In our example it is {'name': 1, 'sigil': 1, 'seat': 1, 'words': 1}
. This is called a projection, and you can read more about it here.
We only wanted to exclude the members
field. There is a better way of writing such projections. We can write it as:
> db.houses.find({}, {'members': 0})
The output would look exactly like the previous query's output.
Let's say our application also supports a detail page for a house
. It displays detail about the house like it's name
, 'sigil' etc. and also the members. In such case, the user would probably click on a house
on the list page and will then navigate to the detail page. Hence, we can filter the house by the id and get all the details.
> db.houses.find({_id: ObjectId("61ca90df0ee3a7a24c8f101a")})
The application can then loop over the members
and display the member details too.
Had we used a relational database, this would have either required 2 queries or a join between the tables. Joins are expensive as the data of two different tables isn't stored together and might be far apart on the disk.
Say we want to allow ordering on the house list page. We want to order by the strength of each house. This involves doing a sum of power
of members for each house. The query to get houses ordered by number of members would be:
> db.houses.aggregate([{$unwind: "$members"}, {$group: {_id: "$name", strength: {$sum: "$members.power"}}}, {$sort: {strength: -1}}])
What we have used here is an aggregation pipeline. An aggregation pipeline is invoked by using function aggregate()
. It takes an array as an argument. We have passed the following three entries in the array:
- {$unwind: "$members"}
- {$group: {_id: "$name", strength: {$sum: "$members.power"}}}
- {$sort: {strength: -1}}
Read more about aggregation pipeline to understand it in detail.
In a relational database, this would again need a join between the two tables and then applying a group by on the joined data.
References across documents
Embedded documents approach works well when child documents need to be shown in the context of a parent document. However, at times the application might need to show child documents independent of a parent document. Let's say your appication has following needs:
- Members list page ordered by strength
- A member detail page
If we have a need for a members list page and it can be shown independently and without the house context, then it makes sense to create a separate collection for members
Before creating members collection and inserting data into it, let's unset members
field from houses.
> db.houses.update({'name': 'Stark'}, {$unset: {'members': ''}})
> db.houses.update({'name': 'Lannister'}, {$unset: {'members': ''}})
> db.houses.update({'name': 'Baratheon'}, {$unset: {'members': ''}})
We can create a collection called members
.
> db.createCollection('members')
We will keep a field house_id
on members. Let's find _id
of house Stark
.
> db.houses.find({'name': 'Stark'}, {'_id': 1})
Output:
{ "_id" : ObjectId("61ca90df0ee3a7a24c8f101a") }
We can use this same _id
as the member_id
when inserting a member.
> db.members.insertOne({"name" : "Eddard Stark", "age" : 40, "power" : 90, 'house_id': ObjectId("61ca90df0ee3a7a24c8f101a")})
Let's insert more members for house Stark.
> db.members.insertMany([{ "name" : "Catelyn Stark", "age" : 38, "power" : 80, "house_id": ObjectId("61ca90df0ee3a7a24c8f101a") }, { "name" : "Robb Stark", "age" : 19, "power" : 70, "house_id": ObjectId("61ca90df0ee3a7a24c8f101a") }, { "name" : "Sansa Stark", "age" : 17, "power" : 60, "house_id": ObjectId("61ca90df0ee3a7a24c8f101a") }, { "name" : "Arya Stark", "age" : 15, "power" : 65, "house_id": ObjectId("61ca90df0ee3a7a24c8f101a") }])
You can find _id
for house Lannister and similarly insert members for Lannister.
> db.members.insertMany([{ "name" : "Tyrion Lannister", "age" : 30, "power" : 80, "house_id" : ObjectId("61ca93840ee3a7a24c8f101b") }, { "name" : "Jamie Lannister", "age" : 32, "power" : 85, "house_id" : ObjectId("61ca93840ee3a7a24c8f101b") }, { "name" : "Tywin Lannister", "age" : 50, "power" : 87, "house_id" : ObjectId("61ca93840ee3a7a24c8f101b") }, { "name" : "Cersei Lannister", "age" : 32, "power" : 80, "house_id" : ObjectId("61ca93840ee3a7a24c8f101b") }])
Similarly, let's insert members for house Baratheon.
> db.members.insertMany([{ "name" : "Robert Baratheon", "age" : 40, "power" : 95, "house_id" : ObjectId("61ca94ba0ee3a7a24c8f101c") }, { "name" : "Renly Baratheon", "age" : 36, "power" : 75, "house_id" : ObjectId("61ca94ba0ee3a7a24c8f101c") }, { "name" : "Stannis Baratheon", "age" : 38, "power" : 82, "house_id" : ObjectId("61ca94ba0ee3a7a24c8f101c") }])
We want to query members ordered by their strength:
> db.members.find({}).sort({power: -1})
This would output:
{ "_id" : ObjectId("61cb11d30ee3a7a24c8f1026"), "name" : "Robert Baratheon", "age" : 40, "power" : 95, "house_id" : ObjectId("61ca94ba0ee3a7a24c8f101c") }
{ "_id" : ObjectId("61cabe9a0ee3a7a24c8f101d"), "name" : "Eddard Stark", "age" : 40, "power" : 90, "house_id" : ObjectId("61ca90df0ee3a7a24c8f101a") }
{ "_id" : ObjectId("61cb09f00ee3a7a24c8f1024"), "name" : "Tywin Lannister", "age" : 50, "power" : 87, "house_id" : ObjectId("61ca93840ee3a7a24c8f101b") }
{ "_id" : ObjectId("61cb09f00ee3a7a24c8f1023"), "name" : "Jamie Lannister", "age" : 32, "power" : 85, "house_id" : ObjectId("61ca93840ee3a7a24c8f101b") }
{ "_id" : ObjectId("61cb11d30ee3a7a24c8f1028"), "name" : "Stannis Baratheon", "age" : 38, "power" : 82, "house_id" : ObjectId("61ca94ba0ee3a7a24c8f101c") }
{ "_id" : ObjectId("61cabf8f0ee3a7a24c8f101e"), "name" : "Catelyn Stark", "age" : 38, "power" : 80, "house_id" : ObjectId("61ca90df0ee3a7a24c8f101a") }
{ "_id" : ObjectId("61cb09f00ee3a7a24c8f1022"), "name" : "Tyrion Lannister", "age" : 30, "power" : 80, "house_id" : ObjectId("61ca93840ee3a7a24c8f101b") }
{ "_id" : ObjectId("61cb09f00ee3a7a24c8f1025"), "name" : "Cersei Lannister", "age" : 32, "power" : 80, "house_id" : ObjectId("61ca93840ee3a7a24c8f101b") }
{ "_id" : ObjectId("61cb11d30ee3a7a24c8f1027"), "name" : "Renly Baratheon", "age" : 36, "power" : 75, "house_id" : ObjectId("61ca94ba0ee3a7a24c8f101c") }
{ "_id" : ObjectId("61cabf8f0ee3a7a24c8f101f"), "name" : "Robb Stark", "age" : 19, "power" : 70, "house_id" : ObjectId("61ca90df0ee3a7a24c8f101a") }
{ "_id" : ObjectId("61cabf8f0ee3a7a24c8f1021"), "name" : "Arya Stark", "age" : 15, "power" : 65, "house_id" : ObjectId("61ca90df0ee3a7a24c8f101a") }
{ "_id" : ObjectId("61cabf8f0ee3a7a24c8f1020"), "name" : "Sansa Stark", "age" : 17, "power" : 60, "house_id" : ObjectId("61ca90df0ee3a7a24c8f101a") }
Let's assume the application has pagination and each page shows 2 members. Hence the first page shows information about Robert Baratheon
and Eddard Stark
.
In case we want to show house information along with the members, say show the sigil
and words
along with each member, we would need to collect all the house_id
which are relevant. In this case the house ids are [ObjectId("61ca94ba0ee3a7a24c8f101c"), ObjectId("61ca90df0ee3a7a24c8f101a")]
.
And then we would have to retreive the house information for these ids and combine them with members at application level.
> db.houses.find({_id: {$in: [ObjectId("61ca94ba0ee3a7a24c8f101c"), ObjectId("61ca90df0ee3a7a24c8f101a")]}})
This will give us information about both the houses and we can combine them with members at the application level.
Stay tuned for more posts on Mongo!