Emulating LEFT JOIN on MongoDB using MapReduce/Aggregation - mongodb

I have a mongo database with few collections such as a user in the system (id, name, email) and list of projects (id, name, list of users who have access)
User
{
"_id": 1,
"name": "John",
"email": "john#domain.com"
}
{
"_id": 2,
"name": "Sam",
"email": "sam#domain.com"
}
Project
{
"_id": 1,
"name": "My Project1",
"users": [1,2]
}
{
"_id": 2,
"name": My Project2",
"users": [2]
}
In my dashboard, I display a list of projects and the names of its users. To support names - I've changed the "users" field to now also include the name:
{
"_id": 2,
"name": "My Project2",
"users": [{"_id":2,"name":"Sam"}]
}
But on several pages, I now need to also print their email address and later on - maybe also display their image.
Since I don't want to start and embed the entire User document in each project, I'm looking for a way to do a LEFT JOIN and pick the values I need from the User collection.
Performances are NOT important so much on those pages and I rather prefer an easy way to manage my data. So basically I'm looking for a way to query for a list of all projects and associated users with different fields from the original User document.
I've read about the map-reduce and aggregation option of mongo and to be honest - I'm not sure which to use and how to achieve what I'm looking for.

MongoDb doesn't support joins in any form even by using MapReduce and Aggregation Framework. Only way you could implement join between collection is in your code. So just implement LEFT JOIN logic in your code.

Related

how to delete child entries while getting the matched parents using mongo shell

I am using mongodb shell command, I can get the collection data using db.collection.find(findQuery).
In my example, I have Books and Authors table and it is many to many associated between them.
Sample data on viewing from books table is(this data is viewing from postman),
{"authors": [
{
"id": "123",
"name": "Sivakumar"
},
{
"id": "124",
"name": "Ram"
}
],
"id": "456",
"title": "Believe Yourself"
}
I can delete author entry by db.authors.remove({name:"Ram"}). This deletes the Ram entry but I have to delete all authors entries which is associated to books using shell command. I didn't find any documentation for populate in shell command. How can I delete authors entries by using find and populate in mongo shell command.
I don't fully understand your post/requirement - $unset removes a field from a document.
you can simply not project the author field into any results - so in terms of output it isn't there and does not involve any actual delete.
You can create a new collection using the existing collection and simply not include the authors field.

What is the proper way to check if a user has liked a thread when he GET the thread? [duplicate]

Currently I am working on a mobile app. Basically people can post their photos and the followers can like the photos like Instagram. I use mongodb as the database. Like instagram, there might be a lot of likes for a single photos. So using a document for a single "like" with index seems not reasonable because it will waste a lot of memory. However, I'd like a user add a like quickly. So my question is how to model the "like"? Basically the data model is much similar to instagram but using Mongodb.
No matter how you structure your overall document there are basically two things you need. That is basically a property for a "count" and a "list" of those who have already posted their "like" in order to ensure there are no duplicates submitted. Here's a basic structure:
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3")
"photo": "imagename.png",
"likeCount": 0
"likes": []
}
Whatever the case, there is a unique "_id" for your "photo post" and whatever information you want, but then the other fields as mentioned. The "likes" property here is an array, and that is going to hold the unique "_id" values from the "user" objects in your system. So every "user" has their own unique identifier somewhere, either in local storage or OpenId or something, but a unique identifier. I'll stick with ObjectId for the example.
When someone submits a "like" to a post, you want to issue the following update statement:
db.photos.update(
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3"),
"likes": { "$ne": ObjectId("54bb2244a3a0f26f885be2a4") }
},
{
"$inc": { "likeCount": 1 },
"$push": { "likes": ObjectId("54bb2244a3a0f26f885be2a4") }
}
)
Now the $inc operation there will increase the value of "likeCount" by the number specified, so increase by 1. The $push operation adds the unique identifier for the user to the array in the document for future reference.
The main important thing here is to keep a record of those users who voted and what is happening in the "query" part of the statement. Apart from selecting the document to update by it's own unique "_id", the other important thing is to check that "likes" array to make sure the current voting user is not in there already.
The same is true for the reverse case or "removing" the "like":
db.photos.update(
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3"),
"likes": ObjectId("54bb2244a3a0f26f885be2a4")
},
{
"$inc": { "likeCount": -1 },
"$pull": { "likes": ObjectId("54bb2244a3a0f26f885be2a4") }
}
)
The main important thing here is the query conditions being used to make sure that no document is touched if all conditions are not met. So the count does not increase if the user had already voted or decrease if their vote was not actually present anymore at the time of the update.
Of course it is not practical to read an array with a couple of hundred entries in a document back in any other part of your application. But MongoDB has a very standard way to handle that as well:
db.photos.find(
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3"),
},
{
"photo": 1
"likeCount": 1,
"likes": {
"$elemMatch": { "$eq": ObjectId("54bb2244a3a0f26f885be2a4") }
}
}
)
This usage of $elemMatch in projection will only return the current user if they are present or just a blank array where they are not. This allows the rest of your application logic to be aware if the current user has already placed a vote or not.
That is the basic technique and may work for you as is, but you should be aware that embedded arrays should not be infinitely extended, and there is also a hard 16MB limit on BSON documents. So the concept is sound, but just cannot be used on it's own if you are expecting 1000's of "like votes" on your content. There is a concept known as "bucketing" which is discussed in some detail in this example for Hybrid Schema design that allows one solution to storing a high volume of "likes". You can look at that to use along with the basic concepts here as a way to do this at volume.

Best way to represent multilingual database on mongodb

I have a MySQL database to support a multilingual website where the data is represented as the following:
table1
id
is_active
created
table1_lang
table1_id
name
surname
address
What's the best way to achieve the same on mongo database?
You can either design a schema where you can reference or embed documents. Let's look at the first option of embedded documents. With you above application, you might store the information in a document as follows:
// db.table1 schema
{
"_id": 3, // table1_id
"is_active": true,
"created": ISODate("2015-04-07T16:00:30.798Z"),
"lang": [
{
"name": "foo",
"surname": "bar",
"address": "xxx"
},
{
"name": "abc",
"surname": "def",
"address": "xyz"
}
]
}
In the example schema above, you would have essentially embedded the table1_lang information within the main table1document. This design has its merits, one of them being data locality. Since MongoDB stores data contiguously on disk, putting all the data you need in one document ensures that the spinning disks will take less time to seek to a particular location on the disk. If your application frequently accesses table1 information along with the table1_lang data then you'll almost certainly want to go the embedded route. The other advantage with embedded documents is the atomicity and isolation in writing data. To illustrate this, say you want to remove a document which has a lang key "name" with value "foo", this can be done with one single (atomic) operation:
db.table.remove({"lang.name": "foo"});
For more details on data modelling in MongoDB, please read the docs Data Modeling Introduction, specifically Model One-to-Many Relationships with Embedded Documents
The other design option is referencing documents where you follow a normalized schema. For example:
// db.table1 schema
{
"_id": 3
"is_active": true
"created": ISODate("2015-04-07T16:00:30.798Z")
}
// db.table1_lang schema
/*
1
*/
{
"_id": 1,
"table1_id": 3,
"name": "foo",
"surname": "bar",
"address": "xxx"
}
/*
2
*/
{
"_id": 2,
"table1_id": 3,
"name": "abc",
"surname": "def",
"address": "xyz"
}
The above approach gives increased flexibility in performing queries. For instance, to retrieve all child table1_lang documents for the main parent entity table1 with id 3 will be straightforward, simply create a query against the collection table1_lang:
db.table1_lang.find({"table1_id": 3});
The above normalized schema using document reference approach also has an advantage when you have one-to-many relationships with very unpredictable arity. If you have hundreds or thousands of table_lang documents per give table entity, embedding has so many setbacks in as far as spacial constraints are concerned because the larger the document, the more RAM it uses and MongoDB documents have a hard size limit of 16MB.
The general rule of thumb is that if your application's query pattern is well-known and data tends to be accessed only in one way, an embedded approach works well. If your application queries data in many ways or you unable to anticipate the data query patterns, a more normalized document referencing model will be appropriate for such case.
Ref:
MongoDB Applied Design Patterns: Practical Use Cases with the Leading NoSQL Database By Rick Copeland

Is my MongoDB data model the right choice?

I'm going to build my first project (genealogy database) with MongoDB and nodejs and I am asking myself, if my data model is the right choice:
people document (simplified):
{
"_id": ObjectId("123"),
"modified": ISODate("2015-02-04T16:52:32.601Z"),
"birth": ISODate("1995-02-04T16:52:32.601Z"),
"name": "peter"
}, {
"_id": ObjectId("456"),
"modified": ISODate("2015-02-04T16:52:32.601Z"),
"birth": ISODate("1999-02-04T16:52:32.601Z"),
"name": "uschi"
}
relations document (simplified):
{
"sourceID": ObjectId("123"),
"targetID": ObjectId("456"),
"type": "Married",
"modified": ISODate("2015-02-04T16:52:32.599Z"),
"startrelation": ISODate("2001-02-04T16:52:32.601Z"),
"endrelation": ISODate("2007-02-04T16:52:32.601Z"),
"_id": ObjectId("54d24e5033bfc203aaaad590")
}
Yesterday I tried to retrieve a list with all people and their related people and got worries about my data model because I needed a lot of code to generate the following result:
items: [
{
"_id": ObjectId("123"),
"modified": ISODate("2015-02-04T16:52:32.601Z"),
"birth": ISODate("1995-02-04T16:52:32.601Z"),
"name": "peter"
"married": [{
"_id": ObjectId("456"),
"modified": ISODate("2015-02-04T16:52:32.601Z"),
"birth": ISODate("1999-02-04T16:52:32.601Z"),
"name": "uschi"
}, ...]
}, ...]
Is there are problem with that solution?
The main problem I see with this solution is that you are using MongoDB to store relational data. I have done this in the past and regretted it. Consider using Postgres. It's a relational db but also has a feature called hstore which allows you to store and query arbitrarily structured json if your schema has some areas that may not be well defined.
It seems that graph-database would be perfect match for you problem domain.
This way you wont have to implement all the logic related to "relations" in your application. GraphDBs natively understand them.
i.e. neo4j
Graph databases allow for
easily handle complex relations
very quick traversal of relations
fast searching for relationships of the type "friends of a friend" or who is Jim in relation to Janet
In general, if you are planning to query your data in various ways looking on relations, graph database is the way to go,

How to Model a "likes" voting system with MongoDB

Currently I am working on a mobile app. Basically people can post their photos and the followers can like the photos like Instagram. I use mongodb as the database. Like instagram, there might be a lot of likes for a single photos. So using a document for a single "like" with index seems not reasonable because it will waste a lot of memory. However, I'd like a user add a like quickly. So my question is how to model the "like"? Basically the data model is much similar to instagram but using Mongodb.
No matter how you structure your overall document there are basically two things you need. That is basically a property for a "count" and a "list" of those who have already posted their "like" in order to ensure there are no duplicates submitted. Here's a basic structure:
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3")
"photo": "imagename.png",
"likeCount": 0
"likes": []
}
Whatever the case, there is a unique "_id" for your "photo post" and whatever information you want, but then the other fields as mentioned. The "likes" property here is an array, and that is going to hold the unique "_id" values from the "user" objects in your system. So every "user" has their own unique identifier somewhere, either in local storage or OpenId or something, but a unique identifier. I'll stick with ObjectId for the example.
When someone submits a "like" to a post, you want to issue the following update statement:
db.photos.update(
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3"),
"likes": { "$ne": ObjectId("54bb2244a3a0f26f885be2a4") }
},
{
"$inc": { "likeCount": 1 },
"$push": { "likes": ObjectId("54bb2244a3a0f26f885be2a4") }
}
)
Now the $inc operation there will increase the value of "likeCount" by the number specified, so increase by 1. The $push operation adds the unique identifier for the user to the array in the document for future reference.
The main important thing here is to keep a record of those users who voted and what is happening in the "query" part of the statement. Apart from selecting the document to update by it's own unique "_id", the other important thing is to check that "likes" array to make sure the current voting user is not in there already.
The same is true for the reverse case or "removing" the "like":
db.photos.update(
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3"),
"likes": ObjectId("54bb2244a3a0f26f885be2a4")
},
{
"$inc": { "likeCount": -1 },
"$pull": { "likes": ObjectId("54bb2244a3a0f26f885be2a4") }
}
)
The main important thing here is the query conditions being used to make sure that no document is touched if all conditions are not met. So the count does not increase if the user had already voted or decrease if their vote was not actually present anymore at the time of the update.
Of course it is not practical to read an array with a couple of hundred entries in a document back in any other part of your application. But MongoDB has a very standard way to handle that as well:
db.photos.find(
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3"),
},
{
"photo": 1
"likeCount": 1,
"likes": {
"$elemMatch": { "$eq": ObjectId("54bb2244a3a0f26f885be2a4") }
}
}
)
This usage of $elemMatch in projection will only return the current user if they are present or just a blank array where they are not. This allows the rest of your application logic to be aware if the current user has already placed a vote or not.
That is the basic technique and may work for you as is, but you should be aware that embedded arrays should not be infinitely extended, and there is also a hard 16MB limit on BSON documents. So the concept is sound, but just cannot be used on it's own if you are expecting 1000's of "like votes" on your content. There is a concept known as "bucketing" which is discussed in some detail in this example for Hybrid Schema design that allows one solution to storing a high volume of "likes". You can look at that to use along with the basic concepts here as a way to do this at volume.