Tags in MongoDB - mongodb

I am new to MongoDB.
I have a product which can have multiple tags. I saw tutorials where the collection was like:
{
_id: 1234,
tags: ["stationery","electronics"]
}
{
_id: 456,
tags: ["home","electronics"]
}
{
_id: 135,
tags: ["books","stationery"]
}
I have a fixed list of tags. All my products will belong to these tags. Now my question is how to store such a list so that when a new product is added I can display this list and ask user to select tags only from this list?
Should I make another document called Tags and save reference in each collection? If I do this then while searching for products belonging to say Books category I will have to run 2 queries.
Please suggest!

Store tags like you see in tutorials.
{
_id: 1234,
tags: ["stationary","electronics"]
}
This allows for easy queries. Now, to serve your tag-adding UI, I'd create a separate collection "tags", which would consist of very simple documents
{ name: 'stationary' }
{ name: 'electronics' }
{ name: 'books' }
(mongodb will create an _id field on them, but you don't care about it).
So your UI will read documents from this collection and use name property to populate tags property of products.

Related

Storing enum to MongoDb (for managing tag names)

If we have a collection of books, we can assign tags of authors into an array as follows:
Books collection
{
...,
"authors" : ["John Michaels", "Bill Williams"]
}
This can cause problems if an author's name changes.
Instead, I was thinking of assigning an integer value to each author and creating a 'tags' collection:
Tags collection
{
“tags” : [
{“John Michaels” : 0},
{“Jane Collins” : 1},
{“Bill Williams” : 2}
]
}
Here is my books collection, here we specify that ‘John Michaels’ and ‘Bill Williams’ are the authors:
{
…,
“authors” : [0, 2]
}
If I ever needed to change the author’s name ‘Bill Williams’ to ‘Bill H. Williams’, there would be no problem because the value stored in the books collection remains unchanged.
My Question is if MongoDB has something like enums that will automatically increment the integral value or if there is something else built into MongoDB to help with this type of situation.
Thank you
This is typical use case of referencing other collections. So, you should have 2 collections:
Authors collection:
{
_id: ObjectId,
name: String,
... // Other fields
}
Books collection:
{
_id: ObjectId,
authors: [ ObjectId ], // References to documents from Author collection
... // Other fields
}
So, in authors property of the Books collection, you store _id values of all the authors. Then when you fetch book document, you can easily fetch up-to-date authors data from Authors collection.

MongoDB schema design: reference by ID vs. reference by name?

With this simple example
(use short ObjectId to make it read easier)
Tag documents:
{
_id: ObjectId('0001'),
name: 'JavaScript',
// other data
},
{
_id: ObjectId('0002'),
name: 'MongoDB',
// other data
},
...
Assume that we need a individual tag collection, e.g. we need to store some information on each tag.
If reference by ID:
// a book document
{
_id: ObjectId('9876'),
title: 'MEAN Web Development',
tags: [ObjectId('0001'), ObjectId('0002'), ...]
}
If reference by name:
{
_id: ObjectId('9876'),
title: 'MEAN Web Development',
tags: ['JavaScript', 'MongoDB', ...]
}
It's known that "reference by ID" is feasible.
I'm thinking if use "reference by name", a query for book's info only need to find within the book collection, we could know the tags' name without a join ($lookup) operation, which should be faster.
If the app performs a tag checking before book creating and modifying, this should also be feasible, and faster.
I'm still not very sure:
Is there any hider on "reference by name" ?
Will "reference by name" slower on "finding all books with a given tag" ? Maybe ObjectId is somehow special ?
Thanks.
I would say it depends on what your use case is for tags. As you say, it will be more expensive to do a $lookup to retrieve tag names if you reference by id. On the other hand, if you expect that tag names may change frequently, all documents in the book collection containing that tag will need to be updated every change.
The ObjectID is simply a 12 byte value, which is autogenerated by a driver if no _id is present in inserted documents. See the MongoDB docs for more info. The only "special behavior" would be the fact that _id has an index by default. An index will speedup lookups in general, but indexes can be created on any field, not just _id.
In fact, the _id does not need to be an ObjectID. It is perfectly legal to have documents with integer _id values for instance:
{
_id: 1,
name: 'Javascript'
},
{
_id: 2,
name: 'MongoDB'
},

Meteor: How do you populate a field from one collection into another collection with _id field?

In mongo I have a document that stores pending userId's in a collaborators object array and looks like this:
researchThread {
_id: 4374583575745756
pending: {
collaborators: [
{
userId: '13745845754745753'
},
{
userId: '23755845854745731'
},
{
userId: '33755845653741736'
}]
}
}
The userId is the _id field for the user from the users collection. Each user also has a name and an email field.
How can I populate the name and email fields from the user collection, into this document for each user in the researchThread.pending.collaborators object array? And also, will the populated data be reactive when used in the templates?
Loop through each collaborator, find the relevant user document by searching the users collection for the id, and update the researchThread document with that information.
The data will be reactive if the researchThread collection is a Meteor.Collection which you're drawing from in your templates.
However, why do you want to copy the user data? Why not just have Meteor query the users collection based on the researchThread userId when you need the data in the template?

Modeling a user-to-item database in MongoDB

I've got two tables.
Movies, which lists all the movies in the database.
Users, which has the users.
Usually, I'd create a join table to connect a user to a movie (as in, the user likes a certain movie).
However, since you can't do that in MongoDB, what should I do? I want to be able to find all the movies a certain user likes, as well as all the users that like a certain movie, and movies that a given set of users like.
Embedded documents?
Thanks!
For a many-to-many relationship between movies and users like this, I'd probably have separate collections for each, but denormalise users who like a movie into the movies collection by embedding their _id and name fields into a likes array.
This way, you can retrieve the names of users who like a movie without having to make a separate lookup to the users collection, but still have extra user fields that won't be embedded inside movies.
The trade off is that you'd need to update both collections if a user changed their name, but I think that's a worthwhile cost.
db.movies
{
_id: <objectid>,
name:"Star Wars",
likes: [
{ userid: <user-objectid>, name: "John Smith" },
{ userid: <user-objectid>, name: "Alice Brown" }
]
}
db.users
{
_id: <objectid>,
name: "John Smith",
username: "jsmith",
passwordhash: "d131dd02c5e6eec4693d"
}
Movies a certain user likes
db.movies.find( { "likes.userid": <user-objectid> }, { "name": 1 } );
Users that like a certain movie
db.movies.find( { "_id": <movie-objectid> },
{ "likes.userid": 1, "likes.name": 1 } );
Movies that a given set of users like
db.movies.find( { "likes.userid":
{ $in: [ <user1-objectid>, <user2-objectid> ] } },
{ "name": 1 } );
You can do that in MongoDB, you just can't do the 'join' operation at the database level, you have to do it at the application level.
If you have millions of movies and millions of users you have to do it using a join collection because there is no way you can fit the number of likes for one movie or one user into either document.
Lookup the User and get their _id
Lookup the UserMovie documents with matching _id values
Lookup the Movies as necessary
The denormalization you might do here would be to store the Movie names in the UserMovie collection so you can display the movies a user likes without having to fetch each one from the Movie collection.
A possible optimization
One optimization you can try on this scheme is to create documents in the UserMovie collection which contain multiple relationships instead of using a single document for each relationship (like you would in SQL).
For example, if the most common access pattern is finding what movies a user likes, you could group them by user and put them in one or more documents indexed by that user id. Take a look at the StatementGroups in this blog post for a more complete explanation.

MongoDB design - tags

I'm new with MongoDB. I have a design question, about performance of MongoDB. Lets say I have the class Movies with two properties, Name and Director. Also I want to tag this Movie Class. Is better to add a new propertie of strings[] to this class, or to create a new class MovieTags? I know I will query this tags a lot because I will use an autocomplete on the UI. For this autocomplete function I only need the tags, not the Movie object.
What option is better? add a propertie of strings[] or reference to a collection of MovieTags? Thinking in performance... of course in both cases the indexing will be done.
Should I use a MapReduce? To only select the tags, for the autocomplete function if I use an embebed string[] object? How?
Thanks!
I'd probably go with a schema like this, which stores the tags in a string array field:
db.movies.insert({
name: "The Godfather",
director: "Francis Ford Coppola",
tags: [ "mafia", "wedding", "violence" ]
})
db.movies.insert({
name: "Pulp Fiction",
director: "Quentin Tarantino",
tags: [ "briefcase", "violence", "gangster" ]
})
db.movies.insert({
name: "Inception",
director: "Christopher Nolan",
tags: [ "dream", "thief", "subconscious" ]
})
You wouldn't need map-reduce for this type of query. By embedding the tags inside the the movie document you can take advantage of MongoDB's multikey feature, and find movies with a given tag using single find() query like this:
db.movies.find( { tags: "dream" } )
And like you said, it's also worth adding an index to the multikey array to improve query performance:
db.movies.ensureIndex( { tags: 1 } )
You can always filter the fields that are returned as part of the query result.
The link to the docs that details how to do so is http://docs.mongodb.org/manual/tutorial/query-documents/#Querying-FieldSelection
This will let you filter out parts of the movie object that you re not interested in.