MongoDB design - tags - mongodb

I'm new with MongoDB. I have a design question, about performance of MongoDB. Lets say I have the class Movies with two properties, Name and Director. Also I want to tag this Movie Class. Is better to add a new propertie of strings[] to this class, or to create a new class MovieTags? I know I will query this tags a lot because I will use an autocomplete on the UI. For this autocomplete function I only need the tags, not the Movie object.
What option is better? add a propertie of strings[] or reference to a collection of MovieTags? Thinking in performance... of course in both cases the indexing will be done.
Should I use a MapReduce? To only select the tags, for the autocomplete function if I use an embebed string[] object? How?
Thanks!

I'd probably go with a schema like this, which stores the tags in a string array field:
db.movies.insert({
name: "The Godfather",
director: "Francis Ford Coppola",
tags: [ "mafia", "wedding", "violence" ]
})
db.movies.insert({
name: "Pulp Fiction",
director: "Quentin Tarantino",
tags: [ "briefcase", "violence", "gangster" ]
})
db.movies.insert({
name: "Inception",
director: "Christopher Nolan",
tags: [ "dream", "thief", "subconscious" ]
})
You wouldn't need map-reduce for this type of query. By embedding the tags inside the the movie document you can take advantage of MongoDB's multikey feature, and find movies with a given tag using single find() query like this:
db.movies.find( { tags: "dream" } )
And like you said, it's also worth adding an index to the multikey array to improve query performance:
db.movies.ensureIndex( { tags: 1 } )

You can always filter the fields that are returned as part of the query result.
The link to the docs that details how to do so is http://docs.mongodb.org/manual/tutorial/query-documents/#Querying-FieldSelection
This will let you filter out parts of the movie object that you re not interested in.

Related

How do I make a mongoose find request where I am trying to find a document that contains all the tags in the query

How do i make a mongoose find request to find something based on a inner field that has all of what is in the query. Example:
const query = [ "Power", "Logic" ]
const results = documents.filter(schematic => schematic.tags.filter(tag => query.includes(tag).length == query.length)
I am trying to make a tag system where you can search for schematics that contain certain tags, I am wondering how I would make a request that searches for a schematic that contains all of the tags in the query which is a array of the names of the tags and the tags on a schematic are also a array of objects where element.name is the tags name.
You have to use $all operator - https://docs.mongodb.com/manual/reference/operator/query/all/
db.collection.find({'element.name': {$all: [ "Power", "Logic" ]}})
Try this one:
db.collection.find({
name: { $in: query }
})

MongoDB schema design: reference by ID vs. reference by name?

With this simple example
(use short ObjectId to make it read easier)
Tag documents:
{
_id: ObjectId('0001'),
name: 'JavaScript',
// other data
},
{
_id: ObjectId('0002'),
name: 'MongoDB',
// other data
},
...
Assume that we need a individual tag collection, e.g. we need to store some information on each tag.
If reference by ID:
// a book document
{
_id: ObjectId('9876'),
title: 'MEAN Web Development',
tags: [ObjectId('0001'), ObjectId('0002'), ...]
}
If reference by name:
{
_id: ObjectId('9876'),
title: 'MEAN Web Development',
tags: ['JavaScript', 'MongoDB', ...]
}
It's known that "reference by ID" is feasible.
I'm thinking if use "reference by name", a query for book's info only need to find within the book collection, we could know the tags' name without a join ($lookup) operation, which should be faster.
If the app performs a tag checking before book creating and modifying, this should also be feasible, and faster.
I'm still not very sure:
Is there any hider on "reference by name" ?
Will "reference by name" slower on "finding all books with a given tag" ? Maybe ObjectId is somehow special ?
Thanks.
I would say it depends on what your use case is for tags. As you say, it will be more expensive to do a $lookup to retrieve tag names if you reference by id. On the other hand, if you expect that tag names may change frequently, all documents in the book collection containing that tag will need to be updated every change.
The ObjectID is simply a 12 byte value, which is autogenerated by a driver if no _id is present in inserted documents. See the MongoDB docs for more info. The only "special behavior" would be the fact that _id has an index by default. An index will speedup lookups in general, but indexes can be created on any field, not just _id.
In fact, the _id does not need to be an ObjectID. It is perfectly legal to have documents with integer _id values for instance:
{
_id: 1,
name: 'Javascript'
},
{
_id: 2,
name: 'MongoDB'
},

Tags in MongoDB

I am new to MongoDB.
I have a product which can have multiple tags. I saw tutorials where the collection was like:
{
_id: 1234,
tags: ["stationery","electronics"]
}
{
_id: 456,
tags: ["home","electronics"]
}
{
_id: 135,
tags: ["books","stationery"]
}
I have a fixed list of tags. All my products will belong to these tags. Now my question is how to store such a list so that when a new product is added I can display this list and ask user to select tags only from this list?
Should I make another document called Tags and save reference in each collection? If I do this then while searching for products belonging to say Books category I will have to run 2 queries.
Please suggest!
Store tags like you see in tutorials.
{
_id: 1234,
tags: ["stationary","electronics"]
}
This allows for easy queries. Now, to serve your tag-adding UI, I'd create a separate collection "tags", which would consist of very simple documents
{ name: 'stationary' }
{ name: 'electronics' }
{ name: 'books' }
(mongodb will create an _id field on them, but you don't care about it).
So your UI will read documents from this collection and use name property to populate tags property of products.

Mongo update specific subdoc

I have the following mongo entry:
et = {
languages: [{
code: String,
title: String,
tools: [{
description: String,
mds: [ObjectId]
}],
}]
//some more stuff
}
I now need to update this object and add an new ObjectId to the mds array. I need to specify the language element via the code element and the tools entry via the description parameter.
So far I came up with the following update method with which I can update some element of the correct language entry:
ETs.find({
'_id':mdAttributes.etID,
'languages':{'$elemMatch':{'code':mdAttributes.language}}
},{
'$set':{
'languages.$.title':'update2.jpg'
}
});
However I do not know how add an query for the correct tool.
So what my set should make should be something like this:
ETs.find({
'_id':mdAttributes.etID,
'languages':{'$elemMatch':{'code':mdAttributes.language}}
},{
'$set':{
'languages.$.tools.$.mds': ["newId"]
}
});
Is there a way to achieve this in mongo?
Short answer, no. The positional operator doesn't currently work with nested arrays (https://jira.mongodb.org/browse/server-831).
You can do it nonetheless by setting the whole tools array entry (like you do on your first example, but for the tools array instead of the title field).

Mongodb querying document with linked id

I have a document that has an id of another document from a different collection embedded in it.
My desired result is to return (I'm using python and pymongo) the all the fields of the first collection, and all of the friends from the document that was embedded.
I understand mongo doesn't do joins and I understand I'll need to make two queries. I also don't want to duplicate my data.
My question is how to piece the two queries together in python/pymongo so I have one results with all the fields from both documents in it.
Here is what my data looks like:
db.employees
{_id: ObjectId("4d85c7039ab0fd70a117d733"), name: 'Joe Smith', title: 'junior',
manager: ObjectId("4d85c7039ab0fd70a117d730") }
db.managers
{_id: ObjectId("ObjectId("4d85c7039ab0fd70a117d730"), name: 'Jane Doe', title: 'senior manager'}
desired result
x = {_id: ObjectId("4d85c7039ab0fd70a117d733"), name: 'Joe Smith', title: 'junior',
manager: 'Jane Doe' }
Your basically doing something that Mongo does not support out of the box and would actually be more painful than using the two records separately.
Basically (in pseudo/JS code since I am not a Python programmer):
var emp = db.employees.find({ name: 'Joe Smith' });
var mang = db.managers.find({ _id: emp._id });
And there you have it you have the two records separately. You cannot chain as #slownage shows and get a merged result, or any result infact since MongoDB, from one qauery, will actually return a dictionary not the value of the field even when you specify only one field to return.
So this is really the only solution, to get the the two separately and then work on them.
Edit
You could use a DBRef here but it is pretty much the same as doing it this way, only difference is that it is a helper to put it into your own model instead of doing it youself.
If it works it should be something like:
db.managers.find({
'_id' => db->employees->find({ ('_id' : 1),
('_id': ObjectId("4d85c7039ab0fd70a117d733") }))
})
updated