How should I store the "tags" of a document in MongoDb? - mongodb

I know this question was asked several times, but as far as my search skills go, every post is how to implement this in a SQL database, and no mention of NoSQL databases.
I have documents for which I want to implement a tagging feature.
Users will be able to tag them with whatever string they want, and then I need to be able to query the documents as fast as possible (by these generic tags)
Should I have a String array for my tags (which would allow me to support any number of tags), like this
{"_id":"aaa", "prop":"value", "tags":["tag1","tag2","tagN"]}
or limit the amount of tags to, say, 5, and have them as different properties, like this
{"_id":"aaa", "prop":"value", "tag1":"value", "tag2":"value", "tag3":"value" }
Which structure would be better for fast querying, specifically in mongodb?
Using the second structure would allow me to index the collection by this fields, but are 5 indexes recommended? Should I have less tags?

You simply want to use an array, the first example you have. That will allow you to have a consistent model and indexes.
{"_id":"aaa", "prop":"value", "tags":["tag1","tag2","tagN"]}
If you index a field that contains an array, MongoDB indexes each value in the array separately, in a “multikey index.”[1]
[1] http://docs.mongodb.org/manual/core/indexes/#multikey-indexes

Related

How to efficiently do complex queries on MongoDB unindexed fields?

I am building filtering functionality for web application which should look like JIRA of TFS filtering query. So user should be able to filter on fields contents and use logical operators in filter query.
The data lives in MongoDB and the main challenge is that the fields on which we will filter should support not only strict equality but also full text search are difficult to index because they can vary for each user.
In a nutshell, there is a nested object, which has three other nested object that can have different amount of fields depending of user, field names are also set by user, so we don't know them.
For example document structure in collection can be:
{
_id: ObjectId()
storage: {
obj_1:{}
obj_2:{}
}
},
{
_id: ObjectId()
storage: {
obj_1:{
field_1 : val,
field_2 : val
}
obj_2:{}
}
}
I imagine queries will be something like:
find({$and:[{"storage.obj_1.field_1":{$regex: "va"}},{"storage.obj_1.var_2":"val"}]})
Unfortunately, I am not a database expert so the solutions that I see now are:
1) Use Elasticsearch as a search engine. But the question is: how do I set Elastic index if I don't know my documents structure?
2) Use sparse index in Mongo. But I will need to use regex for matching, is that solution better than Elastic?
So the question is: what is the best way to do filtering in such a DB structure?
p.s.
I have put this question in SO and not Software Engineering because SO has more active members, pls keep your downvotes for later
Elasticsearch and MongoDB (much like a relational database) behave differently for indexing: In MongoDB you need to explicitly index every field (for a non $text index). In Elasticsearch every field is automatically indexed. Don't go too crazy on the number of fields in Elasticsearch, since there is a little overhead for each field (in terms of disk space though that has improved in version 6).
As soon as you are having more than a test data set, regular expressions are often slow, since they can only use indices in specific cases and you need to define those indices explicitly. Maybe the $text index and search operator are more what you are looking for. That one can index every field in a collection as well. If you need more features and a system that is fully built for search, then Elasticsearch will be the better option though.

MongoDB database design - contest application

I'm building a contest application. Which have 4 collections so far:
contest
questions
matches
users
I want to store every user score for every match he's assigned into. But I really can't find a proper way to achieve this.
All what I've came up with, Is to replace matches in users with an array in which each element contains a reference to matches collection and score field. But I think this is not very efficient.
EDIT
I was thinking about another solution. A separate collection called scores that contains three fields user, match and score.
Here's my schema structure:
Contests:
Questions:
Matches:
Users:
Note Any recommended adjustments on the current design is welcomed too.
Since mongodb is not designed to support collections relationships you migth end up with some duplicated work, I would suggest you to find a way of storing as much data as you can in a single document.
Your scores would go in each match document, probably the users array would have this structure {'users':[{user_id:'xxx',score:xxx}{user_id:'xxx',score:xxx}]}
The other solution, would be what you say, to have in each user doccument, a matches array with a structure like this: {'matches':[{match_id:'xxx',score:xxx}{match_id:'xxx',score:xxx}]}
You can have both also, this migth be more efficient depending the kind of queries you will need to do. You can also have a field in the subdocuments that stores the user/match name/title
Note: As you can see, you have two solutions, or you optimize for doccument size(so you can store more) or you optimize for performance (so you can read faster/with less resources)
Hope this be of any help.

Any way to make Mongo filtering dynamic records faster?

Scenario
I have Mongo collection Items that have dynamic item objects in it. Currently I have over 3 million records. I'm using C# with MongoSharp but I don't think it has anything to do with my problem.
Here is an example Item (it has a lot more fields than just 3):
{
_id : "1234567890",
Code : 888596937,
RefNumber : "GHTZKL",
...
}
AFAIK there is no point in using TextSearch since it's not really words, just some codes so it won't give me anything beneficial. I also cannot index them all since then I will have to index every single field.
Problem
Right now when I filter data it takes about 1-3 seconds (on ssd). Is there any way I can make it filter my items faster or it is as fast as I can get?
You don't mention what field you want to search on, but it sounds like you want to search on any arbitrary attribute. This is a common design and borders on an antipattern for MongoDB. The only way to avoid the collection scan you're getting now is to index the fields you want to search on, but indexing every field when you don't know what the fields will be ahead of time isn't possible. The solution to this to name only the common fields (and index on them), then group the other fields into name/value pairs in an array in the the document. You can then index that array to get your fast searches.
A word of caution on NVP arrays: if you array gets very large (hundreds of attributes), your index size will blow up spectacularly. It's best to try to keep the array size fairly small.
For more information on this design pattern, see Asya's great writeup.

How to do global search in MongoDB?

What I mean as global search is searching for documents in specified collections, for example, searching for a name in both User and Organization collections and will return both user and organization documents that match the criteria.
Is it possible to simply copy the documents in User and Organization into another collection and do a search in it?
No, it is not possible to do a multi-collection search automatically. There's no reason however that you couldn't perform the same query on multiple collections and combine the results.
While you could duplicate the data into another collection for query purposes, if you need to be guaranteed that the source collection's values matches identically with the "index" collection, you'll need to implement your own multi-phase transaction (example) as MongoDb doesn't have a multi-collection atomic commit. Or, you can accept the fact that the "index" table may be out of sync. Of course, it could be periodically updated through custom code. Further, it means your working set has increased as you're double storing data. Also, if you then need to grab data from individual collections (to grab more of the source document), you've likely not gained anything and made things worse when compared to doing multiple queries in the first place.
You could store related documents in the same collection and take advantage of the built-in indexing offered. Of course, this comes with the caveat that if your documents are now typed, you may find it more challenging to build MongoDb indexes that are efficient. Every changing/new document must go through the indexing pipeline, which may introduce significant overhead.
If it's only a few collections, I'd just do multiple searches without understanding more deeply your requirements. If not, the second best would be to combine documents into a single collection. Last choice would be to copy the data.

Mongodb : multiple specific collections or one "store-it-all" collection for performance / indexing

I'm logging different actions users make on our website. Each action can be of different type : a comment, a search query, a page view, a vote etc... Each of these types has its own schema and common infos. For instance :
comment : {"_id":(mongoId), "type":"comment", "date":4/7/2012,
"user":"Franck", "text":"This is a sample comment"}
search : {"_id":(mongoId), "type":"search", "date":4/6/2012,
"user":"Franck", "query":"mongodb"} etc...
Basically, in OOP or RDBMS, I would design an Action class / table and a set of inherited classes / tables (Comment, Search, Vote).
As MongoDb is schema less, I'm inclined to set up a unique collection ("Actions") where I would store these objects instead of multiple collections (collection Actions + collection Comments with a link key to its parent Action etc...).
My question is : what about performance / response time if I try to search by specific columns ?
As I understand indexing best practices, if I want "every users searching for mongodb", I would index columns "type" + "query". But it will not concern the whole set of data, only those of type "search".
Will MongoDb engine scan the whole table or merely focus on data having this specific schema ?
If you create sparse indexes mongo will ignore any rows that don't have the key. Though there is the specific limitation of sparse indexes that they can only index one field.
However, if you are only going to query using common fields there's absolutely no reason not to use a single collection.
I.e. if an index on user+type (or date+user+type) will satisfy all your querying needs - there's no reason to create multiple collections
Tip: use date objects for dates, use object ids not names where appropriate.
Here is some useful information from MongoDB's Best Practices
Store all data for a record in a single document.
MongoDB provides atomic operations at the document level. When data
for a record is stored in a single document the entire record can be
retrieved in a single seek operation, which is very efficient. In some
cases it may not be practical to store all data in a single document,
or it may negatively impact other operations. Make the trade-offs that
are best for your application.
Avoid Large Documents.
The maximum size for documents in MongoDB is 16MB. In practice most
documents are a few kilobytes or less. Consider documents more like
rows in a table than the tables themselves. Rather than maintaining
lists of records in a single document, instead make each record a
document. For large media documents, such as video, consider using
GridFS, a convention implemented by all the drivers that stores the
binary data across many smaller documents.