mongodb: modelling user defined sort order - mongodb

I am looking for a good way to implement a sort key, that is completely user definable. E.g. The user is presented with a list and may sort the elements by dragging them around. This order should be persisted.
One commonly used way is to just create an ascending integer type sort field within each element:
{
"_id": "xxx1",
"sort": 2
},
{
"_id": "xxx2",
"sort": 3
},
{
"_id": "xxx3",
"sort": 1
}
While this will surely work, it might not be ideal: In case the user moves an element from the very bottom to the very top, all the indexes in-between need to be updated. We are not talking about embedded documents here, so this will cause a lot of individual documents to be updated. This might be optimised by creating initial sort values with gaps in-between (e.g. 100, 200, 300, 400). However, this will create the need for additional logic an re-sorting in case the space between two elements is exhausted.
Another approach comes to mind: Have the parent document contain a sorted array, which defines the order of the children.
{
"_id": "parent01",
"children": ["xxx3","xxx1","xxx2"]
}
This approach would certainly make it easier to change the order, but will have it's own caveats: The parent documents must always keep track of a valid list of its children. As adding children will update multiple documents, this still might not be ideal. And there needs to be complex validation of the input received from the client, as the length of this list and the elements contained, may never be changed by the client.
Is there a better way to implement such a use case?

Hard to say which option is better without knowing:
How often the sort order is usually updated
Which queries you gonna run against the documents and how often
How many documents can be sorted at a time
I'm sure you gonna do much more queries than updates so personally I would go with the first option. It's easy to implement and it's simple which means it's gonna be rebust. I understand your concerns about updating multiple documents but the updates will be done in place, I mean no documents shifting will occur as you don't actually change the documents size. Just create a simple test. Generate 1k of documents, then just update each of them in a loop like that
db.test.update({ '_id': arrIds[i] }, { $set: { 'sort' : i } })
You will see it will be a pretty instant operation.
I like the second option as well, from programming perspective it looks more elegant but when it comes to practice you don't usually care much if your update takes 10 milleseconds instead of 5 if you don't do it often and I'm sure you don't, most applications are query oriented.
EDIT:
When you update multiple documents, even if it's an instant operation, one may come up with an inconsistency issue when some documents are updated and some not. In my case it wasn't really an issue in fact. Let's consider an example, assume there's a list:
{ "_id" : 1, "sort" : 1 },{ "_id" : 2, "sort" : 4 },{ "_id" : 3, "sort" : 2 },{ "_id" : 4, "sort" : 3 }
so the ordered ids should look like that 1,3,4,2 according to sort fields. Let's say we have a failure when we want to move id=2 to the top. The failure occurs when we only updated two documents, so we will come up with the following state as we only managed to update ids 2 and 1:
{ "_id" : 1, "sort" : 2 },{ "_id" : 2, "sort" : 1 },{ "_id" : 3, "sort" : 2 },{ "_id" : 4, "sort" : 3 }
the data is in inconsistent state but still we can display the list to fix the problem, the ids order will be 2,1,3,4 if we just order it by sort field. why is it not a problem in my case? because when a failure occurs a user is redirected to an error page or provided with an error message, it is obvious for him that something got wrong and he should try again, so he just goes to the page and fix the order which is only partially valid for him.
Just to sum it up. Taking into account that it's a really rare case and other benefits of the approach I would go with it. Otherwise you will have to place everything in one document both the elements and the array with their indexes. This might be a much bigger issue, especially when it come to querying.
Hope it helps!

Related

Solr Increase relevance of search result based on a map of word:value

Let's say we have a structure like this per entry that goes to solr. The document is first amended and than saved. The way it is amended at the moment is that we lose the connection between the number and the score. However, we could change that into something else, if necessary.
"keywords" : [
{
"score" : 1,
"content" : "great finisher"
},
{
"score" : 1,
"content" : "project"
},
{
"score" : 1,
"content" : "staying"
},
{
"score" : 1,
"content" : "staying motivated"
}
]
What we want is to give a boost to a solr query result to a document using the "score" value in case the query contains the word/collocation to which the score is associated.
So each document has a different "map" of keyword with a score. And the relevancy would be computed normally how it Solr does now, but with a boost according to this map and the words present in the query.
From what I saw we can give boosts to results according to some criteria, but this criteria is very dynamic - context dependent. Not sure how to implement or where to start.
At the moment there is no built-in support in Solr to do anything like this. The most ideal way would be to have each term in a multiValued field boosted separately, but this is currently not possible (the progress (although there is none) is tracked in SOLR-2499).
There are however ways of working around this; two are suggested in the issue tracker above. I can't say much about using payloads and a custom BoostingTermQuery, but using dynamic fields are a possibility. The drawbacks are managing your cache sizes if you have many different field names and query/sort by most of them. If you have a small index with fewer terms, it will work, but a larger (in the higher five and six digits) with many dynamic fields will eat up your memory quick (as you for each sort/query will have one lookup cache with an int/long-array in the same size as your document count.
Another suggestion would be to look at using function queries together with a boost. If you reference the field here instead, you might avoid the cache issue. Try it!

Multiple nested arrays in MongoDB

I am having difficulties figuring out an effective way of working with a multiple nested document. It looks like the following:
{ "_id" :
{ "$oid" : "53ce46e3f0c25036e7b0ddd8"} ,
"someid" : 7757099 ,
"otherids" :
[ { "id" : 100 ,
"line" : "test" ,
"otherids" :
[ { "id" : 129}]}
]}
and there will be another level of array in addition.
I can not find a way to query this structure except for "otherids" array, but no deeper. Is this possible to do in an effective way at all?
These arrays might grow a bit, but not hugely.
My thought was to use it like this since it will be effective to fetch a lot of data in one go. But this data also needs to be updated quite often. Is this a hopeless solution with mongoDB?
Regards mongoDB newb
EDIT:
I would like to do it as simply and fast as possible :-)
Like: someid.4.otherids.2.line -> somevalue
I know that probably I would have to do a query to check if values exist, but it would be nice to do it as an upsert. Now I only work with objects in java, and it takes 14 secs to insert 10 000 records. Most of these inserts are "leaf nodes", meaning I have to query, then find out what is already there, modify the document, then update the whole root. This takes too long.

mongodb, make increment several times in single update

Having very simple 2 mongo documents:
{_id:1, v:1}
{_id:2, v:1}
Now, basing on array of _id I need increase field v as many times how _id appears. For example [1, 2, 1] should produce
{_id:1, v:3} //increased 2 times
{_id:2, v:2} //increased 1 times
Of course simple update eliminates duplicate in $in:
db.r.update({_id:{$in:[1,2,1]}}, {$inc:{v:1}}, {multi:true})
Is there a way to do it without for-loop? /Thank you in advance/
No there isn't a way to do this in a single update statement.
The reason why the $in operator "removes the duplicate" is a simple matter of the fact that th 1 was already matched, no point in matching again. So you can't make the document "match twice" as it were.
Also there is no current way to batch update operations. But that feature is coming.
You could look at your "batch" and make a decision to group together occurrences of the same document to be updated and then issue your increment to the appropriate number of units. However just like looping the array items, the operation would be programitic, albeit a little more efficient.
That isn't possible directly. You'll have to do that in your client, where you can at least try to minimize the number of batch updates required.
First, find the counts. This depends on your programming language, but what you want is something like [1, 2, 1] => [ { 1 : 2 }, { 2 : 1} ] (these are the counts for the respective ids, i.e. id 1 appears twice, etc.) Something like linq oder underscore.js is helpful here.
Next, since you can't perform different updates in a single operation, group them by their count, and update all objects whose count must be incremented by a common fixed value in one batch:
Pseudocode:
var groups = data.groupBy(p => p.Value);
foreach(var group in groups)
db.update({"_id" : { $in : group.values.asArray }},
// increase by the number of times those ids were present
{$inc : { v : group.key } })
That is better than individual updates only if there are many documents that must be increased by the same value.

mongodb: create a top-level index for a nested document instead of having to index each individual sublevel?

This question is about how I can use indexes in MongoDB to look something up in nested documents, without having to index each individual sublevel.
I have a collection "test" in MongoDB which basically goes something like this:
{
"_id" : ObjectId("50fdd7d71d41c82875a5b6c1"),
"othercol" : "bladiebla",
"scenario" : {
"1" : { [1,2,3] },
"2" : { [4,5,6] }
}}
Scenario has multiple keys, each document can have any subset of the scenarios (i.e. from none to a subset to all). Also: Scenario can't be an array because i need it as a dictionary in Python. I created an index on the "scenario" field.
My issue is that i want to select on the collection, filtering for documents that have a certain value. So this works fine functionally:
db.test.find({"scenario.1": {$exists: true}})
However, it won't use any index i've put on scenario. Only if i put an index on the "scenario.1" an index is used. But I can have thousands (or more) scenarios (and the collection itself has 100.000s of records), so i would prefer not to!
So i tried alternatives:
db.test.find({"scenario": "1"})
This will use the index on scenario, but won't return results. Making scenario an array still gives the same index issue.
Is my question clear? Can anyone give a pointer on how I could achieve the best performance here?
P.s. I have seen this: How to Create a nested index in MongoDB? but that solution is not possible in my case (due to the amount of scenarios)
Putting an index on a subobject like scenario is useless in this case as it would only be used when you're filtering on complete scenario objects rather than individual fields (think of it as a binary blob comparison).
You either need to add an index on each of your possible fields ("scenario.1", "sceanario.2", etc.) or rework your schema to get rid of the dynamic keys by doing something like this:
{
"_id" : ObjectId("50fdd7d71d41c82875a5b6c1"),
"othercol" : "bladiebla",
"scenario" : [
{ id: "1", value: [1,2,3] },
{ id: "2", value: [4,5,6] }
}}
Then you can add a single index to scenario.id to support the queries you need to perform.
I know you said you need scenario to be a dict and not an array, but I don't see how you have much choice.
Johnny HK's answer is a nice explained answer and should be used in general cases. I will just suggest a workaround for you to solve your issue if you have to have many scenarios and don't need complex querying. Instead of keeping values under scenario field, just hold the id of the scenario under that field, and hold the values as another field in the document and use the scenario id as the key of this field.
Example:
{
"_id" : ObjectId("50fdd7d71d41c82875a5b6c1"),
"othercol" : "bladiebla",
"scenario" : [ "1", "2"],
"scenario_1": [1,2,3],
"scenario_2": [4,5,6]
}}
With this schema you can use index on scenario to find specific scenarios. But if you need to query for specific scenario values, you again need to have an index on each scenario value field i.e scenario_1, scenario_2, etc.. If you need to have indexes for each field, then don't change your original schema and use sparse indexes for each nested field and that might help reduce the size of your indexes.

MongoDB Table Design and Query Performance

I'm new to MongoDB. When creating a new table a question came to my mind related to how to design it and performance. My table structure looks this way:
{
"name" : string,
"data" : { "data1" : "xxx", "data2" : "yyy", "data3" : "zzz", .... }
}
The "data" field could grow until it reaches an amount of 100.000 elements ( "data100.000" : "aaaXXX"). However the number of rows in this table would be under control (between 500 and 1000).
This table will be accessed many times in my application and I'd like to maximize the performance of any queries. I would do queries like this one (I'll put an example in java):
new Query().addCriteria(Criteria.where("name").is(name).and("data.data3").is("zzz"));
I don't know if this would get slower when the amount of "dataX"... elements grows.
So the question is: Is this design correct? Should I change something?
I'll be pleased to read your advice, many thanks in advance
A document could be viewed like a table with columns, but you have to be carefull. It has other usage characteristics. The document size can be max. 16 MB. And you have to keep in mind that the documents are hold in memory by mongo.
With your query the whole document will be returned. Ask yourself do you need all entries or
will you have to use a single entry on his own?
Using MongoDB for eCommerce
MongoDB Schema Design
MongoDB and eCommerce
MongoDB Transactions
This should be a good start.
What is data? I wouldn't store a single nested document with up to 100,000 fields as it you wouldn't be able to index it easily so you would get performance issues.
You'd be better off storing as an array of strings, then you can index the array field which would index all the values.
{
"name" : string,
"data" : [ "xxx", "yyy", "zzz" ]
}
If like in your query you then wanted the value at a particular position in the array, instead of data.data3 you could do:
db.Collection.find( { "data.2" : "zzz" } )
Or, if you don't care about the position and just want all documents where the data array contains 'zzz' you can do:
db.Collection.find( { "data" : "zzz" } )
100,000 strings is not going to get anywhere near 16MB so you don't need to worry about that, but having 100,000 fields in a nested document or array indicates something is wrong with the design, but without knowing what data is I couldn't say for sure.