MongoDB: Find document given field values in an object with an unknown key - mongodb

I'm making a database on theses/arguments. They are related to other arguments, which I've placed in an object with a dynamic key, which is completely random.
{
_id : "aeokejXMwGKvWzF5L",
text : "test",
relations : {
cF6iKAkDJg5eQGsgb : {
type : "interpretation",
originId : "uFEjssN2RgcrgiTjh",
ratings: [...]
}
}
}
Can I find this document if I only know what the value of type is? That is I want to do something like this:
db.theses.find({relations['anything']: { type: "interpretation"}}})
This could've been done easily with the positional operator, if relations had been an array. But then I cannot make changes to the objects in ratings, as mongo doesn't support those updates. I'm asking here to see if I can keep from having to change the database structure.

Though you seem to have approached this structure due to a problem with updates in using nested arrays, you really have only caused another problem by doing something else which is not really supported, and that is that there is no "wildcard" concept for searching unspecified keys using the standard query operators that are optimal.
The only way you can really search for such data is by using JavaScript code on the server to traverse the keys using $where. This is clearly not a really good idea as it requires brute force evaluation rather than using useful things like an index, but it can be approached as follows:
db.theses.find(function() {
var relations = this.relations;
return Object.keys(relations).some(function(rel) {
return relations[rel].type == "interpretation";
});
))
While this will return those objects from the collection that contain the required nested value, it must inspect each object in the collection in order to do the evaluation. This is why such evaluation should really only be used when paired with something that can directly use an index instead as a hard value from the object in the collection.
Still the better solution is to consider remodelling the data to take advantage of indexes in search. Where it is neccessary to update the "ratings" information, then basically "flatten" the structure to consider each "rating" element as the only array data instead:
{
"_id": "aeokejXMwGKvWzF5L",
"text": "test",
"relationsRatings": [
{
"relationId": "cF6iKAkDJg5eQGsgb",
"type": "interpretation",
"originId": "uFEjssN2RgcrgiTjh",
"ratingId": 1,
"ratingScore": 5
},
{
"relationId": "cF6iKAkDJg5eQGsgb",
"type": "interpretation",
"originId": "uFEjssN2RgcrgiTjh",
"ratingId": 2,
"ratingScore": 6
}
]
}
Now searching is of course quite simple:
db.theses.find({ "relationsRatings.type": "interpretation" })
And of course the positional $ operator can now be used with the flatter structure:
db.theses.update(
{ "relationsRatings.ratingId": 1 },
{ "$set": { "relationsRatings.$.ratingScore": 7 } }
)
Of course this means duplication of the "related" data for each "ratings" value, but this is generally the cost of being to update by matched position as this is all that is supported with a single level of array nesting only.
So you can force the logic to match with the way you have it structured, but it is not a great idea to do so and will lead to performance problems. If however your main need here is to update the "ratings" information rather than just append to the inner list, then a flatter structure will be of greater benefit and of course be a lot faster to search.

Related

MongoDB - how do I update a value in nested array/object?

I have a document in my Mongo collection which has a field with the following structure:
"_id" : "F7WNvjwnFZZ7HoKSF",
"process" : [
{
"process_id" : "wTGqVk5By32mpXadZ",
"stages" : [
{
"stage_id" : "D6Huk89DGFsd29ds7",
"completed" : "N"
},
{
"stage_id" : "Msd390vekn09nvL23",
"completed" : "N"
}
]
}
]
I need to update the value of completed where the stage_id is equal to 'D6Huk89DGFsd29ds7' - the update query will not know which object in the stages array this value of stage_id will be in.
How do I do this?
Since you have nested arrays in your object, this is bit tricky and I'm not sure if this problem can be solved with help of just one update query.
However, if you happen to know index of your matching object in first array, in your case process[0] you can write your update query like.
db.collection.update(
{"process.stages.stage_id":"D6Huk89DGFsd29ds7"},
{$set:{"process.0.stages.$.completed":"Y"}}
);
Query above will work perfect with your test case. Again, there is still possibility of having multiple objects at root level and there is no guarantee that matching object will always be at 0 index.
Solution I proposed above will fail if you have multiple children of process and if matching index of object is not zero.
However, you can achieve your goal with help of client side programming. That is find matching document, modify on client side and replace whole document with new content.
Since this approach is very in efficient, I'll suggest that you should consider altering your document structure to avoid nesting. Create another collection and move content of process array there.
In the end, I removed the outer process block, so that the process_id and stages were in the root of the document - made the process of updating easier using:
MyColl.update(
{
_id: 'F7WNvjwnFZZ7HoKSF',
"stages.stage_id": 'D6Huk89DGFsd29ds7'
},
{
$set: {"stages.$.completed": 'Y'}
}
);

mongo operation speed : array $addToSet/$pull vs object $set/$unset

I have a index collection containing lots of terms, and a field items containing identifier from an other collection. Currently that field store an array of document, and docs are added by $addToSet, but I have some performance issues. It seems an $unset operation is executed faster, so I plan to change the array of document to a document of embed documents.
Am I right to think the $set/$unset fields are fatest than push/pull embed document into arrays ?
EDIT:
After small tests, we see the set/unset 4 times faster. On the other
hand, if I use object instead of array, it's a little harder to count
the number of properties (vs the length of the array), and we were
counting that a lot. But we can consider using $set everytime and
adding a field with the number of items.
This is a document of the current index :
{
"_id": ObjectId("5594dea2b693fffd8e8b48d3"),
"term": "clock",
"nbItems": NumberLong("1"),
"items": [
{
"_id": ObjectId("55857b10b693ff18948ca216"),
"id": NumberLong("123")
}
{
"_id": ObjectId("55857b10b693ff18948ca217"),
"id": NumberLong("456")
}
]
}
Frequent update operations are :
* remove item : {$pull:{"items":{"id":123}}}
* add item : {$addToSet:{"items":{"_id":ObjectId("55857b10b693ff18948ca216"),"id":123,}}}
* I can change $addToSet to $push and check duplicates before if performances are better
And this is what I plan to do:
{
"_id": ObjectId("5594dea2b693fffd8e8b48d3"),
"term": "clock",
"nbItems": NumberLong("1"),
"items": {
"123":{
"_id": ObjectId("55857b10b693ff18948ca216")
}
"456":{
"_id": ObjectId("55857b10b693ff18948ca217")
}
}
}
* remove item : {$unset:{"items.123":true}
* add item : {$set:{"items.123":{"_id":ObjectId("55857b10b693ff18948ca216"),"id":123,}}}
For information, theses operations are made with pymongo (or can be done with php if there is a good reason to), but I don't think this is relevant
As with any performance question, there are a number of factors which can come into play with an issue like this, such as indexes, need to hit disk, etc.
That being said, I suspect you are likely correct that adding a new field or removing an old field from a MongoDB document will be slightly faster than appending/removing from an array as the array types will be less easy to traverse when searching for duplicates.

Mongoose: Saving as associative array of subdocuments vs array of subdocuments

I have a set of documents I need to maintain persistence for. Due to the way MongoDB handle's multi-document operations, I need to embed this set of documents inside a container document in order to ensure atomicity of my operations.
The data lends itself heavily to key-value pairing. Is there any way instead of doing this:
var container = new mongoose.Schema({
// meta information here
subdocs: [{key: String, value: String}]
})
I can instead have subdocs be an associative array (i.e. an object) that applies the subdoc validations? So a container instance would look something like:
{
// meta information
subdocs: {
<key1>: <value1>,
<key2>: <value2>,
...
<keyN>: <valueN>,
}
}
Thanks
Using Mongoose, I don't believe that there is a way to do what you are describing. To explain, let's take an example where your keys are dates and the values are high temperatures, to form pairs like { "2012-05-31" : 88 }.
Let's look at the structure you're proposing:
{
// meta information
subdocs: {
"2012-05-30" : 80,
"2012-05-31" : 88,
...
"2012-06-15": 94,
}
}
Because you must pre-define schema in Mongoose, you must know your key names ahead of time. In this use case, we would probably not know ahead of time which dates we would collect data for, so this is not a good option.
If you don't use Mongoose, you can do this without any problem at all. MongoDB by itself excels at inserting values with new key names into an existing document:
> db.coll.insert({ type : "temperatures", subdocuments : {} })
> db.coll.update( { type : "temperatures" }, { $set : { 'subdocuments.2012-05-30' : 80 } } )
> db.coll.update( { type : "temperatures" }, { $set : { 'subdocuments.2012-05-31' : 88 } } )
{
"_id" : ObjectId("5238c3ca8686cd9f0acda0cd"),
"subdocuments" : {
"2012-05-30" : 80,
"2012-05-31" : 88
},
"type" : "temperatures"
}
In this case, adding Mongoose on top of MongoDB takes away some of MongoDB's native flexibility. If your use case is well suited by this feature of MongoDB, then using Mongoose might not be the best choice.
you can achieve this behavior by using {strict: false} in your mongoose schema, although you should check the implications on the validation and casting mechanism of mongoose.
var flexibleSchema = new Schema( {},{strict: false})
another way is using schema.add method but i do not think this is the right solution.
the last solution i see is to get all the array to the client side and use underscore.js or whatever library you have. but it depends on your app, size of docs, communication steps etc.

Querying sub array with $where

I have a collection with following document:
{
"_id" : ObjectId("51f1fd2b8188d3117c6da352"),
"cust_id" : "abc1234",
"ord_date" : ISODate("2012-10-03T18:30:00Z"),
"status" : "A",
"price" : 27,
"items" : [{
"sku" : "mmm",
"qty" : 5,
"price" : 2.5
}, {
"sku" : "nnn",
"qty" : 5,
"price" : 2.5
}]
}
I want to use "$where" in the fields of "items", so something like this:
{$where:"this.items.sku==mmm"}
How can I do it? It works when the field is not of array type.
You don't need a $where operator to do this; just use a query object of:
{ "items.sku": mmm }
As for why your $where isn't working, the value of that operator is executed as JavaScript, so that's not going to check each element of the items array, it's just going to treat items as a normal object and compare its sku property (which is undefined) to mmm.
You are comparing this.items.sku to a variable mmm, which isn't initialized and thus has the value unefined. What you want to do, is iterate the array and compare each entry to the string 'mmm'. This example does this by using the array method some which returns true, when the passed function returns true for at least one of the entries:
{$where:"return this.items.some(function(entry){return entry.sku =='mmm'})"}
But really, don't do this. In a comment to the answer by JohnnyHK you said "my service is just a interface between user and mongodb, totally unaware what the field client want's to store". You aren't really explaining your use-case, but I am sure you can solve this better.
The $where operator invokes the Javascript engine even though this
trivial expression could be done with a normal query. This means unnecessary performance overhead.
Every single document in the collection is passed to the function, so when you have an index, it can not be used.
When the javascript function is generated from something provided by the client, you must be careful to sanetize and escape it properly, or your application gets vulnerable to code injection.
I've been reading through your comments in addition to the question. It sounds like your users can generically add some attributes, which you are storing in an array within a document. Your client needs to be able to query an arbitrary pair from the document in a generic manner. The pattern to achieve this is typically as follows:
{
.
.
attributes:[
{k:"some user defined key",
v:"the value"},
{k: ,v:}
.
.
]
}
Note that in your case, items is attributes. Now to get the document, your query will be something like:
eg)
db.collection.find({attributes:{$elemMatch:{k:"sku",v:"mmm"}}});
(index attributes.k, attributes.v)
This allows your service to provide a way to query the data, and letting the client specify what the k,v pairs are. The one caveat with this design is always be aware that documents have a 16MB limit (unless you have a use case that makes GridFS appropriate). There are functions like $slice which may help with controlling this.

mongodb pointer to another collection's item

Is it possible to point from one collection's item's value to another collection's item?
example:
db.col2.save( { value: 'test' } );
db.col1.save( { title: 'testing, something: [code to point to another collection's item] } );
db.col1.find().toArray()
[
{
"_id" : ObjectId([someobjectidhere]),
"title" : "testing",
"something": {
"value": "test"
}
}
]
Yes you can point to another document, however unlike SQL you can't do a join to retrieve both at the same time.
Therefore you would need to do 2 retrieves. One to get the first document (then extract the reference in code) and then use this reference to get the second document
MongoDB does not support joins. In MongoDB some data is “denormalized,” or stored with related data in documents to remove the need for joins. However, in some cases it makes sense to store related information in separate documents, typically in different collections or databases.
You can refer the doc for DBRef here