Mongo best practice to structure nested document array - mongodb

I've been struggling to find a solution to the following problem and seem to get conflicting advice from various mongodb posts. I am trying to figure out how to correctly represent an "array" of sub-objects such that:
they can be upserted (i.e. updated or new element created if needed, in a single operation)
the ids of the objects are available as values that can be searched, not just keys (that you can't really search in mongo).
I have a structure that I can represent as an array (repr A):
{
_id: 1,
subdocs: [
{ sd_id: 1, title: t1 },
{ sd_id: 2, title: t2 },
...
]
}
or as a nested document (repr B)
{
_id: 1,
subdocs: {
1: { title: t1 },
2: { title: t2 },
...
}
}
I would like to be able to update OR insert (i.e. upsert) new subdocs without having to use extra in-application logic.
In repr B this is straight-forward as I can simply use set
$set: {subdocs.3.title: t3}
in an update with upsert: true.
In repr A it is possible to update an existing record using an 'arrayFilter' with something like:
update({_id: 1}, {$set: {subdocs.$[i].title: t3}}, {arrayFilter: [{i.sd_id: 3}], upsert: true})
The problem is that while the above will update an existing subobject it will not create a new subobject (i.e. with _id: 3) if it does not exist (it is not an upsert). The docs claim that $[] does support upsert but this does not work for me.
While repr B does allow for update/upserts there is no way to search on the ids of the subdocuments because they are now keys rather than values.
The only solution to the above is to use a denormalized representation with e.g. the id being stored as both a key and a value:
subdocs: {
1: { sd_id: 1, title: t1 },
2: { sd_id: 2, title: t2 },
...
}
But this seems precarious (because the values might get out of sync).
So my question is whether there is a way around this? Am I perhaps missing a way to do an upsert in case A?
UPDATE: I found a workaround that lets me effectively use repr A even though I'm not sure its optimal. It involves using two writes rather than one:
update({_id: 1, "subdocs.sd_id": {$ne: 3}}, {$push: {subdocs: {sd_id: 3}}})
update({_id: 1}, {$set: {subdocs.$[i].title: t3}}, {arrayFilter: [{i.sd_id: 3}]})
The first line in the above ensures that we only ever insert one subdoc with sd_id 3 (and only has an effect if the id does not exist) while the second line updates the record (which should now definitely exist). I can probably put these in an ordered bulkwrite to make it all work.

Related

Meteor Collection: find element in array

I have no experience with NoSQL. So, I think, if I just try to ask about the code, my question can be incorrect. Instead, let me explain my problem.
Suppose I have e-store. I have catalogs
Catalogs = new Mongo.Collection('catalogs);
and products in that catalogs
Products = new Mongo.Collection('products');
Then, people add there orders to temporary collection
Order = new Mongo.Collection();
Then, people submit their comments, phone, etc and order. I save it to collection Operations:
Operations.insert({
phone: "phone",
comment: "comment",
etc: "etc"
savedOrder: Order //<- Array, right? Or Object will be better?
});
Nice, but when i want to get stats by every product, in what Operations product have used. How can I search thru my Operations and find every operation with that product?
Or this way is bad? How real pro's made this in real world?
If I understand it well, here is a sample document as stored in your Operation collection:
{
clientRef: "john-001",
phone: "12345678",
other: "etc.",
savedOrder: {
"someMetadataAboutOrder": "...",
"lines" : [
{ qty: 1, itemRef: "XYZ001", unitPriceInCts: 1050, desc: "USB Pen Drive 8G" },
{ qty: 1, itemRef: "ABC002", unitPriceInCts: 19995, desc: "Entry level motherboard" },
]
}
},
{
clientRef: "paul-002",
phone: null,
other: "etc.",
savedOrder: {
"someMetadataAboutOrder": "...",
"lines" : [
{ qty: 3, itemRef: "XYZ001", unitPriceInCts: 950, desc: "USB Pen Drive 8G" },
]
}
},
Given that, to find all operations having item reference XYZ001 you simply have to query:
> db.operations.find({"savedOrder.lines.itemRef":"XYZ001"})
This will return the whole document. If instead you are only interested in the client reference (and operation _id), you will use a projection as an extra argument to find:
> db.operations.find({"savedOrder.lines.itemRef":"XYZ001"}, {"clientRef": 1})
{ "_id" : ObjectId("556f07b5d5f2fb3f94b8c179"), "clientRef" : "john-001" }
{ "_id" : ObjectId("556f07b5d5f2fb3f94b8c17a"), "clientRef" : "paul-002" }
If you need to perform multi-documents (incl. multi-embedded documents) operations, you should take a look at the aggregation framework:
For example, to calculate the total of an order:
> db.operations.aggregate([
{$match: { "_id" : ObjectId("556f07b5d5f2fb3f94b8c179") }},
{$unwind: "$savedOrder.lines" },
{$group: { _id: "$_id",
total: {$sum: {$multiply: ["$savedOrder.lines.qty",
"$savedOrder.lines.unitPriceInCts"]}}
}}
])
{ "_id" : ObjectId("556f07b5d5f2fb3f94b8c179"), "total" : 21045 }
I'm an eternal newbie, but since no answer is posted, I'll give it a try.
First, start by installing robomongo or a similar software, it will allow you to have a look at your collections directly in mongoDB (btw, the default port is 3001)
The way I deal with your kind of problem is by using the _id field. It is a field automatically generated by mongoDB, and you can safely use it as an ID for any item in your collections.
Your catalog collection should have a string array field called product where you find all your products collection items _id. Same thing for the operations: if an order is an array of products _id, you can do the same and store this array of products _id in your savedOrder field. Feel free to add more fields in savedOrder if necessary, e.g. you make an array of objects products with additional fields such as discount.
Concerning your queries code, I assume you will find all you need on the web as soon as you figure out what your structure is.
For example, if you have a product array in your savedorder array, you can pull it out like that:
Operations.find({_id: "your operation ID"},{"savedOrder.products":1)
Basically, you ask for all the products _id in a specific operation. If you have several savedOrders in only one operation, you can specify too the savedOrder _id, if you used the one you had in your local collection.
Operations.find({_id: "your_operation_ID", "savedOrder._id": "your_savedOrder_ID"},{"savedOrder.products":1)
ps: to bad-ass coders here, if I'm doing it wrong, please tell me.
I find an answer :) Of course, this is not a reveal for real professionals, but is a big step for me. Maybe my experience someone find useful. All magic in using correct mongo operators. Let solve this problem in pseudocode.
We have a structure like this:
Operations:
1. Operation: {
_id: <- Mongo create this unique for us
phone: "phone1",
comment: "comment1",
savedOrder: [
{
_id: <- and again
productId: <- whe should save our product ID from 'products'
name: "Banana",
quantity: 100
},
{
_id:,
productId: <- Another ID, that we should save if order
name: "apple",
quantity: 50
}
]
And if we want to know, in what Operation user take "banana", we should use mongoDB operator"elemMatch" in Mongo docs
db.getCollection('operations').find({}, {savedOrder: {$elemMatch:{productId: "f5mhs8c2pLnNNiC5v"}}});
In simple, we get documents our saved order have products with id that we want to find. I don't know is it the best way, but it works for me :) Thank you!

How do I implement this mongodb query & update operation (CSharp driver)?

I have this collection:
Books
[
{
title: Book1,
References: [ObjectId(1), ObjectId(3), ObjectId(5)] <- These are object ids of another collection
Main-Reference: ObjectId(5)
},
{
title: Book2,
References: [ObjectId(2), ObjectId(5), ObjectId(7)]
Main-Reference: ObjectId(5)
}
{
title: Book3,
References: [ObjectId(5), ObjectId(7), ObjectId(9)]
Main-Reference: ObjectId(7)
},
]
I have an operation where I delete a Reference from book collection
Example: Assume I have to delete Reference ObjectId(5) from my collection
So my new collection become this:
Books
[
{
title: Book1,
References: [ObjectId(1), ObjectId(3)] <- ObjectId(5) is pulled
Main-Reference: ObjectId(1) <- ObjectId(1) is new value as ObjectId(5) is deleted
},
{
title: Book2,
References: [ObjectId(2), ObjectId(7)] <- ObjectId(5) is pulled
Main-Reference: ObjectId(2) <- ObjectId(2) is now main reference
}
{
title: Book3,
References: [ObjectId(7), ObjectId(9)] <- ObjectId(5) is pulled
Main-Reference: ObjectId(7) <- no changes here as ObjectId(7) still exists in References
},
]
Currently this is how I am doing:
Step 1: Pull ObjectId(5) from all Books where References[] has ObjectId(5)
Step 2: Query Books collection where Main-Reference=ObjectId(5) & use References: {$slice:1} slice to get the first array element from References array
Step 3: Update all of the books found in Step 2 & replace Main-Reference with the first array element I get from slice
This seems clumsy to me and trying to see if there is a better way to do this.
If I essentially get your gist you basically want to
Pull the item that is not required from your references array
Set the value of your main-reference field to the first element of the altered array
And get that done all in one update without moving documents across the wire.
But this sadly cannot be done. The main problem with this is that there is no way to refer to the value of another field within the document being updated. Even so, to do this without iterating you would also need to access the changed array in order to get the new first element.
Perhaps one approach is to re-think your schema in order to accomplish what you want. My option here would expanding on your references documents a little and removing the need for the main-reference field.
It seems that the assumption you are willing to live with on the updates is that if the removed reference was the main-reference then you can just set the new main-reference to the first element in the array. With that in mind consider the following structure:
refs: [ { oid: "object1" }, { oid: "object2" }, { oid: "object5", main: true } ]
By changing these to documents with an oid property that would be set to the ObjectId it gives the option to have an additional property on the document that specifies which is the default. This can easily be queried determine which Id is the main reference.
Now also consider what would happen if the document matching "object5" in the oid field was pulled from the array:
refs: [ { oid: "object1" }, { oid: "object2" } ]
So when you query for which is the main-reference as per the earlier logic you accept the first document in the array. Now of course, to your application requirements, if you want to set a different main-reference you just alter the document
refs: [ { oid: "object1" }, { oid: "object2", main: true } ]
And now the logic remains to choose the array element that has the main property as true would occur in preference, and as shown above that if that property does not exist on any elements document then fall back to the first element.
With all of that digested, your operation to pull all references to an object out of that array in all documents becomes quite simple, as done in the shell ( same format should basically apply to whatever driver ):
db.books.update(
{ "refs.oid": "object5" },
{ $pull: { refs: {oid: "object5"} } }, false, true )
The two extra arguments to the query and update operation being upsert and multi respectively. In this case, upsert does not make much sense as we only want to modify documents that exist, and multi means that we want to update everything that matched. The default is to change just the first document.
Naturally I shortened all the notation but of course the values can be actual ObjectId's as per your intent. It seemed also reasonable to presume that your main usage of the main-reference is once you have retrieved the document. Defining a query that returns the main-reference by following the logic that was outlined should be possible, but as it stands I have typed a lot out here and need to break for dinner :)
I think this presents a worthwhile case for re-thinking your schema to avoid over the wire iterations for what you want to achieve.

$unwind an object in aggregation framework

In the MongoDB aggregation framework, I was hoping to use the $unwind operator on an object (ie. a JSON collection). Doesn't look like this is possible, is there a workaround? Are there plans to implement this?
For example, take the article collection from the aggregation documentation . Suppose there is an additional field "ratings" that is a map from user -> rating. Could you calculate the average rating for each user?
Other than this, I'm quite pleased with the aggregation framework.
Update: here's a simplified version of my JSON collection per request. I'm storing genomic data. I can't really make genotypes an array, because the most common lookup is to get the genotype for a random person.
variants: [
{
name: 'variant1',
genotypes: {
person1: 2,
person2: 5,
person3: 7,
}
},
{
name: 'variant2',
genotypes: {
person1: 3,
person2: 3,
person3: 2,
}
}
]
It is not possible to do the type of computation you are describing with the aggregation framework - and it's not because there is no $unwind method for non-arrays. Even if the person:value objects were documents in an array, $unwind would not help.
The "group by" functionality (whether in MongoDB or in any relational database) is done on the value of a field or column. We group by value of field and sum/average/etc based on the value of another field.
Simple example is a variant of what you suggest, ratings field added to the example article collection, but not as a map from user to rating but as an array like this:
{ title : title of article", ...
ratings: [
{ voter: "user1", score: 5 },
{ voter: "user2", score: 8 },
{ voter: "user3", score: 7 }
]
}
Now you can aggregate this with:
[ {$unwind: "$ratings"},
{$group : {_id : "$ratings.voter", averageScore: {$avg:"$ratings.score"} } }
]
But this example structured as you describe it would look like this:
{ title : title of article", ...
ratings: {
user1: 5,
user2: 8,
user3: 7
}
}
or even this:
{ title : title of article", ...
ratings: [
{ user1: 5 },
{ user2: 8 },
{ user3: 7 }
]
}
Even if you could $unwind this, there is nothing to aggregate on here. Unless you know the complete list of all possible keys (users) you cannot do much with this. [*]
An analogous relational DB schema to what you have would be:
CREATE TABLE T (
user1: integer,
user2: integer,
user3: integer
...
);
That's not what would be done, instead we would do this:
CREATE TABLE T (
username: varchar(32),
score: integer
);
and now we aggregate using SQL:
select username, avg(score) from T group by username;
There is an enhancement request for MongoDB that may allow you to do this in the aggregation framework in the future - the ability to project values to keys to vice versa. Meanwhile, there is always map/reduce.
[*] There is a complicated way to do this if you know all unique keys (you can find all unique keys with a method similar to this) but if you know all the keys you may as well just run a sequence of queries of the form db.articles.find({"ratings.user1":{$exists:true}},{_id:0,"ratings.user1":1}) for each userX which will return all their ratings and you can sum and average them simply enough rather than do a very complex projection the aggregation framework would require.
Since 3.4.4, you can transform object to array using $objectToArray
See:
https://docs.mongodb.com/manual/reference/operator/aggregation/objectToArray/
This is an old question, but I've run across a tidbit of information through trial and error that people may find useful.
It's actually possible to unwind on a dummy value by fooling the parser this way:
db.Opportunity.aggregate(
{ $project: {
Field1: 1, Field2: 1, Field3: 1,
DummyUnwindField: { $ifNull: [null, [1.0]] }
}
},
{ $unwind: "$DummyUnwindField" }
);
This will produce 1 row per document, regardless of whether or not the value exists. You may be able tinker with this to generate the results you want. I had hoped to combine this with multiple $unwinds to (sort of like emit() in map/reduce), but alas, the last $unwind wins or they combine as an intersection rather than union which makes it impossible to achieve the results I was looking for. I am sadly disappointed with the aggregate framework functionality as it doesn't fit the one use case I was hoping to use it for (and seems strangely like a lot of the questions on StackOverflow in this area are asking) - ordering results based on match rate. Improving the poor map reduce performance would have made this entire feature unnecessary.
This is what I found & extended.
Lets create experimental database in mongo
db.copyDatabase('livedb' , 'experimentdb')
Now Use experimentdb & convert Array to object in your experimentcollection
db.getCollection('experimentcollection').find({}).forEach(function(e){
if(e.store){
e.ratings = [e.ratings]; //Objects name to be converted to array eg:ratings
db.experimentcollection.save(e);
}
})
Some nerdy js code to convert json to flat object
var flatArray = [];
var data = db.experimentcollection.find().toArray();
for (var index = 0; index < data.length; index++) {
var flatObject = {};
for (var prop in data[index]) {
var value = data[index][prop];
if (Array.isArray(value) && prop === 'ratings') {
for (var i = 0; i < value.length; i++) {
for (var inProp in value[i]) {
flatObject[inProp] = value[i][inProp];
}
}
}else{
flatObject[prop] = value;
}
}
flatArray.push(flatObject);
}
printjson(flatArray);

Mongo - most efficient way to $inc values in this schema

I'm a Mongo newbie, and trying to make this schema work.
Its intended for logging events as they happen on a high traffic website - just incrementing the number of times a certain action happens.
{
_id: "20110924",
Vals:[
{ Key: "SomeStat1": Value: 1},
{ Key: "SomeStat2": Value: 2},
{ Key: "SomeStat3": Value: 3}
]
}
,
{
_id: "20110925",
Vals:[
{ Key: "SomeStat1": Value: 3},
{ Key: "SomeStat8": Value: 13},
{ Key: "SomeStat134": Value: 63}
]
}, etc.
So _id here is the date, then an array of the different stats, and the number of times they've occurred. There may be no stats for that day, and the stat keys can be dynamic.
I'm looking for the most efficient way to achieve these updates, avoiding race conditions...so ideally all atomic.
I've got stuck when trying to do the $inc. When I specify that it should upsert, it tries to match the whole document as the conditional, and it fails on duplicate keys. Similarly for $addToSet - if I addToSet with { Key:"SomeStat1" }, it won't think it's a duplicate as it's matching against the entire document, and hence insert it alongside the existing SomeStat1 value.
What's the best approach here? Is there a way to control how $addToSet matches? Or do I need a different schema?
Thanks in advance.
You're using a bad schema, it's impossible to do atomic updates on it. Do it like this:
{
_id: "20110924",
Vals: {
SomeStat1: 1,
SomeStat2: 2,
SomeStat3: 3,
}
}
or you can skip The Vals subdocument and embed the stats in the main document.
Take a look at this article which explains the issues around serializing a Dictionary using the MongoDB C# driver. See also this work item on the C# driver.

mongoDB: unique index on a repeated value

So i'm pretty new to mongoDb so i figure this could be a misunderstanding on general usage. so bear with me.
I have a document schema I'm working with as such
{
name: "bob",
email: "bob#gmail.com",
logins: [
{ u: 'a', p: 'b', public_id: '123' },
{ u: 'x', p: 'y', public_id: 'abc' }
]
}
My Problem is that i need to ensure that the public ids are unique within a document and collection,
Furthermore there are some existing records being migrated from a mySQL DB that dont have records, and will therefore all be replaced by null values in mongo.
I figure its either an index
db.users.ensureIndex({logins.public_id: 1}, {unique: true});
which isn't working because of the missing keys and is throwing a E11000 duplicate key error index:
or this is a more fundamental schema problem in that I shouldn't be nesting objects in an array structure like that. In which case, what? a seperate collection for the user_logins??? which seems to go against the idea of an embedded document.
If you expect u and p to have always the same values on each insert (as in your example snippet), you might want to use the $addToSet operator on inserts to ensure the uniqueness of your public_id field. Otherwise I think it's quite difficult to make them unique across a whole collection not working with external maintenance or js functions.
If not, I would possibly store them in their own collection and use the public_id as _id field to ensure their cross-document uniqueness inside a collection. Maybe that would contradict the idea of embedded docs in a doc database, but according to different requirements I think that's negligible.
Furthermore there are some existing records being migrated from a mySQL DB that dont have records, and will therefore all be replaced by null values in mongo.
So you want to apply a unique index on a data set that's not truly unique. I think this is just a modeling problem.
If logins.public_id is null that's going to violate your uniqueness constraint, then just don't write it at all:
{
logins: [
{ u: 'a', p: 'b' },
{ u: 'x', p: 'y' }
]
}
Thanks all.
In the end I opted to seperate this into 2 collections, one for users and one for logins.
users this looked a little like..
userDocument = {
...
logins: [
DBRef('loginsCollection', loginDocument._id),
DBRef('loginsCollection', loginDocument2._id),
]
}
loginDocument = {
...
user: new DBRef('userCollection', userDocument ._id)
}
Although not what i was originally after (a single collection) It is working niocely and by utilising the MongoId uniquness there is a constraint now built in at a database level and not implemented at the application level.