MongoDb insert or add in embedded document - mongodb

This is my final document structure. I am on mongo 3.0.
{
_id: "1",
stockA: [ { size: "S", qty: 25 }, { size: "M", qty: 50 } ],
stockB: [ { size: "S", qty: 27 }, { size: "M", qty: 58 } ]
}
I am randomly adding elements inside stockA or stockB collections.
I would like come up with a query which would satisfy following rules:
If item is not exist - create whole document and insert item in stock A or B collection.
If item is exist but stock A or B collection is missing create stock collection and add element inside collection.
if item is exist and stock collection is exist just append element inside collection.
Question: Is it possible to achieve all that in one query? My main requirement that current insert has to be extremely fast and scalable.
If yes, could you please help me to come up with required query.
If no, could you please guide me how do I achieve requirements in fastest and cleanest way.
Thanks for any help!

The operator you are looking for is $addToSet or $push in combination with upsert:true.
Both operators can take multiple key/value pairs. It will then operate separately on each key. Both will create the array when it doesn't exist yet, and in either case will add the value.
The difference is that $addToSet will first check if an identical element already exists in the array. When performance is an issue, keep in mind that MongoDB must perform a linear search to do this. So for very large arrays, $addToSet can become quite slow.
The upsert:true option to the update function will result in the document being created when it isn't found.

You can use the update method with the upsert option which will create a new document when no document matches the query criteria.
db.collection.update({ "_id": 1 },
{
"$push" : {
"stockB" : {
"size" : "M",
"qty" : 51
},
"stockA" : {
"size" : "M",
"qty" : 21
}
}
},
{ "upsert": true }
)

Related

Projecting nested documents in Mongo

I am trying to find and project from a nested structure. For example, I have the following document where each unit might have an embedded sub-unit:
{
"_id" : 1,
"unit" : {
"_id" : 2,
"unit" : {
"_id" : 3,
"unit" : {
"_id" : 4
}
}
}
}
And I want to get the id's of all the subunits under unit 1:
[{_id:2}, {_id:3}, {_id:4}]
$graphlookup does not seem to handle this kind of nested structure. As far as I understand, it works when the units are saved at a single level without nesting and each keep a reference to its parent unit.
What is the correct way to retrieve the desired result?
Firstly, $graphlookup isn't operator for your problem, because it's recursive search on a collection, not recursive in a document
$graphLookup Performs a recursive search on a collection, with options
for restricting the search by recursion depth and query filter.
Therefore, it didn't recursive search in your document, it only recursive search on a collection (includes multiple documents), it cannot handle your problem.
With your problem, I think it is not responsibility of Mongo, because you've retrieved your wanted document. You want to parse the retrieved document to array of sub-documents, you can do it in your language.
Example if you use JavaScript (Node.JS for backend), you can parse this document to array:
const a = {
"_id": 1,
"unit": {
"_id": 2,
"unit": {
"_id": 3,
"unit": {
"_id": 4
}
}
}
}
const parse = o => {
const { _id } = o;
if (!o.unit) return [{ _id }];
return [{ _id }, ...parse(o.unit) ];
}
console.log(parse(a.unit));
You can not do that from mongodb query. Mongodb will guarantee the document with id :1, and will not recursively search inside the document.
What you can do is: retrieve the document from mongodb, then parse it into a Map object and retrieve the information from that map, recursively.

What is the best way to use collection as round robin in Mongodb

I have a collection named items with three documents.
{
_id: 1,
item: "Pencil"
}
{
_id: 1,
item: "Pen"
}
{
_id: 1,
item: "Sharpner"
}
How could I query to get the document as round-robin?
Consider I got multiple user requests at the same time.
so one should get Pencil other will get Pen and then other will get Sharpner.
then start again from the first one.
If changing schema is a choice I am also ready for that.
I think I found a way to do this without changing the schema. It is based on skip() and limit(). Moreover you can specify to keep the internal sorting order for returned documents but as the guide says you should not rely on this, especially because you are losing performance since the indexing is overridden:
The $natural parameter returns items according to their natural order
within the database. This ordering is an internal implementation
feature, and you should not rely on any particular structure within
it.
Anyway, this is the query:
db.getCollection('YourCollection').find().skip(counter).limit(1)
Where counter stores the current index for your documents.
Few things to start..
_id has to be unique across a collection especially when the collection is only a replication set.
This is a very stateful requirement and would not work well with a distributed set of services for example.
With that said, assuming you really just want to iterate from the database i would use cursors to accomplish this. This will do a collection scan and is very inefficient for the record.
var myCursor = db.items.find().sort({_id:1});
while (myCursor.hasNext()) {
printjson(myCursor.next());
}
My suggestion is that you should pull all results from the database at once and do your iteration in the application tier.
var myCursor = db.inventory.find().sort({_id:1});
var documentArray = myCursor.toArray();
documentArray.foreach(doSomething)
If this is about distribution you may consider fetching random documents instead of round-robin via aggregation/$sample:
db.collection.aggregate([
{
"$sample": {
"size": 1
}
}
])
playground
Or there is options to randomize via $rand ...
Use text findOneAndUpdate after restructuring the data objects
db.counter.findOneAndUpdate( {}, pipeline)
{
"_id" : ObjectId("624317a681e72a1cfd7f2b7e"),
"values" : [
"Pencil",
"Pen",
"Sharpener"
],
"selected" : "Pencil",
"counter" : 1
}
db.counter.findOneAndUpdate( {}, pipeline)
{
"_id" : ObjectId("624317a681e72a1cfd7f2b7e"),
"values" : [
"Pencil",
"Pen",
"Sharpener"
],
"selected" : "Pen",
"counter" : 2
}
where the data object is now:
{
"_id" : ObjectId("6242fe3bc1551d0f3562bcb2"),
"values" : [
"Pencil",
"Pen",
"Sharpener"
],
"selected" : "Pencil",
"counter" : 1
}
and the pipeline is:
[{$project: {
values: 1,
selected: {
$arrayElemAt: [
'$values',
'$counter'
]
},
counter: {
$mod: [
{
$add: [
'$counter',
1
]
},
{
$size: '$values'
}
]
}
}}]
This has some merits:
Firstly, using findOneAndUpdate means that moving the pointer to the
next item in the list and reading the object happen at once.
Secondly,by using the {$size: "$values"} adding a value into the list
doesn't change the logic.
And, instead of a string an object could be used instead.
Problems:
This method would be unwieldy with more than 10's of entries
It is hard to prove that this method works as advertised so there is an accompanying Kotlin project. The project uses coroutines so it is calling a find/update asynchronously.
text GitHub
The alternative (assuming 50K items and not 3):
Set-up a simple counter {counter: 0} and update as follows:
db.counter.findOneAndUpdate({},
[{$project: {
counter: {
$mod: [
{
$add: [
'$counter',
1
]
},
50000
]
}
}}])
Then use a simple select query to find the right document.
I've updated the github to include this example.

Mongodb: how to aggregate data from different collection based on id

I have a document in which data are like
collection A
{
"_id" : 69.0,
"values" : [
{
"date_data" : "2016-12-16 10:00:00",
"valueA" : 8,
"valuB" : 9
},
{
"date_data" : "2016-12-16 11:00:00",
"valueA" : 8,
"valuB" : 9
},.......
}
collection B
{
"_id" : 69.0,
"values" : [
{
"date_data" : "2017-12-16 10:00:00",
"valueA" : 8,
"valuB" : 9
},
{
"date_data" : "2017-12-16 11:00:00",
"valueA" : 8,
"valuB" : 9
},.......
}
data is being stored at each hour, as it store in one documents, it may reach its limit 16Mb at some point, that's why i'm thinking to spread data across the years, means in one collection all the id's will hold the data on yearly basis. But when we want to show data combined, how we can use aggregate function?
For example, collectionA has data from 7th dec'16 to 7th dec'17 and collectionB has data from 6th dec'15 to 6th dec'16. how i can show data between 1st dec'16 to 1st jan'17 which are in different collection?
Very simple, use mongodb $lookup query which is the equivalent of a left outer join. All the documents on the left will be scanned for a value inside a field and the documents from the right considered the foreign document will match with respect to value. For your case, here is the parent collection
Parent A
Child collection B
Now all we have to do is make a query from the collection A
With a very simple aggregation $lookup query, you ll see the following result
db.getCollection('uniques').aggregate([
{
"$lookup": {
"from": "values",//Get data from values table
"localField": "_id", //The field _id of the current table uniques
"foreignField": "parent_id", //The foreign column containing a matching value
"as": "related" //An array containing all items under 69
}
},
{
"$unwind": "$related" //Unwind that array
},
{
"$project": {
"value_id": "$related._id",//project only what you need
"date": "$related.date_data",
"a": "$related.valueA",
"b": "$related.valueB"
}
}
], {"allowDiskUse": true})
Remember a few things
Local field for the lookup doesnt care if you have indexed it or not so run it over a table with the least number of rows
Foreign field works best when indexed or directly on an _id
There is an option to specify a pipeline and do some custom filtering work while matching, I wont recommend it as pipelines are ridiculously slow
Dont forget to "allowDiskUse" if you are going to aggregate large amounts of data

Inline/combine other collection into one collection

I want to combine two mongodb collections.
Basically I have a collection containing documents that reference one document from another collection. Now I want to have this as a inline / nested field instead of a separate document.
So just to provide an example:
Collection A:
[{
"_id":"90A26C2A-4976-4EDD-850D-2ED8BEA46F9E",
"someValue": "foo"
},
{
"_id":"5F0BB248-E628-4B8F-A2F6-FECD79B78354",
"someValue": "bar"
}]
Collection B:
[{
"_id":"169099A4-5EB9-4D55-8118-53D30B8A2E1A",
"collectionAID":"90A26C2A-4976-4EDD-850D-2ED8BEA46F9E",
"some":"foo",
"andOther":"stuff"
},
{
"_id":"83B14A8B-86A8-49FF-8394-0A7F9E709C13",
"collectionAID":"90A26C2A-4976-4EDD-850D-2ED8BEA46F9E",
"some":"bar",
"andOther":"random"
}]
This should result in Collection A looking like this:
[{
"_id":"90A26C2A-4976-4EDD-850D-2ED8BEA46F9E",
"someValue": "foo",
"collectionB":[{
"some":"foo",
"andOther":"stuff"
},{
"some":"bar",
"andOther":"random"
}]
},
{
"_id":"5F0BB248-E628-4B8F-A2F6-FECD79B78354",
"someValue": "bar"
}]
I'd suggest something simple like this from the console:
db.collB.find().forEach(function(doc) {
var aid = doc.collectionAID;
if (typeof aid === 'undefined') { return; } // nothing
delete doc["_id"]; // remove property
delete doc["collectionAID"]; // remove property
db.collA.update({_id: aid}, /* match the ID from B */
{ $push : { collectionB : doc }});
});
It loops through each document in collectionB and if there is a field collectionAID defined, it removes the unnecessary properties (_id and collectionAID). Finally, it updates a matching document in collectionA by using the $push operator to add the document from B to the field collectionB. If the field doesn't exist, it is automatically created as an array with the newly inserted document. If it does exist as an array, it will be appended. (If it exists, but isn't an array, it will fail). Because the update call isn't using upsert, if the _id in the collectionB document doesn't exist, nothing will happen.
You can extend it to delete other fields as necessary or possibly add more robust error handling if for example a document from B doesn't match anything in A.
Running the code above on your data produces this:
{ "_id" : "5F0BB248-E628-4B8F-A2F6-FECD79B78354", "someValue" : "bar" }
{ "_id" : "90A26C2A-4976-4EDD-850D-2ED8BEA46F9E",
"collectionB" : [
{
"some" : "foo",
"andOther" : "stuff"
},
{
"some" : "bar",
"andOther" : "random"
}
],
"someValue" : "foo"
}
Sadly mapreduce can't produce full documents.
https://jira.mongodb.org/browse/SERVER-2517
No idea why despite all the attention, whining and upvotes they haven't changed it. So you'll have to do this manually in the language of your choice.
Hopefully you've indexed 'collectionAID' which should improve the speed of your queries. Just write something that goes through your A collection one document at a time, loading the _id and then adding the array from Collection B.
There is a much faster way than https://stackoverflow.com/a/22676205/1578508
You can do it the other way round and run through the collection you want to insert your documents in. (Far less executions!)
db.collA.find().forEach(function (x) {
var collBs = db.collB.find({"collectionAID":x._id},{"_id":0,"collectionA":0});
x.collectionB = collBs.toArray();
db.collA.save(x);
})

Query multiple date ranges, return only specific key in MongoDB

In Mongo, I have a documents that look like the following:
dateRange: [{
"price": "200",
"dateStart": "2014-01-01",
"dateEnd": "2014-01-30"
},
{
"price": "220",
"dateStart": "2014-02-01",
"dateEnd": "2014-02-15"
}]
Nice and simple right? Just dates and prices. Now, the tricky party I'm is how would I go about creating a query to find the dateRange that fits with 2014-01-12, and then JUST return the price after it's found instead of the entire array of dateRanges?
These dateRanges can get quite large, and I'm trying to minimize the amount of data returned (if this is possible at all with Mongo). Note, the date format I can change up if required, I was just using the above for example purposes.
Any help is appreciated, thanks!
You want to use the $elemMatch operator, which is only valid in versions 2.2 upward. You will also need to make sure you use multikey indexes.
edit: To be clear you will also have to use the $elemMatch find operator as pointed out in comment below.
This being said, I agree with the gist of comment by mnemosyn. It would be better to have each element of the array represented as a single document.
quick example of $elemMatch to demonstrate the projection. Simply add $elemMatch to the find as well.
> db.test.save ( {
_id: 1,
zipcode: 63109,
students: [
{ name: "john", school: 102, age: 10 },
{ name: "jess", school: 102, age: 11 },
{ name: "jeff", school: 108, age: 15 }
]
} );
> db.test.find( { zipcode: 63109 }, { students: { $elemMatch: { school: 102 } } } ).pretty() );
{
"_id" : 1,
"students" : [
{
"name" : "john",
"school" : 102,
"age" : 10
}
]
}
Well, the problem with that schema is that it uses large embedded arrays - this can be quite inefficient, because a mongodb query will always find a document, not a subset of an embedded object. Even if you're using a projection, mongodb will have to read the entire object internally, so if the array becomes huge, say 100k entries, that will slow things down to a halt.
Why not simply separate these array elements into documents, e.g.
{
price : 200,
productId : ObjectId("foo"), // or whatever the price refers to
dateStart : "2014-01-01",
dateEnd : "2013-01-30"
}
This way, mongodb doesn't need to pull the entire object with all prices, but only the prices that match your date range. This will minimize the amount of data transferred. You can then also use the query projection to only return the price, i.e. db.collection.find({ criteria }, {"price" : 1, "_id" : 0}).
Of course, the number of objects will increase dramatically, but efficient indexing will solve that problem. The only inefficiency induced is the duplication of the productId, which is cheaper than dealing with huge embedded arrays.
P.S: I'd suggest using actual dates (ISODate) instead of strings, even if their format is sortable.