Mongodb Field constraint for minimum value - mongodb

I have a field in mongodb and it is of integer type. I want to clip the minimum value to zero (0, it shouldn't go negative) on update action like $inc.
Any idea how to achieve this constraint? Thanks

Since MongoDB itself is "schemaless" it cannot really enforce any "schema logic" by itself for such a constraint.
You can possibly look into Object Document Mapper (ODM) libraries available for your chosen language, but all of these generally require that the data be "loaded into the client" in order to enforce such constraints upon modifications.
Additionally, there is no current "operator syntax" that allows such a constaint, such as for $inc to not fall below ( or go above ) a certain value.
What you "can" do however is use something like the "Bulk" operations API to send "two" requests at once, respectively doing:
Increment/Decrement the field by the specified value
Check if the field fell out of constraints and set the value accordingly.
So for a simple example of a Min of "0" and a Max of "100" you would do:
var bulk = db.collection.initializeOrderedBulkOp();
// Increment document field
bulk.find({ "_id": idValue }).updateOne({
"$inc": { "counter": incValue }
});
// Set to Max if greater than Max
bulk.find({ "_id": idValue, "counter": { "$gt": 100 } }).updateOne({
"$set": { "counter": 100 }
});
// Set to Min if less than Min
bulk.find({ "_id": idValue, "counter": { "$lt": 0 } }).updateOne({
"$set": { "counter": 0 }
});
// Only sends and returns from server now
bulk.execute();
That means that though there are actually "three" operations here to enforce the constraint, there is actually only "one" request to the server and "one" response.
Whilst it still really is "three" operations, the likelihood of the "interim" value being picked up is very small. So this is generally preferred to loading the data from the server, then modifying and saving back the contents in all respects.

This option is not very efficient one but may be one of the choice in your case of
http://docs.mongodb.org/manual/tutorial/store-javascript-function-on-server/
You can write a javascript function which will do the increment before validating the data. But i am still sure concurrency may have a problem that you need to consider.

Related

MapReduce function to return two outputs. MongoDB

I am currently using doing some basic mapReduce using MongoDB.
I currently have data that looks like this:
db.football_team.insert({name: "Tane Shane", weight: 93, gender: "m"});
db.football_team.insert({name: "Lily Jones", weight: 45, gender: "f"});
...
I want to create a mapReduce function to group data by gender and show
Total number of each gender, Male & Female
Average weight of each gender
I can create a map / reduce function to carry out each function seperately, just cant get my head around how to show output for both. I am guessing since the grouping is based on Gender, Map function should stay the same and just alter something ont he reduce section...
Work so far
var map1 = function()
{var key = this.gender;
emit(key, {count:1});}
var reduce1 = function(key, values)
{var sum=0;
values.forEach(function(value){sum+=value["count"];});
return{count: sum};};
db.football_team.mapReduce(map1, reduce1, {out: "gender_stats"});
Output
db.football_team.find()
{"_id" : "f", "value" : {"count": 12} }
{"_id" : "m", "value" : {"count": 18} }
Thanks
The key rule to "map/reduce" in any implementation is basically that the same shape of data needs to be emitted by the mapper as is also returned by the reducer. The key reason for this is part of how "map/reduce" conceptually works by quite possibly calling the reducer multiple times. Which basically means you can call your reducer function on output that was already emitted from a previous pass through the reducer along with other data from the mapper.
MongoDB can invoke the reduce function more than once for the same key. In this case, the previous output from the reduce function for that key will become one of the input values to the next reduce function invocation for that key.
That said, your best approach to "average" is therefore to total the data along with a count, and then simply divide the two. This actually adds another step to a "map/reduce" operation as a finalize function.
db.football_team.mapReduce(
// mapper
function() {
emit(this.gender, { count: 1, weight: this.weight });
},
// reducer
function(key,values) {
var output = { count: 0, weight: 0 };
values.forEach(value => {
output.count += value.count;
output.weight += value.weight;
});
return output;
},
// options and finalize
{
"out": "gender_stats", // or { "inline": 1 } if you don't need another collection
"finalize": function(key,value) {
value.avg_weight = value.weight / value.count; // take an average
delete value.weight; // optionally remove the unwanted key
return value;
}
}
)
All fine because both mapper and reducer are emitting data with the same shape and also expecting input in that shape within the reducer itself. The finalize method of course is just invoked after all "reducing" is finally done and just processes each result.
As noted though, the aggregate() method actually does this far more effectively and in native coded methods which do not incur the overhead ( and potential security risks ) of server side JavaScript interpretation and execution:
db.football_team.aggregate([
{ "$group": {
"_id": "$gender",
"count": { "$sum": 1 },
"avg_weight": { "$avg": "$weight" }
}}
])
And that's basically it. Moreover you can actually continue and do other things after a $group pipeline stage ( or any stage for that matter ) in ways that you cannot do with a MongoDB mapReduce implementation. Notably something like applying a $sort to the results:
db.football_team.aggregate([
{ "$group": {
"_id": "$gender",
"count": { "$sum": 1 },
"avg_weight": { "$avg": "$weight" }
}},
{ "$sort": { "avg_weight": -1 } }
])
The only sorting allowed by mapReduce is solely that the key used with emit is always sorted in ascending order. But you cannot sort the aggregated result in output in any other way, without of course performing queries when output to another collection, or by working "in memory" with returned results from the server.
As a "side note" ( though an important one ), you probably should also consider in "learning" that the reality is the "server-side JavaScript" functionality of MongoDB is really a work-around more than being a feature. When MongoDB was first introduced, it applied a JavaScript engine for server execution mostly to make up for features which had not yet been implemented.
Thus to make up for the lack of the complete implementation of many query operators and aggregation functions which would come later, adding a JavaScript engine was a "quick fix" to allow certain things to be done with minimal implementation.
The result over the years is those JavaScript engine features are gradually being removed. The group() function of the API is removed. The eval() function of the API is deprecated and scheduled for removal at the next major version. The writing is basically "on the wall" for the limited future of these JavaScript on the server features, as the clear pattern is where the native features provide support for something, then the need to continue support for the JavaScript engine basically goes away.
The core wisdom here being that focusing on learning these JavaScript on the server features, is probably not really worth the time invested unless you have a pressing use case that currently cannot be solved by any other means.

Push Item to Array and Delete in the Same Request

I have a document that stores sensor data where the sensor readings are objects stored in an array. Example:
{
"readings": [
{
"timestamp": 1499475320,
"temperature": 121
},
{
"timestamp": 1499475326,
"temperature": 93
},
{
"timestamp": 1499475340,
"temperature": 142
}
]
}
I know how to push/add an item to the "readings" array. But what I need is when I add an item to the array, I also want to "clean" the array by removing items that have "timestamp" value older than a cutoff time.
Is this possible in mongodb?
The way I see this you basically have two options here that have varying approaches.
Restrict Arrays to Capped Size
The first option here is "not exactly" what you are asking for, but it is the option with the least implementation and execution overhead. The variance from your question is that instead of "removing past a certain age", we instead simply place a "limit/cap" on the total number of entries in the array.
This is actually done using the $slice modifier to $push:
Model.update(
{ "_id": docId },
{ "$push": {
"readings": {
"$each": [{ "timestamp": 1499478496679, "temperature": 100 }],
"$slice": -10
}
}
)
In this case the -10 argument restricts the array to only have the "last ten" entries from the end of the array since we are "appending" with $push. If you wanted instead the "latest" as the first entry then you would modify with $position and instead provide the "positive" value to $slice, which means "first ten" in contrast.
So it's not the same thing you asked for, but it is practical since the arrays do not have "unlimited growth" and you can simply "cap" them as each update is made and the "oldest" item will be removed once at the maximum length. This means the overall document never actually grows beyond a set size, and this is a very good thing for MongoDB.
Issue with Bulk Operations
The next case which actually does exactly what you ask uses "Bulk Operations" to issue "two" update operations in a "single" request to the server. The reason why it is "two" is because there is a rule that you cannot have different update operators "assigned to the same path" in a singe update operation.
Therefore what you want actually involves a $push AND a $pull operation, and on the "same array path" we need to issue those as "separate" operations. This is where the Bulk API can help:
Model.collection.bulkWrite([
{ "updateOne": {
"filter": { "_id": docId },
"update": {
"$pull": {
"readings": { "timestamp": { "$lt": cutOff } }
}
}
}},
{ "updateOne": {
"filter": { "_id": docId },
"update": {
"$push": { "timestamp": 1499478496679, "temperature": 100 }
}
}}
])
This uses the .bulkWrite() method from the underlying driver which you access from the model via .collection as shown. This will actually return a BulkWriteOpResult within the callback or Promise which contains information about the actual operations performed within the "batch". In this case it will be the "matched" and "modified" numbers which will be appropriate to the operations that were actually performed.
Hence if the $pull did not actually "remove" anything since the timestamp values were actually newer than the given constraint, then the modified count would only reflect the $push operation. But most of the time this need not concern you, where instead you would just accept that the operations completed without error and did something according to what you actually asked.
Conclude
So the general case of "both" is that it's really all done in one request and one response. The differences come in that "under the hood" the second approach which matches your request actually does do "two" operations per request and therefore takes microseconds longer.
There is actually no reason why you could not "combine" the logic of "both", and remove past your "cutoFF" as well as keeping a "cap" on the overall array size. But the general idea here is that the first implementation, though not exactly the same thing as asked will actually do a "good enough" job of "housekeeping" with little to no additional overhead on the request, or indeed the implementation of the actual code.
Also, whilst you can always "read the data" -> "modify" -> "save". That is not a really great pattern. And for best performance as well as "consistency" without conflict, you should be using the atomic operations to modify in just the same way as is outlined here.

Update join in mongoDB. Is it possible?

Given these three documents:
db.test.save({"_id":1, "foo":"bar1", "xKey": "xVal1"});
db.test.save({"_id":2, "foo":"bar2", "xKey": "xVal2"});
db.test.save({"_id":3, "foo":"bar3", "xKey": "xVal3"});
And a separate array of information that references those documents:
[{"_id":1, "foo":"bar1Upd"},{"_id":2, "foo":"bar2Upd"}]
Is it possible to update "foo" on the two referenced documents (1 and 2) in a single operation?
I know I can loop through the array and do them one by one but I have thousands of documents, which means too many round trips to the server.
Many thanks for your thoughts.
It's not possible to update "foo" on the two referenced documents (1 and 2) in a single atomic operation as MongoDB has no such mechanism. However, seeing that you have a large collection, one option is to take advantage of the Bulk API which allows you to send your updates in batches instead of every update request to the server.
The process involves looping all matched documents within the array and process Bulk updates which will at least allow many operations to be sent in a single request with a singular response.
This gives you much better performance since you won't be sending every request to the server but just once in every 500 requests, thus making your updates more efficient and quicker.
-EDIT-
The reason of choosing a lower value is generally a controlled choice. As noted in the documentation there, MongoDB by default will send to the server in batches of 1000 operations at a time at maximum and there is no guarantee that makes sure that these default 1000 operations requests actually fit under the 16MB BSON limit. So you would still need to be on the "safe" side and impose a lower batch size that you can only effectively manage so that it totals less than the data limit in size when sending to the server.
Let's use an example to demonstrate the approaches above:
a) If using MongoDB v3.0 or below:
var bulk = db.test.initializeOrderedBulkOp(),
largeArray = [{"_id":1, "foo":"bar1Upd"},{"_id":2, "foo":"bar2Upd"}],
counter = 0;
largeArray.forEach(doc) {
bulk.find({ "_id": doc._id }).updateOne({ "$set": { "foo": doc.foo } });
counter++;
if (counter % 500 == 0) {
bulk.execute();
bulk = db.test.initializeOrderedBulkOp();
}
}
if (counter % 500 != 0 ) bulk.execute();
b) If using MongoDB v3.2.X or above (the new MongoDB version 3.2 has since deprecated the Bulk() API and provided a newer set of apis using bulkWrite()):
var largeArray = [{"_id":1, "foo":"bar1Upd"},{"_id":2, "foo":"bar2Upd"}],
bulkUpdateOps = [];
largeArray.forEach(function(doc){
bulkUpdateOps.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": { "$set": { "foo": doc.foo } }
}
});
if (bulkUpdateOps.length === 500) {
db.test.bulkWrite(bulkUpdateOps);
bulkUpdateOps = [];
}
});
if (bulkUpdateOps.length > 0) db.test.bulkWrite(bulkUpdateOps);

MongoDB MapReduce strange results

When I perform a Mapreduce operation over a MongoDB collection with an small number of documents everything goes ok.
But when I run it with a collection with about 140.000 documents, I get some strange results:
Map function:
function() { emit(this.featureType, this._id); }
Reduce function:
function(key, values) { return { count: values.length, ids: values };
As a result, I would expect something like (for each mapping key):
{
"_id": "FEATURE_TYPE_A",
"value": { "count": 140000,
"ids": [ "9b2066c0-811b-47e3-ad4d-e8fb6a8a14e7",
"db364b3f-045f-4cb8-a52e-2267df40066c",
"d2152826-6777-4cc0-b701-3028a5ea4395",
"7ba366ae-264a-412e-b653-ce2fb7c10b52",
"513e37b8-94d4-4eb9-b414-6e45f6e39bb5", .......}
But instead I get this strange document structure:
{
"_id": "FEATURE_TYPE_A",
"value": {
"count": 706,
"ids": [
{
"count": 101,
"ids": [
{
"count": 100,
"ids": [
"9b2066c0-811b-47e3-ad4d-e8fb6a8a14e7",
"db364b3f-045f-4cb8-a52e-2267df40066c",
"d2152826-6777-4cc0-b701-3028a5ea4395",
"7ba366ae-264a-412e-b653-ce2fb7c10b52",
"513e37b8-94d4-4eb9-b414-6e45f6e39bb5".....}
Could someone explain me if this is the expected behavior, or am I doing something wrong?
Thanks in advance!
The case here is un-usual and I'm not sure if this is what you really want given the large arrays being generated. But there is one point in the documentation that has been missed in the presumption of how mapReduce works.
MongoDB can invoke the reduce function more than once for the same key. In this case, the previous output from the reduce function for that key will become one of the input values to the next reduce function invocation for that key.
What that basically says here is that your current operation is only expecting that "reduce" function to be called once, but this is not the case. The input will in fact be "broken up" and passed in here as manageable sizes. The multiple calling of "reduce" now makes another point very important.
Because it is possible to invoke the reduce function more than once for the same key, the following properties need to be true:
the type of the return object must be identical to the type of the value emitted by the map function to ensure that the following operations is true:
Essentially this means that both your "mapper" and "reducer" have to take on a little more complexity in order to produce your desired result. Essentially making sure that the output for the "mapper" is sent in the same form as how it will appear in the "reducer" and the reduce process itself is mindful of this.
So first the mapper revised:
function () { emit(this.type, { count: 1, ids: [this._id] }); }
Which is now consistent with the final output form. This is important when considering the reducer which you now know will be invoked multiple times:
function (key, values) {
var ids = [];
var count = 0;
values.forEach(function(value) {
count += value.count;
value.ids.forEach(function(id) {
ids.push( id );
});
});
return { count: count, ids: ids };
}
What this means is that each invocation of the reduce function expects the same inputs as it is outputting, being a count field and an array of ids. This gets to the final result by essentially
Reduce one chunk of results #chunk1
Reduce another chunk of results #chunk2
Comnine the reduce on the reduced chunks, #chunk1 and #chunk2
That may not seem immediately apparent, but the behavior is by design where the reducer gets called many times in this way to process large sets of emitted data, so it gradually "aggregates" rather than in one big step.
The aggregation framework makes this a lot more straightforward, where from MongoDB 2.6 and upwards the results can even be output to a collection, so if you had more than one result and the combined output was greater than 16MB then this would not be a problem.
db.collection.aggregate([
{ "$group": {
"_id": "$featureType",
"count": { "$sum": 1 },
"ids": { "$push": "$_id" }
}},
{ "$out": "ouputCollection" }
])
So that will not break and will actually return as expected, with the complexity greatly reduced as the operation is indeed very straightforward.
But I have already said that your purpose for returning the array of "_id" values here seems unclear in your intent given the sheer size. So if all you really wanted was a count by the "featureType" then you would use basically the same approach rather than trying to force mapReduce to find the length of an array that is very large:
db.collection.aggregate([
{ "$group": {
"_id": "$featureType",
"count": { "$sum": 1 },
}}
])
In either form though, the results will be correct as well as running in a fraction of the time that the mapReduce operation as constructed will take.

How do I manage a sublist in Mongodb?

I have different types of data that would be difficult to model and scale with a relational database (e.g., a product type)
I'm interested in using Mongodb to solve this problem.
I am referencing the documentation at mongodb's website:
http://docs.mongodb.org/manual/tutorial/model-referenced-one-to-many-relationships-between-documents/
For the data type that I am storing, I need to also maintain a relational list of id's where this particular product is available (e.g., store location id's).
In their example regarding "one-to-many relationships with embedded documents", they have the following:
{
name: "O'Reilly Media",
founded: 1980,
location: "CA",
books: [12346789, 234567890, ...]
}
I am currently importing the data with a spreadsheet, and want to use a batchInsert.
To avoid duplicates, I assume that:
1) I need to do an ensure index on the ID, and ignore errors on the insert?
2) Do I then need to loop through all the ID's to insert a new related ID to the books?
Your question could possibly be defined a little better, but let's consider the case that you have rows in a spreadsheet or other source that are all de-normalized in some way. So in a JSON representation the rows would be something like this:
{
"publisher": "O'Reilly Media",
"founded": 1980,
"location": "CA",
"book": 12346789
},
{
"publisher": "O'Reilly Media",
"founded": 1980,
"location": "CA",
"book": 234567890
}
So in order to get those sort of row results into the structure you wanted, one way to do this would be using the "upsert" functionality of the .update() method:
So assuming you have some way of looping the input values and they are identified with some structure then an analog to this would be something like:
books.forEach(function(book) {
db.publishers.update(
{
"name": book.publisher
},
{
"$setOnInsert": {
"founded": book.founded,
"location": book.location,
},
"$addToSet": { "books": book.book }
},
{ "upsert": true }
);
})
This essentially simplified the code so that MongoDB is doing all of the data collection work for you. So where the "name" of the publisher is considered to be unique, what the statement does is first search for a document in the collection that matches the query condition given, as the "name".
In the case where that document is not found, then a new document is inserted. So either the database or driver will take care of creating the new _id value for this document and your "condition" is also automatically inserted to the new document since it was an implied value that should exist.
The usage of the $setOnInsert operator is to say that those fields will only be set when a new document is created. The final part uses $addToSet in order to "push" the book values that have not already been found into the "books" array (or set).
The reason for the separation is for when a document is actually found to exist with the specified "publisher" name. In this case, all of the fields under the $setOnInsert will be ignored as they should already be in the document. So only the $addToSet operation is processed and sent to the server in order to add the new entry to the "books" array (set) and where it does not already exist.
So that would be simplified logic compared to aggregating the new records in code before sending a new insert operation. However it is not very "batch" like as you are still performing some operation to the server for each row.
This is fixed in MongoDB version 2.6 and above as there is now the ability to do "batch" updates. So with a similar analog:
var batch = [];
books.forEach(function(book) {
batch.push({
"q": { "name": book.publisher },
"u": {
"$setOnInsert": {
"founded": book.founded,
"location": book.location,
},
"$addToSet": { "books": book.book }
},
"upsert": true
});
if ( ( batch.length % 500 ) == 0 ) {
db.runCommand( "update", "updates": batch );
batch = [];
}
});
db.runCommand( "update", "updates": batch );
So what is doing in setting up all of the constructed update statements into a single call to the server with a sensible size of operations sent in the batch, in this case once every 500 items processed. The actual limit is the BSON document maximum of 16MB so this can be altered appropriate to your data.
If your MongoDB version is lower than 2.6 then you either use the first form or do something similar to the second form using the existing batch insert functionality. But if you choose to insert then you need to do all the pre-aggregation work within your code.
All of the methods are of course supported with the PHP driver, so it is just a matter of adapting this to your actual code and which course you want to take.