MongoDB - replace item in nested array - mongodb

MongoDB does not allow to replace an item in an array in a single operation. Instead it's a pull followed by a push operation.
Unfortunately we have a case where we end up with a race condition on the same item in the array on parallel requests (distributed environment), i.e.
2x pull runs first, then 2x push. This results in duplicate entries, e.g.
{
"_id": ...,
"nestedArray": [
{
"subId": "1"
},
{
"subId": "1"
},
{
"subId": "2"
}
]
}
Are there any workarounds?

I usually use an optimistic lock for this situation.
To prepare for this, you need to add a version field to your model, which you will increment each time you modify that model. Then you use this method:
Model.findOneAndUpdate(
{$and: [{_id: <current_id>}, {version: <current_version>}]},
{nestedArray: <new_nested_array>})
.exec(function(err, result) {
if(err) {
// handle error
}
if(!result) {
// the model has been updated in the mean time
}
// all is good
});
This means that you first need to get the model and compute the new array <new_nested_array>. This way you can be sure that only one modification will take place for a certain version.
Hope I explained myself.

Related

How to check if a mongodb document has been modified after reading but before writing to it?

Consider the use case where I am embedding Address document IDs into a User document. Threads 1 and 2 run at the same time followed by the Check results thread.
User:
addresses: [1, 2, 3]
Async Thread 1:
user = userCollection.find(userId)
user.addresses.push(4)
userCollection.update(user)
Async Thread 2:
user = userCollection.find(userId)
user.addresses.push(5)
userCollection.update(user)
Check results:
user = userCollection.find(userId)
user.addresses.includes(4) // result is non-deterministic
user.addresses.includes(5) // result is non-deterministic
Do I need to implement my own document level locks at the application level to prevent multi-threaded applications from overwriting data that other threads are currently writing?
Maybe I'm missing a built-in atomic function to append to an array? But what about the case of find / replace? I don't just want to 'push' a new value into the array but find an old ID that's been deleted and then remove it. And at the same time another thread wants to add to the array. I'm honestly not sure what the simplest solution is. I've written the problem in psuedo-javascript however I'm using golang for the project.
According to official doc,
In MongoDB, a write operation is atomic on the level of a single document, even if the operation modifies multiple embedded documents within a single document.
You won't need to care a lot about atomicity for a single document. For your given example, you can simply do an update with an aggregate pipeline, which is available for MongoDB v4.2+.
db.collection.update({
"userId": 1
},
[
{
"$set": {
"addresses": {
"$setUnion": [
{
"$filter": {
"input": "$addresses",
"as": "a",
"cond": {
$ne: [
"$$a",
3 // element you want to remove
]
}
}
},
// element you want to add
[
4
]
]
}
}
}
])
Here is the Mongo playground for your reference.
If you need to deal with multi-document atomicity, you can opt for transactions, which is available for MongoDB v4.0+

MapReduce function to return two outputs. MongoDB

I am currently using doing some basic mapReduce using MongoDB.
I currently have data that looks like this:
db.football_team.insert({name: "Tane Shane", weight: 93, gender: "m"});
db.football_team.insert({name: "Lily Jones", weight: 45, gender: "f"});
...
I want to create a mapReduce function to group data by gender and show
Total number of each gender, Male & Female
Average weight of each gender
I can create a map / reduce function to carry out each function seperately, just cant get my head around how to show output for both. I am guessing since the grouping is based on Gender, Map function should stay the same and just alter something ont he reduce section...
Work so far
var map1 = function()
{var key = this.gender;
emit(key, {count:1});}
var reduce1 = function(key, values)
{var sum=0;
values.forEach(function(value){sum+=value["count"];});
return{count: sum};};
db.football_team.mapReduce(map1, reduce1, {out: "gender_stats"});
Output
db.football_team.find()
{"_id" : "f", "value" : {"count": 12} }
{"_id" : "m", "value" : {"count": 18} }
Thanks
The key rule to "map/reduce" in any implementation is basically that the same shape of data needs to be emitted by the mapper as is also returned by the reducer. The key reason for this is part of how "map/reduce" conceptually works by quite possibly calling the reducer multiple times. Which basically means you can call your reducer function on output that was already emitted from a previous pass through the reducer along with other data from the mapper.
MongoDB can invoke the reduce function more than once for the same key. In this case, the previous output from the reduce function for that key will become one of the input values to the next reduce function invocation for that key.
That said, your best approach to "average" is therefore to total the data along with a count, and then simply divide the two. This actually adds another step to a "map/reduce" operation as a finalize function.
db.football_team.mapReduce(
// mapper
function() {
emit(this.gender, { count: 1, weight: this.weight });
},
// reducer
function(key,values) {
var output = { count: 0, weight: 0 };
values.forEach(value => {
output.count += value.count;
output.weight += value.weight;
});
return output;
},
// options and finalize
{
"out": "gender_stats", // or { "inline": 1 } if you don't need another collection
"finalize": function(key,value) {
value.avg_weight = value.weight / value.count; // take an average
delete value.weight; // optionally remove the unwanted key
return value;
}
}
)
All fine because both mapper and reducer are emitting data with the same shape and also expecting input in that shape within the reducer itself. The finalize method of course is just invoked after all "reducing" is finally done and just processes each result.
As noted though, the aggregate() method actually does this far more effectively and in native coded methods which do not incur the overhead ( and potential security risks ) of server side JavaScript interpretation and execution:
db.football_team.aggregate([
{ "$group": {
"_id": "$gender",
"count": { "$sum": 1 },
"avg_weight": { "$avg": "$weight" }
}}
])
And that's basically it. Moreover you can actually continue and do other things after a $group pipeline stage ( or any stage for that matter ) in ways that you cannot do with a MongoDB mapReduce implementation. Notably something like applying a $sort to the results:
db.football_team.aggregate([
{ "$group": {
"_id": "$gender",
"count": { "$sum": 1 },
"avg_weight": { "$avg": "$weight" }
}},
{ "$sort": { "avg_weight": -1 } }
])
The only sorting allowed by mapReduce is solely that the key used with emit is always sorted in ascending order. But you cannot sort the aggregated result in output in any other way, without of course performing queries when output to another collection, or by working "in memory" with returned results from the server.
As a "side note" ( though an important one ), you probably should also consider in "learning" that the reality is the "server-side JavaScript" functionality of MongoDB is really a work-around more than being a feature. When MongoDB was first introduced, it applied a JavaScript engine for server execution mostly to make up for features which had not yet been implemented.
Thus to make up for the lack of the complete implementation of many query operators and aggregation functions which would come later, adding a JavaScript engine was a "quick fix" to allow certain things to be done with minimal implementation.
The result over the years is those JavaScript engine features are gradually being removed. The group() function of the API is removed. The eval() function of the API is deprecated and scheduled for removal at the next major version. The writing is basically "on the wall" for the limited future of these JavaScript on the server features, as the clear pattern is where the native features provide support for something, then the need to continue support for the JavaScript engine basically goes away.
The core wisdom here being that focusing on learning these JavaScript on the server features, is probably not really worth the time invested unless you have a pressing use case that currently cannot be solved by any other means.

Insert multiple documents referenced by another Schema

I have the following two schemas:
var SchemaOne = new mongoose.Schema({
id_headline: { type: String, required: true },
tags: [{ type: mongoose.Schema.Types.ObjectId, ref: 'Tag' }]
});
var tagSchema = new mongoose.Schema({
_id: { type: String, required: true, index: { unique: true } }, // value
name: { type: String, required: true }
});
As you can see, in the first schema there is an array of references to the second schema.
My problem is:
Suppose that, in my backend server, I receive an array of tags (just the id's) and, before creating the SchemaOne document, I need to verify if the received tags already exist in the database and, if not, create them. Only after having all the tags stored in the database, I may assign this received array to the tags array of the to be created SchemaOne document.
I'm not sure on how to implement this? Can you give me a helping hand?
So lets assume you have input being sent to your server that essentially resolves to this:
var input = {
"id_headline": "title",
"tags": [
{ "name": "one" },
{ "name": "two" }
]
};
And as you state, you are not sure whether any of the "tags" entries alredy exists, but of course the "name" is also unique for lookup to the associated object.
What you are basically going to have to do here is "lookup" each of the elements within "tags" and return the document with the reference to use to the objects in the "Tag" model. The ideal method here is .findOneAndUpdate(), with the "upsert" option set to true. This will create the document in the collection where it is not found, and at any rate will return the document content with the reference that was created.
Note that natually, you want to ensure you have those array items resolved "first", before preceeding to saving the main "SchemaOne" object. The async library has some methods that help structure this:
async.waterfall(
[
function(callback) {
async.map(input.tags,function(tag,callback) {
Tag.findOneAndUpdate(
{ "name": tag.name },
{ "$setOnInsert": { "name": tag.name } },
{ "upsert": true, "new": true },
callback
)
},callback);
},
function(tags,callback) {
Model.findOneAndUpdate(
{ "id_headline": input.id_headline },
{ "$addToSet": {
"tags": { "$each": tags.map(function(tag) { return tag._id }) }
}},
{ "upsert": true, "new": true },
callback
)
}
],
function(err,result) {
// if err then do something to report it, otherwise it's done.
}
)
So the async.waterfall is a special flow control method that will pass the result returned from each of the functions specified in the array of arguments to the next one, right until the end of execution where you can optionally pass in the result of the final function in the list. It basically "cascades" or "waterfalls" results down to each step. This is wanted to pass in the results of the "tags" creation to the main model creation/modification.
The async.map within the first executed stage looks at each of the elements within the array of the input. So for each item contained in "tags", the .findOneAndUpdate() method is called to look for and possibly create if not found, the specified "tag" entry in the collection.
Since the output of .map() is going to be an array of those documents, it is simply passed through to the next stage. Therefore each iteration returns a document, when the iteration is complete you have all documents.
The next usage of .findOneAndUpdate() with "upsert" is optional, and of course considers that the document with the matching "id_headline" may or may not exist. The same case is true that if it is there then the "update" is processed, if not then it is simply created. You could optionally .insert() or .create() if the document was known not to be there, but the "update" action gives some interesting options.
Namely here is the usage of $addToSet, where if the document already existed then the specified items would be "added" to any content that was already there, and of course as a "set", any items already present would not be new additions. Note that only the _id fields are required here when adding to the array with an atomic operator, hence the .map() function employed.
An alternate case on "updating" could be to simply "replace" the array content using the $set atomic operation if it was the intent to only store those items that were mentioned in the input and no others.
In a similar manner the $setOnInsert shown when "creating"/"looking for" items in "Tags" makes sure that there is only actual "modification" when the object is "created/inserted", and that removes some write overhead on the server.
So the basic priciples of using .findOneAndUpdate() at least for the "Tags" entries is the most optimal way of handling this. This avoids double handling such as:
Querying to see if the document exists by name
if No result is returned, then send an additional statement to create one
That means two operations to the database with communication back and forth, which the actions here using "upserts" simplifies into a single request for each item.

MongoDB Grouping Query

I'm working on a Meteor application and I have data for a weekly timetable of the format -
events:
event:
day: "Monday"
time: "9am"
location: "A"
event:
day: "Monday"
time: "10am"
location: "B"
There are numerous entries for each day. Can I run a query that will return an object of the format -
day: "Monday"
events:
event:
time: "9am"
location: "A"
event:
time: "10am"
location: "B"
I could store the object in the second format but prefer the first for ease of deleting and updating individual events.
I also want to return them ordered by day of week if there's a nice way to do that.
Several options:
You can use an aggregation command but be warned, you will loose reactivity: it means that except when you reload your template, you will not get external updates. You would also need to use a package to add the aggregation command to Meteor in order to achieve that.
My personal favorite: you don't need to aggregate (and loose reactivity) to achieve your data transformation. You can use a simple Collection.find() query and extend/reduce/modify it using a clever mix of cursor.Observe() and conditional modifications. Have a look at this answer, it did the trick for me (I needed a sum with black listing of some fields, but you can easily adapt it to your group/sorting case) : https://stackoverflow.com/a/30813050/3793161. Comment if you need more details on this
If you plan to have several servers, be warned that each server will have to observe so it may lead to an unnecessary load. So my third solution is either use collection hooks or methods to update an additional field containing every event for each day/user (whatever you need).See #David Weldon answer about that here: https://stackoverflow.com/a/31190896/3793161. In you case, it would probably mean to re-think your database structure to fit your needs (i.e. adding more fields so you ca update them on insert.
EDIT Here are some thoughts on your comment:
If you stick to what you described in the question, you would need seven documents, one per day, with an events field where you put all the events. My second solution is good if you need to rework a collection structure before sending it. However, in your case, you just need an object week with 7 sub-objects matching the days of the week.
I advise you to possible approaches:
use the aggregation in a method, as described by #chridam. Be warned that you will not be able to directly get a sorted array, as stated in mongo $group documentation
$group does not order its output documents
So you need to sort them (by day and by hour within each day) using, for example _.sortBy() before you return your method result. By the way, if you want to know what is going on in your method call, clientside, here is how you should write the call:
Meteor.call("getGroupedDailyEvents", userId, function(error, result){
if(error){
console.log("error", error);
}
if(result){
//do whatever you need
}
});
Make the data sorting client-side. You are looking for an overkill solution because, afaik, you don't need to filter any data to keep it from the user, and you are going to send the data anyway (just with another structure). This is much easier to make a simple helper in your template like this:
Template.displaySchedule.helpers({
"monday_events": function() {
return _.sortBy (events.find({day:"Monday"}).fetch(), "time")
},
//add other days
);
It assumes the format of your time field is sortable this way. If not, you just need to create a function to sort them accordingly to their formats or change the original format into something better suited.
The rest (HTML) would just be to iterate on Monday events using a {{#each monday_events}}
To achieve the desired result, use the aggregation framework where the $group pipeline operator groups all the input documents and apply the accumulator expression $push to the group to get the events array.
Your pipeline would look like this:
var Events = new Mongo.Collection('events');
var pipeline = [
{
"$group": {
"_id": "$day",
"events": {
"$push": {
"time": "$time"
"location": "$location"
}
}
}
},
{
"$project": {
"_id": 0, "day": "$_id", "events": 1
}
}
];
var result = Events.aggregate(pipeline);
console.log(result)
You can add the meteorhacks:aggregate package to implement the aggregation in Meteor:
Add to your app with
meteor add meteorhacks:aggregate
Since this package exposes .aggregate method on Mongo.Collection instances, you can define a method that gets the aggregated result array. For example
if (Meteor.isServer) {
var Events = new Mongo.Collection('events');
Meteor.methods({
getGroupedDailyEvents: function () {
var pipeline = [
{
"$group": {
"_id": "$day",
"events": {
"$push": {
"time": "$time"
"location": "$location"
}
}
}
},
{
"$project": {
"_id": 0, "day": "$_id", "events": 1
}
}
];
var result = Events.aggregate(pipeline);
return result;
}
});
}
if (Meteor.isClient) {
Meteor.call('getGroupedDailyEvents', logResult);
function logResult(err, res) {
console.log("Result: ", res)
}
}

JSON Schema with dynamic key field in MongoDB

Want to have a i18n support for objects stored in mongodb collection
currently our schema is like:
{
_id: "id"
name: "name"
localization: [{
lan: "en-US",
name: "name_in_english"
}, {
lan: "zh-TW",
name: "name_in_traditional_chinese"
}]
}
but my thought is that field "lan" is unique, can I just use this field as a key, so the structure would be
{
_id: "id"
name: "name"
localization: {
"en-US": "name_in_english",
"zh-TW": "name_in_traditional_chinese"
}
}
which would be neater and easier to parse (just localization[language] would get the value i want for specific language).
But then the question is: Is this a good practice in storing data in MongoDB? And how to pass the json-schema check?
It is not a good practice to have values as keys. The language codes are values and as you say you can not validate them against a schema. It makes querying against it impossible. For example, you can't figure out if you have a language translation for "nl-NL" as you can't compare against keys and neither is it possible to easily index this. You should always have descriptive keys.
However, as you say, having the languages as keys makes it a lot easier to pull the data out as you can just access it by ['nl-NL'] (or whatever your language's syntax is).
I would suggest an alternative schema:
{
your_id: "id_for_name"
lan: "en-US",
name: "name_in_english"
}
{
your_id: "id_for_name"
lan: "zh-TW",
name: "name_in_traditional_chinese"
}
Now you can :
set an index on { your_id: 1, lan: 1 } for speedy lookups
query for each translation individually and just get that translation:
db.so.find( { your_id: "id_for_name", lan: 'en-US' } )
query for all the versions for each id using this same index:
db.so.find( { your_id: "id_for_name" } )
and also much easier update the translation for a specific language:
db.so.update(
{ your_id: "id_for_name", lan: 'en-US' },
{ $set: { name: "ooga" } }
)
Neither of those points are possible with your suggested schemas.
Obviously the second schema example is much better for your task (of course, if lan field is unique as you mentioned, that seems true to me also).
Getting element from dictionary/associated array/mapping/whatever_it_is_called_in_your_language is much cheaper than scanning whole array of values (and in current case it's also much efficient from the storage size point of view (remember that all fields are stored in MongoDB as-is, so every record holds the whole key name for json field, not it's representation or index or whatever).
My experience shows that MongoDB is mature enough to be used as a main storage for your application, even on high-loads (whatever it means ;) ), and the main problem is how you fight database-level locks (well, we'll wait for promised table-level locks, it'll fasten MongoDB I hope a lot more), though data loss is possible if your MongoDB cluster is built badly (dig into docs and articles over Internet for more information).
As for schema check, you must do it by means of your programming language on application side before inserting records, yeah, that's why Mongo is called schemaless.
There is a case where an object is necessarily better than an array: supporting upserts into a set. For example, if you want to update an item having name 'item1' to have val 100, or insert such an item if one doesn't exist, all in one atomic operation. With an array, you'd have to do one of two operations. Given a schema like
{ _id: 'some-id', itemSet: [ { name: 'an-item', val: 123 } ] }
you'd have commands
// Update:
db.coll.update(
{ _id: id, 'itemSet.name': 'item1' },
{ $set: { 'itemSet.$.val': 100 } }
);
// Insert:
db.coll.update(
{ _id: id, 'itemSet.name': { $ne: 'item1' } },
{ $addToSet: { 'itemSet': { name: 'item1', val: 100 } } }
);
You'd have to query first to know which is needed in advance, which can exacerbate race conditions unless you implement some versioning. With an object, you can simply do
db.coll.update({
{ _id: id },
{ $set: { 'itemSet.name': 'item1', 'itemSet.val': 100 } }
});
If this is a use case you have, then you should go with the object approach. One drawback is that querying for a specific name requires scanning. If that is also needed, you can add a separate array specifically for indexing. This is a trade-off with MongoDB. Upserts would become
db.coll.update({
{ _id: id },
{
$set: { 'itemSet.name': 'item1', 'itemSet.val': 100 },
$addToSet: { itemNames: 'item1' }
}
});
and the query would then simply be
db.coll.find({ itemNames: 'item1' })
(Note: the $ positional operator does not support array upserts.)