how to restrict $push in mongodb? - mongodb

I am learning mongodb and wondering if can I restrict push by matching values.
For example:
field1 = {
id:123,
title:123,
likes: [{by:1,type:'like'}, {by:2, type:'like'}]
}
Can I restrict push by id in likes?

What you may have already tried was the $addToSet operator, but then found out it does not suit the case here as the combination of "id" and "type" can possibly vary. For instance what you don't want is the same "id" value with both types "like" and "dislike".
This is however a typical "voting" model, and the current structure is not the best one. A better model for this is as so, with the basic fields just for example:
{
"_id": 123,
"likeCount": 2,
"dislikeCount": 0,
"likes": [456,789]
"dislikes": []
}
Having seperate arrays is important to the atomic update process, since you cannot both $pull and $push from an array. But more than that, as it re-enforces the logic behind keeping the "count" values, as this is useful for simple queries as sorting as opposed to calculating array length.
In order to post a "like" for a user who you don't want to duplicate in the array, the $addToSet operator is still not be best one despite the values now being truly unique. You want to contrain the "count" as well, so add the conditions to the query in the update instead:
db.collection.update(
{ "_id": 123, "likes": { "$ne": 456 } },
{
"$push": { "likes": 456 },
"$inc": { "likeCount": 1 }
}
)
That way, if the user has already voted their "like" then not only is nothing added but the "count" is kept at the correct total as well. Basically the query condition on the update was not met as there already was an element in the array matching that value. So the document does not match and nothing is updated.
That is a good approach, but we can make that better still. What if the user already posted to "dislike" and now changes their mind to "like" instead? What you really need here are "two" update statements to cover the possible conditions, and this is where the Bulk Operations API comes in, to handle that logic in a single request:
var bulk = db.collection.initializeOrderedBulkOp();
// match and update where a dislike is present
bulk.find({
"_id": 123,
"likes": { "$ne": 456 },
"dislikes": 456
}).updateOne({
"$push": { "likes": 456 },
"$pull": { "dislikes": 456 }
"$inc": {
"likeCount": 1,
"dislikeCount": -1
}
});
// match and update where no dislike exists
bulk.find({
"_id": 123,
"likes": { "$ne": 456 },
"dislikes": { "$ne": 456 }
}).updateOne({
"$push": { "likes": 456 },
"$inc": { "likeCount": 1 }
});
// Send requests to server and respond
bulk.execute();
In this case if the first statement did not match because there was no dislike then nothing would be updated, but if there was a dislike then the correct adjustments would be made.
With the second request, this one would be applied if there was nothing in the dislikes array to match and there was also not a matching item in the likes array. So this would apply for a new vote and also does not conflict with the previous statement. Despite the two statements, the upadte is only ever applied once or not at all depending on the state conditions.
That is the basic pattern for handling this kind of voting properly, as you keep lists of each vote type as well as maintaining the counts for ease of access. The "dislikes" process is pretty much just the reverse of the logic for the elements you need to check for, and removing votes has similar conditions as well.

Related

MongoDB: returning documents in order until a condition match

In a MongoDB collection, I have documents with a "position" field for ordering and an optional "date" field, e.g.
[
{
"_id": "doc1",
"position": 1
},
{
"_id": "doc2",
"position": 2,
"date": "2021-05-20T08:00:00.000Z"
},
{
"_id": "doc3",
"position": 3
},
{
"_id": "doc4",
"position": 4,
"date": "2021-05-20T08:00:00.000Z"
}
]
I would like the query this collection to get the documents "before" a specified date, in position order. The algorithm would be:
find the first element whose date is "after" the specified date
return all the documents whose position is less than the position of the element found, sorted by "position"
I have implemented this algorithm naïvely with 2 independent queries. However, I suspect it can be done with a single call to the database, but I have no idea how to proceed. Maybe with an aggregation pipeline?
Can someone give me a clue how this can be done?
EDIT: Here are the current queries I use (roughly):
limit_element = db.getCollection('collection').find({
"date": { "$gte": ISODate("2021-05-20T08:00:00.000Z") }
}).sort({
"position": 1
}).limit(1)
position = limit_element['position']
elements = db.getCollection('collection').find({
"position": { "$lt": position }
}).sort({
"position": 1
})
You can use an aggregation pipeline with two match clauses. Essentially its the same thing as you do now but within one DB access so a bit faster. With aggregation you can acess results from the previus stage to use in the next stage. If that is worth it you have to decide. I think your naive approach is sensible. In any case this a conditional problem so you will have to first find one and then do the other. Difference is just where you do the steps.

How to join two collection in mongo without lookup

I have two collection, there name are post and comment.
The model structure is in the following.
I want to use aggregation query post and sort by comments like length sum, currently I can query a post comments like length sum in the following query statement.
My question is how can I query post and join comment collection in Mongo version 2.6. I know after Mongo 3.2 have a lookup function.
I want to query post collection and sort by foreign comments likes length. Is it have a best way to do this in mongo 2.6?
post
{
"_id": ObjectId("5a39e22c27308912334b4567"),
"uid": "0",
"content": "what is hello world mean?",
}
comment
/* 1 */
{
"_id": ObjectId("5a595d8c2703892c3d8b4567"),
"uid": "1",
"post_id": "5a39e22c27308912334b4567",
"comment": "hello world",
"like": [
"2"
]
}
/* 2 */
{
"_id": ObjectId("5a595d8c2703892c3d8b4512"),
"uid": "2",
"post_id": "5a39e22c27308912334b4567",
"comment": "hello stackoverflow",
"like": [
"1",
"2"
]
}
Query a post comments like sum
db.getCollection('comment').aggregate([
{
"$match": {
post_id: "5a39e22c27308912334b4567"
}
},
{
"$project": {
"likeLength": {
"$size": "$like"
},
"post_id": "$post_id"
}
},
{
"$group": {
_id: "$post_id",
"likeLengthSum": {
"$sum": "$likeLength"
}
}
}
])
There is no "best" way to query, as it'll really depend on your specific needs, but... you cannot perform a single query across multiple collections (aside from the $lookup aggregation pipeline function in later versions, as you already are aware).
You'll need to make multiple queries: one to your post collection, and one to your comment collection.
If you must perform a single query, then consider storing both types of documents in a single collection (with some identifier property to let you filter on either posts or comments, within your query).
There is no other way to join collections in the current MongoDB v6 without $lookup,
I can predict two reasons that causing you the issues,
The $lookup is slow and expensive - How to improve performance?
$lookup optimization:
Follow the guideline provided in the documentation
Use indexs:
You can use the index on the reference collection's fields, as per your sample data you can create an index for post_id field, an index for uid field, or a compound index for both the fields on the basis of your use cases
You can read more about How to Improve Performance with Indexes and Document Filters
db.comment.createIndex({ "post_id": -1 });
db.comment.createIndex({ "uid": -1 });
// or
db.comment.createIndex({ "post_id": -1, "uid": -1 });
Document Filters:
Use the $match, $limit, and $skip stages to restrict the documents that enter the pipeline
You can refer to the documentation for more detailed examples
{ $skip: 0 },
{ $limit: 10 } // as per your use case
Limit the $lookup result:
Try to limit the result of lookup by $limit stage,
Try to coordinate or balance with improved query and the UI/Use cases
You want to avoid $lookup - How to improve the collection schema to avoid $lookup?
Store the analytics/metrics:
If you are trying to get the total counts of the comments in a particular post then you must store the total count in the post collection whenever you get post get a new comment
{
"_id": ObjectId("5a39e22c27308912334b4567"),
"uid": "0",
"content": "what is hello world mean?",
// new fields
"total_comments": 10
}
Store minimum reference data:
If you want to show the comments of a particular post, you can limit the result for ex: show 5 comments per post
You can also store a max of 5 latest comments in the post collection to avoid the $lookup, whenever you get the latest comment then add it and just remove the oldest comment from 5 comments
{
"_id": ObjectId("5a39e22c27308912334b4567"),
"uid": "0",
"content": "what is hello world mean?",
// new fields
"total_comments": 10,
"comments": [
{
"_id": ObjectId("5a595d8c2703892c3d8b4567"),
"uid": "1",
"comment": "hello world"
},
{
"_id": ObjectId("5a595d8c2703892c3d8b4512"),
"uid": "2",
"comment": "hello stackoverflow"
}
]
}
Must read about Reduce $lookup Operations
Must read about Improve Your Schema

Update Array Children Sorted Order

I have a collection containing objects with the following structure
{
"dep_id": "some_id",
"departament": "dep name",
"employees": [{
"name": "emp1",
"age": 31
},{
"name": "emp2",
"age": 35
}]
}
I would like to sort and save the array of employees for the object with id "some_id", by employees.age, descending. The best outcome would be to do this atomically using mongodb's query language. Is this possible?
If not, how can I rearrange the subdocuments without affecting the parent's other data or the data of the subdocuments? In case I have to download the data from the database and save back the sorted array of children, what would happen if something else performs an update to one of the children or children are added or removed in the meantime?
In the end, the data should be persisted to the database like this:
{
"dep_id": "some_id",
"departament": "dep name",
"employees": [{
"name": "emp2",
"age": 35
},{
"name": "emp1",
"age": 31
}]
}
The best way to do this is to actually apply the $sort modifier as you add items to the array. As you say in your comment "My actual objects have a "rank" and 'created_at'", which means that you really should have asked that in your question instead of writing a "contrived" case ( don't know why people do that ).
So for "sorting" by multiple properties, the following reference would adjust like this:
db.collection.update(
{ },
{ "$push": { "employees": { "$each": [], "$sort": { "rank": -1, "created_at": -1 } } } },
{ "multi": true }
)
But to update all the data you presently have "as is shown in the question", then you would sort on "age" with:
db.collection.update(
{ },
{ "$push": { "employees": { "$each": [], "$sort": { "age": -1 } } } },
{ "multi": true }
)
Which oddly uses $push to actually "modify" an array? Yes it's true, since the $each modifier says we are not actually adding anything new yet the $sort modifier is actually going to apply to the array in place and "re-order" it.
Of course this would then explain how "new" updates to the array should be written in order to apply that $sort and ensure that the "largest age" is always "first" in the array:
db.collection.update(
{ "dep_id": "some_id" },
{ "$push": {
"employees": {
"$each": [{ "name": "emp": 3, "age": 32 }],
"$sort": { "age": -1 }
}
}}
)
So what happens here is as you add the new entry to the array on update, the $sort modifier is applied and re-positions the new element between the two existing ones since that is where it would sort to.
This is a common pattern with MongoDB and is typically used in combination with the $slice modifier in order to keep arrays at a "maximum" length as new items are added, yet retain "ordered" results. And quite often "ranking" is the exact usage.
So overall, you can actually "update" your existing data and re-order it with "one simple atomic statement". No looping or collection renaming required. Furthermore, you now have a simple atomic method to "update" the data and maintain that order as you add new array items, or remove them.
In order to get what you want you can use the following query:
db.collection.aggregate({
$unwind: "$employees" // flatten employees array
}, {
$sort: {
"employees.name": -1 // sort all documents by employee name (descending)
}
}, {
$group: { // restore the previous structure
_id: "$_id",
"dep_id": {
$first: "$dep_id"
},
"departament": {
$first: "$departament"
},
"employees": {
$push: "$employees"
},
}
}, {
$out: "output" // write everything out to a separate collection
})
After this step you would want to drop your source table and rename the "output" collection to match your source table name.
This solution will, however, not deal with the concurrency issue. So you should remove write access from the collection first so nobody modifies it during the process and then restore it once you're done with the migration.
You could alternatively query all data first, then sort the employees array on the client side and then use either single update queries or - faster but more complicated - a bulk write operation with all the individual update calls in order to update the existing documents. Here, you could use the entire document that you've initially read as a filter for the update operation. So if an individual update does not modify any document you'd know straight away, that some other change must have modified the document you read before. Those cases you'd need to retry later (or straight away until the update does actually modify a document).

Upsert Document and/or add a Sub-Document

I've been wrestling with the asynchronous nature of MongoDB, Mongoose and JavaScript and how to best make multiple updates to a collection.
I have an Excel sheet of client and contact data. There are some clients with multiple contacts, one per line, and the client data is the same (so the client name can be used as a unique key - in fact in the schema it's defined with unique: true).
The logic I want to achieve is:
Search the Client collection for the client with clientName as the key
If a matching clientName isn't found then create a new document for that client (not an upsert, I don't want to change anything if the client document is already in the database)
Check to see if the contact is already present in the array of contacts within the client document using firstName and lastName as the keys
If the contact isn't found then $push that contact onto the array
Of course, we could easily have a situation where the client doesn't exists (and so is created) and then immediately, the very next row of the sheet, is another contact for the same client so then I'd want to find that existing (just created) client and $push that 2nd new contact into the array.
I've tried this but it's not working:
Client.findOneAndUpdate(
{clientName: obj.client.clientname},
{$set: obj.client, $push: {contacts: obj.contact}},
{upsert: true, new: true},
function(err, client){
console.log(client)
}
)
and I've had a good look at other questions, e.g.:
create mongodb document with subdocuments atomically?
https://stackoverflow.com/questions/28026197/upserting-complex-collections-through-mongoose-via-express
but can't get a solution... I'm coming to the conclusion that maybe I have to use some app logic to do the find, then decisions in my code, then writes, rather than use a single Mongoose/Mongo statement, but then the issues of asynchronicity rear their ugly head.
Any suggestions?
The approach to handling this is not a simple one, as mixing "upserts" with adding items to "arrays" can easily lead to undesired results. It also depends on if you want logic to set other fields such as a "counter" indicating how many contacts there are within an array, which you only want to increment/decrement as items are added or removed respectively.
In the most simple case however, if the "contacts" only contained a singular value such as an ObjectId linking to another collection, then the $addToSet modifier works well, as long as there no "counters" involved:
Client.findOneAndUpdate(
{ "clientName": clientName },
{ "$addToSet": { "contacts": contact } },
{ "upsert": true, "new": true },
function(err,client) {
// handle here
}
);
And that is all fine as you are only testing to see if a doucment matches on the "clientName", if not upsert it. Whether there is a match or not, the $addToSet operator will take care of unique "singular" values, being any "object" that is truly unique.
The difficulties come in where you have something like:
{ "firstName": "John", "lastName": "Smith", "age": 37 }
Already in the contacts array, and then you want to do something like this:
{ "firstName": "John", "lastName": "Smith", "age": 38 }
Where your actual intention is that this is the "same" John Smith, and it's just that the "age" is not different. Ideally you want to just "update" that array entry end neiter create a new array or a new document.
Working this with .findOneAndUpdate() where you want the updated document to return can be difficult. So if you don't really want the modified document in response, then the Bulk Operations API of MongoDB and the core driver are of most help here.
Considering the statements:
var bulk = Client.collection.initializeOrderedBulkOP();
// First try the upsert and set the array
bulk.find({ "clientName": clientName }).upsert().updateOne({
"$setOnInsert": {
// other valid client info in here
"contacts": [contact]
}
});
// Try to set the array where it exists
bulk.find({
"clientName": clientName,
"contacts": {
"$elemMatch": {
"firstName": contact.firstName,
"lastName": contact.lastName
}
}
}).updateOne({
"$set": { "contacts.$": contact }
});
// Try to "push" the array where it does not exist
bulk.find({
"clientName": clientName,
"contacts": {
"$not": { "$elemMatch": {
"firstName": contact.firstName,
"lastName": contact.lastName
}}
}
}).updateOne({
"$push": { "contacts": contact }
});
bulk.execute(function(err,response) {
// handle in here
});
This is nice since the Bulk Operations here mean that all statements here are sent to the server at once and there is only one response. Also note here that the logic means here that at most only two operations will actually modify anything.
In the first instance, the $setOnInsert modifier makes sure that nothing is changed when the document is just a match. As the only modifications here are within that block, this only affects a document where an "upsert" occurs.
Also note on the next two statements you do not try to "upsert" again. This considers that the first statement was possibly successful where it had to be, or otherwise did not matter.
The other reason for no "upsert" there is because the condtions needed to test the presence of the element in the array would lead to the "upsert" of a new document when they were not met. That is not desired, therefore no "upsert".
What they do in fact is respectively check whether the array element is present or not, and either update the existing element or create a new one. Therefore in total, all operations mean you either modify "once" or at most "twice" in the case where an upsert occurred. The possible "twice" creates very little overhead and no real problem.
Also in the third statement the $not operator reverses the logic of the $elemMatch to determine that no array element with the query condition exists.
Translating this with .findOneAndUpdate() becomes a bit more of an issue. Not only is it the "success" that matters now, it also determines how the eventual content is returned.
So the best idea here is to run the events in "series", and then work a little magic with the result in order to return the end "updated" form.
The help we will use here is both with async.waterfall and the lodash library:
var _ = require('lodash'); // letting you know where _ is coming from
async.waterfall(
[
function(callback) {
Client.findOneAndUpdate(
{ "clientName": clientName },
{
"$setOnInsert": {
// other valid client info in here
"contacts": [contact]
}
},
{ "upsert": true, "new": true },
callback
);
},
function(client,callback) {
Client.findOneAndUpdate(
{
"clientName": clientName,
"contacts": {
"$elemMatch": {
"firstName": contact.firstName,
"lastName": contact.lastName
}
}
},
{ "$set": { "contacts.$": contact } },
{ "new": true },
function(err,newClient) {
client = client || {};
newClient = newClient || {};
client = _.merge(client,newClient);
callback(err,client);
}
);
},
function(client,callback) {
Client.findOneAndUpdate(
{
"clientName": clientName,
"contacts": {
"$not": { "$elemMatch": {
"firstName": contact.firstName,
"lastName": contact.lastName
}}
}
},
{ "$push": { "contacts": contact } },
{ "new": true },
function(err,newClient) {
newClient = newClient || {};
client = _.merge(client,newClient);
callback(err,client);
}
);
}
],
function(err,client) {
if (err) throw err;
console.log(client);
}
);
That follows the same logic as before in that only two or one of those statements is actually going to do anything with the possibility that the "new" document returned is going to be null. The "waterfall" here passes a result from each stage onto the next, including the end where also any error will immediately branch to.
In this case the null would be swapped for an empty object {} and the _.merge() method will combine the two objects into one, at each later stage. This gives you the final result which is the modified object, no matter which preceeding operations actually did anything.
Of course, there would be a differnt manipulation required for $pull, and also your question has input data as an object form in itself. But those are actually answers in themselves.
This should at least get you started on how to approach your update pattern.
I don't use mongoose so I'll post a mongo shell update; sorry for that. I think the following would do:
db.clients.update({$and:[{'clientName':'apple'},{'contacts.firstName': {$ne: 'nick'}},{'contacts.lastName': {$ne: 'white'}}]},
{$set:{'clientName':'apple'}, $push: {contacts: {'firstName': 'nick', 'lastName':'white'}}},
{upsert: true });
So:
if the client "apple" does not exists, it is created, with a contact with given first and last name. If it exists, and does not have the given contact, it has it pushed. If it exists, and already has the given contact, nothing happens.

How to multi-sort MongoDB entry with dynamic keys, on two suboptions?

I'm trying to sort this in MongoDB with mongojs on a find():
{
"songs": {
"bNppHOYIgRE": {
"id": "bNppHOYIgRE",
"title": "Kygo - ID (Ultra Music Festival Anthem)",
"votes": 1,
"added": 1428514707,
"guids": [
"MzM3NTUx"
]
},
"izJzdDPH9yw": {
"id": "izJzdDPH9yw",
"title": "Benjamin Francis Leftwich - Atlas Hands (Samuraii Edit)",
"votes": 1,
"added": 1428514740,
"guids": [
"MzM3NTUx"
]
},
"Yifz3X_i-F8": {
"id": "Yifz3X_i-F8",
"title": "M83 - Wait (Kygo Remix)",
"votes": 0,
"added": 1428494338,
"guids": []
},
"nDopn_p2wk4": {
"id": "nDopn_p2wk4",
"title": "Syn Cole - Miami 82 (Kygo Remix)",
"votes": 0,
"added": 1428494993,
"guids": []
}
}
}
and I want to sort the keys in the songs on votes ascending and added descending.
I have tried
db.collection(coll).find().sort({votes:1}, function(err, docs) {});
but that doesn't work.
If this is an operation that you're going to be doing often, I would strongly consider changing your schema. If you make songs an array instead of a map, then you can perform this query using aggregation.
db.coll.aggregate([{ "$unwind": "$songs" }, { "$sort": { "songs.votes": 1, "songs.added": -1 }}]);
And if you put each of these songs in a separate songs collection, then you could perform the query with a simple find() and sort().
db.songs.find().sort({ "votes": 1, "added": -1 });
With your current schema, however, all of this logic would need to be in your application and it would get messy. A possible solution would be to get all of the documents and while iterating through the cursor, for each document, iterate through the keys, adding them to an array. Once you have all of the subdocuments in the array, sorting the array according to votes and added.
It is possible, but unnecessarily complex. And, of course, you wouldn't be able to take advantage of indexes, which would have an impact on your performance.
You already include the key inside the subdocument, so I would really recommend you reconsider your schema.