Cursor-based pagination without `skip()` based on frequently dynamically updated field without skipping documents - mongodb

The context
I have a MongoDB collection, items, that looks like this:
{
"_id": ObjectId(...),
"score": 42,
"data": "some text"
},
{
"_id": ObjectId(...),
"score": 95,
"data": "some text"
},
{
"_id": ObjectId(...),
"score": 1841,
"data": "some text"
},
{
"_id": ObjectId(...),
"score": 11,
"data": "some text"
},
It has potentially 50,000+ documents inside it, where the score field changes dynamically very frequently (it's a vote tally that records user's upvotes and downvotes).
What I need to do
I'm trying to infinitely paginate through this collection, sorting documents by the highest score, loading them sequentially, highest score to lowest, likely in bunches of ~25 at a time.
The only current way I know how
Use skip to provide an offset based on the last document I've loaded each call to the database, and only load new documents that have a score less than the last document's. The downside to this is that if I have multiple documents with the same score as the last seen one, I'd skip them when I only load new ones with a score less than the last seen one.
Additionally, I've read using skip() is extremely inefficient.
Conclusion
Do I have to use this inefficient solution, that would also result in me skipping documents?
Is there a better way?

Related

Is updating Embedded Documents in MongoDB a Manual process?

I am not overly familiar with Mongodb yet , but I have a question about embedded documents.
I have seen a number of posts which show you how to update embedded documents through some update query.
My question is this: If I have a collection with embedded documents - which is denormalised for performance ; and one of the embedded documents changes, then do I need to manually update all the embedded documents or is there some way of specifying the link in MongoDB to Auto-Update?
For Example:
An Order record might look like the structure below. Note there is a Product item in one of the rows.
Lets say the ItemName field changed to "Product1a" in the product from a different collection and I want to update the product in every single order where this exists. Is that a manual process - or is there a way od setting it up in Mongodb to auto-update embedded documents?
{
"id": "ccc1beb1-e022-11e9-97f0-e7e789106ab2",
"type": "order",
"orderNumber": "ORD-100209857x",
"orderDate": "2019-09-26T17:42:31.000+12:00",
"orderItems": [
{
"discount": 0,
"price": 24.4944,
"product": {
"id": "ccc1beb1-e022-11e9-97f0-e7e789106ab2",
"itemNumber": "prd1",
"itemName": "Product1"
},
"qty": 4,
"rowTotal": 97.96,
"taxAmount": 9.8
},
{
"discount": 0,
"price": 3.21,
"itemName": "Shipping",
"qty": 1,
"rowTotal": 3.21,
"taxAmount": 0
}
]
}
Not sure what you mean by manual process, but here is some sample code to update all the documents
db.collection.updateMany({}, {$set:{"orderItems.product.itemName": "updatedProductName"}})
Let me know if this is not what you are looking for.

MongoDB document setup and aggregation

I'm pretty new to MongoDB and while preparing data to be consumed I got into Aggregation... what a powerful little thing this database has! I got really excited and started to test some things :)
I'm saving time entries for a companyId and employeeId ... that can have many entries... those are normally sorted by date, but one date can have several entries (multiple registrations in the same day)
I'm trying to come up with a good schema so I could easily get my data exactly how I need and as a newbie, I would rather ask for guidance and check if I'm in the right path
my output should be as
[{
"company": "474A5D39-C87F-440C-BE99-D441371BF88C",
"employee": "BA75621E-5D46-4487-8C9F-C0CE0B2A7DE2",
"name": "Bruno Alexandre":
"registrations": [{
"id": 1448364,
"spanned": false,
"spannedDay": 0,
"date": "2019-01-17",
"timeStart": "09:00:00",
"timeEnd": "12:00:00",
"amount": {
"days": 0.4,
"hours": 2,
"km": null,
"unit": "days and hours",
"normHours": 5
},
"dateDetails": {
"week": 3,
"weekDay": 4,
"weekDayEnglish": "Thursday",
"holiday": false
},
"jobCode": {
"id": null,
"isPayroll": true,
"isFlex": false
},
"payroll": {
"guid": null
},
"type": "Sick",
"subType": "Sick",
"status": "APP",
"reason": "IS",
"group": "LeaveAndAbsence",
"note": null,
"createdTimeStamp": "2019-01-17T15:53:55.423Z"
}, /* more date entries */ ]
}, /* other employees */ ]
what is the best way to add the data into a collection?
Is it more efficient if I create a document per company/employee and add all registration entries inside that document (it could get really big as time passes)... or is it better to have one document per company/employee/date and add all daily events in that document instead?
regarding aggregation, I'm still new to all this, but I'm imagining I could simply call
RegistrationsModel.aggregate([
{
$match: {
date: { $gte: new Date('2019-01-01'), $lte: new Date('2019-01-31') },
company: '474A5D39-C87F-440C-BE99-D441371BF88C'
}
},
{
$group: {
_id: '$employee',
name: { '$first': '$name' }
}
},
{
// ... get all registrations as an Array ...
},
{
$sort: {
'registrations.date': -1
}
}
]);
P.S. I'm taken the Aggregation course to start familiarized with all of it
Is it more efficient if I create a document per company/employee and
add all registration entries inside that document (it could get really
big as time passes)... or is it better to have one document per
company/employee/date and add all daily events in that document
instead?
From what I understand of document oriented databases, I would say the aim is to have all the data you need, in a specific context, grouped inside one document.
So what you need to do is identify what data you're going to need (getting close to the features you want to implement) and build your data structure according to that. Be sure to identify future features, cause the more you prepare your data structure to it, the less it will be tricky to scale your database to your needs.
Your aggregation query looks ok !

Querying the most recent posts in a MongoDB collection

Rather new to Mongodb/Mongoose/Node. Trying to make a query to retrieve the most recent posts (example being the 10 most recent posts) across all documents in a collection.
I tried querying this a few different ways.
MessageboardModel.find({"posts": {"time": {"$gte": ISODate("2014-07-02T00:00:00Z")}}} ...
I tried doing the above just to try getting to the proper nested time property, but everything I was trying throws an error. I'm definitely missing something here...
Here is an example document in the collection:
{
"_id": {
"$oid": "5c435d493dcf9281500cd177"
},
"movie": 433249,
"posts": [
{
"replies": [],
"_id": {
"$oid": "5c435d493dcf9281500cd142"
},
"username": "Username1",
"time": {
"$date": "2019-01-19T17:24:25.204Z"
},
"post": "This is a post title",
"content": "Content here."
},
{
"replies": [],
"_id": {
"$oid": "5c435d493dcf9281500cd123"
},
"username": "Username2",
"time": {
"$date": "2019-01-12T17:24:25.204Z"
},
"post": "This is another post made earlier",
"content": "Content here."
}
],
"__v": 0
}
There are many documents in the collection. I want to get, say the most recent 10 posts, across all of the documents in the entire collection.
Any help?
You can try using aggregation query:
Steps:
1> Match Specific doc
2> Stretch docs of its array using $unwind.
3> Sort using the time field from the posts.
4> Select fields , if specific fields needs to be shown.
5> Add limit, how many docs you want.
<YOUR_MODEL>.aggregate([
{$match:{
"movie": 433249 //you may add find conditions here, otherwise you can keep {} or remove $match from here
}},
{$unwind:"$posts"}, //this will make the each array element with different different docs.
{$sort:{"posts. time":1}}, // sort using the date field now, depends on your requirement use -1 /1
{$project:{posts:1}}, //select docs only from posts field. [u can remove if you want every element, or may modify]
{$limit:10} //you want only last 10 posts
]).exec();
let me know if you still having any issue or getting any error.
would love answer.

mongoDB - find first x documents, where rolling sum of their fields exceeds certain value

I have a mongoDB collection of documents like this:
{
"_id": 1,
"size": 10,
"name": "ABCD"
}
I would like to:
Sort them by "name" in ascending order
Return however many first documents from the result, where their cumulative "size" will be greater or equal to 100
I have briefly looked into $redact stage of aggregation framework, but I can't figure out whether I can store the cumulative sum outside the document. What would be the best approach to solve this problem?
EDIT:
An example collection:
{ "name": "AAAA", "size": 2}
{ "name": "BBBB", "size": 4}
{ "name": "CCCC", "size": 3}
So the query would be designed to return the first X documents, in order of their appearance, when their cumulative size reaches 6.
So output will be (because 2+4 is 6):
{ "name": "AAAA", "size": 2}
{ "name": "BBBB", "size": 4}
The only thing I can think of is to use the Cursor on the application level, and keep adding documents to result set, incrementing the "size" counter by value in the document. But is there a way to do that using Aggregation framework, for example?
EDIT2:
I also came across the 'rolling sum' terminology and using map-reduce. Sadly, in my case I would want the map-reduce operation to terminate when a global scope variable gets to or over a certain value, and I don't think it's possible (mapReduce will go over all documents fed to it at the outset).

How to join two collection in mongo without lookup

I have two collection, there name are post and comment.
The model structure is in the following.
I want to use aggregation query post and sort by comments like length sum, currently I can query a post comments like length sum in the following query statement.
My question is how can I query post and join comment collection in Mongo version 2.6. I know after Mongo 3.2 have a lookup function.
I want to query post collection and sort by foreign comments likes length. Is it have a best way to do this in mongo 2.6?
post
{
"_id": ObjectId("5a39e22c27308912334b4567"),
"uid": "0",
"content": "what is hello world mean?",
}
comment
/* 1 */
{
"_id": ObjectId("5a595d8c2703892c3d8b4567"),
"uid": "1",
"post_id": "5a39e22c27308912334b4567",
"comment": "hello world",
"like": [
"2"
]
}
/* 2 */
{
"_id": ObjectId("5a595d8c2703892c3d8b4512"),
"uid": "2",
"post_id": "5a39e22c27308912334b4567",
"comment": "hello stackoverflow",
"like": [
"1",
"2"
]
}
Query a post comments like sum
db.getCollection('comment').aggregate([
{
"$match": {
post_id: "5a39e22c27308912334b4567"
}
},
{
"$project": {
"likeLength": {
"$size": "$like"
},
"post_id": "$post_id"
}
},
{
"$group": {
_id: "$post_id",
"likeLengthSum": {
"$sum": "$likeLength"
}
}
}
])
There is no "best" way to query, as it'll really depend on your specific needs, but... you cannot perform a single query across multiple collections (aside from the $lookup aggregation pipeline function in later versions, as you already are aware).
You'll need to make multiple queries: one to your post collection, and one to your comment collection.
If you must perform a single query, then consider storing both types of documents in a single collection (with some identifier property to let you filter on either posts or comments, within your query).
There is no other way to join collections in the current MongoDB v6 without $lookup,
I can predict two reasons that causing you the issues,
The $lookup is slow and expensive - How to improve performance?
$lookup optimization:
Follow the guideline provided in the documentation
Use indexs:
You can use the index on the reference collection's fields, as per your sample data you can create an index for post_id field, an index for uid field, or a compound index for both the fields on the basis of your use cases
You can read more about How to Improve Performance with Indexes and Document Filters
db.comment.createIndex({ "post_id": -1 });
db.comment.createIndex({ "uid": -1 });
// or
db.comment.createIndex({ "post_id": -1, "uid": -1 });
Document Filters:
Use the $match, $limit, and $skip stages to restrict the documents that enter the pipeline
You can refer to the documentation for more detailed examples
{ $skip: 0 },
{ $limit: 10 } // as per your use case
Limit the $lookup result:
Try to limit the result of lookup by $limit stage,
Try to coordinate or balance with improved query and the UI/Use cases
You want to avoid $lookup - How to improve the collection schema to avoid $lookup?
Store the analytics/metrics:
If you are trying to get the total counts of the comments in a particular post then you must store the total count in the post collection whenever you get post get a new comment
{
"_id": ObjectId("5a39e22c27308912334b4567"),
"uid": "0",
"content": "what is hello world mean?",
// new fields
"total_comments": 10
}
Store minimum reference data:
If you want to show the comments of a particular post, you can limit the result for ex: show 5 comments per post
You can also store a max of 5 latest comments in the post collection to avoid the $lookup, whenever you get the latest comment then add it and just remove the oldest comment from 5 comments
{
"_id": ObjectId("5a39e22c27308912334b4567"),
"uid": "0",
"content": "what is hello world mean?",
// new fields
"total_comments": 10,
"comments": [
{
"_id": ObjectId("5a595d8c2703892c3d8b4567"),
"uid": "1",
"comment": "hello world"
},
{
"_id": ObjectId("5a595d8c2703892c3d8b4512"),
"uid": "2",
"comment": "hello stackoverflow"
}
]
}
Must read about Reduce $lookup Operations
Must read about Improve Your Schema