Next.js/MongoDB - Query Optimization - mongodb

I am building a website using Next.js and MongoDB. On one of my website page, I have implemented filters to help search for products. To retrieve and update the filters (update item count each time a filter is changing), I have an api endpoint which query my MongoDB Collection. This specific collection contains ~200.000 items. Each item have several fields such as brand, model, place etc...
I have 9 fields which I use to filter and thus must fetch through my api each time there's a change. Therefore I have 9 queries running through my api, on for each field/filter and the query on MongoDB looks like :
var models = await db_collection
.aggregate([
{
$match: {
$and: [filter],
},
},
{
$group: { _id: '$model', count: { $sum: 1 } },
},
{ $sort: { _id: 1 } },
])
.toArray();
The problem is that, as 9 queries are running, the update of the page (mainly due to the queries) takes ~4secs which is too long. I would like to reach <1sec. I would like to now if there is a good practice I am missing such as doing one query instead of one for each filter or maybe a database optimization on my database.
Thank you,
I have tried using a $project argument before $groupon aggregate pipeline for the query to reduce the number of field returned, using distinct and then sorting instead of aggregate but none of these solutions seem to improve efficiency.
EDIT :
As suggested by R2D2, I am posting the structure of a document on MongoDB in my collection :
{
_id : ObjectId('example_id')
source : string
date : date
brand : string
family : string
model : string
size : string
color : string
condition : string
contact : string
SKU : string
}
Depending on the pages, I query unique values of each field of interest (source, date, brand, family, model, size, color, condition, contact) and their count depending on filters (e.g. Number for each unique values of model for selected brands, I also query documents based on specific values of these fields.

As mentioned, you indexes are important and if you are querying by those field I recomand to create compound indexes, see here for indexes optimisation : https://learnmongodbthehardway.com/schema/indexes/
As far as the aggregation pipeline goes, nothing is out of the ordinary, but this specific aggregation just return the number of items per model matching the criteria, not the matching document. If it is all the data you need you might find it usefull to create a new collection when you perform pre-caculation for common search daily (how many items have the color black, ...) this way, when the page loads, you don't have to look in you 200k+ items, but just in your pre-calculated statistical collection. Schedule a cron task or use a lambda function to invoke a route on your api that will calculate all your stats once a day and upsert them in a new collection.
Also I believe the "and" is useless useless since you can use the implicit $and. You can look for an object like :
{
color : {$in : ['BLACK', 'BLUE']},
size : 3
}
rather than :
[{color : 'BLACK'}, {color : 'BLUE'}, {size : 3}]
Reserve the explicit $and for when you really need it.

Related

Inner query using mongo template

I am new to MongoDB and Spring mongotemplate. I would like to build a query using mongotemplate whose equivalent in Postgres would be
select * from feedback
where feedback.outletId in (
select outletId from feedback
where feedback.createdOn >= '2013-05-03'::date
)
Is this even possible in MongoDB?
Well there is no concept of inner queries in MongoDB so basically it can be achieved by 2 queries but probably you already know that and want a 'better' solution. Since you asked if it is possible, I think it can be achieved by aggregation however that can be tricky.
db.feedback.aggregate([
{$project : {
'outletId' : 1,
'feedback._id' : '$_id',
'feedback.createdOn' : '$createdOn',
'feedback.a' : '$a'
}},
{$group : {
_id : $outletId,
feedbacks : {$addToSet : '$feedback'}
}},
{$match : {
'feedbacks.createdOn' : {
$gte : ISODate('2013-05-03')}
}},
{$unwind : '$feedback'}]);
You can add one more $project stage in the end to turn child object into values as it was in the document. I know it doesn't look pretty, but I would explain it stage by stage,
first we project a document with putting all the needed field inside a child field called feedback,
in second stage we grouped it by outletId and put all the child feedback into an array named feedbacks,(so for each outletid we get all feedbacks).
in third stage we use $match to filter out where there is not even a single feedback in array which createdOn field is greater than set date,
after those outletIds are filtered out we call unwind to get each child in feedbacks array as a single document.
Now if we talk about mongoTemplate, yes it accepts all these parameter for aggregation including the nesting of child in feedback in first stage. just see some example of TypedAggregation
if you are saving the createdOn field as a string instead of timestamp or ISODate even normal mongo queries won't work on that when you need to find range as its working in your postgres example.
Hope it helps.

How to store an ordered set of documents in MongoDB without using a capped collection

What's a good way to store a set of documents in MongoDB where order is important? I need to easily insert documents at an arbitrary position and possibly reorder them later.
I could assign each item an increasing number and sort by that, or I could sort by _id, but I don't know how I could then insert another document in between other documents. Say I want to insert something between an element with a sequence of 5 and an element with a sequence of 6?
My first guess would be to increment the sequence of all of the following elements so that there would be space for the new element using a query something like db.items.update({"sequence":{$gte:6}}, {$inc:{"sequence":1}}). My limited understanding of Database Administration tells me that a query like that would be slow and generally a bad idea, but I'm happy to be corrected.
I guess I could set the new element's sequence to 5.5, but I think that would get messy rather quickly. (Again, correct me if I'm wrong.)
I could use a capped collection, which has a guaranteed order, but then I'd run into issues if I needed to grow the collection. (Yet again, I might be wrong about that one too.)
I could have each document contain a reference to the next document, but that would require a query for each item in the list. (You'd get an item, push it onto the results array, and get another item based on the next field of the current item.) Aside from the obvious performance issues, I would also not be able to pass a sorted mongo cursor to my {#each} spacebars block expression and let it live update as the database changed. (I'm using the Meteor full-stack javascript framework.)
I know that everything has it's advantages and disadvantages, and I might just have to use one of the options listed above, but I'd like to know if there is a better way to do things.
Based on your requirement, one of the approaches could be to design your schema, in such a way that each document has the capability to hold more than one document and in itself act as a capped container.
{
"_id":Number,
"doc":Array
}
Each document in the collection will act as a capped container, and the documents will be stored as array in the doc field. The doc field being an array, will maintain the order of insertion.
You can limit the number of documents to n. So the _id field of each container document will be incremental by n, indicating the number of documents a container document can hold.
By doing these you avoid adding extra fields to the document, extra indices, unnecessary sorts.
Inserting the very first record
i.e when the collection is empty.
var record = {"name" : "first"};
db.col.insert({"_id":0,"doc":[record]});
Inserting subsequent records
Identify the last container document's _id, and the number of
documents it holds.
If the number of documents it holds is less than n, then update the
container document with the new document, else create a new container
document.
Say, that each container document can hold 5 documents at most,and we want to insert a new document.
var record = {"name" : "newlyAdded"};
// using aggregation, get the _id of the last inserted container, and the
// number of record it currently holds.
db.col.aggregate( [ {
$group : {
"_id" : null,
"max" : {
$max : "$_id"
},
"lastDocSize" : {
$last : "$doc"
}
}
}, {
$project : {
"currentMaxId" : "$max",
"capSize" : {
$size : "$lastDocSize"
},
"_id" : 0
}
// once obtained, check if you need to update the last container or
// create a new container and insert the document in it.
} ]).forEach( function(check) {
if (check.capSize < 5) {
print("updating");
// UPDATE
db.col.update( {
"_id" : check.currentMaxId
}, {
$push : {
"doc" : record
}
});
} else {
print("inserting");
//insert
db.col.insert( {
"_id" : check.currentMaxId + 5,
"doc" : [ record ]
});
}
})
Note that the aggregation, runs on the server side and is very efficient, also note that the aggregation would return you a document rather than a cursor in versions previous to 2.6. So you would need to modify the above code to just select from a single document rather than iterating a cursor.
Inserting a new document in between documents
Now, if you would like to insert a new document between documents 1 and 2, we know that the document should fall inside the container with _id=0 and should be placed in the second position in the doc array of that container.
so, we make use of the $each and $position operators for inserting into specific positions.
var record = {"name" : "insertInMiddle"};
db.col.update(
{
"_id" : 0
}, {
$push : {
"doc" : {
$each : [record],
$position : 1
}
}
}
);
Handling Over Flow
Now, we need to take care of documents overflowing in each container, say we insert a new document in between, in container with _id=0. If the container already has 5 documents, we need to move the last document to the next container and do so till all the containers hold documents within their capacity, if required at last we need to create a container to hold the overflowing documents.
This complex operation should be done on the server side. To handle this, we can create a script such as the one below and register it with mongodb.
db.system.js.save( {
"_id" : "handleOverFlow",
"value" : function handleOverFlow(id) {
var currDocArr = db.col.find( {
"_id" : id
})[0].doc;
print(currDocArr);
var count = currDocArr.length;
var nextColId = id + 5;
// check if the collection size has exceeded
if (count <= 5)
return;
else {
// need to take the last doc and push it to the next capped
// container's array
print("updating collection: " + id);
var record = currDocArr.splice(currDocArr.length - 1, 1);
// update the next collection
db.col.update( {
"_id" : nextColId
}, {
$push : {
"doc" : {
$each : record,
$position : 0
}
}
});
// remove from original collection
db.col.update( {
"_id" : id
}, {
"doc" : currDocArr
});
// check overflow for the subsequent containers, recursively.
handleOverFlow(nextColId);
}
}
So that after every insertion in between , we can invoke this function by passing the container id, handleOverFlow(containerId).
Fetching all the records in order
Just use the $unwind operator in the aggregate pipeline.
db.col.aggregate([{$unwind:"$doc"},{$project:{"_id":0,"doc":1}}]);
Re-Ordering Documents
You can store each document in a capped container with an "_id" field:
.."doc":[{"_id":0,","name":"xyz",...}..]..
Get hold of the "doc" array of the capped container of which you want
to reorder items.
var docArray = db.col.find({"_id":0})[0];
Update their ids so that after sorting the order of the item will change.
Sort the array based on their _ids.
docArray.sort( function(a, b) {
return a._id - b._id;
});
update the capped container back, with the new doc array.
But then again, everything boils down to which approach is feasible and suits your requirement best.
Coming to your questions:
What's a good way to store a set of documents in MongoDB where order is important?I need to easily insert documents at an arbitrary
position and possibly reorder them later.
Documents as Arrays.
Say I want to insert something between an element with a sequence of 5 and an element with a sequence of 6?
use the $each and $position operators in the db.collection.update() function as depicted in my answer.
My limited understanding of Database Administration tells me that a
query like that would be slow and generally a bad idea, but I'm happy
to be corrected.
Yes. It would impact the performance, unless the collection has very less data.
I could use a capped collection, which has a guaranteed order, but then I'd run into issues if I needed to grow the collection. (Yet
again, I might be wrong about that one too.)
Yes. With Capped Collections, you may lose data.
An _id field in MongoDB is a unique, indexed key similar to a primary key in relational databases. If there is an inherent order in your documents, ideally you should be able to associate a unique key to each document, with the key value reflecting the order. So while preparing your document for insertion, explicitly add an _id field as this key (if you do not, mongo creates it automatically with a BSON objectid).
As far as retrieving the results are concerned, MongoDB does not guarantee the order of return documents unless you explicitly use .sort() . If you do not use .sort(), the results are usually returned in natural order (order of insertion).Again, there is no guarantee on this behavior.
I'd advise you to override _id with your order while inserting, and use a sort while retrieving. Since _id is a necessary and auto-indexed entity, you will not be wasting any space defining a sort key, and storing the index for it.
For abitrary sorting of any collection, you'll need a field to sort it on. I call mine "sequence".
schema:
{
_id: ObjectID,
sequence: Number,
...
}
db.items.ensureIndex({sequence:1});
db.items.find().sort({sequence:1})
Here is a link to some general sorting database answers that may be relevant:
https://softwareengineering.stackexchange.com/questions/195308/storing-a-re-orderable-list-in-a-database/369754
I suggest going with Floating point solution - adding a position column:
Use a floating-point number for the position column.
You can then reorder the list changing only the position column in the "moved" row.
If your user wants to position "red" after "blue" but before "yellow" Then you just need to calculate
red.position = ((yellow.position - blue.position) / 2) + blue.position
After a few re-positions in the same place (Cuttin in half every time) - you might reach a wall - it's better that if you reach a certain threshold - to resort the list.
When retrieving it you can simply say col.sort() to get it sorted and no need for any client-side code (Like in the case of a Linked list solution)

Return only matching subdocs with Mongo aggregation

I have a schema with subdocs.
// Schema
var company = {
_id: ObjectId,
publish: Boolean,
divisions: {
employees: [ObjectId]
}
};
I need to find all the subdocs (divisions) that match my query. It appears that I have to use 2 matches - one to filter out initial docs and a second one to filter out the matching subdocs from the resulting $unwind operation. Is there a more efficient way?
// Query
this.aggregate({
$match: {
'publish': 1,
'divisions.employees': new ObjectId(userid)
}
}, {
$unwind: '$divisions'
}, {
$match: {
'divisions.employees': new ObjectId(userid)
}
}
I found this ticket but I am unsure this does what I need.
Doing both matches is the right thing here. You could eliminate the first match stage and just unwind, but having an initial $match allows you to narrow down the pipeline to exactly those documents that will produce at least one output document (i.e. those documents for which publish : true and some employees ObjectId matches the given ObjectId). You will be able to use indexes, like an index on { publish : 1, divisions.employees : 1 }, to perform the first match stage quickly.
However, you should ask yourself why you are using the aggregation pipeline here and if it is appropriate. Will you commonly be querying for a given employee that's part of a company with publish : 1? Is this one of the main queries for your use case? If it's infrequent or not critical then the aggregation is a good way to do it. Otherwise, you should reconsider the schema. You could make this query easy with a schema like
{
"_id" : ObjectId,
"publish" : Boolean,
"company" : (unique identifier, possibly a String or ObjectId)
}
that models employees as documents and denormalizes company information into the employee document. Then your query is as easy as
db.employees.find({ "_id" : ObjectId(userid), "publish" : true })
which will be much quicker than the aggregation (not that the aggregation will be slow; this is just relatively quicker). I'm not telling you to do it this way - use your own knowledge of your use case to make the right call.

Reference an _id in a Subdocument in another Collection Mongodb

I am developing an application with mongodb and nodejs
I should also mention that I am new to both so please help me through this question
my database has a collection categories and then in each category I am storing products in subdocument
just like below :
{
_id : ObjectId(),
name: String,
type: String,
products : [{
_id : ObjectId(),
name : String,
description : String,
price : String
}]
});
When it comes to store the orders in database the orders collection will be like this:
{
receiver : String,
status : String,
subOrders : [
{
products :[{
productId : String,
name : String,
price : String,
status : String
}],
tax : String,
total : String,
status : String,
orderNote : String
}
]
}
As you can see we are storing _id of products which is a subdocument of categories in orders
when storing there is no issue obviously, when it comes to fetch these data if we just need the limited field like name or price there will be no issue as well, but if later on we need some extra fields from products like description,... they are not stored in orders.
My question is this:
Is there any easy way to access other fields of products apart from loop through the whole categories in mongodb, namely I need a sample code for querying the description of a product by only having its _id in mongodb?
or our design and implementation was wrong and I have to re-design it from scratch and separate the products from categories into another collection?
please don't put links to websites or weblogs that generally talks about mongodb and its collections implementations unless they focus on a very similar issue to mine
thanks in advance
I'd assume that you'd want to return as many product descriptions as matched the current list of products, so first, there isn't a query to return only matching array elements. Using $elemMatch you can return a specific element or the first match, but not only matching array elements. However, $elemMatch can also be used as a projection operator.
db.categories({ "products._id" : "PID1" },
{ $elemMatch : { "products._id" : "PID1" },
"products._id" : 1,
"products.description" : 1})
You'd definitely want to index the "products._id" field to achieve reasonable performance.
You might consider instead creating a products collection where each document contains a category identifier, much like you would in a relational database. This is a common pattern in MongoDb when embedding doesn't make sense, or complicates queries and aggregations.
Assuming that is true:
You'll need to load the data from the second collection manually. There are no joins in MognoDb. You might consider using $in which takes a list of values for a field and loads all matching documents.
Depending on the driver you're using to access MongoDb, you should be able to use the projection feature of find, which can limit the fields returned for a document to just those you've specified.
As product descriptions ardently likely to change frequently, you might also consider caching the values for a period on the client (like a web server for example).
db.products.find({ _id: { $in : [ 'PID1', 'PID2'] } }, { description : 1 })

MongoDB find repeteated values in an array

Say I have a collection with documents like—
{
'name': 'Hawaiian',
'toppings': ['ham', 'cheese', 'pineapple'],
}
Or—
{
'name': 'Peperonni',
'toppings': ['cheese', 'pepperoni'],
}
How can I get a list of all toppings that appear in more than one document? So, for the two documents above, it'd be cheese.
Ideally as "close" to the database as possible—I know I can get a list of all toppings with distinct, then loop through all documents at the application level, but that'd be too expensive.
Thanks!
Though a long query, but you can take a look.
This is the aggregation framework with mongodb 2.2
db.test2.aggregate({$project:{"toppings":1, "_id":0}}, {$unwind:"$toppings"}, {$group:{"_id":"$toppings", count:{$sum:1}}}, {$match:{count:{$gt:1}}}, {$project:{"_id":1}})
{ "result" : [ { "_id" : "cheese" } ], "ok" : 1 }
Explain my query step:
Only want the toppings field
Expand all the values in toppings
Group by values in toppings and count the number
Find the number of the value which bigger than 1
Get only value(toppings), count is not needed.
I would get the list of all toppings, and then check for
db.coll.find({"topping": topping}).count() > 1
Note that I tried this in the mongo shell, and while the pymongo syntax would be exactly the same, I'm not sure where the count is implemented - in pymongo or in the database.
[EDIT]
pymongo seems to delegate the count() to mongodb, so that instead of a full query, the count operation is performed by the database.