I am new to MongoDB and Spring mongotemplate. I would like to build a query using mongotemplate whose equivalent in Postgres would be
select * from feedback
where feedback.outletId in (
select outletId from feedback
where feedback.createdOn >= '2013-05-03'::date
)
Is this even possible in MongoDB?
Well there is no concept of inner queries in MongoDB so basically it can be achieved by 2 queries but probably you already know that and want a 'better' solution. Since you asked if it is possible, I think it can be achieved by aggregation however that can be tricky.
db.feedback.aggregate([
{$project : {
'outletId' : 1,
'feedback._id' : '$_id',
'feedback.createdOn' : '$createdOn',
'feedback.a' : '$a'
}},
{$group : {
_id : $outletId,
feedbacks : {$addToSet : '$feedback'}
}},
{$match : {
'feedbacks.createdOn' : {
$gte : ISODate('2013-05-03')}
}},
{$unwind : '$feedback'}]);
You can add one more $project stage in the end to turn child object into values as it was in the document. I know it doesn't look pretty, but I would explain it stage by stage,
first we project a document with putting all the needed field inside a child field called feedback,
in second stage we grouped it by outletId and put all the child feedback into an array named feedbacks,(so for each outletid we get all feedbacks).
in third stage we use $match to filter out where there is not even a single feedback in array which createdOn field is greater than set date,
after those outletIds are filtered out we call unwind to get each child in feedbacks array as a single document.
Now if we talk about mongoTemplate, yes it accepts all these parameter for aggregation including the nesting of child in feedback in first stage. just see some example of TypedAggregation
if you are saving the createdOn field as a string instead of timestamp or ISODate even normal mongo queries won't work on that when you need to find range as its working in your postgres example.
Hope it helps.
Related
I am building a website using Next.js and MongoDB. On one of my website page, I have implemented filters to help search for products. To retrieve and update the filters (update item count each time a filter is changing), I have an api endpoint which query my MongoDB Collection. This specific collection contains ~200.000 items. Each item have several fields such as brand, model, place etc...
I have 9 fields which I use to filter and thus must fetch through my api each time there's a change. Therefore I have 9 queries running through my api, on for each field/filter and the query on MongoDB looks like :
var models = await db_collection
.aggregate([
{
$match: {
$and: [filter],
},
},
{
$group: { _id: '$model', count: { $sum: 1 } },
},
{ $sort: { _id: 1 } },
])
.toArray();
The problem is that, as 9 queries are running, the update of the page (mainly due to the queries) takes ~4secs which is too long. I would like to reach <1sec. I would like to now if there is a good practice I am missing such as doing one query instead of one for each filter or maybe a database optimization on my database.
Thank you,
I have tried using a $project argument before $groupon aggregate pipeline for the query to reduce the number of field returned, using distinct and then sorting instead of aggregate but none of these solutions seem to improve efficiency.
EDIT :
As suggested by R2D2, I am posting the structure of a document on MongoDB in my collection :
{
_id : ObjectId('example_id')
source : string
date : date
brand : string
family : string
model : string
size : string
color : string
condition : string
contact : string
SKU : string
}
Depending on the pages, I query unique values of each field of interest (source, date, brand, family, model, size, color, condition, contact) and their count depending on filters (e.g. Number for each unique values of model for selected brands, I also query documents based on specific values of these fields.
As mentioned, you indexes are important and if you are querying by those field I recomand to create compound indexes, see here for indexes optimisation : https://learnmongodbthehardway.com/schema/indexes/
As far as the aggregation pipeline goes, nothing is out of the ordinary, but this specific aggregation just return the number of items per model matching the criteria, not the matching document. If it is all the data you need you might find it usefull to create a new collection when you perform pre-caculation for common search daily (how many items have the color black, ...) this way, when the page loads, you don't have to look in you 200k+ items, but just in your pre-calculated statistical collection. Schedule a cron task or use a lambda function to invoke a route on your api that will calculate all your stats once a day and upsert them in a new collection.
Also I believe the "and" is useless useless since you can use the implicit $and. You can look for an object like :
{
color : {$in : ['BLACK', 'BLUE']},
size : 3
}
rather than :
[{color : 'BLACK'}, {color : 'BLUE'}, {size : 3}]
Reserve the explicit $and for when you really need it.
I have a collection that contains the following fields: agentId, postBalance, preBalance, etc. I want to fetch the last unique record for an agent that contains the field stated earlier based on a date filter.
db.transaction.find(
{
"createdAt" : {
"$gte": ISODate("2022-09-01T00:00:00Z"),
"$lt": ISODate("2022-09-02T00:00:00Z")
}
},
{
“agentId”: 1,
“walletBalance”: 1
}
)
The query above returns duplicate values and not the latest one. How best do I optimise this query. I am using Mongo Compass so I don't mind any query that comes in that format. I have read up on $last, $natural but they don't seem to solve my issue.
Have you tried to add sort by "createdAt" and limit of 1, or just using findOne method with same sort?
Can I create a query, something like this below
db.getCollection('walkins.businesses').update(
{$and:[{"loyaltyModule.isPublished": true},{"loyaltyModule.publishAt": {"$eq":null}}]},
{$set : {"loyaltyModule.publishAt":"this.loyaltyModule.creationAt"}}, {multi:true}
)
to set value of creationAt as publishAt using update query directly where creationAt is already in collection.
Can I set the value of publishAt using another field creationAt in the same document?
With Aggregate
The best way to do this is to use the aggregation framework to compute our new field.
using the $addFields and the $out aggregation pipeline operators.
db.getCollection('walkins.businesses').aggregate(
[
{$match : {$and:[{"loyaltyModule.isPublished": true},{"loyaltyModule.publishAt": {"$eq":null}}]}},
{ "$addFields": {
"loyaltyModule.publishAt":"loyaltyModule.creationAt"
}},
{ "$out": "collection" }
]
)
Note that this does not update your collection but instead replace the existing collection or create a new one. Also for update operations that require "type casting" you will need client side processing, and depending on the operation, you may need to use the find() method instead of the .aggreate() method
With Iteration of cursor
you can iterate the cursor and update
db.getCollection('walkins.businesses').find({$and:[{"loyaltyModule.isPublished": true},{"loyaltyModule.publishAt": {"$eq":null}}]}).forEach(function(x){
db.getCollection('walkins.businesses').update({_id : x._id },
{$set : {"loyaltyModule.publishAt": x.loyaltyModule.creationAt}},
{multi:true}
)
})
here, you can't update multiple records at one update query due to update happening by matching with _id field
I'm trying to get the last 20 records of user collection with mongoose:
User.find({'owner': req.params.id}).
sort(date:'-1').
limit(20).
exec(.....)
This works well, show the last 20 items.
But the items inside the array are sorted from the most recent to the oldest, Is there any way to reverse this with mongoose?
Thanks
You can certainly do this with an aggregation, such as this:
db.user.aggregate[(
{ $match : {"owner" : req.params.id}},
{ $sort : {"date" : -1}},
{ $limit : 20},
{ $sort : {"date" : 1}}
])
Notes on this aggregation:
The first three parts do the same job as the Find in your question
The fourth part applies a further sort, which re-orders the returned 20 records from oldest to most recent
I have written it in native MongoDB aggregation syntax; you will need to adjust the code to generate the same aggregation from Mongoose.
Update: I think this is not possible with a find() with cursor methods, because you would need two different sort() operations. But, MongoDB does not treat them as a sequence of independent operations; the docs give an example of methods written in one order — sort().limit() — being equivalent to the opposite order — limit().sort(), showing that the order cannot be relied upon as meaningful.
Find total and select only latest 20 , may be this is not effective way you found , but this will solve your problem.
User.count({'owner': req.params.id},function(err,count){
if(count){
var skipItem=count-20;
User.find({'owner': req.params.id}).
.skip(skipItem)
.limit(20)
.sort(date:'1').
exec(.....)
}
});
db.users.aggregate([
{ $match: {
'owner': req.params.id
}},
{ $unwind: '[arrayFieldName]' },
{ $sort: {
'[arrayFieldName]': -1/1,
'date':-1
}}
])
I have a mongo collection 'books'. Here's a typical book:
BOOK
name: 'Test Book'
author: 'Joe Bloggs'
print_runs: [
{publisher: 'OUP', year: 1981},
{publisher: 'Penguin', year: 1987},
{publisher: 'Harper-Collins', year: 1992}
]
I'd like to be able to filter books to return only books whose last print run was after a given date, and/or before a given date...and I've been struggling to find a feasible query. Any suggestions appreciated.
There are a few options, as getting access to the "last" element in the array and only filtering on that is difficult/impossible with the normal find options in MongoDB queries. (Unfortunately, you can't $slice with find).
Store the most recent published publisher and year in the print_runs array and in a special (denormalized/copy) of the data directly on the book object. Book.last_published_by and Book.last_published_date for example. Queries would be simple and super fast.
MapReduce. This would be simple enough to emit the last element in the array and then "reduce" it to just that. You'd need to do incremental updates on the MapReduce to keep it accurate.
Write a relatively complex aggregation framework expression
The aggregation might look like:
db.so.aggregate({ $project :
{ _id: 1, "print_run_year" : "$print_runs.year" }},
{ $unwind: "$print_run_year" },
{ $group : { _id : "$_id", "newest" : { $max : "$print_run_year" }}},
{ $match : { "newest" : { $gt : 1991, $lt: 2000 } }
})
As it may require a bit of explanation:
It projects and unwinds the year of the print runs for each book.
Then, group on the _id (of the book, and create a new computed field called, newest which contains the highest print run year (from the projection).
Then, filter on newest using a $gt and $lt
I'd suggest option #1 above would be the best from an efficiency perspective, followed by the MapReduce, and then a distant third, option #3.