mongodb aggregation select specific document in group - mongodb

I need a bit help with mongodb aggregation.
first I have a $match to get filter some specific documents.
then I group by a field I need them grouped in.
the group I need to select a document where field value is ... and get that document as main data.
{"$match": {"$and": [
{chain: chain},
{dex: dex}
]}};
{$group: {
_id: "$pairAddress",
allChange: {"$push": "$$ROOT"},
baseToken: {$last: '$baseToken'},
txCount: {the document with timeframe inside this group 86400.txCount}
}},
{$sort: {txCount: -1}}
{$skip: 0}
{$limit: 100}
the group consist of documents with different timeframes, I need to somehow select a specific timeframe and add fields to the group from that timeframe. for example each timeframe has a different amount of txCount after group I want to sort by txCount and limit the amount and use skip for some pagination.
the problem is in selecting a document from that group with the specific timeframe.
anyone who could help me a bit to the right direction that would be awesome.
Here an example of how data is stored in the database and what I would like the result to be.
const document = {
_id: '3567356735672467',
pairAddress: '0x45jk6v34jy5634jkh5v6kj4h5v62j4h56', // group by pair address
baseToken: '0x456jn345k6hb4k5h6b3khb65k3hb56k3h4b6',
resolution: 86400, // a pair address has 6 documents with each a own timeframe 300, 900, 1800, 3600, 43200, 86400
base0: true,
txCount: 26,
buyCount: 10,
sellCount:16,
buyVolume: '2342354.345',
sellVolume: '1234.34',
volume: '1232352.345',
change: '12.34',
positive: true,
time: 1676865981,
chain: 'ETH',
dex: 'SUS',
price: '12.45',
};
const result = [
{
_id: "0x45jk6v34jy5634jkh5v6kj4h5v62j4h56",
allChange: {"$push": "$$ROOT"}, // array of all documents/timeframes for a pairAddress
selectedTxAmount: 26, // this needs to be the document with selected timeframe example 86400, selected from the group is must match the pairAddress
}
];
Maybe its possible to change the aggregation to make it work and faster.
match all timeframes, dex and chain.
sort by txCount.
skip X amount.
limit to 100
and return all document with a field containing all timestamps per the pairAddress left after the aggregation.
Currently thanks to #1sina1 I got this and it works.
{"$match": {"$and": [
{"chain": chain},
{"dex": dex}
]}},
{$group: {
_id: "$pairAddress",
allChange: {"$push": "$$ROOT"},
baseToken: {$last: '$baseToken'},
txCount: {
"$push": {
"$cond": {
"if": {
"$eq": [
"$resolution",
43200
]
},
"then": "$txCount",
"else": "$$REMOVE"
}
}
}
}},
{$sort: {txCount: -1}},
{$skip: parseInt(page) * 100},
{$limit: 100},
But I think there might be a way to do it just a bit different now we first group all (which is about 20k documents) I am only interested in 100, so maybe first match to timeframe/resolution then sort, skip, limit and then I just need from those 100 pairAddress all the according timeframes/resolutions for each as a flied allChange.

Related

How to addfields in MongoDB retrospectively

I have a schema that has a like array. This array stores all the people who have liked my post.
I just added a likeCount field as well but the likeCount fields default value is 0.
How can I create an addfields in mongoDB so that I can update the likeCount with the length of the like array?
I am on a MERN stack.
I am assuming you have a data structure like this:
{
postId: "post1",
likes: [ "ID1", "ID2", "ID3" ]
}
There is almost no reason to add a likeCount field. You should take the length of the likes array itself. Some examples:
db.foo.insert([
{'post':"P1", likes: ["ID1","ID2","ID3"]},
{'post':"P2", likes: ["ID1","ID2","ID3"]},
{'post':"P3", likes: ["ID4","ID2","ID6","ID7"]}
]);
// Which post has the most likes?
db.foo.aggregate([
{$addFields: {N: {$size: "$likes"}}},
{$sort: {"N":-1}}
//, {$limit: 2} // optionally limit to whatever
]);
// Is ID6 in likes?
// $match of a scalar to an input field ('likes') acts like
// $in for convenience:
db.foo.aggregate([ {$match: {'likes':'ID6'}} ]);
// Is ID6 OR ID3 in likes?
db.foo.aggregate([ {$match: {'likes':{$in:['ID6','ID3']}}} ]);
// Is ID2 AND ID7 in likes?
// This is a fancier way of doing set-to-set compares instead
// of a bunch of expression passed to $and:
var targets = ['ID7','ID2'];
db.foo.aggregate([
{$project: {X: {$eq:[2, {$size:{$setIntersection: ['$likes', targets]}} ]} }}
]);
// Who likes the most across all posts?
db.foo.aggregate([
{$unwind: '$likes'},
{$group: {_id: '$likes', N:{$sum:1}} },
{$sort: {'N':-1}}
]);
This is how to update all the documents with the respective likeCount values the first time:
db.collection.update({},
[
{
$addFields: {
likeCount: {
$size: "$like"
}
}
}
],
{
multi: true
})
Every next time somebody or multiple people are added to the like array , you may set the likeCount with the $size or you may increase the count with $inc operation.
Afcourse as #Buzz pointed below it is best to leave the array count() in the read code since updating every time like count() it will be an expensive operation leading to performance implication under heavy load ...
playground

Specific Field Wont Display In Mongo DB Aggregation Pipeline

I have a collection of Book Reviews where I am trying to find users who have created multiple reviews (lets say 5), I also want to return the number of reviews, their unique ID and their Name.
So far I have managed to find a way of doing this through aggregation, however for the life of me I cant seem to return the name field, I assumed a simple $project would be fine but instead I can only see the ID and the Number of reviews someone has made, what am i missing to fix this?
Current Code:
db.bookreviews.aggregate([
{"$group": {"_id": "$reviewerID","NumberOfReviews": { "$sum": 1 }}},
{"$match": {NumberOfReviews: {"$gte": 5}}},
{"$project":{_id:1,NumberOfReviews:1, reviewerName:1}},
])
Returned Values:
{IDXYZ, NumberofReviews 5},
{IDABC, NumberofReviews 5},
{ID123, NumberofReviews 5}
you can use $first to keep the first document of that group and keep the value of reviewerName in your $group stage and you can remove the $project.
db.bookreviews.aggregate([
{"$group": {"_id": "$reviewerID","NumberOfReviews": { "$sum": 1 }, "reviewerName": { "$first": "$reviewerName" } } },
{"$match": {"NumberOfReviews": {"$gte": 5}}},
])

MongoDB: calculate average value for the document & then do the same thing across entire collection

I have collection of documents (Offers) with subdocuments (Salary) like this:
{
_id: ObjectId("zzz"),
sphere: ObjectId("xxx"),
region: ObjectId("yyy"),
salary: {
start: 10000,
end: 50000
}
}
And I want to calculate average salary across some region & sphere for the entire collection. I created query for this, it works, but it takes care only about salary start value.
db.offer.aggregate(
[
{$match:
{$and: [
{"salary.start": {$gt: 0}},
{region: ObjectId("xxx")},
{sphere: ObjectId("yyy")}
]}
},
{$group: {_id: null, avg: {$avg: "$salary.start"}}}
]
)
But firstly I want to calculate avarage salary (start & end) of the offer. How can I do this?
Update.
If value for "salary.end" may be missing in your data, you need to add one additional "$project" iteration to replace missing "salary.end" with existing "salary.start". Otherwise, the result of the average function will be calculated wrong due to ignoring documents with the lack of "salary.end" values.
db.offer.aggregate([
{$match:
{$and: [
{"salary.start": {$gt: 0}},
{"region": ObjectId("xxx")},
{"sphere": ObjectId("yyy")}
]}
},
{$project:{"_id":1,
"sphere":1,
"region":1,
"salary.start":1,
"salary.end":1,
"salary.end": {$ifNull: ["$salary.end", "$salary.start"]}
}
},
{$project:{"_id":1,
"sphere":1,
"region":1,
"avg_salary":{$divide:[
{$add:["$salary.start","$salary.end"]}
,2
]}}},
{$group:{"_id":{"sphere":"$sphere","region":"$region"},
"avg":{$avg:"$avg_salary"}}}
])
The way you aggregate has to be modified:
Match the required region,sphere and where salary > 0.
Project a extra field for each offer, which holds the average of
start and end.
Now group together the records with the same region and sphere, and
apply the $avg aggregation operator on the avg_salary for each offer
in that group,to get the average salary.
The Code:
db.offer.aggregate([
{$match:
{$and: [
{"salary.start": {$gt: 0}},
{"region": ObjectId("xxx")},
{"sphere": ObjectId("yyy")}
]}
},
{$project:{"_id":1,
"sphere":1,
"region":1,
"avg_salary":{$divide:[
{$add:["$salary.start","$salary.end"]}
,2
]}}},
{$group:{"_id":{"sphere":"$sphere","region":"$region"},
"avg":{$avg:"$avg_salary"}}}
])

MongoDB - sort by subdocument match

Say I have a users collection in MongoDB. A typical user document contains a name field, and an array of subdocuments, representing the user's characteristics. Say something like this:
{
"name": "Joey",
"characteristics": [
{
"name": "shy",
"score": 0.8
},
{
"name": "funny",
"score": 0.6
},
{
"name": "loving",
"score": 0.01
}
]
}
How can I find the top X funniest users, sorted by how funny they are?
The only way I've found so far, was to use the aggregation framework, in a query similar to this:
db.users.aggregate([
{$project: {"_id": 1, "name": 1, "characteristics": 1, "_characteristics": '$characteristics'}},
{$unwind: "$_characteristics"},
{$match: {"_characteristics.name": "funny"}},
{$sort: {"_characteristics.score": -1}},
{$limit: 10}
]);
Which seems to be exactly what I want, except for the fact that according to MongoDB's documentation on using indexes in pipelines, once I call $project or $unwind in an aggregation pipeline, I can no longer utilize indexes to match or sort the collection, which renders this solution somewhat unfeasible for a very large collection.
I think you are half way there. I would do
db.users.aggregate([
{$match: { 'characteristics.name': 'funny' }},
{$unwind: '$characteristics'},
{$match: {'characteristics.name': 'funny'}},
{$project: {_id: 0, name: 1, 'characteristics.score': 1}},
{$sort: { 'characteristics.score': 1 }},
{$limit: 10}
])
I add a match stage to get rid of users without the funny attribute (which can be easily indexed).
unwind and match again to keep only the certain part of the data
keep only the necessary data with project
sort by the correct score
and limit the results.
that way you can use an index for the first match.
The way I see it, if the characteristics you are interested about are not too many, IMO it would be better to have your structure as
{
"name": "Joey",
"shy": 0.8
"funny": 0.6
"loving": 0.01
}
That way you can use an index (sparse or not) to make your life easier!

Mongodb aggregate query help - grouping with multiple fields and converting to an array

I have the following document in the mongodb collection
[{quarter:'Q1',project:'project1',user:'u1',cost:'100'},
{quarter:'Q2',project:'project1',user:'u2',cost:'100'},
{quarter:'Q3',project:'project1',user:'u1',cost:'200'},
{quarter:'Q1',project:'project2',user:'u2',cost:'200'},
{quarter:'Q2',project:'project2',user:'u1',cost:'300'},
{quarter:'Q3',project:'project2',user:'u2',cost:'300'}]
i need to generate an output which will sum the cost based on quarter and project and put it in the format so that it can be rendered in the Extjs chart.
[{quarter:'Q1','project1':100,'project2':200,'project3':300},
{quarter:'Q2','project1':100,'project2':200,'project3':300},
{quarter:'Q3','project1':100,'project2':200,'project3':300}]
i have tried various permutations and combinations of aggregates but couldnt really come up with a pipeline. your help or direction is greatly appreciated
Your cost data appears to be strings, which isn't helping, but assuming you're around that:
The main component is the $cond operator in the document projection, and assuming your data is larger and you want to group the results:
db.mstats.aggregate([
// Optionaly match first depending on what you are doing
// Sum up cost for each quarter and project
{$group: {_id: { quarter: "$quarter", project: "$project" }, cost: {$sum: "$cost" }}},
// Change the "projection" in $group, using $cond to add a key per "project" value
// We use $sum and the false case of 0 to fill in values not in the row.
// These will then group on the key adding the real cost and 0 together.
{$group: {
_id: "$_id.quarter",
project1: {$sum: {$cond:[ {$eq: [ "$_id.project", "project1" ]}, "$cost", 0 ]}},
project2: {$sum: {$cond:[ {$eq: [ "$_id.project", "project2" ]}, "$cost", 0 ]}}
}},
// Change the document to have the "quarter" key
{$project: { _id:0, quarter: "$_id", project1: 1, project2: 1}},
// Optionall sort by quarter
{$sort: {quarter: 1 }}
])
So after doing the initial grouping the document is altered with use of $cond to determine if the value of a key is going to go into the new key name. Essentially this asks if the current value of project is "project1" then put the cost value into this project1 key. And so on.
As we put a 0 value into this new document key when there was no match, we need to group the results again in order to merge the documents. Sorting is optional, but probably what you want for a chart.
Naturally you will have to build this up dynamically and probably query for the project keys that you want. But otherwise this should be what you are looking for.