Query object with max field on MongoDB - mongodb

I am new to MongoDB and I use Atlas & Charts in order to query and visualize the results.
I want to create a graph that shows the max amount of money every day, and indicate the person with the max amount of money.
for example:
if my collection contains the following documents:
{"date": "15-12-2020", "name": "alice", "money": 7}
{"date": "15-12-2020", "name": "bob", "money": 9}
{"date": "16-12-2020", "name": "alice", "money": 39}
{"date": "16-12-2020", "name": "bob", "money": 25}
what should be the query I put on query box (on "Charts") in order to create a graph with the following result?
date | max_money | the_person_with_max_money
15-12-2020 9 bob
16-12-2020 39 alice

You have to use an aggregation and I think this should works.
First of all $sort values by money (I'll explain later why).
And then use $group to group values by date.
The query looks like this:
db.collection.aggregate([
{
"$sort": { "money": -1 }
},
{
"$group": {
"_id": "$date",
"max_money": { "$max": "$money" },
"the_person_with_max_money": { "$first": "$name" }
}
}
])
Example here
How this works? Well, there is a "problem" using $group, is that you can't keep values for the next stage unless you uses an accumulator, so, the best way it seems is to use $first to get the first name.
And this is why is sorted by money descendent, to get the name whose money value is the greatest at first position.
So, sorting we ensure that the first value is what you want.
And then using group to group the documents with the same date and create the fields max_money and the_person_with_max_money.

Related

Nested grouping in MongoDB aggregation

Context:
I have a MongoDB full of Documents like this, which I want to dumb into one grouped json:
[
{
"_id": "615dc97907f597330c510279",
"code": "SDFSDFSDF",
"location": "ABC1",
"week_number": 40,
"year": 2021,
"region": "NA"
},
....
{
"_id": "615dc97907f597330c51027a",
"code": "SDFSGSGR",
"location": "ABC1",
"week_number": 40,
"year": 2021,
"region": "EU"
},
....
{
"_id": "615dc97607f597330c50ff50",
"code": "GGSFHSFS",
"location": "DEF2",
"week_number": 40,
"year": 2021,
"region": "EU",
"audit_result": {
"issues_found": true,
"comment": "comment."
}
}
]
I am trying to write an aggregation which should return and object like this:
{
[
"EU": {
2021: {
40: {
"ABC1": {
(All documents for location ABC1 and week 40, year 2021 and region EU)
}
},
39: {
....
}
},
2020: {
....
}
},
"NA": {
....
}
]
}
Problem:
I am not 100% sure how.
I started grouping them by region but I am not sure how to proceed after the first group.
I tried grouping them by location first and group my way up to region but that also does not seem to work as I expected it.
The docs don't talk about a case like this and examples I find only group by one or two things, not four.
any insights highly appreciated :)
Using dynamic values as field name is generally considered as anti-pattern and you should avoid that. You are likely to introduce unnecessary difficulty to composing and maintaining your queries.
Nevertheless, you can do the followings in an aggregation pipeline:
$group at the finest level: region, year, week_number, location; $addToSet to group all the $ROOT document into an array named v
$group at 1 coarser level: region, year, week_number; create k-v tuples that k is the location and v is the v from step 1. Use $addToSet to group the k-v tuples into an array named v
use $arrayToObject to convert your k-v tuples into fields with dynamic values e.g.
"ABC" : [
{
"_id": "615dc97907f597330c510279",
...
},
...
]
Basically repeating step 2 & 3 at 1 coarser level: region, year; create k-v tuples that k is the location and v is the v from step 3. Use $addToSet to group the k-v tuples into an array named v
Repeat step 4 at 1 coarser level: region
$group unconditionally (i.e. $group by _id: null); repeating previous step to put the results into a single array named v; use $arrayToObject to convert it again
$replaceRoot to obtain your expected result
Here is one small note: when $arrayToObject for numeric k value like year and week_number, the k value needs to be converted into String beforehand. You can use $toString to achieve this.
Here is the Mongo playground for your reference.

MongoDB: returning documents in order until a condition match

In a MongoDB collection, I have documents with a "position" field for ordering and an optional "date" field, e.g.
[
{
"_id": "doc1",
"position": 1
},
{
"_id": "doc2",
"position": 2,
"date": "2021-05-20T08:00:00.000Z"
},
{
"_id": "doc3",
"position": 3
},
{
"_id": "doc4",
"position": 4,
"date": "2021-05-20T08:00:00.000Z"
}
]
I would like the query this collection to get the documents "before" a specified date, in position order. The algorithm would be:
find the first element whose date is "after" the specified date
return all the documents whose position is less than the position of the element found, sorted by "position"
I have implemented this algorithm naïvely with 2 independent queries. However, I suspect it can be done with a single call to the database, but I have no idea how to proceed. Maybe with an aggregation pipeline?
Can someone give me a clue how this can be done?
EDIT: Here are the current queries I use (roughly):
limit_element = db.getCollection('collection').find({
"date": { "$gte": ISODate("2021-05-20T08:00:00.000Z") }
}).sort({
"position": 1
}).limit(1)
position = limit_element['position']
elements = db.getCollection('collection').find({
"position": { "$lt": position }
}).sort({
"position": 1
})
You can use an aggregation pipeline with two match clauses. Essentially its the same thing as you do now but within one DB access so a bit faster. With aggregation you can acess results from the previus stage to use in the next stage. If that is worth it you have to decide. I think your naive approach is sensible. In any case this a conditional problem so you will have to first find one and then do the other. Difference is just where you do the steps.

Trying to fetch data from Nested MongoDB Database?

I am beginner in MongoDB and struck at a place I am trying to fetch data from nested array but is it taking so long time as data is around 50K data, also it is not much accurate data, below is schema structure please see once -
{
"_id": {
"$oid": "6001df3312ac8b33c9d26b86"
},
"City": "Los Angeles",
"State":"California",
"Details": [
{
"Name": "Shawn",
"age": "55",
"Gender": "Male",
"profession": " A science teacher with STEM",
"inDate": "2021-01-15 23:12:17",
"Cars": [
"BMW","Ford","Opel"
],
"language": "English"
},
{
"Name": "Nicole",
"age": "21",
"Gender": "Female",
"profession": "Law student",
"inDate": "2021-01-16 13:45:00",
"Cars": [
"Opel"
],
"language": "English"
}
],
"date": "2021-01-16"
}
Here I am trying to filter date with date and Details.Cars like
db.getCollection('news').find({"Details.Cars":"BMW","date":"2021-01-16"}
it is returning details of other persons too which do not have cars- BMW , Only trying to display details of person like - Shawn which have BMW or special array value and date too not - Nicole, rest should not appear but is it not happening.
Any help is appreciated. :)
A combination of $match on the top-level fields and $filter on the array elements will do what you seek.
db.foo.aggregate([
{$match: {"date":"2021-01-16"}}
,{$addFields: {"Details": {$filter: {
input: "$Details",
as: "zz",
cond: { $in: ['BMW','$$zz.Cars'] }
}}
}}
,{$match: {$expr: { $gt:[{$size:"$Details"},0] } }}
]);
Notes:
$unwind is overly expensive for what is needed here and it likely means "reassembling" the data shape later.
We use $addFields where the new field to add (Details) already exists. This effectively means "overwrite in place" and is a common idiom when filtering an array.
The second $match will eliminate docs where the date matches but not a single entry in Details.Cars is a BMW i.e. the array has been filtered down to zero length. Sometimes you want to know this info so if this is the case, do not add the final $match.
I recommend you look into using real dates i.e. ISODate instead of strings so that you can easily take advantage of MongoDB date math and date formatting functions.
Is a common mistake think that find({nested.array:value}) will return only the nested object but actually, this query return the whole object which has a nested object with desired value.
The query is returning the whole document where value BMW exists in the array Details.Cars. So, Nicole is returned too.
To solve this problem:
To get multiple elements that match the criteria you can do an aggregation stage using $unwind to separate the different objects into array and match by the criteria you want.
db.collection.aggregate([
{
"$match": { "Details.Cars": "BMW", "date": "2021-01-26" }
},
{
"$unwind": "$Details"
},
{
"$match": { "Details.Cars": "BMW" }
}
])
This query first match by the criteria to avoid $unwind over all collection.
Then $unwind to get every document and $match again to get only the documents you want.
Example here
To get only one element (for example, if you match by _id and its unique) you can use $elemMatch in this way:
db.collection.find({
"Details.Cars": "BMW",
"date": "2021-01-16"
},
{
"Details": {
"$elemMatch": {
"Cars": "BMW"
}
}
})
Example here
You can use $elemenMatch into query or projection stage. Docs here and here
Using $elemMatch into query the way is this:
db.collection.find({
"Details": {
"$elemMatch": {
"Cars": "BMW"
}
},
"date": "2021-01-16"
},
{
"Details.$": 1
})
Example here
The result is the same. In the second case you are using positional operator to return, as docs says:
The first element that matches the query condition on the array.
That is, the first element where "Cars": "BMW".
You can choose the way you want.

How to process mongo documents and get field wise data in array

Currently I'm hitting at a problem to process the mongodb documents and get the field wise values. For example, say mongo contains these documents:
[
{ "name": "test1", "age": 20, "gender": "male" },
{ "name": "test2", "age": 21, "gender": "female" },
{ "name": "test3", "age": 30, "gender": "male"}
]
Expected Output:
{
"name": ["test1","test2","test3"],
"age": [20,21,30],
"gender": ["male","female", "male"]
}
Is it possible to retrieve data from mongo in the above format? I dont want to write some javascript functions to process this. Looking at retrieving the data by using mongo functions along with the find query.
You'd need to use the aggregation framework to get the desired result. Run the following aggregation pipeline which filters the documents in the collection getting into the pipeline for grouping using the $match operator. This is similar to the find() query filter.
db.collection.aggregate([
{ "$match": { "age": { "$gte": 20 } } }, // filter on users with age >= 20
{
"$group": {
"_id": null,
"name": { "$push": "$name" },
"age": { "$push": "$age" },
"gender": { "$push": "$gender" }
}
},
{
"$project": {
"_id": 0,
"name": 1,
"age": 1,
"gender": 1
}
}
])
Sample Output
{
"name": ["test1", "test2", "test3"],
"age": [20, 21, 30],
"gender": ["male", "female", "male"]
}
In the above pipeline, the first pipeline step is the $match operator which is similar to SQL's WHERE clause. The above example filters incoming documents on the age field (age greater than or equal to 20).
One thing to note here is when executing a pipeline, MongoDB pipes operators into each other. "Pipe" here takes the Linux meaning: the output of an operator becomes the input of the following operator. The result of each operator is a new collection of documents. So Mongo executes the previous pipeline as follows:
collection | $match | $group | $project => result
The next pipeline stage is the $group operator. Inside the $group pipeline, you are now grouping all the filtered documents where you can specify an _id value of null to calculate accumulated values for all the input documents as a whole. Use the available accumulators to return the desired aggregation on the grouped documents. The accumulator operator $push is used in this grouping operation because it returns an array of expression values for each group.
Accumulators used in the $group stage maintain their state (e.g. totals, maximums, minimums, and related data) as documents progress through the pipeline.
To get the documents with the desired field, the $project operator which is similar to SELECT in SQL is used to rename the field names and select/deselect the fields to be returned, out of the grouped fields. If you specify 0 for a field, it will NOT be sent in the pipeline to the next operator.
You cannot do this with the find command.
Try using mongodb's aggregation pipeline.
Specifically use $group in combination with $push
See here: https://docs.mongodb.com/manual/reference/operator/aggregation/group/#pipe._S_group

Can I use populate before aggregate in mongoose?

I have two models, one is user
userSchema = new Schema({
userID: String,
age: Number
});
and the other is the score recorded several times everyday for all users
ScoreSchema = new Schema({
userID: {type: String, ref: 'User'},
score: Number,
created_date = Date,
....
})
I would like to do some query/calculation on the score for some users meeting specific requirement, say I would like to calculate the average of score for all users greater than 20 day by day.
My thought is that firstly do the populate on Scores to populate user's ages and then do the aggregate after that.
Something like
Score.
populate('userID','age').
aggregate([
{$match: {'userID.age': {$gt: 20}}},
{$group: ...},
{$group: ...}
], function(err, data){});
Is it Ok to use populate before aggregate? Or I first find all the userID meeting the requirement and save them in a array and then use $in to match the score document?
No you cannot call .populate() before .aggregate(), and there is a very good reason why you cannot. But there are different approaches you can take.
The .populate() method works "client side" where the underlying code actually performs additional queries ( or more accurately an $in query ) to "lookup" the specified element(s) from the referenced collection.
In contrast .aggregate() is a "server side" operation, so you basically cannot manipulate content "client side", and then have that data available to the aggregation pipeline stages later. It all needs to be present in the collection you are operating on.
A better approach here is available with MongoDB 3.2 and later, via the $lookup aggregation pipeline operation. Also probably best to handle from the User collection in this case in order to narrow down the selection:
User.aggregate(
[
// Filter first
{ "$match": {
"age": { "$gt": 20 }
}},
// Then join
{ "$lookup": {
"from": "scores",
"localField": "userID",
"foriegnField": "userID",
"as": "score"
}},
// More stages
],
function(err,results) {
}
)
This is basically going to include a new field "score" within the User object as an "array" of items that matched on "lookup" to the other collection:
{
"userID": "abc",
"age": 21,
"score": [{
"userID": "abc",
"score": 42,
// other fields
}]
}
The result is always an array, as the general expected usage is a "left join" of a possible "one to many" relationship. If no result is matched then it is just an empty array.
To use the content, just work with an array in any way. For instance, you can use the $arrayElemAt operator in order to just get the single first element of the array in any future operations. And then you can just use the content like any normal embedded field:
{ "$project": {
"userID": 1,
"age": 1,
"score": { "$arrayElemAt": [ "$score", 0 ] }
}}
If you don't have MongoDB 3.2 available, then your other option to process a query limited by the relations of another collection is to first get the results from that collection and then use $in to filter on the second:
// Match the user collection
User.find({ "age": { "$gt": 20 } },function(err,users) {
// Get id list
userList = users.map(function(user) {
return user.userID;
});
Score.aggregate(
[
// use the id list to select items
{ "$match": {
"userId": { "$in": userList }
}},
// more stages
],
function(err,results) {
}
);
});
So by getting the list of valid users from the other collection to the client and then feeding that to the other collection in a query is the onyl way to get this to happen in earlier releases.