Mongodb get document that has max value for each subdocument - mongodb

I have some data looking like this:
{'Type':'A',
'Attributes':[
{'Date':'2021-10-02', 'Value':5},
{'Date':'2021-09-30', 'Value':1},
{'Date':'2021-09-25', 'Value':13}
]
},
{'Type':'B',
'Attributes':[
{'Date':'2021-10-01', 'Value':36},
{'Date':'2021-09-15', 'Value':14},
{'Date':'2021-09-10', 'Value':18}
]
}
I would like to query for each document the document with the newest date. With the data above the desired result would be:
{'Type':'A', 'Date':'2021-10-02', 'Value':5}
{'Type':'B', 'Date':'2021-10-01', 'Value':36}
I managed to find some queries to find over all sub document only the global max. But I did not find the max for each document.
Thanks a lot for your help

Storing date as string is generally considered as bad pratice. Suggest that you change your date field into date type. Fortunately for your case, you are using ISO date format so some effort could be saved.
You can do this in aggregation pipeline:
use $max to find out the max date
use $filter to filter the Attributes array to contains only the latest element
$unwind the array
$project to your expected output
Here is the Mongo playground for your reference.

This keeps 1 member from Attributes only, the one with the max date.
If you want to keep multiple ones use the #ray solution that keeps all members that have the max-date.
*mongoplayground can lose the order, of fields in a document,
if you see wrong result, test it on your driver, its bug of mongoplayground tool
Query1 (local-way)
Test code here
aggregate([
{
"$project": {
"maxDateValue": {
"$max": {
"$map": {
"input": "$Attributes",
"in": { "Date": "$$this.Date", "Value": "$$this.Value" },
}
}
},
"Type": 1
}
},
{
"$project": {
"Date": "$maxDateValue.Date",
"Value": "$maxDateValue.Value"
}
}
])
Query2 (unwind-way)
Test code here
aggregate([
{
"$unwind": { "path": "$Attributes" }
},
{
"$group": {
"_id": "$Type",
"maxDate": {
"$max": {
"Date": "$Attributes.Date",
"Value": "$Attributes.Value"
}
}
}
},
{
"$project": {
"_id": 0,
"Type": "$_id",
"Date": "$maxDate.Date",
"Value": "$maxDate.Value"
}
}
])

Related

Efficiently find the most recent filtered document in MongoDB collection using datetime field

I have a large collection of documents with datetime fields in them, and I need to retrieve the most recent document for any given queried list.
Sample data:
[
{"_id": "42.abc",
"ts_utc": "2019-05-27T23:43:16.963Z"},
{"_id": "42.def",
"ts_utc": "2019-05-27T23:43:17.055Z"},
{"_id": "69.abc",
"ts_utc": "2019-05-27T23:43:17.147Z"},
{"_id": "69.def",
"ts_utc": "2019-05-27T23:44:02.427Z"}
]
Essentially, I need to get the most recent record for the "42" group as well as the most recent record for the "69" group. Using the sample data above, the desired result for the "42" group would be document "42.def".
My current solution is to query each group one at a time (looping with PyMongo), sort by the ts_utc field, and limit it to one, but this is really slow.
// Requires official MongoShell 3.6+
db = db.getSiblingDB("someDB");
db.getCollection("collectionName").find(
{
"_id" : /^42\..*/
}
).sort(
{
"ts_utc" : -1.0
}
).limit(1);
Is there a faster way to get the results I'm after?
Assuming all your documents have the format displayed above, you can split the id into two parts (using the dot character) and use aggregation to find the max element per each first array (numeric) element.
That way you can do it in a one shot, instead of iterating per each group.
db.foo.aggregate([
{ $project: { id_parts : { $split: ["$_id", "."] }, ts_utc : 1 }},
{ $group: {"_id" : { $arrayElemAt: [ "$id_parts", 0 ] }, max : {$max: "$ts_utc"}}}
])
As #danh mentioned in the comment, the best way you can do is probably adding an auxiliary field to indicate the grouping. You may further index the auxiliary field to boost the performance.
Here is an ad-hoc way to derive the field and get the latest result per grouping:
db.collection.aggregate([
{
"$addFields": {
"group": {
"$arrayElemAt": [
{
"$split": [
"$_id",
"."
]
},
0
]
}
}
},
{
$sort: {
ts_utc: -1
}
},
{
"$group": {
"_id": "$group",
"doc": {
"$first": "$$ROOT"
}
}
},
{
"$replaceRoot": {
"newRoot": "$doc"
}
}
])
Here is the Mongo playground for your reference.

Getting concrete elements by element field in mongoDB

I know that by the title it is not very clear what is my problem, so let me explain it with an example.
Let's suppose I have a collection in a mongo database called tweets whose elements look like this:
{
"id": "tweet_id",
"text": "this is the tweet's text",
"user": {
"id": "user_id",
"name": "user_name",
}
}
Let's suppose that we have 100 documents that look like that, and 10 users with diferent ids and names.
How would look a query to know see the different user_id s that exist in the collection and their names?
The result I want would look like this:
{"user_id1": "user_name1"}
{"user_id2": "user_name2"}
...
{"user_id10": "user_name10"}
Thank you for your help
You can use this aggregation query:
First $group by user.id to get all differents user ids with the name.
And then use $replaceRoot with $arrayToObject to get the desired output format.
db.collection.aggregate([
{
"$group": {
"_id": "$user.id",
"name": {
"$first": "$user.name"
}
}
},
{
"$replaceRoot": {
"newRoot": {
"$arrayToObject": [
[
{
"k": "$_id",
"v": "$name"
}
]
]
}
}
}
])
Example here

$elemMatch doesn't work on nested documents in MongoDB

Stack Overflow!
I have a very strange problem with using $elemMatch in MongoDB. I added multiple documents to a collection. Some of these documents were added using import feature in MongoDB Compass (Add Data -> Import File -> JSON) and some of them were added using insertMany().
Here is an example structure of a single document:
{
"id": "1234567890",
"date": "YYYY-MM-DD",
"contents": {
"0": {
"content": {
"id": "1111111111",
"name": "Name 1"
}
},
"1": {
"content": {
"id": "2222222222",
"name": "Name 2"
}
},
"2": {
"content": {
"id": "3333333333",
"name": "Name 3"
}
}
}
}
The thing is, when I use the following find query using this filter:
{date: "<some_date_here>", "contents": {
$elemMatch: {
"content.id": <some_id_here>
}
}}
ONLY documents that were imported from MongoDB Compass are showing up. Documents that were added by Mongosh or by NodeJS driver (doesn't matter), do NOT show up.
Am I missing something obvious here? What should I do in order to make all documents in a collection (that matches filter) to show up?
Simple filters that do not include $elemMatch work well and all documents that match the filtering rules show up. Problem seems to be with $elemMatch.
I tried adding the same batch of documents using different methods but only direct importing a JSON file in MongoDB Compass make them appear using a filter mentioned above.
Thank you for your help!
$elemMatch if for matching array , and in this case you don't have array
first you should convert contents object to array and then check the query for example id with filter and use match to find all doc that have specific data and size of new filters array
db.collection.aggregate([
{
"$addFields": {
"newField": {
"$objectToArray": "$contents"
}
}
},
{
"$addFields": {
"newField": {
"$filter": {
"input": "$newField",
"as": "z",
"cond": {
$eq: [
"$$z.v.content.id",
"1111111111"
]
}
}
}
}
},
{
"$addFields": {
"newField": {
$size: "$newField"
}
}
},
{
$match: {$and:[ {newField: {
$gt: 0
}},{date:{$gt:Date}}]}
},
{$project:{
contents:1,
date:1,
id:1,
}}
])
https://mongoplayground.net/p/pue4QPp1dYR
in mongoplayground I don't add filter of date

MongoDB $push aggregaton won't keep the right order

I tried to make a $group aggregation with MongoDB, like the following example:
"$group": {
"_id": "$test_id",
"feeling": {
"$push": "$feeling"
},
"reference_id": {
"$push": "$_id"
},
"training_start": {
"$push": "$training_start"
},
"training_duration": {
"$push": "$duration_ms"
}
}
The aggregation works fine, but the created arrays are sorted different. That means, if I check the result of the aggregation by looking at reference_id[x] and training_start[x] then the value of training_start in the source collection is not equal to training_start[x].
Maybe an example shows my problem more precisely:
One document after the $group aggregation:
{
_id: "string_1",
reference_id: [1, 2, 3],
training_start: [01:00:00, 02:00:00, 03:00:00] (date times)
}
Documents from source collection:
{
_id:1,
training_start: 01:00:00,
test_id: "string_1"
},
{
_id:2,
training_start: 03:00:00,
test_id: "string_1"
},
{
_id:3,
training_start: 02:00:00,
test_id: "string_1"
}
The first elements in these arrays are always in the right order. So I checked if each grouped field has the same number of entries by using the code below. And the annoying result is, that the amount of entries in each array is equal. So there is no shift in the arrays caused by missing values.
"$group": {
"_id": "$test_id",
"sum": {
"$sum": {
"$cond": {
"if": {
"$lte": [
"$training_start", null
]
},
"then": 0,
"else": 1
}
}
}
Does anybody know, if there is an other way to create arrays (already tried $addToSet) which keep the order, the elements where pushed in? Or am I the problem?
Greetings Max

Embedded List ordering

I have the following object with an embedded list of items and I would like to write a query and return all or specific items ordered by date. Is it possible to do it or should I have a different collection for items and keep here their references?
I know that you can match specific element using $elemMatch.
{
"_id": "51cb12857124a215940cf2d4",
"level1":
[
{
"name":"item00",
"description":"item01",
"date": 1238492103
},
{
"name":"item10",
"description":"item11",
"date": 1238492104
}
]
}
If you want these items ordered by date on a more often than not then your best option is to keep the list ordered in the first place. The $push operator has an additional $sort paramter explicitly for this purpose.
db.collection.update(
{ "_id": "51cb12857124a215940cf2d4" },
{ "$push": {
"level1":{
"$each":[{
"name":"item11",
"description":"item11",
"date": 1238492104
}],
"$sort": { "date": 1 }
}
}
)
That actually even adapts so you could just sort your whole collection in one statement:
db.collection.update(
{},
{ "$push": {
"level1":{
"$each":[], "$sort": { "date": 1 }
}
},
{ "multi": true }
)
Without that your ony alternate is to order the results via the .aggregate() method. This really should not be your chosen operation as it requires processing $unwind on the array contents and then $sort operation on the elements within the document. Naturally this comes with some significant overhead on larger selections:
db.collection.aggregate([
{ "$unwind": "$level1" },
{ "$sort": { "_id": 1, "level1.date": 1 } },
{ "$group": {
"_id": "$_id",
"level1": { "$push": "$level1" }
}}
])