In the MongoDB collection I'm querying, each document represents some data for a parcel at a specific time. Every time I receive an update for a parcel, some fields may be updated (non-null value) and some others are not (null values).
To illustrate, consider this example. We received 3 data sets for a parcel:
/* 1 */
{
"parcelNum" : "CC123456789FR",
"datetime" : ISODate("2018-09-05T10:48:38.584Z"),
"field1" : "value1_1",
"field2" : "value2_1"
}
/* 2 */
{
"parcelNum" : "CC123456789FR",
"datetime" : ISODate("2018-09-05T10:48:40.566Z"),
"field1" : "value1_2",
"field2" : null
}
/* 3 */
{
"parcelNum" : "CC123456789FR",
"datetime" : ISODate("2018-09-05T10:48:42.777Z"),
"field1" : null,
"field2" : "value2_2"
}
How can I extract the latest non-null value, for all fields, considering the timestamp of the document they belong to?
Using the previous example, this is what I try to get:
{
"parcelNum" : "CC123456789FR",
"field1" : "value1_2",
"field2" : "value2_2"
}
I tried that kind of query but I can't find how to mix field values from multiple documents:
db.testDB.aggregate([
{$sort: { datetime: -1 }},
{$group: { _id: "$parcelNum",
field1: {$first: "$field1" },
field2: {$first: "$field2" }
}}
])
gives me:
{
"_id" : "CC123456789FR",
"field1" : null,
"field2" : "value2_2"
}
which is wrong because it only uses values from the most recent document and doesn't mix all the documents.
I tried another approach suggested by Rishi in another topic. Instead of creating a new document for each revision, he suggested pushing revision sub-documents onto an array and maintaining the latest revision at the parent document.
Something like this:
{
parcelNum: CC123456789FR,
lastUpdated: ISODate("2018-09-05T10:48:42.777Z")
field1: "value1_2",
field2: "value2_2",
revisions: [
{
datetime: ISODate("2018-09-05T10:48:38.584Z"),
field1: "value1_1",
field2: "value2_1"
},
{
datetime: ISODate("2018-09-05T10:48:40.566Z"),
field1: "value1_2",
field2: null
},
{
datetime: ISODate("2018-09-05T10:48:42.777Z"),
field1: null,
field2: "value2_2"
}
]
}
However, maintaining the latest revision is not that easy because updates are not received in a chronological order then I can receive a "new" document which has an older "datetime" field value and then I must not update the fields except if they are null. Therefore, I would have to record the last update timestamp for all fields if I want to do so!
You can try this:
db.getCollection('test').aggregate([
//Sort
{$sort: { datetime: -1 }},
//Add fields to an array
{$group: {
"_id": null,
"field1": { $push: "$field1" },
"field2": { $push: "$field2" },
}},
//Filter and do not include null values
{$project: {
"field1notNull" : {
$filter: {
input: "$field1",
as: "f",
cond: { $ne: [ "$$f", null ] }
}
},
"field2notNull" : {
$filter: {
input: "$field2",
as: "f",
cond: { $ne: [ "$$f", null ] }
}
}
}
},
//Get the first values of each
{$project: {
"_id": null,
"field1": {$arrayElemAt: ["$field1notNull", 0]},
"field2": {$arrayElemAt: ["$field2notNull", 0]}
}}
])
You can try with $facet stage, to threat field1 and field2 separatly :
db['01'].aggregate(
[
// Stage 1
{
$sort: {
"datetime":-1
}
},
// Stage 2
{
$facet: {parcelNum:[{$group:{_id:"$parcelNum"}}],
field1: [ {
$match: {
field1:{$ne:null}
}
},
{
$limit: 1
},
{
$project: {
_id:0,
field1:1
}
}, ],
field2: [ {
$match: {
field2:{$ne:null}
}
},
{
$limit: 1
},
{
$project: {
_id:0,
field2:1
}
}, ],
}
},
// Stage 3
{
$project: {
parcelNum:"$parcelNum._id" ,
field1:"$field1.field1",
field2:"$field2.field2",
}
},
// Stage 4
{
$project: {
parcelNum:{$arrayElemAt:["$parcelNum" ,0]},
field1:{$arrayElemAt:["$field1" ,0]},
field2:{$arrayElemAt:["$field2" ,0]},
}
},
],
);
Note that stages 3 and 4 are only 'decorative', needed result is present at end of stage 2.
Hope it helps
Related
I have a test collection:
{
"_id" : ObjectId("5exxxxxx03"),
"username" : "abc",
"col1" : [
{
"colId" : 1
"col2" : [
{
"name" : "a",
"value" : 10
},
{
"name" : "b",
"value" : 20
},
{
"name" : "c",
"value" : 30
}
],
"col3" : [
{
"name" : "d",
"value" : 15
},
{
"name" : "e",
"value" : 25
},
{
"name" : "f",
"value" : 35
}
]
}
]
}
col1 has the list of sub-documents col2 and col3, which are similar, but convey different meanings. These two sub-documents are having name and value as fields.
Now, I need to find the max value from col2 or col3 and its corresponding name.
I tried the below query:
db.test.aggregate([
{$unwind: '$col1'},
{$unwind: '$col1.col2'},
{$unwind: '$col1.col3'},
{$group:
{_id: '$col1.colId',
maxCol2: {$max: '$col1.col2.value'},
maxCol3: {$max: '$col1.col3.value'}}},
{$project:
{maxValue: {$max: ['$maxCol2', '$maxCol3']},
name: {$cond: [
{$eq: ['$maxValue', '$maxCol2']},
'$col1.col2.name',
'$col1.col3.name']}}}]).pretty()
But, it resulted in the following, without name field in it:
{ "_id" : 1, "maxValue" : 35 }
So, just to check, weather my condition is correct or not, tried the following query ($col1.col2.name and $col1.col3.name replaced with 111 and 222 strings):
db.test.aggregate([
{$unwind: '$col1'},
{$unwind: '$col1.col2'},
{$unwind: '$col1.col3'},
{$group:
{_id: '$col1.colId',
maxCol2: {$max: '$col1.col2.value'},
maxCol3: {$max: '$col1.col3.value'}}},
{$project:
{maxValue: {$max: ['$maxCol2', '$maxCol3']},
name: {$cond: [
{$eq: ['$maxValue', '$maxCol2']},
'111',
'222']}}}]).pretty()
Which gives me the expected output:
{ "_id" : 1, "maxValue" : 35, "name" : "222" }
Could any one guide me why I am not getting the correct answer and how should I query this to get the correct output?
The correct out should be:
{ "_id" : 1, "maxValue" : 35, "name" : "f" }
P.S. - I'm a beginner.
You can use below aggregation
db.collection.aggregate([
{ "$project": {
"col1": {
"$max": {
"$reduce": {
"input": "$col1",
"initialValue": [],
"in": {
"$concatArrays": [
"$$this.col2",
"$$value",
"$$this.col3"
]
}
}
}
}
}}
])
MongoPlayground
Try this one:
Explanation
We need to add extra fields with col2 and col3 values. Once we calculate max value, we retrieve name based on max value.
db.collection.aggregate([
{
$unwind: "$col1"
},
{
$unwind: "$col1.col2"
},
{
$unwind: "$col1.col3"
},
{
$group: {
_id: "$col1.colId",
maxCol2: {
$max: "$col1.col2.value"
},
maxCol3: {
$max: "$col1.col3.value"
},
col2: {
$addToSet: "$col1.col2"
},
col3: {
$addToSet: "$col1.col3"
}
}
},
{
$project: {
maxValue: {
$filter: {
input: {
$cond: [
{
$gt: [
"$maxCol2",
"$maxCol3"
]
},
"$col2",
"$col3"
]
},
cond: {
$eq: [
"$$this.value",
{
$cond: [
{
$gt: [
"$maxCol2",
"$maxCol3"
]
},
"$maxCol2",
"$maxCol3"
]
}
]
}
}
}
}
},
{
$unwind: "$maxValue"
},
{
$project: {
_id: 1,
maxValue: "$maxValue.value",
name: "$maxValue.name"
}
}
])
MongoPlayground | Merging col2 / col3 | Per document
New to Mongo, have found lots of examples of removing dupes from arrays of strings using the aggregation framework, but am wondering if possible to remove dupes from array of objects based on a field in the object. Eg
{
"_id" : ObjectId("5e82661d164941779c2380ca"),
"name" : "something",
"values" : [
{
"id" : 1,
"val" : "x"
},
{
"id" : 1,
"val" : "x"
},
{
"id" : 2,
"val" : "y"
},
{
"id" : 1,
"val" : "xxxxxx"
}
]
}
Here I'd like to remove dupes based on the id field. So would end up with
{
"_id" : ObjectId("5e82661d164941779c2380ca"),
"name" : "something",
"values" : [
{
"id" : 1,
"val" : "x"
},
{
"id" : 2,
"val" : "y"
}
]
}
Picking the first/any object with given id works. Just want to end up with one per id. Is this doable in aggregation framework? Or even outside aggregation framework, just looking for a clean way to do this. Need to do this type of thing across many documents in collection, which seems like a good use case for aggregation framework, but as I mentioned, newbie here...thanks.
Well, you may get desired result 2 ways.
Classic
Flatten - Remove duplicates (pick first occurrence) - Group by
db.collection.aggregate([
{
$unwind: "$values"
},
{
$group: {
_id: "$values.id",
values: {
$first: "$values"
},
id: {
$first: "$_id"
},
name: {
$first: "$name"
}
}
},
{
$group: {
_id: "$id",
name: {
$first: "$name"
},
values: {
$push: "$values"
}
}
}
])
MongoPlayground
Modern
We need to use $reduce operator.
Pseudocode:
values : {
var tmp = [];
for (var value in values) {
if !(value.id in tmp)
tmp.push(value);
}
return tmp;
}
db.collection.aggregate([
{
$addFields: {
values: {
$reduce: {
input: "$values",
initialValue: [],
in: {
$concatArrays: [
"$$value",
{
$cond: [
{
$in: [
"$$this.id",
"$$value.id"
]
},
[],
[
"$$this"
]
]
}
]
}
}
}
}
}
])
MongoPlayground
You can use $reduce, Try below query :
db.collection.aggregate([
{
$addFields: {
values: {
$reduce: {
input: "$values",
initialValue: [],
in: {
$cond: [
{ $in: ["$$this.id", "$$value.id"] }, /** Check if 'id' exists in holding array if yes push same array or concat holding array with & array of new object */
"$$value",
{ $concatArrays: ["$$value", ["$$this"]] }
]
}
}
}
}
}
]);
Test : MongoDB-Playground
I have a dataset in mongodb collection named visitorsSession like
{ip : 192.2.1.1,country : 'US', type : 'Visitors',date : '2019-12-15T00:00:00.359Z'},
{ip : 192.3.1.8,country : 'UK', type : 'Visitors',date : '2019-12-15T00:00:00.359Z'},
{ip : 192.5.1.4,country : 'UK', type : 'Visitors',date : '2019-12-15T00:00:00.359Z'},
{ip : 192.8.1.7,country : 'US', type : 'Visitors',date : '2019-12-15T00:00:00.359Z'},
{ip : 192.1.1.3,country : 'US', type : 'Visitors',date : '2019-12-15T00:00:00.359Z'}
I am using this mongodb aggregation
[{$match: {
nsp : "/hrm.sbtjapan.com",
creationDate : {
$gte: "2019-12-15T00:00:00.359Z",
$lte: "2019-12-20T23:00:00.359Z"
},
type : "Visitors"
}}, {$group: {
_id : "$country",
totalSessions : {
$sum: 1
}
}}, {$project: {
_id : 0,
country : "$_id",
totalSessions : 1
}}, {$sort: {
country: -1
}}]
using above aggregation i am getting results like this
[{country : 'US',totalSessions : 3},{country : 'UK',totalSessions : 2}]
But i also total visitors also along with result like totalVisitors : 5
How can i do this in mongodb aggregation ?
You can use $facet aggregation stage to calculate total visitors as well as visitors by country in a single pass:
db.visitorsSession.aggregate( [
{
$match: {
nsp : "/hrm.sbtjapan.com",
creationDate : {
$gte: "2019-12-15T00:00:00.359Z",
$lte: "2019-12-20T23:00:00.359Z"
},
type : "Visitors"
}
},
{
$facet: {
totalVisitors: [
{
$count: "count"
}
],
countrySessions: [
{
$group: {
_id : "$country",
sessions : { $sum: 1 }
}
},
{
$project: {
country: "$_id",
_id: 0,
sessions: 1
}
}
],
}
},
{
$addFields: {
totalVisitors: { $arrayElemAt: [ "$totalVisitors.count" , 0 ] },
}
}
] )
The output:
{
"totalVisitors" : 5,
"countrySessions" : [
{
"sessions" : 2,
"country" : "UK"
},
{
"sessions" : 3,
"country" : "US"
}
]
}
You could be better off with two queries to do this.
To save the two db round trips following aggregation can be used which IMO is kinda verbose (and might be little expensive if documents are very large) to just count the documents.
Idea: Is to have a $group at the top to count documents and preserve the original documents using $push and $$ROOT. And then before other matches/filter ops $unwind the created array of original docs.
db.collection.aggregate([
{
$group: {
_id: null,
docsCount: {
$sum: 1
},
originals: {
$push: "$$ROOT"
}
}
},
{
$unwind: "$originals"
},
{ $match: "..." }, //and other stages on `originals` which contains the source documents
{
$group: {
_id: "$originals.country",
totalSessions: {
$sum: 1
},
totalVisitors: {
$first: "$docsCount"
}
}
}
]);
Sample O/P: Playground Link
[
{
"_id": "UK",
"totalSessions": 2,
"totalVisitors": 5
},
{
"_id": "US",
"totalSessions": 3,
"totalVisitors": 5
}
]
I have two document with following structure
{
"CollegeName" : "Hi-Tech College",
"StudentName" : "John",
"Age" : 25
},
{
"CollegeName" : "Hi-Tech College",
"StudentName" : "Tom",
"Age" : 24
}
In those two document collegename is the common fields, by using that I want generate following format of a single document
{
"CollegeName" : "Hi-Tech College",
"JohnAge" : 25,
"TomAge" : 24
}
You can try below aggregation:
db.col.aggregate([
{
$group: {
_id: null,
CollegeName: { $first: "$CollegeName" },
Students: { $push: { k: { $concat: [ "$StudentName", "Age" ] }, v: "$Age" } }
}
},
{
$replaceRoot: {
newRoot: { $mergeObjects: [ { CollegeName: "$CollegeName" }, { $arrayToObject: "$Students" } ] }
}
}
])
Basically to create key names dynamically you can use $arrayToObject operator which takes an array of key-value pairs (k and v properties) and returns an object. To create your custom keys you can use $concat. Then you have to "merge" new dynamically created object with CollegeName so you can use $mergeObjects and $replaceRoot operators for that.
Since it's grouping by null which returns one document for entire collection you have to keep in mind that MongoDB has BSON document size limit, so your result can't exceed 16MB. More here.
db.temp.aggregate([
{"$group":{"_id":{"CollegeName":"$CollegeName"},
"Students":{"$push":{"StudentName":"$StudentName","Age":"$Age"}}}}
,{"$unwind":"$Students"}
,,{"$group":{"_id":"$_id",
'JohnAge': { $max: {$cond: [ {$or: [
{$eq:['$Students.StudentName', 'John']}
]}
, '$Students.Age', null] } },
'TomAge': { $max: {$cond: [ {$or: [
{$eq:['$Students.StudentName', 'Tom']}
]}
, '$Students.Age', null] } }
}}
])
I have documents that have a few fields and in particular the have a field called attrs that is an array. I am using the aggregation pipeline.
In my query I am interested in the attrs (attributes) field if there are any elements in it. Otherwise I still want to get the result. In this case I am after the field type of the document.
The problem is that if a document does not contain any element in the attrs field it will be filtered away and I won't get its _id.type field, which is what I really want from this query.
{
aggregate: "entities",
pipeline: [
{
$match: {
_id.servicePath: {
$in: [
/^/.*/,
null
]
}
}
},
{
$project: {
_id: 1,
"attrs.name": 1,
"attrs.type": 1
}
},
{
$unwind: "$attrs"
},
{
$group: {
_id: "$_id.type",
attrs: {
$addToSet: "$attrs"
}
}
},
{
$sort: {
_id: 1
}
}
]
}
So the question is: how can I get a result containing all documents types regardless of their having attrs, but including the attributes in case they have them?
I hope it makes sense.
You can use the $cond operator in a $project stage to replace the empty attr array with one that contains a placeholder like null that can be used as a marker to indicate that this doc doesn't contain any attr elements.
So you'd insert an additional $project stage like this right before the $unwind:
{
$project: {
attrs: {$cond: {
if: {$eq: ['$attrs', [] ]},
then: [null],
else: '$attrs'
}}
}
},
The only caveat is that you'll end up with a null value in the final attrs array for those groups that contain at least one doc without any attrs elements, so you need to ignore those client-side.
Example
The example uses an altered $match stage because the one in your example isn't valid.
Input Docs
[
{_id: {type: 1, id: 2}, attrs: []},
{_id: {type: 2, id: 1}, attrs: []},
{_id: {type: 2, id: 2}, attrs: [{name: 'john', type: 22}, {name: 'bob', type: 44}]}
]
Output
{
"result" : [
{
"_id" : 1,
"attrs" : [
null
]
},
{
"_id" : 2,
"attrs" : [
{
"name" : "bob",
"type" : 44
},
{
"name" : "john",
"type" : 22
},
null
]
}
],
"ok" : 1
}
Aggregate Command
db.test.aggregate([
{
$match: {
'_id.servicePath': {
$in: [
null
]
}
}
},
{
$project: {
_id: 1,
"attrs.name": 1,
"attrs.type": 1
}
},
{
$project: {
attrs: {$cond: {
if: {$eq: ['$attrs', [] ]},
then: [null],
else: '$attrs'
}}
}
},
{
$unwind: "$attrs"
},
{
$group: {
_id: "$_id.type",
attrs: {
$addToSet: "$attrs"
}
}
},
{
$sort: {
_id: 1
}
}
])
use some if statements and loops.
first, your query should select all documents, first and foremost.
loop through all of them
then, if number of attributes is greater than 0, loop through the attributes. loop them into whatever array or output you find useful.
use if statements to sanitize your results if you like.
You should use '$or' operator , and two seperate queries : one to select the documents with attr value equal to required value, and other query to match documents where attr is null, or attr key does not exist ( using $exists operator )