Customize existing document and add new fields in mongo aggregation - mongodb

I have two document with following structure
{
"CollegeName" : "Hi-Tech College",
"StudentName" : "John",
"Age" : 25
},
{
"CollegeName" : "Hi-Tech College",
"StudentName" : "Tom",
"Age" : 24
}
In those two document collegename is the common fields, by using that I want generate following format of a single document
{
"CollegeName" : "Hi-Tech College",
"JohnAge" : 25,
"TomAge" : 24
}

You can try below aggregation:
db.col.aggregate([
{
$group: {
_id: null,
CollegeName: { $first: "$CollegeName" },
Students: { $push: { k: { $concat: [ "$StudentName", "Age" ] }, v: "$Age" } }
}
},
{
$replaceRoot: {
newRoot: { $mergeObjects: [ { CollegeName: "$CollegeName" }, { $arrayToObject: "$Students" } ] }
}
}
])
Basically to create key names dynamically you can use $arrayToObject operator which takes an array of key-value pairs (k and v properties) and returns an object. To create your custom keys you can use $concat. Then you have to "merge" new dynamically created object with CollegeName so you can use $mergeObjects and $replaceRoot operators for that.
Since it's grouping by null which returns one document for entire collection you have to keep in mind that MongoDB has BSON document size limit, so your result can't exceed 16MB. More here.

db.temp.aggregate([
{"$group":{"_id":{"CollegeName":"$CollegeName"},
"Students":{"$push":{"StudentName":"$StudentName","Age":"$Age"}}}}
,{"$unwind":"$Students"}
,,{"$group":{"_id":"$_id",
'JohnAge': { $max: {$cond: [ {$or: [
{$eq:['$Students.StudentName', 'John']}
]}
, '$Students.Age', null] } },
'TomAge': { $max: {$cond: [ {$or: [
{$eq:['$Students.StudentName', 'Tom']}
]}
, '$Students.Age', null] } }
}}
])

Related

Mongo remove duplicates in array of objects based on field

New to Mongo, have found lots of examples of removing dupes from arrays of strings using the aggregation framework, but am wondering if possible to remove dupes from array of objects based on a field in the object. Eg
{
"_id" : ObjectId("5e82661d164941779c2380ca"),
"name" : "something",
"values" : [
{
"id" : 1,
"val" : "x"
},
{
"id" : 1,
"val" : "x"
},
{
"id" : 2,
"val" : "y"
},
{
"id" : 1,
"val" : "xxxxxx"
}
]
}
Here I'd like to remove dupes based on the id field. So would end up with
{
"_id" : ObjectId("5e82661d164941779c2380ca"),
"name" : "something",
"values" : [
{
"id" : 1,
"val" : "x"
},
{
"id" : 2,
"val" : "y"
}
]
}
Picking the first/any object with given id works. Just want to end up with one per id. Is this doable in aggregation framework? Or even outside aggregation framework, just looking for a clean way to do this. Need to do this type of thing across many documents in collection, which seems like a good use case for aggregation framework, but as I mentioned, newbie here...thanks.
Well, you may get desired result 2 ways.
Classic
Flatten - Remove duplicates (pick first occurrence) - Group by
db.collection.aggregate([
{
$unwind: "$values"
},
{
$group: {
_id: "$values.id",
values: {
$first: "$values"
},
id: {
$first: "$_id"
},
name: {
$first: "$name"
}
}
},
{
$group: {
_id: "$id",
name: {
$first: "$name"
},
values: {
$push: "$values"
}
}
}
])
MongoPlayground
Modern
We need to use $reduce operator.
Pseudocode:
values : {
var tmp = [];
for (var value in values) {
if !(value.id in tmp)
tmp.push(value);
}
return tmp;
}
db.collection.aggregate([
{
$addFields: {
values: {
$reduce: {
input: "$values",
initialValue: [],
in: {
$concatArrays: [
"$$value",
{
$cond: [
{
$in: [
"$$this.id",
"$$value.id"
]
},
[],
[
"$$this"
]
]
}
]
}
}
}
}
}
])
MongoPlayground
You can use $reduce, Try below query :
db.collection.aggregate([
{
$addFields: {
values: {
$reduce: {
input: "$values",
initialValue: [],
in: {
$cond: [
{ $in: ["$$this.id", "$$value.id"] }, /** Check if 'id' exists in holding array if yes push same array or concat holding array with & array of new object */
"$$value",
{ $concatArrays: ["$$value", ["$$this"]] }
]
}
}
}
}
}
]);
Test : MongoDB-Playground

MongoDB - Get latest non-null field value from documents with timestamp

In the MongoDB collection I'm querying, each document represents some data for a parcel at a specific time. Every time I receive an update for a parcel, some fields may be updated (non-null value) and some others are not (null values).
To illustrate, consider this example. We received 3 data sets for a parcel:
/* 1 */
{
"parcelNum" : "CC123456789FR",
"datetime" : ISODate("2018-09-05T10:48:38.584Z"),
"field1" : "value1_1",
"field2" : "value2_1"
}
/* 2 */
{
"parcelNum" : "CC123456789FR",
"datetime" : ISODate("2018-09-05T10:48:40.566Z"),
"field1" : "value1_2",
"field2" : null
}
/* 3 */
{
"parcelNum" : "CC123456789FR",
"datetime" : ISODate("2018-09-05T10:48:42.777Z"),
"field1" : null,
"field2" : "value2_2"
}
How can I extract the latest non-null value, for all fields, considering the timestamp of the document they belong to?
Using the previous example, this is what I try to get:
{
"parcelNum" : "CC123456789FR",
"field1" : "value1_2",
"field2" : "value2_2"
}
I tried that kind of query but I can't find how to mix field values from multiple documents:
db.testDB.aggregate([
{$sort: { datetime: -1 }},
{$group: { _id: "$parcelNum",
field1: {$first: "$field1" },
field2: {$first: "$field2" }
}}
])
gives me:
{
"_id" : "CC123456789FR",
"field1" : null,
"field2" : "value2_2"
}
which is wrong because it only uses values from the most recent document and doesn't mix all the documents.
I tried another approach suggested by Rishi in another topic. Instead of creating a new document for each revision, he suggested pushing revision sub-documents onto an array and maintaining the latest revision at the parent document.
Something like this:
{
parcelNum: CC123456789FR,
lastUpdated: ISODate("2018-09-05T10:48:42.777Z")
field1: "value1_2",
field2: "value2_2",
revisions: [
{
datetime: ISODate("2018-09-05T10:48:38.584Z"),
field1: "value1_1",
field2: "value2_1"
},
{
datetime: ISODate("2018-09-05T10:48:40.566Z"),
field1: "value1_2",
field2: null
},
{
datetime: ISODate("2018-09-05T10:48:42.777Z"),
field1: null,
field2: "value2_2"
}
]
}
However, maintaining the latest revision is not that easy because updates are not received in a chronological order then I can receive a "new" document which has an older "datetime" field value and then I must not update the fields except if they are null. Therefore, I would have to record the last update timestamp for all fields if I want to do so!
You can try this:
db.getCollection('test').aggregate([
//Sort
{$sort: { datetime: -1 }},
//Add fields to an array
{$group: {
"_id": null,
"field1": { $push: "$field1" },
"field2": { $push: "$field2" },
}},
//Filter and do not include null values
{$project: {
"field1notNull" : {
$filter: {
input: "$field1",
as: "f",
cond: { $ne: [ "$$f", null ] }
}
},
"field2notNull" : {
$filter: {
input: "$field2",
as: "f",
cond: { $ne: [ "$$f", null ] }
}
}
}
},
//Get the first values of each
{$project: {
"_id": null,
"field1": {$arrayElemAt: ["$field1notNull", 0]},
"field2": {$arrayElemAt: ["$field2notNull", 0]}
}}
])
You can try with $facet stage, to threat field1 and field2 separatly :
db['01'].aggregate(
[
// Stage 1
{
$sort: {
"datetime":-1
}
},
// Stage 2
{
$facet: {parcelNum:[{$group:{_id:"$parcelNum"}}],
field1: [ {
$match: {
field1:{$ne:null}
}
},
{
$limit: 1
},
{
$project: {
_id:0,
field1:1
}
}, ],
field2: [ {
$match: {
field2:{$ne:null}
}
},
{
$limit: 1
},
{
$project: {
_id:0,
field2:1
}
}, ],
}
},
// Stage 3
{
$project: {
parcelNum:"$parcelNum._id" ,
field1:"$field1.field1",
field2:"$field2.field2",
}
},
// Stage 4
{
$project: {
parcelNum:{$arrayElemAt:["$parcelNum" ,0]},
field1:{$arrayElemAt:["$field1" ,0]},
field2:{$arrayElemAt:["$field2" ,0]},
}
},
],
);
Note that stages 3 and 4 are only 'decorative', needed result is present at end of stage 2.
Hope it helps

Count and apply condition to slice the mongodb array document

My document structure looks like this:
{
"_id" : ObjectId("5aeeda07f3a664c55e830a08"),
"profileId" : ObjectId("5ad84c8c0e71892058b6a543"),
"list" : [
{
"content" : "answered your post",
"createdBy" : ObjectId("5ad84c8c0e71892058b6a540")
},
{
"content" : "answered your post",
"createdBy" : ObjectId("5ad84c8c0e71892058b6a540")
},
{
"content" : "answered your post",
"createdBy" : ObjectId("5ad84c8c0e71892058b6a540")
},
],
}
I want to count array of
list field. And apply condition before slicing that
if the list<=10 then slice all the elements of list
else 10 elements.
P.S I used this query but is returning null.
db.getCollection('post').aggregate([
{
$match:{
profileId:ObjectId("5ada84c8c0e718s9258b6a543")}
},
{$project:{notifs:{$size:"$list"}}},
{$project:{notifications:
{$cond:[
{$gte:["$notifs",10]},
{$slice:["$list",10]},
{$slice:["$list","$notifs"]}
]}
}}
])
Your first $project stage effectively wipes out all result fields but the one(s) that it explicitly projects (only notifs in your case). That's why the second $project stage cannot $slice the list field anymore (it has been removed by the first $project stage).
Also, I think your $cond/$slice combination can be more elegantly expressed using the $min operator. So there's at least the following two fixes for your problem:
Using $addFields:
db.getCollection('post').aggregate([
{ $match: { profileId: ObjectId("5ad84c8c0e71892058b6a543") } },
{ $addFields: { notifs: { $size: "$list" } } },
{ $project: {
notifications: {
$slice: [ "$list", { $min: [ "$notifs", 10 ] } ]
}
}}
])
Using a calculation inside the $project - this avoids a stage so should be preferable.
db.getCollection('post').aggregate([
{ $match: { profileId: ObjectId("5ad84c8c0e71892058b6a543") } },
{ $project: {
notifications: {
$slice: [ "$list", { $min: [ { $size: "$list" }, 10 ] } ]
}
}}
])

How to write a custom function to split a document into multiple documents of same Id

I am trying to split a document which has the following fields of string type:
{
"_id" : "17121",
"firstName": "Jello",
"lastName" : "New",
"bio" :"He is a nice person."
}
I want to split the above document into three new documents For Example:
{
"_id": "17121-1",
"firstName": "Jello"
}
{
"_id": "17121-2",
"firstName": "New"
}
{
"_id": "17121-3",
"bio": "He is a nice person."
}
Can anyone suggest how to proceed?
db.coll1.find().forEach(function(obj){
// I want to extract every single field. How to iterate on the field within this Bson object(obj) to collect every field.?
});
or any suggestion to do with aggregation pipeline in MongoDB.
You can use the below aggregation query.
The below query will convert each document fields into key value document array followed by $unwind while keeping the index and $replaceRoot with merge to produce the desired output.
$objectToArray to produce array (keyvalarr) with key (name of the array field)-value (array field) pair.
$match to remove the _id key value document.
$arrayToObject to produce the named key value while adding new _id key value pair and flatten array key values.
db.coll.aggregate([
{
"$project": {
"keyvalarr": {
"$objectToArray": "$$ROOT"
}
}
},
{
"$unwind": {
"path": "$keyvalarr",
"includeArrayIndex": "index"
}
},
{
"$match": {
"keyvalarr.k": {
"$ne": "_id"
}
}
},
{
"$replaceRoot": {
"newRoot": {
"$arrayToObject": [
{
"k": "_id",
"v": {
"$concat": [
{
"$substr": [
"$_id",
0,
-1
]
},
"-",
{
"$substr": [
"$index",
0,
-1
]
}
]
}
},
"$keyvalarr"
]
}
}
}
])
Anu. Here are two options you can use.
The first option is pretty straightforward, but it requires you to hardcode _id' indexes yourself.
db.users.aggregate([
{
$project: {
pairs : [
{ firstName: '$firstName', _id : { $concat : [ { $substr : [ '$_id', 0, 50 ] }, '-1' ] } },
{ lastName: '$lastName', _id : { $concat : [ '$_id', '-2' ] } },
{ bio: '$bio', _id : { $concat : [ { $substr : [ '$_id', 0, 50 ] }, '-3' ] } }
]
}
},
{
$unwind : '$pairs'
},
{
$replaceRoot: { newRoot: '$pairs' }
}
])
The second option does a little bit more job and is somewhat more tricky. But it is probably easier to extend if you ever need to add another field.
db.users.aggregate([
{
$project: {
pairs : [
{ firstName: '$firstName' },
{ lastName: '$lastName' },
{ bio: '$bio' }
]
}
},
{
$addFields: {
pairsReference : '$pairs'
}
},
{
$unwind: '$pairs'
},
{
$addFields: {
'pairs._id' : { $concat: [ { $substr : [ '$_id', 0, 50 ] }, '-', { $substr: [ { $indexOfArray : [ '$pairsReference', '$pairs' ] }, 0, 2 ] } ] }
}
},
{
$replaceRoot: { newRoot: '$pairs' }
}
])
You can redirect results of both queries into another collection by using $out stage.
UPD:
The only reason you get the error is that one of the _ids is not a string.
Replace the first parameter of $concat ($_id) with the following expression:
{ $substr : [ '$_id', 0, 50 ] }

How can I get max value in nested documents?

I have a collection(named menucategories) in MongoDB 3.2.11:
{
"_id" : ...
"menus" : [
{
"code":0
},
{
"code":1
},
{
"code":2
},
{
"code":3
}
]
},
{
"_id" : ...
"menus" : [
{
"code":4
},
{
"code":5
},
{
"code":6
},
{
"code":7
}
]
},
{
"_id" : ...
"menus" : [
{
"code":8
},
{
"code":9
},
{
"code":10
},
{
"code":11
}
]
}
Every menucategory has array named menus. And every menu(element of the array) has code. The 'code' of menus is unique in every menu. I wanna get the maximum value of menu's code(in this case, 11). How can I achieve this?
If you want to find maximum value of code from all menus code then probable query will be as follows:
db.menucategories.aggregate([
{ $unwind: '$menus' },
{ $group: { _id: null, max: { $max: '$menus.code' } } },
{ $project: { max: 1, _id:0 } }
])
Click below links for more information regarding different operators:
$unwind, $group, $project
You don't need to use the $unwind aggregation pipeline operator here because starting from MongoDB 3.2, some accumulator expressions are available in the $project stage.
db.collection.aggregate([
{"$project": {"maxPerDoc": {"$max": "$menus.code"}}},
{"$group": {"_id": null, "maxValue": {"$max": "$maxPerDoc"}}}
])
Responding a previous now deleted comment, you don't need to put your pipeline in an array so the following query will work as well.
db.collection.aggregate(
{"$project": {"maxPerDoc": {"$max": "$menus.code"}}},
{"$group": {"_id": null, "maxValue": {"$max": "$maxPerDoc"}}}
)
Try with aggregation:
db.collection.aggregate({ $group : { _id: 1, max: { $max: {$max : "$menus.code"}}}});
No need of any unwind, if you need find only maximum value.