I'm hoping that someone might be able to answer whether what I'm trying to accomplish below can be done with the MongoDB Aggregation Framework.
I have a user data structure that resembles the following with close to 1 million documents.
{
"firstName" : "John",
"lastName" : "Doe",
"state" : "NJ",
"email" : "JOHNDOE#XYZ.COM"
"source" : [
{
"type" : "SOURCE-A",
"data" : {
"info" : "abc",
"info2" : "xyz"
}
},
{
"type" : "SOURCE-B",
"data" : {
"info3" : "abc"
}
}
]
}
For the purposes of feeding data to another system, I need to generate a flat file structure with limited information from the previous dataset. The columns need to represent:
firstname, lastname, email, is_source-a, is_source-b
The part that I'm having difficulty with is the conditional code that attempts to populate "is_source-a" and "is_source-b". I have tried to use the following aggregation query, but can't figure out how to get it working since the $EQ operator used along with $COND doesn't seem to evaluate data inside of an array (always false).
db.collection.aggregate([
{
$project : {
_id : 0,
firstName : 1,
lastName: 1,
"is_source-a" : {
$cond : [
{ $eq: [ "$source.type", "source-a" ] },
1,
0
]
},
"is_source-b" : {
$cond : [
{ $eq: [ "$source.type", "source-b" ] },
1,
0
]
}
}
}
]);
I could $UNWIND the array first, but then I wind up with a multiple records for each user document and don't understand how to consolidate them back.
Is there something that I'm missing with how to use $EQ (or some other operator) along with $COND when dealing with arrays of objects?
You're definitely on the right track, and using $unwind can get you there if you follow it up with a $group to put things back together:
db.collection.aggregate([
{$unwind: '$source'},
{$project: {
_id: 1,
firstName: 1,
lastName: 1,
email: 1,
'is_source-a': {$eq: ['$source.type', 'SOURCE-A']},
'is_source-b': {$eq: ['$source.type', 'SOURCE-B']}
}},
// group the docs that were duplicated in the $unwind back together by _id,
// taking the values for most fields from the $first occurrence of the _id,
// but the $max of the is_source fields so that if its true in any of the
// docs for that _id it will be true in the output for that _id.
{$group: {
_id: '$_id',
firstName: {$first: '$firstName'},
lastName: {$first: '$lastName'},
email: {$first: '$email'},
'is_source-a': {$max: '$is_source-a'},
'is_source-b': {$max: '$is_source-b'}
}},
// project again to remove _id
{$project: {
_id: 0,
firstName: 1,
lastName: 1,
email: 1,
'is_source-a': '$is_source-a',
'is_source-b': '$is_source-b'
}}
])
If you do not want to $unwind and $group this can also be achieved with $cond and $in.
I found this originally here: https://www.mongodb.com/community/forums/t/cond-inside-an-array-is-not-working/156468
I was a bit surprised this works, but as the mongo docs state:
$in
has the following operator expression syntax:
{ $in: [ <expression>, <array expression> ] }
For the original question (I'm sure you're still waiting on this 8 years later), it could be done like this:
db.collection.aggregate([
{
$project : {
_id : 0,
firstName : 1,
lastName: 1,
"is_source-a" : {
$cond : [
{ $in: [ "source-a", "$source.type" ] },
1,
0
]
},
"is_source-b" : {
$cond : [
{ $in: [ "source-b", "$source.type" ] },
1,
0
]
}
}
}
]);
Related
I want to show products by ids (56e641d4864e5b780bb992c6 and 56e65504a323ee0812e511f2) and show price after subtracted by discount if available.
I can count the final price using aggregate, but this return all document in a collection, how to make it return only the matches ids
"_id" : ObjectId("56e641d4864e5b780bb992c6"),
"title" : "Keyboard",
"discount" : NumberInt(10),
"price" : NumberInt(1000)
"_id" : ObjectId("56e65504a323ee0812e511f2"),
"title" : "Mouse",
"discount" : NumberInt(0),
"price" : NumberInt(1000)
"_id" : ObjectId("56d90714a48d2eb40cc601a5"),
"title" : "Speaker",
"discount" : NumberInt(10),
"price" : NumberInt(1000)
this is my query
productModel.aggregate([
{
$project: {
title : 1,
price: {
$cond: {
if: {$gt: ["$discount", 0]}, then: {$subtract: ["$price", {$divide: [{$multiply: ["$price", "$discount"]}, 100]}]}, else: "$price"
}
}
}
}
], function(err, docs){
if (err){
console.log(err)
}else{
console.log(docs)
}
})
and if i add this $in query, it returns empty array
productModel.aggregate([
{
$match: {_id: {$in: ids}}
},
{
$project: {
title : 1,
price: {
$cond: {
if: {$gt: ["$discount", 0]}, then: {$subtract: ["$price", {$divide: [{$multiply: ["$price", "$discount"]}, 100]}]}, else: "$price"
}
}
}
}
], function(err, docs){
if (err){
console.log(err)
}else{
console.log(docs)
}
})
Your ids variable will be constructed of "strings", and not ObjectId values.
Mongoose "autocasts" string values for ObjectId into their correct type in regular queries, but this does not happen in the aggregation pipeline, as in described in issue #1399.
Instead you must do the correct casting to type manually:
ids = ids.map(function(el) { return mongoose.Types.ObjectId(el) })
Then you can use them in your pipeline stage:
{ "$match": { "_id": { "$in": ids } } }
The reason is because aggregation pipelines "typically" alter the document structure, and therefore mongoose makes no presumption that the "schema" applies to the document in any given pipeline stage.
It is arguable that the "first" pipeline stage when it is a $match stage should do this, since indeed the document is not altered. But right now this is not how it happens.
Any values that may possibly be "strings" or at least not the correct BSON type need to be manually cast in order to match.
In the mongoose , it work fine with find({_id:'606c1ceb362b366a841171dc'})
But while using the aggregate function we have to use the mongoose object to convert the _id as object eg.
$match: { "_id": mongoose.Types.ObjectId("606c1ceb362b366a841171dc") }
This will work fine.
You can simply convert your id to
let id = mongoose.Types.ObjectId(req.query.id);
and then match
{ $match: { _id: id } },
instead of:
$match: { _id: "6230415bf48824667a417d56" }
use:
$match: { _id: ObjectId("6230415bf48824667a417d56") }
Use this
$match: { $in : [ {_id: mongoose.Types.ObjectId("56e641d4864e5b780bb992c6 ")}, {_id: mongoose.Types.ObjectId("56e65504a323ee0812e511f2")}] }
Because Mongoose autocasts string values for ObjectId into their correct type in regular queries, but this does not happen in the aggregation pipeline. So we need to define ObjectId cast in pipeline queries.
I want to show products by ids (56e641d4864e5b780bb992c6 and 56e65504a323ee0812e511f2) and show price after subtracted by discount if available.
I can count the final price using aggregate, but this return all document in a collection, how to make it return only the matches ids
"_id" : ObjectId("56e641d4864e5b780bb992c6"),
"title" : "Keyboard",
"discount" : NumberInt(10),
"price" : NumberInt(1000)
"_id" : ObjectId("56e65504a323ee0812e511f2"),
"title" : "Mouse",
"discount" : NumberInt(0),
"price" : NumberInt(1000)
"_id" : ObjectId("56d90714a48d2eb40cc601a5"),
"title" : "Speaker",
"discount" : NumberInt(10),
"price" : NumberInt(1000)
this is my query
productModel.aggregate([
{
$project: {
title : 1,
price: {
$cond: {
if: {$gt: ["$discount", 0]}, then: {$subtract: ["$price", {$divide: [{$multiply: ["$price", "$discount"]}, 100]}]}, else: "$price"
}
}
}
}
], function(err, docs){
if (err){
console.log(err)
}else{
console.log(docs)
}
})
and if i add this $in query, it returns empty array
productModel.aggregate([
{
$match: {_id: {$in: ids}}
},
{
$project: {
title : 1,
price: {
$cond: {
if: {$gt: ["$discount", 0]}, then: {$subtract: ["$price", {$divide: [{$multiply: ["$price", "$discount"]}, 100]}]}, else: "$price"
}
}
}
}
], function(err, docs){
if (err){
console.log(err)
}else{
console.log(docs)
}
})
Your ids variable will be constructed of "strings", and not ObjectId values.
Mongoose "autocasts" string values for ObjectId into their correct type in regular queries, but this does not happen in the aggregation pipeline, as in described in issue #1399.
Instead you must do the correct casting to type manually:
ids = ids.map(function(el) { return mongoose.Types.ObjectId(el) })
Then you can use them in your pipeline stage:
{ "$match": { "_id": { "$in": ids } } }
The reason is because aggregation pipelines "typically" alter the document structure, and therefore mongoose makes no presumption that the "schema" applies to the document in any given pipeline stage.
It is arguable that the "first" pipeline stage when it is a $match stage should do this, since indeed the document is not altered. But right now this is not how it happens.
Any values that may possibly be "strings" or at least not the correct BSON type need to be manually cast in order to match.
In the mongoose , it work fine with find({_id:'606c1ceb362b366a841171dc'})
But while using the aggregate function we have to use the mongoose object to convert the _id as object eg.
$match: { "_id": mongoose.Types.ObjectId("606c1ceb362b366a841171dc") }
This will work fine.
You can simply convert your id to
let id = mongoose.Types.ObjectId(req.query.id);
and then match
{ $match: { _id: id } },
instead of:
$match: { _id: "6230415bf48824667a417d56" }
use:
$match: { _id: ObjectId("6230415bf48824667a417d56") }
Use this
$match: { $in : [ {_id: mongoose.Types.ObjectId("56e641d4864e5b780bb992c6 ")}, {_id: mongoose.Types.ObjectId("56e65504a323ee0812e511f2")}] }
Because Mongoose autocasts string values for ObjectId into their correct type in regular queries, but this does not happen in the aggregation pipeline. So we need to define ObjectId cast in pipeline queries.
I want to show products by ids (56e641d4864e5b780bb992c6 and 56e65504a323ee0812e511f2) and show price after subtracted by discount if available.
I can count the final price using aggregate, but this return all document in a collection, how to make it return only the matches ids
"_id" : ObjectId("56e641d4864e5b780bb992c6"),
"title" : "Keyboard",
"discount" : NumberInt(10),
"price" : NumberInt(1000)
"_id" : ObjectId("56e65504a323ee0812e511f2"),
"title" : "Mouse",
"discount" : NumberInt(0),
"price" : NumberInt(1000)
"_id" : ObjectId("56d90714a48d2eb40cc601a5"),
"title" : "Speaker",
"discount" : NumberInt(10),
"price" : NumberInt(1000)
this is my query
productModel.aggregate([
{
$project: {
title : 1,
price: {
$cond: {
if: {$gt: ["$discount", 0]}, then: {$subtract: ["$price", {$divide: [{$multiply: ["$price", "$discount"]}, 100]}]}, else: "$price"
}
}
}
}
], function(err, docs){
if (err){
console.log(err)
}else{
console.log(docs)
}
})
and if i add this $in query, it returns empty array
productModel.aggregate([
{
$match: {_id: {$in: ids}}
},
{
$project: {
title : 1,
price: {
$cond: {
if: {$gt: ["$discount", 0]}, then: {$subtract: ["$price", {$divide: [{$multiply: ["$price", "$discount"]}, 100]}]}, else: "$price"
}
}
}
}
], function(err, docs){
if (err){
console.log(err)
}else{
console.log(docs)
}
})
Your ids variable will be constructed of "strings", and not ObjectId values.
Mongoose "autocasts" string values for ObjectId into their correct type in regular queries, but this does not happen in the aggregation pipeline, as in described in issue #1399.
Instead you must do the correct casting to type manually:
ids = ids.map(function(el) { return mongoose.Types.ObjectId(el) })
Then you can use them in your pipeline stage:
{ "$match": { "_id": { "$in": ids } } }
The reason is because aggregation pipelines "typically" alter the document structure, and therefore mongoose makes no presumption that the "schema" applies to the document in any given pipeline stage.
It is arguable that the "first" pipeline stage when it is a $match stage should do this, since indeed the document is not altered. But right now this is not how it happens.
Any values that may possibly be "strings" or at least not the correct BSON type need to be manually cast in order to match.
In the mongoose , it work fine with find({_id:'606c1ceb362b366a841171dc'})
But while using the aggregate function we have to use the mongoose object to convert the _id as object eg.
$match: { "_id": mongoose.Types.ObjectId("606c1ceb362b366a841171dc") }
This will work fine.
You can simply convert your id to
let id = mongoose.Types.ObjectId(req.query.id);
and then match
{ $match: { _id: id } },
instead of:
$match: { _id: "6230415bf48824667a417d56" }
use:
$match: { _id: ObjectId("6230415bf48824667a417d56") }
Use this
$match: { $in : [ {_id: mongoose.Types.ObjectId("56e641d4864e5b780bb992c6 ")}, {_id: mongoose.Types.ObjectId("56e65504a323ee0812e511f2")}] }
Because Mongoose autocasts string values for ObjectId into their correct type in regular queries, but this does not happen in the aggregation pipeline. So we need to define ObjectId cast in pipeline queries.
I have documents that have a few fields and in particular the have a field called attrs that is an array. I am using the aggregation pipeline.
In my query I am interested in the attrs (attributes) field if there are any elements in it. Otherwise I still want to get the result. In this case I am after the field type of the document.
The problem is that if a document does not contain any element in the attrs field it will be filtered away and I won't get its _id.type field, which is what I really want from this query.
{
aggregate: "entities",
pipeline: [
{
$match: {
_id.servicePath: {
$in: [
/^/.*/,
null
]
}
}
},
{
$project: {
_id: 1,
"attrs.name": 1,
"attrs.type": 1
}
},
{
$unwind: "$attrs"
},
{
$group: {
_id: "$_id.type",
attrs: {
$addToSet: "$attrs"
}
}
},
{
$sort: {
_id: 1
}
}
]
}
So the question is: how can I get a result containing all documents types regardless of their having attrs, but including the attributes in case they have them?
I hope it makes sense.
You can use the $cond operator in a $project stage to replace the empty attr array with one that contains a placeholder like null that can be used as a marker to indicate that this doc doesn't contain any attr elements.
So you'd insert an additional $project stage like this right before the $unwind:
{
$project: {
attrs: {$cond: {
if: {$eq: ['$attrs', [] ]},
then: [null],
else: '$attrs'
}}
}
},
The only caveat is that you'll end up with a null value in the final attrs array for those groups that contain at least one doc without any attrs elements, so you need to ignore those client-side.
Example
The example uses an altered $match stage because the one in your example isn't valid.
Input Docs
[
{_id: {type: 1, id: 2}, attrs: []},
{_id: {type: 2, id: 1}, attrs: []},
{_id: {type: 2, id: 2}, attrs: [{name: 'john', type: 22}, {name: 'bob', type: 44}]}
]
Output
{
"result" : [
{
"_id" : 1,
"attrs" : [
null
]
},
{
"_id" : 2,
"attrs" : [
{
"name" : "bob",
"type" : 44
},
{
"name" : "john",
"type" : 22
},
null
]
}
],
"ok" : 1
}
Aggregate Command
db.test.aggregate([
{
$match: {
'_id.servicePath': {
$in: [
null
]
}
}
},
{
$project: {
_id: 1,
"attrs.name": 1,
"attrs.type": 1
}
},
{
$project: {
attrs: {$cond: {
if: {$eq: ['$attrs', [] ]},
then: [null],
else: '$attrs'
}}
}
},
{
$unwind: "$attrs"
},
{
$group: {
_id: "$_id.type",
attrs: {
$addToSet: "$attrs"
}
}
},
{
$sort: {
_id: 1
}
}
])
use some if statements and loops.
first, your query should select all documents, first and foremost.
loop through all of them
then, if number of attributes is greater than 0, loop through the attributes. loop them into whatever array or output you find useful.
use if statements to sanitize your results if you like.
You should use '$or' operator , and two seperate queries : one to select the documents with attr value equal to required value, and other query to match documents where attr is null, or attr key does not exist ( using $exists operator )
How flexible is the aggregate function for output formatting in MongoDB?
Data format:
{
"_id" : ObjectId("506ddd1900a47d802702a904"),
"port_name" : "CL1-A",
"metric" : "772.0",
"port_number" : "0",
"datetime" : ISODate("2012-10-03T14:03:00Z"),
"array_serial" : "12345"
}
Right now I'm using this aggregate function to return an array of DateTime, an array of metrics, and a count:
{$match : { 'array_serial' : array,
'port_name' : { $in : ports},
'datetime' : { $gte : from, $lte : to}
}
},
{$project : { port_name : 1, metric : 1, datetime: 1}},
{$group : { _id : "$port_name",
datetime : { $push : "$datetime"},
metric : { $push : "$metric"},
count : { $sum : 1}}}
Which is nice, and very fast, but is there a way to format the output so there's one array per datetime/metric? Like this:
[
{
"_id" : "portname",
"data" : [
["2012-10-01T00:00:00.000Z", 1421.01],
["2012-10-01T00:01:00.000Z", 1361.01],
["2012-10-01T00:02:00.000Z", 1221.01]
]
}
]
This would greatly simplify the front-end as that's the format the chart code expects.
Combining two fields into an array of values with the Aggregation Framework is possible, but definitely isn't as straightforward as it could be (at least as at MongoDB 2.2.0).
Here is an example:
db.metrics.aggregate(
// Find matching documents first (can take advantage of index)
{ $match : {
'array_serial' : array,
'port_name' : { $in : ports},
'datetime' : { $gte : from, $lte : to}
}},
// Project desired fields and add an extra $index for # of array elements
{ $project: {
port_name: 1,
datetime: 1,
metric: 1,
index: { $const:[0,1] }
}},
// Split into document stream based on $index
{ $unwind: '$index' },
// Re-group data using conditional to create array [$datetime, $metric]
{ $group: {
_id: { id: '$_id', port_name: '$port_name' },
data: {
$push: { $cond:[ {$eq:['$index', 0]}, '$datetime', '$metric'] }
},
}},
// Sort results
{ $sort: { _id:1 } },
// Final group by port_name with data array and count
{ $group: {
_id: '$_id.port_name',
data: { $push: '$data' },
count: { $sum: 1 }
}}
)
MongoDB 2.6 made this a lot easier by introducing $map, which allows a simplier form of array transposition:
db.metrics.aggregate([
{ "$match": {
"array_serial": array,
"port_name": { "$in": ports},
"datetime": { "$gte": from, "$lte": to }
}},
{ "$group": {
"_id": "$port_name",
"data": {
"$push": {
"$map": {
"input": [0,1],
"as": "index",
"in": {
"$cond": [
{ "$eq": [ "$$index", 0 ] },
"$datetime",
"$metric"
]
}
}
}
},
"count": { "$sum": 1 }
}}
])
Where much like the approach with $unwind, you supply an array as "input" to the map operation consisting of two values and then essentially replace those values with the field values you want via the $cond operation.
This actually removes all the pipeline juggling required to transform the document as was required in previous releases and just leaves the actual aggregation to the job at hand, which is basically accumulating per "port_name" value, and the transformation to array is no longer a problem area.
Building arrays in the aggregation framework without $push and $addToSet is something that seems to be lacking. I've tried to get this to work before, and failed. It would be awesome if you could just do:
data : {$push: [$datetime, $metric]}
in the $group, but that doesn't work.
Also, building "literal" objects like this doesn't work:
data : {$push: {literal:[$datetime, $metric]}}
or even data : {$push: {literal:$datetime}}
I hope they eventually come up with some better ways of massaging this sort of data.