Using cond to specify _id fields for group in mongodb aggregation - mongodb

new to Mongo. Trying to group across different sub fields of a document based on a condition. The condition is a regex on a field value. Looks like -
db.collection.aggregate([{
{
"$group": {
"$cond": [{
"upper.leaf": {
$not: {
$regex: /flower/
}
}
},
{
"_id": {
"leaf": "$upper.leaf",
"stem": "$upper.stem"
}
},
{
"_id": {
"stem": "$upper.stem",
"petal": "$upper.petal"
}
}
]
}
}])
Using api v4.0: cond in the docs shows - { $cond: [ <boolean-expression>, <true-case>, <false-case> ] }
The error I get with the above code is - "Syntax error: dotted field name 'upper.leaf' can not used in a sub object."
Reading up on that I tried $let to re-assign the dotted field name. But started to hit various syntax errors with no obvious issue in the query.
Also tried using $project to rename the fields, but got - Field names may not start with '$'
Thoughts on the best approach here? I can always address this at the application level and split my query into two but it's attractive potentially to solve it natively in mongo.

$group syntax is wrong
{
$group:
{
_id: <expression>, // Group By Expression
<field1>: { <accumulator1> : <expression1> },
...
}
}
You tried to do
{
$group:
<expression>
}
And even if your expression resulted in the same code, its invalid syntax for $group (check from the documentation where you are allowed to use expressions)
One other problem is that you use the query operator for regex, and not the aggregate regex operators (you can't do that, if you aggregate you can use only aggregate operators, only $match is the exception that you can use both if you add $expr)
You need this i think
[{
"$group" : {
"_id" : {
"$cond" : [ {
"$not" : [ {
"$regexMatch" : {
"input" : "$upper.leaf",
"regex" : "/flower/"}}]},
{"leaf" : "$upper.leaf","stem" : "$upper.stem"},
{"stem" : "$upper.stem","petal" : "$upper.petal"}]
}
}}]
Its similar code, but expression gets as value of the "_id" and $regexMatch
is used that is aggregate operator.
I didnt tested the code.

Related

How to extract grouped results from array in $group stage and return as separate fields?

I'm running an aggregation query, and the $group stage is as follows
$group:
{
_id:
{
year_month: { $dateToString: { "date": "$updated_at", "format": "%Y-%m" } }
,client_name: "$clients_docs.client_name"
,client_label: "$clients_docs.client_label"
,client_code: "$clients_docs.client_code"
,client_country: "$clients_docs.client_country"
,base_curr: "$clients_docs.client_base_currency"
,inv_curr: "$clients_docs.client_invoice_currency"
,dest_curr: "$store.destination_currency"
}
,total_vol: { $sum: "$USD_Value" }
,total_tran: { $sum: 1 }
}
It returns the correct results, and returns all the grouped results in the _id:{} array.
I now want to extract all those fields from the array and return them not within the array so I can more easily export the output to a spreadsheet.
I tried using this stage:
{
$project:
{
year_month: 1
,client_name: 1
,client_label: 1
,client_code: 1
,client_country: 1
,base_curr: 1
,inv_curr: 1
,dest_curr: 1
,total_vol: 1
,total_tran : 1
}
},
But that returned the same results as the $group stage:
{
"_id" : {
"year_month" : "2022-01",
"client_name" : "client A",
"client_label" : "client A",
"client_code" : NumberInt(0000),
"client_country" : "TH",
"base_curr" : "USD",
"inv_curr" : "USD",
"dest_curr" : "HKD"
},
"total_vol" : 100000,
"total_tran" : 100.0
}
I want the "year_month" through "dest_curr" fields at the same level as the "total_vol" and "total_tran", so that when the data is exported they all appear as separate columns (now it's all captured as one "_id" column, and a "total_vol" and "total_tran" column). What's the best way to do this?
From a terminology perspective, you currently have an embedded document (or nested fields) rather than an array.
The straightforward way to do this is to simply enumerate each field, eg:
"year_month": "$_id.year_month",
There are fancier ways to do this, but as you only have a handful of fields this should suffice. Working playground example here.
Edit
An alternative ("fancier") approach is to leverage the $replaceWith stage using the $mergeObjects operator inside of it. Then you can $unset the previous _id field afterwards. It would look like this:
db.collection.aggregate([
{
"$replaceWith": {
"$mergeObjects": [
"$$ROOT",
"$_id"
]
}
},
{
$unset: "_id"
}
])
Playground link here
I also fixed the earlier playground link that had a typo for the client_label field.

How to correctly perform join but with math operations in MongoDB?

Here I have a collection, say test, storing data with a field named timestamp (in ms). Documents in this collection are densely inserted with timestamp interval 60000. That's to say, I can always find one and only one document whose timestamp is 1 minute before that of a refered one (except for the very first one, of course). Now I want to perform a join to correlate each document with that whose timestamp is 1 minute before. I've tried this aggregation:
...
$lookup : {
from: 'test',
let : { lastTimestamp: '$timestamp'-60000 },
pipeline : [
{$match : {timestamp:'$timestamp'}}
],
as: 'lastObjArr'
},
...
which intends to find the array of the very document and set it as the value of key lastObjArr. But in fact lastObjArr is always an empty one. What happend?
Your $lookup pipeline is incomplete as it's missing the necessary math operators. For a start, lastObjArr is empty due to a number of factors, one of them being that the expression
let : { lastTimestamp: '$timestamp'-60000 },
doesn't evaluate correctly, it needs to use the $subtract operator
let : { lastTimestamp: { $subtract: ['$timestamp', 60000] } },
Also, the $match pipeline step needs to use the $expr operator together with $eq for the query to work, i.e.
$lookup : {
from: 'test',
let : { lastTimestamp: { $subtract: ['$timestamp', 60000] } },
pipeline : [
{ $match : {
$expr: { $eq: ['$timestamp', '$$lastTimestamp'] }
} }
],
as: 'lastObjArr'
}
you defined a variable called "lastTimestamp" and you assign it with
'$timestamp'-60000
But you never use it, change your code as following it should work:
$lookup : {
from: 'test',
let : { lastTimestamp: '$timestamp'-60000 },
pipeline : [
{$match : {timestamp:'$$lastTimestamp'}}
],
as: 'lastObjArr'
},

Mongodb weird behaviour of $exists

I don't understand the behaviour of the command $exists.
I have two simple documents in the collection 'user':
/* 1 */
{
"_id" : ObjectId("59788c2f6be212c210c73233"),
"user" : "google"
}
/* 2 */
{
"_id" : ObjectId("597899a80915995e50528a99"),
"user" : "werty",
"extra" : "very important"
}
I want to retrieve documents which contain the field "extra" and the value is not equal to 'unimportant':
The query:
db.getCollection('users').find(
{"extra":{$exists:true},"extra": {$ne:"unimportant"}}
)
returns both two documents.
Also the query
db.getCollection('users').find(
{"extra":{$exists:false},"extra": {$ne:"unimportant"}}
)
returns both two documents.
It seems that $exists (when used with another condition on the same field) works like an 'OR'.
What I'm doing wrong? Any help appreciated.
I used mongodb 3.2.6 and 3.4.9
I have seen Mongo $exists query does not return correct documents
but i haven't sparse indexes.
Per MongoDB documentation (https://docs.mongodb.com/manual/reference/operator/query/and/):
Using an explicit AND with the $and operator is necessary when the same field or operator has to be specified in multiple expressions.
Therefore, and in order to enforce the cumpliment of both clauses, you should use the $and operator like follows:
db.getCollection('users').find({ $and : [ { "extra": { $exists : true } }, { "extra" : { $ne : "unimportant" } } ] });
The way you constructed your query is wrong, nothing to do with how $exists works. Because you are checking two conditions, you would need a query that does a logical AND operation to satisfy the two conditions.
The correct syntax for the query
I want to retrieve documents which contain the field "extra" and the
value is not equal to 'unimportant'
should follow:
db.getCollection('users').find(
{
"extra": {
"$exists": true,
"$ne": "unimportant"
}
}
)
or using the $and operator as:
db.getCollection('users').find(
{
"$and": [
{ "extra": { "$exists": true } },
{ "extra": { "$ne": "unimportant" } }
]
}
)

MongoDB Aggregation with DBRef

Is it possible to aggregate on data that is stored via DBRef?
Mongo 2.6
Let's say I have transaction data like:
{
_id : ObjectId(...),
user : DBRef("user", ObjectId(...)),
product : DBRef("product", ObjectId(...)),
source : DBRef("website", ObjectId(...)),
quantity : 3,
price : 40.95,
total_price : 122.85,
sold_at : ISODate("2015-07-08T09:09:40.262-0700")
}
The trick is "source" is polymorphic in nature - it could be different $ref values such as "webpage", "call_center", etc that also have different ObjectIds. For example DBRef("webpage", ObjectId("1")) and DBRef("webpage",ObjectId("2")) would be two different webpages where a transaction originated.
I would like to ultimately aggregate by source over a period of time (like a month):
db.coll.aggregate( { $match : { sold_at : { $gte : start, $lt : end } } },
{ $project : { source : 1, total_price : 1 } },
{ $group : {
_id : { "source.$ref" : "$source.$ref" },
count : { $sum : $total_price }
} } );
The trick is you get a path error trying to use a variable starting with $ either by trying to group by it or by trying to transform using expressions via project.
Any way to do this? Actually trying to push this data via aggregation to a subcollection to operate on it there. Trying to avoid a large cursor operation over millions of records to transform the data so I can aggregate it.
Mongo 4. Solved this issue in the following way:
Having this structure:
{
"_id" : LUUID("144e690f-9613-897c-9eab-913933bed9a7"),
"owner" : {
"$ref" : "person",
"$id" : NumberLong(10)
},
...
...
}
I needed to use "owner.$id" field. But because of "$" in the name of field, I was unable to use aggregation.
I transformed "owner.$id" -> "owner" using following snippet:
db.activities.find({}).aggregate([
{
$addFields: {
"owner": {
$arrayElemAt: [{ $objectToArray: "$owner" }, 1]
}
}
},
{
$addFields: {
"owner": "$owner.v"
}
},
{"$group" : {_id:"$owner", count:{$sum:1}}},
{$sort:{"count":-1}}
])
Detailed explanations here - https://dev.to/saurabh73/mongodb-using-aggregation-pipeline-to-extract-dbref-using-lookup-operator-4ekl
You cannot use DBRef values with the aggregation framework. Instead you need to use JavasScript processing of mapReduce in order to access the property naming that they use:
db.coll.mapReduce(
function() {
emit( this.source.$ref, this["total_price"] )
},
function(key,values) {
return Array.sum( values );
},
{
"query": { "sold_at": { "$gte": start, "$lt": end } },
"out": { "inline": 1 }
}
)
You really should not be using DBRef at all. The usage is basically deprecated now and if you feel you need some external referencing then you should be "manually referencing" this with your own code or implemented by some other library, with which you can do so in a much more supported way.

Getting first and last element of array in MongoDB

Mongo DB: I'm looking to make one query to return both the first and last element of an array. I realize that I can do this multiple queries, but I would really like to do it with one.
Assume a collection "test" where each objects has an array "arr" of numbers:
db.test.find({},{arr:{$slice: -1},arr:{$slice: 1}});
This will result in the following:
{ "_id" : ObjectId("xxx"), "arr" : [ 1 ] } <-- 1 is the first element
Is there a way to maybe alias the results? Similar to what the mysql AS keyword would allow in a query?
This is not possible at the moment but will be with the Aggregation Framework that's in development now if I understand your functional requirement correctly.
You have to wonder about your schema if you have this requirement in the first place though. Are you sure there isn't a more elegant way to get this to work by changing your schema accordingly?
This can be done with the aggregation framework using the operators $first and $last as follows:
db.test.aggregate([
{ '$addFields': {
'firstElem': { '$first': '$arr' },
'lastElem': { '$last': '$arr' }
} }
])
or using $slice as
db.test.aggregate([
{ '$addFields': {
'firstElem': { '$slice': [ '$arr', 1 ] },
'lastElem': { '$slice': [ '$arr', -1 ] }
} }
])