MongoDB Aggregate $unwind on sub-Documents - mongodb

I'm struggling when I $unwind more than one field from a Sub-Document.
Here's what the data looks like:-
{
resp: {
field1: 'yes',
field2: ''
},
{
resp: {
field1: 'yes',
field2: ''
}
etc,etc...
If I process an Aggregation Pipeline for ONE field, it works OK, so this works...
{ $unwind: "$resp" },
{ $unwind: "$resp.field1" },
{ $project: { field1: "$resp.field1" } }
{ $group: {
_id: 1,
field1: { $sum: { $cond: [{ $eq: ["$field1","yes"] },1,0] } }
}
}
But if I now want to return field 2 in the same aggregation, using the following, it will return a count of Zero for both fields, whereas previously field1 had a count > Zero.
{ $unwind: "$resp" },
{ $unwind: "$resp.field1" },
{ $unwind: "$resp.field2" },
{
$project: {
field1: "$resp.field1",
field2: "$resp.field2"
},
{ $group: {
_id: 1,
field1: { $sum: { $cond: [{ $eq: ["$field1","yes"] },1,0] } },
field2: { $sum: { $cond: [{ $eq: ["$field2","yes"] },1,0] } }
}
}
Any suggestions would be much appreciated.

it seems the above is the correct way to do this, but I'd happily take alternative suggestions. The error was in may mapping of the fields in the $project stage. When typing the issue into SO I realised where the problem was !

Related

MongoDB aggregation: count appearances of each value of a field per id

Data example:
{ id: 1, field: a, .. }
{ id: 1, field: a, .. }
{ id: 1, field: b, .. }
{ id: 2, field: b, .. }
Desired result:
{ id: 1, countA: 2, countB: 1 }
{ id :2, countA:0, countB: 1 }
'field' is an enum, so I know all the values in advance and can give names to the counters.
I have a solution but it seems that there is a better one. My solution:
db.collection.aggregate([
{ $group: { _id: { id: "$id", field: "$field"}, count: { $sum : 1}}},
{ $project: {
_id: 1,
countA: { $cond: { if: { $eq: ["$_id.field", "a"] }, then: "$count", else: 0 }},
countB: { $cond: { if: { $eq: ["$_id.field", "b"] }, then: "$count", else: 0 }}
}
},
{ $group:
{_id: "$_id.id", countA: { $max: "$countA"}, countB: { $max :"$countB"}}
}
])
upd: I have a better solution - placing the project before grouping in some way and no need for 2 groupings, but it uses the same principle. But it still seems that there should be somehting more built-in for this purpose
Thanks!
I believe you just need one $group stage to acheive what you want.
Just use $sum to count the number of fields with value a and B with $cond.
Try this:
db.collection.aggregate([
{
$group: {
_id: "$id",
id: {
$first: "$id"
},
countA: {
$sum: {
$cond: {
if: { $eq: [ "$field","a"] },
then: 1,
else: 0
}
}
},
countB: {
$sum: {
$cond: {
if: { $eq: [ "$field", "b"] },
then: 1,
else: 0
}
}
}
}
}
])
Have a look at this Mongo Playground for working demo of the query.
I hope this is what you are looking for!

MongoDB multiple levels embedded array query

I have a document like this:
{
_id: 1,
data: [
{
_id: 2,
rows: [
{
myFormat: [1,2,3,4]
},
{
myFormat: [1,1,1,1]
}
]
},
{
_id: 3,
rows: [
{
myFormat: [1,2,7,8]
},
{
myFormat: [1,1,1,1]
}
]
}
]
},
I want to get distinct myFormat values as a complete array.
For example: I need the result as: [1,2,3,4], [1,1,1,1], [1,2,7,8]
How can I write mongoDB query for this?
Thanks for the help.
Please try this, if every object in rows has only one field myFormat :
db.getCollection('yourCollection').distinct('data.rows')
Ref : mongoDB Distinct Values for a field
Or if you need it in an array & also objects in rows have multiple other fields, try this :
db.yourCollection.aggregate([{$project :{'data.rows.myFormat':1}},{ $unwind: '$data' }, { $unwind: '$data.rows' },
{ $group: { _id: '$data.rows.myFormat' } },
{ $group: { _id: '', distinctValues: { $push: '$_id' } } },
{ $project: { distinctValues: 1, _id: 0 } }])
Or else:
db.yourCollection.aggregate([{ $project: { values: '$data.rows.myFormat' } }, { $unwind: '$values' }, { $unwind: '$values' },
{ $group: { _id: '', distinctValues: { $addToSet: '$values' } } }, { $project: { distinctValues: 1, _id: 0 } }])
Above aggregation queries would get what you wanted, but those can be tedious on large datasets, try to run those and check if there is any slowness, if you're using for one-time then if needed you can consider using {allowDiskUse: true} & irrespective of one-time or not you need to check on whether to use preserveNullAndEmptyArrays:true or not.
Ref : allowDiskUse , $unwind preserveNullAndEmptyArrays

How to find almost similar records in mongodb?

This is the search record:
A = {
field1: value1,
field2: value2,
...
fieldN: valueN
}
I have many such records in the database.
Other record (B) almost matches record A if even N-M fields in these records are equal. This is the example, M=2:
B = {
field1: OTHER_value1,
field2: OTHER_value2,
field3: value3,
...
fieldN: valueN
}
It can be any fields, not only the first.
P.S.: I have copied the same query for postgresql - How to find almost similar records in sql? and now I want to do this with mongodb.
My solution:
db.col.aggregate(
[
{
$addFields:
{
nonMatchCount: 0
}
},
{
$addFields: {
nonMatchCount:
{
$cond: [{$eq: ['$field1', 'OTHER_value1']}, '$nonMatchCount', {$sum: ['$nonMatchCount', 1]}]
}
}
},
{
$addFields: {
nonMatchCount:
{
$cond: [{$eq: ['$field2', 'OTHER_value2']}, '$nonMatchCount', {$sum: ['$nonMatchCount', 1]}]
}
}
},
{
$addFields: {
nonMatchCount:
{
$cond: [{$eq: ['$field3', 'value3']}, '$nonMatchCount', {$sum: ['$nonMatchCount', 1]}]
}
}
},
...
{
$addFields: {
nonMatchCount:
{
$cond: [{$eq: ['$fieldN', 'valueN']}, '$nonMatchCount', {$sum: ['$nonMatchCount', 1]}]
}
}
},
{$match: { nonMatchCount: {$lte: 2}}}
]
);

Retrieving a count that matches specified criteria in a $group aggregation

So I am looking to group documents in my collection on a specific field, and for the output results of each group, I am looking to include the following:
A count of all documents in the group that match a specific query (i.e. a count of documents that satisfy some expression { "$Property": "Value" })
The total number of documents in the group
(Bonus, as I suspect that this is not easily accomplished) Properties of a document that correspond to a $min/$max accumulator
I am very new to the syntax used to query in mongo and don't quite understand how it all works, but after some research, I've managed to get it down to the following query (please note, I am currently using version 3.0.12 for my mongo db, but I believe we will upgrade in a couple of months time):
db.getCollection('myCollection').aggregate(
[
{
$group: {
_id: {
GroupID: "$GroupID",
Status: "$Status"
},
total: { $sum: 1 },
GroupName: { $first: "$GroupName" },
EarliestCreatedDate: { $min: "$DateCreated" },
LastModifiedDate: { $max: "$LastModifiedDate" }
}
},
{
$group: {
_id: "$_id.GroupID",
Statuses: {
$push: {
Status: "$_id.Status",
Count: "$total"
}
},
TotalCount: { $sum: "$total" },
GroupName: { $first: "$GroupName" },
EarliestCreatedDate: { $min: "$EarliestCreatedDate" },
LastModifiedDate: { $max: "$LastModifiedDate" }
}
}
]
)
Essentially what I am looking to retrieve is the Count for specific Status values, and project them into one final result document that looks like the following:
{
GroupName,
EarliestCreatedDate,
EarliestCreatedBy,
LastModifiedDate,
LastModifiedBy,
TotalCount,
PendingCount,
ClosedCount
}
Where PendingCount and ClosedCount are the total number of documents in each group that have a status Pending/Closed. I suspect I need to use $project with some other expression to extract this value, but I don't really understand the aggregation pipeline well enough to figure this out.
Also the EarliestCreatedBy and LastModifiedBy are the users who created/modified the document(s) corresponding to the EarliestCreatedDate and LastModifiedDate respectively. As I mentioned, I think retrieving these values will add another layer of complexity, so if there is no practical solution, I am willing to forgo this requirement.
Any suggestions/tips would be very much appreciated.
You can try below aggregation stages.
$group
Calculate all the necessary counts TotalCount, PendingCount and ClosedCount for each GroupID
Calculate $min and $max for EarliestCreatedDate and LastModifiedDate respectively and push all the fields to CreatedByLastModifiedBy to be compared later for fetching EarliestCreatedBy and LastModifiedBy for each GroupID
$project
Project all the fields for response
$filter the EarliestCreatedDate value against the data in the CreatedByLastModifiedBy and $map the matching CreatedBy to the EarliestCreatedBy and $arrayElemAt to convert the array to object.
Similar steps for calculating LastModifiedBy
db.getCollection('myCollection').aggregate(
[{
$group: {
_id: "$GroupID",
TotalCount: {
$sum: 1
},
PendingCount: {
$sum: {
$cond: {
if: {
$eq: ["Status", "Pending"]
},
then: 1,
else: 0
}
}
},
ClosedCount: {
$sum: {
$cond: {
if: {
$eq: ["Status", "Closed "]
},
then: 1,
else: 0
}
}
},
GroupName: {
$first: "$GroupName"
},
EarliestCreatedDate: {
$min: "$DateCreated"
},
LastModifiedDate: {
$max: "$LastModifiedDate"
},
CreatedByLastModifiedBy: {
$push: {
CreatedBy: "$CreatedBy",
LastModifiedBy: "$LastModifiedBy",
DateCreated: "$DateCreated",
LastModifiedDate: "$LastModifiedDate"
}
}
}
}, {
$project: {
_id: 0,
GroupName: 1,
EarliestCreatedDate: 1,
EarliestCreatedBy: {
$arrayElemAt: [{
$map: {
input: {
$filter: {
input: "$CreatedByLastModifiedBy",
as: "CrBy",
cond: {
"$eq": ["$EarliestCreatedDate", "$$CrBy.DateCreated"]
}
}
},
as: "EaCrBy",
in: {
"$$EaCrBy.CreatedBy"
}
}
}, 0]
},
LastModifiedDate: 1,
LastModifiedBy: {
$arrayElemAt: [{
$map: {
input: {
$filter: {
input: "$CreatedByLastModifiedBy",
as: "MoBy",
cond: {
"$eq": ["$LastModifiedDate", "$$MoBy.LastModifiedDate"]
}
}
},
as: "LaMoBy",
in: {
"$$LaMoBy.LastModifiedBy"
}
}
}, 0]
},
TotalCount: 1,
PendingCount: 1,
ClosedCount: 1
}
}]
)
Update for Version < 3.2
$filter is also not available in your version. Below is the equivalent.
The comparison logic is the same and creates an array with for every non matching entry the value of false or LastModifiedBy otherwise.
Next step is to use $setDifference to compare the previous array values with array [false] which returns the elements that only exist in the first set.
LastModifiedBy: {
$setDifference: [{
$map: {
input: "$CreatedByLastModifiedBy",
as: "MoBy",
in: {
$cond: [{
$eq: ["$LastModifiedDate", "$$MoBy.LastModifiedDate"]
},
"$$MoBy.LastModifiedBy",
false
]
}
}
},
[false]
]
}
Add $unwind stage after $project stage to change to object
{$unwind:"$LastModifiedBy"}
Similar steps for calculating EarliestCreatedBy

How to count the documents duplicated in mongodb?

I tried to search how to count the documents duplicated in mongodb and i got this function, it return the documents duplicated.
db.job_crawler_models_jobs_crawlings.aggregate(
{ $group: {
_id: { field1: "$field1", field2: "$field2" },
count: { $sum: 1 }
}},
{ $match: {
count: { $gt : 1 }
}}
)
But i want to get the number of documents duplicated. How can i do that?
You could try adding another $group in the pipeline. Not sure this is exactly what you are looking for though.
db.job_crawler_models_jobs_crawlings.aggregate(
{ $group: {
_id: { field1: "$field1", field2: "$field2" },
count: { $sum: 1 }
}},
{ $match: {
count: { $gt : 1 }
}},
{ $group: { _id: null, duplicatedCounts: { $sum:1 } } }
)