MongoDb Exists per column - mongodb

Running an aggregation such as the following:
[
{
"$match":{
"datasourceName":"Startup Failures",
"sheetName":"Data",
"Cost":{
"$exists":true
},
"Status":{
"$exists":true
}
}
},
{
"$group":{
"Count of Cost":{
"$sum":1
},
"Count of Status":{
"$sum":1
},
"_id":null
}
},
{
"$project":{
"Count of Cost":1,
"Count of Status":1
}
}
]
The result of the exists filters actually filters out the whole documents where "Cost" or "Status" do not exist. Such that the projection (Count) of both Cost and Status are the same. I don't want to filter the whole document only the individual columns such that the projection I get is the number of documents where Cost exists (Count of Cost) and the other projection is the number of documents where Status exists. In the case of my data these would give two separate numbers.

I have an aggregation using $facet; this allows do queries in parallel for each document pass. So, we query and count the Cost and Status as two facets of the same query.
db.test.aggregate( [
{
$match: { fld1: "Data" }
},
{
$facet: {
cost: [
{ $match: { cost: { $exists: true } } },
{ $count: "count" }
],
status: [
{ $match: { status: { $exists: true } } },
{ $count: "count" }
],
}
},
{
$project: {
costCount: { $arrayElemAt: [ "$cost.count" , 0 ] },
statusCount: { $arrayElemAt: [ "$status.count" , 0 ] }
}
}
] )
I get a result of { "costCount" : 4, "statusCount" : 3 }, using the following documents:
{ _id: 1, fld1: "Data", cost: 12, status: "Y" },
{ _id: 2, fld1: "Data", status: "N" },
{ _id: 3, fld1: "Data" },
{ _id: 4, fld1: "Data", cost: 90 },
{ _id: 5, fld1: "Data", cost: 44 },
{ _id: 6, fld1: "Data", cost: 235, status: "N" },
{ _id: 9, fld1: "Stuff", cost: 0, status: "Y" }
NOTE: Here is a similar query using the facets: MongoDB Custom sorting on two fields.

Related

MongoDB get count of field per season from MM/DD/YYYY date field

I am facing a problem in MongoDB. Suppose, I have the following collection.
{ id: 1, issueDate: "07/05/2021", code: "31" },
{ id: 2, issueDate: "12/11/2020", code: "14" },
{ id: 3, issueDate: "02/11/2021", code: "98" },
{ id: 4, issueDate: "01/02/2021", code: "14" },
{ id: 5, issueDate: "06/23/2020", code: "14" },
{ id: 6, issueDate: "07/01/2020", code: "31" },
{ id: 7, issueDate: "07/05/2022", code: "14" },
{ id: 8, issueDate: "07/02/2022", code: "20" },
{ id: 9, issueDate: "07/02/2022", code: "14" }
The date field is in the format MM/DD/YYYY. My goal is to get the count of items with each season (spring (March-May), summer (June-August), autumn (September-November) and winter (December-February).
The result I'm expecting is:
count of fields for each season:
{ "_id" : "Summer", "count" : 6 }
{ "_id" : "Winter", "count" : 3 }
top 2 codes (first and second most recurring) per season:
{ "_id" : "Summer", "codes" : {14, 31} }
{ "_id" : "Winter", "codes" : {14, 98} }
How can this be done?
You should never store date/time values as string, store always proper Date objects.
You can use $setWindowFields opedrator for that:
db.collection.aggregate([
// Convert string into Date
{ $set: { issueDate: { $dateFromString: { dateString: "$issueDate", format: "%m/%d/%Y" } } } },
// Determine the season (0..3)
{
$set: {
season: { $mod: [{ $toInt: { $divide: [{ $add: [{ $subtract: [{ $month: "$issueDate" }, 1] }, 1] }, 3] } }, 4] }
}
},
// Count codes per season
{
$group: {
_id: { season: "$season", code: "$code" },
count: { $count: {} },
}
},
// Rank occurrence of codes per season
{
$setWindowFields: {
partitionBy: "$_id.season",
sortBy: { count: -1 },
output: {
rank: { $denseRank: {} },
count: { $sum: "$count" }
}
}
},
// Get only top 2 ranks
{ $match: { rank: { $lte: 2 } } },
// Final grouping
{
$group: {
_id: "$_id.season",
count: { $first: "$count" },
codes: { $push: "$_id.code" }
}
},
// Some cosmetic for output
{
$set: {
season: {
$switch: {
branches: [
{ case: { $eq: ["$_id", 0] }, then: 'Winter' },
{ case: { $eq: ["$_id", 1] }, then: 'Spring' },
{ case: { $eq: ["$_id", 2] }, then: 'Summer' },
{ case: { $eq: ["$_id", 3] }, then: 'Autumn' },
]
}
}
}
}
])
Mongo Playground
I will give you clues,
You need to use $group with _id as $month on issueDate, use accumulator $sum to get month wise count.
You can divide month by 3, to get modulo, using $toInt, $divide, then put them into category using $cond.
Another option:
db.collection.aggregate([
{
$addFields: {
"season": {
$switch: {
branches: [
{
case: {
$in: [
{
$substr: [
"$issueDate",
0,
2
]
},
[
"06",
"07",
"08"
]
]
},
then: "Summer"
},
{
case: {
$in: [
{
$substr: [
"$issueDate",
0,
2
]
},
[
"03",
"04",
"05"
]
]
},
then: "Spring"
},
{
case: {
$in: [
{
$substr: [
"$issueDate",
0,
2
]
},
[
"12",
"01",
"02"
]
]
},
then: "Winter"
}
],
default: "No date found."
}
}
}
},
{
$group: {
_id: {
s: "$season",
c: "$code"
},
cnt1: {
$sum: 1
}
}
},
{
$sort: {
cnt1: -1
}
},
{
$group: {
_id: "$_id.s",
codes: {
$push: "$_id.c"
},
cnt: {
$sum: "$cnt1"
}
}
},
{
$project: {
_id: 0,
season: "$_id",
count: "$cnt",
codes: {
"$slice": [
"$codes",
2
]
}
}
}
])
Explained:
Add one more field for season based on $switch per month(extracted from issueDate string)
Group to collect per season/code.
$sort per code DESCENDING
group per season to form an array with most recurring codes in descending order.
Project the fields to the desired output and $slice the codes to limit only to the fist two most recurring.
Comment:
Indeed keeping dates in string is not a good idea in general ...
Playground

MongoDB aggregating multiple arrays of objects based on shared key

I'm writing a query to calculate multiple metrics for each user in my DB.
I've calculated all of the metrics, and have a structure like this
{
"metric1": [{"user_id": 1, "val": 13},{"user_id": 2, "val": 100}],
"metric2": [{"user_id": 2, "val": 29},{"user_id": 1, "val": 123}],
"metric3": [{"user_id": 1, "val": 46},{"user_id": 2, "val": 111]
}
I'm trying to convert the above into this structure
{
"user_id": [1,2],
"metric1": [13, 100],
"metric2": [29,123],
"metric3": [46,111]
}
So that I can display a table showing each user and the three metrics (one metric per column, and one user per row).
considering that your data is what you've said:
{
"metric1": [
{"id1": 1}, {"id2": 2}
],
"metric2": [
{"id2": 22}, {"id1": 11}
],
"metric3": [
{"id2": 222}, {"id1": 111}
]
}
all you've to do is using $unwind to be able to break the array and then $objectToArray to have access to keys
db.blah.aggregate([
{ $unwind: '$metric1' },
{ $unwind: '$metric2' },
{ $unwind: '$metric3' },
{ $project: {'metric1': { $objectToArray: '$metric1' }, 'metric2': { $objectToArray: '$metric2' }, 'metric3': { $objectToArray: '$metric3' }} },
{ $sort: { 'metric1.k' : -1} },
{ $sort: { 'metric2.k' : -1} },
{ $sort: { 'metric3.k' : -1} },
{ $unwind: '$metric1' },
{ $unwind: '$metric2' },
{ $unwind: '$metric3' },
{ $group: {
_id: null,
user_id: { $addToSet: '$metric1.k' },
metric1: { $addToSet: '$metric1.v' },
metric2: { $addToSet: '$metric2.v' },
metric3: { $addToSet: '$metric3.v' },
} },
{ $project: { _id: 0 } }
]).pretty()
which results
{
"user_id" : [
"id1",
"id2"
],
"metric1" : [
1,
2
],
"metric2" : [
11,
22
],
"metric3" : [
111,
222
]
}

Compare 2 count aggregations

I have a collection in MongoDB that looks something like the following:
{ "_id" : 1, "type" : "start", userid: "101", placementid: 1 }
{ "_id" : 2, "type" : "start", userid: "101", placementid: 2 }
{ "_id" : 3, "type" : "start", userid: "101", placementid: 3 }
{ "_id" : 4, "type" : "end", userid: "101", placementid: 1 }
{ "_id" : 5, "type" : "end", userid: "101", placementid: 2 }
and I want to group results by userid then placementid and then count the types of "start" and "end", but only when the two counts are different. In this particular example I would want to get placementid: 3 because when grouped and counted this is the only case where the counts don't match.
I've written a query that gets the 2 counts and the grouping but I can't do the filtering when counts don't match. This is my query:
db.getCollection('mycollection').aggregate([
{
$project: {
userid: 1,
placementid: 1,
isStart: {
$cond: [ { $eq: ["$type", "start"] }, 1, 0]
},
isEnd: {
$cond: [ { $eq: ["$type", "end"] }, 1, 0]
}
}
},
{
$group: {
_id: { userid:"$userid", placementid:"$placementid" },
countStart:{ $sum: "$isStart" },
countEnd: { $sum: "$isEnd" }
}
},
{
$match: {
countStart: {$ne: "$countEnd"}
}
}
])
It seems like I'm using the match aggregation incorrectly because I'm seeing results where countStart and countEnd are the same.
{ "_id" : {"userid" : "101", "placementid" : "1"}, "countStart" : 1.0, "countEnd" : 1.0 }
{ "_id" : {"userid" : "101", "placementid" : "2"}, "countStart" : 1.0, "countEnd" : 1.0 }
{ "_id" : {"userid" : "101", "placementid" : "3"}, "countStart" : 1.0, "countEnd" : 0 }
Can anybody point into the right direction please?
To compare two fields inside $match stage you need $expr which is available in MongoDB 3.6:
db.myCollection.aggregate([
{
$project: {
userid: 1,
placementid: 1,
isStart: {
$cond: [ { $eq: ["$type", "start"] }, 1, 0]
},
isEnd: {
$cond: [ { $eq: ["$type", "end"] }, 1, 0]
}
}
},
{
$group: {
_id: { userid:"$userid", placementid:"$placementid" },
countStart:{ $sum: "$isStart" },
countEnd: { $sum: "$isEnd" }
}
},
{
$match: {
$expr: { $ne: [ "$countStart", "$countEnd" ] }
}
}
])
If you're using older version of MongoDB you can use $redact:
db.myCollection.aggregate([
{
$project: {
userid: 1,
placementid: 1,
isStart: {
$cond: [ { $eq: ["$type", "start"] }, 1, 0]
},
isEnd: {
$cond: [ { $eq: ["$type", "end"] }, 1, 0]
}
}
},
{
$group: {
_id: { userid:"$userid", placementid:"$placementid" },
countStart:{ $sum: "$isStart" },
countEnd: { $sum: "$isEnd" }
}
},
{
$redact: {
$cond: { if: { $ne: [ "$countStart", "$countEnd" ] }, then: "$$KEEP", else: "$$PRUNE" }
}
}
])
You run do the following pipeline to get this - no need to use $expr or $redact or anything special really:
db.mycollection.aggregate({
$group: {
_id: {
"userid": "$userid",
"placementid": "$placementid"
},
"sum": {
$sum: {
$cond: {
if: { $eq: [ "$type", "start" ] },
then: 1, // +1 for start
else: -1 // -1 for anything else
}
}
}
}
}, {
$match: {
"sum": { $ne: 0 } // only return the non matching-up ones
}
})

mongoDB aggregate with two percent by $group

My dataset :
{
"codepostal": 84000,
"siren": 520010234,
"type": "home"
},
{
"codepostal": 84000,
"siren": 0,
"type": "home"
},
{
"codepostal": 84000,
"siren": 450123003,
"type": "appt"
} ...
My pipeline (total is an integer) :
var pipeline = [
{
$match: { codepostal: 84000 }
},
{
$group: {
_id: { type: "$type" },
count: { $sum: 1 }
}
},
{
$project: {
percentage: { $multiply: ["$count", 100 / total] }
}
},
{
$sort: { _id: 1 }
}
];
Results :
[ { _id: { type: 'appt' }, percentage: 66 },
{ _id: { type: 'home' }, percentage: 34 } ]
Expected results is to count when "siren" is set to 0 or another number.
Count siren=0 => part
Count siren!=0 => pro
[ { _id: { type: 'appt' }, totalPercent: 66, proPercent: 20, partPercent: 80},
{ _id: { type: 'home' }, totalPercent: 34, proPercent: 45, partPercent: 55 } ]
Thanks a lot for your help !!
You can use $cond to get 0 or 1 for pro/part documents depending o value of siren field. Then it's easy to calculate totals for each type of document:
[
{
$match: { codepostal: 84000 }
},
{
$group: {
_id: { type: "$type" },
count: { $sum: 1 },
countPro: { $sum: {$cond: [{$eq:["$siren",0]}, 0, 1]} },
countPart: {$sum: {$cond: [{$eq:["$siren",0]}, 1, 0]} }
}
},
{
$project: {
totalPercent: { $multiply: ["$count", 100 / total] },
proPercent: { $multiply: ["$countPro", {$divide: [100, "$count"]}] },
partPercent: { $multiply: ["$countPart", {$divide: [100, "$count"]}] }
}
},
{
$sort: { _id: 1 }
}
]
Note that I used $divide to calculate pro/part percentage relative to the count of document within type group.
For your sample documents (total = 3) output will be:
[
{
"_id" : { "type" : "appt" },
"totalPercent" : 33.3333333333333,
"proPercent" : 100,
"partPercent" : 0
},
{
"_id" : { "type" : "home" },
"totalPercent" : 66.6666666666667,
"proPercent" : 50,
"partPercent" : 50
}
]

Intersection of several arrays

I have some documents having a array protperty Items.
I want to get the intercept between n docuements.
db.things.insert({name:"A", items:[1,2,3,4,5]})
db.things.insert({name:"B", items:[2,4,6,8]})
db.things.insert({name:"C", items:[1,2]})
db.things.insert({name:"D", items:[5,6]})
db.things.insert({name:"E", items:[9,10]})
db.things.insert({name:"F", items:[1,5]})
Data:
{ "_id" : ObjectId("57974a0d356baff265710a1c"), "name" : "A", "items" : [ 1, 2, 3, 4, 5 ] },
{ "_id" : ObjectId("57974a0d356baff265710a1d"), "name" : "B", "items" : [ 2, 4, 6, 8 ] },
{ "_id" : ObjectId("57974a0d356baff265710a1e"), "name" : "C", "items" : [ 1, 2 ] },
{ "_id" : ObjectId("57974a0d356baff265710a1f"), "name" : "D", "items" : [ 5, 6 ] },
{ "_id" : ObjectId("57974a0d356baff265710a20"), "name" : "E", "items" : [ 9, 10 ] },
{ "_id" : ObjectId("57974a1a356baff265710a21"), "name" : "F", "items" : [ 1, 5 ] }
For example:
things.mane.A intercept things.mane.C intercept things.mane.F:
[ 1, 2, 3, 4, 5 ] intercept [ 1, 2 ] intercept [ 1, 5 ]
Must be: [1]
I think that it's doable using $setIntersectionbut I can't find the way.
I can do it with two documents but how to do it with more ?
db.things.aggregate({$match:{"name":{$in:["A", "F"]}}},
{$group:{_id:null, "setA":{$first:"$items"}, "setF":{$last:"$items"} } },
{
"$project": {
"set1": 1,
"set2": 1,
"commonToBoth": { "$setIntersection": [ "$setA", "$setF" ] },
"_id": 0
}
}
)
{ "commonToBoth" : [ 5, 1 ] }
A solution which is not specific to the number of input items could look like so:
db.things.aggregate(
{
$match: {
"name": {
$in: ["A", "F"]
}
}
},
{
$group: {
_id: "$items",
count: {
$sum: 1
}
}
},
{
$group: {
_id: null,
totalCount: {
$sum: "$count"
},
items: {
$push: "$_id"
}
}
},
{
$unwind: {
path: "$items"
}
},
{
$unwind: {
path: "$items"
}
},
{
$group: {
_id: "$items",
totalCount: {
$first: "$totalCount"
},
count: {
$sum: 1
}
}
},
{
$project: {
_id: 1,
presentInAllDocs: {
$eq: ["$totalCount", "$count"]
}
}
},
{
$match: {
presentInAllDocs: true
}
},
{
$group: {
_id: null,
items: {
$push: "$_id"
}
}
}
)
which will output this
{
"_id" : null,
"items" : [
5,
1
]
}
Of course you can add a last $project stage to bring the result into the desired shape.
Explanation
The basic idea behind this is that when we count the number of documents and we count the number of occurrences of each item, then the items with a count equal to the total document count appeared in each document and are therefore in the intersection result.
This idea has one important assumption: your items arrays have no duplicates in it (i.e. they are sets). If this assumption is wrong, then you would have to insert an additional stage at the beginning of the pipeline to turn the arrays into sets.
One could also build this pipeline in a different and probably shorter way but I tried to keep the resource usage as low as possible and therefore added possibly unnecessary (from the functional point of view) stages. For example, the second stage groups by the items array as my assumption is that there are far fewer different values/arrays than documents so the rest of the pipeline has to work with a fraction of the initial document count. However, from the functional point of view, we just need the total count of documents and therefore we could skip that stage and just make a $group stage counting all documents and pushing them into an array for later usage - which of course is a big hit for memory consumption as we have now an array of all possible documents.
If your are using mongo 3.2, you could use arrayElemAt to precise all arguments of $setIntersection :
db.things.aggregate([{
$match: {
"name": {
$in: ["A", "B", "C"]
}
}
}, {
$group: {
_id: 0,
elements: {
$push: "$items"
}
}
}, {
$project: {
intersect: {
$setIntersection: [{
"$arrayElemAt": ["$elements", 0]
}, {
"$arrayElemAt": ["$elements", 1]
}, {
"$arrayElemAt": ["$elements", 2]
}]
},
}
}]);
You would have to dynamically add the require number of JsonObject with index such as :
{
"$arrayElemAt": ["$elements", <index>]
}
It should match with the number of elements of your input items in ["A", "B", "C"]
If you want to deal with duplicates (some name are present multiple time), regroup all your items by name, $unwind twice and $addToSet to merge all array for a specific $name before executing the previous aggregation :
db.things.aggregate([{
$match: {
"name": {
$in: ["A", "B", "C"]
}
}
}, {
$group: {
_id: "$name",
"items": {
"$push": "$items"
}
}
}, {
"$unwind": "$items"
}, {
"$unwind": "$items"
}, {
$group: {
_id: "$_id",
items: {
$addToSet: "$items"
}
}
}, {
$group: {
_id: 0,
elements: {
$push: "$items"
}
}
}, {
$project: {
intersect: {
$setIntersection: [{
"$arrayElemAt": ["$elements", 0]
}, {
"$arrayElemAt": ["$elements", 1]
}, {
"$arrayElemAt": ["$elements", 2]
}]
},
}
}]);
It isn't a clean solution but it works