How do I keep documents in aggregation with $unwind - mongodb

Lets say I have three students...
Alice, she is Always there on fridays.
{
"name" : "Alice",
"goes" : {
"mondays" : {
"fr" : 900,
"to" : 1400
},
"fridays" : {
"fr" : 700,
"to" : 1600
},
}
}
And bob, here should be there on the first of january
{
"_id" : ObjectId("5284a7085d60338b40b8f17d"),
"name" : "Bob",
"goes" : {
"mondays" : {
"fr" : 800,
"to" : 1200
},
"special" : [
{
"date" : "2010-01-01",
"fr" : 1000,
"to" : 1500
}
]
}
}
And Clair who will not be attenging on mondays or at 10.00
{
"_id" : ObjectId("5284c2785d60338b40b8f17f"),
"name" : "Clair",
"goes" : {
"wednesdays" : {
"fr" : 1100,
"to" : 1500
},
"special" : [
{
"date" : "2010-01-01",
"fr" : 1600,
"to" : 1900
},
{
"date" : "2010-01-02",
"fr" : 1000,
"to" : 1300
}
]
}
}
I want to find all students that should attend on fridays at 7 och 10 on the first of January 2010
So I do this with the aggregation framework.
db.students.aggregate(
[
{
$unwind: "$goes.special"
},
{
$match: {
$or : [
{
'goes.fridays.fr': 700,
},
{
'goes.special.date' : '2010-01-01',
'goes.special.fr': 1000
}
]
}
}
]
)
But Alice does not show up. It clearly states why in the mongodb docs, http://docs.mongodb.org/manual/reference/operator/aggregation/unwind/ at the very bottom.
"If you specify a target field for $unwind that holds an empty array
([]) in an input document, the pipeline ignores the input document,
and will generates no result documents."
I could solve it by adding an array with a null value in it but that does not seam like a nice solution.
Is there a way I could get unwind NOT to ignore documents that does not have data in a $unwind'ed array?

You don't need $unwind at all. Simple $match in pipeline is enough:
pipeline = [
{
"$match" : {
"$or" : [
{
"goes.fridays.fr" : 700
},
{
"goes.special" : {
"$elemMatch" : {
"date" : "2010-01-01",
"fr" : 1000
}
}
}
]
}
}
]
db.students.aggregate(pipeline)
It can be done easily even without aggregation framework.
query = {
"$or" : [
{
"goes.fridays.fr" : 700
},
{
"goes.special" : {
"$elemMatch" : {
"date" : "2010-01-01",
"fr" : 1000
}
}
}
]
}
db.students.find(query)

Related

Group based on discrete date ranges

I am new to MongoDB and I've been struggling to get a specific query to work without any luck.
I have a collection with millions of documents having a date and an amount, I want to get the aggregations for specific periods of time.
For example, I want to get the count, amount summations for the periods between 1/1/2015 - 15/1/2015 and between 1/2/2015 - 15/2/2015
A sample collection is
{ "_id" : "148404972864202083547392254", "account" : "3600", "amount" : 50, "date" : ISODate("2017-01-01T12:02:08.642Z")}
{ "_id" : "148404972864202085437392254", "account" : "3600", "amount" : 50, "date" : ISODate("2017-01-03T12:02:08.642Z")}
{ "_id" : "148404372864202083547392254", "account" : "3600", "amount" : 70, "date" : ISODate("2017-01-09T12:02:08.642Z")}
{ "_id" : "148404972864202083547342254", "account" : "3600", "amount" : 150, "date" : ISODate("2017-01-22T12:02:08.642Z")}
{ "_id" : "148404922864202083547392254", "account" : "3600", "amount" : 200, "date" : ISODate("2017-02-02T12:02:08.642Z")}
{ "_id" : "148404972155502083547392254", "account" : "3600", "amount" : 30, "date" : ISODate("2017-02-7T12:02:08.642Z")}
{ "_id" : "148404972864202122254732254", "account" : "3600", "amount" : 10, "date" : ISODate("2017-02-10T12:02:08.642Z")}
for date ranges between 1/1/2017 - 10/10/2017 and 1/2/2017 - 10/2/2017 the output would be like this:
1/1/2017 - 10/1/2017 - count =3, amount summation: 170
10/2/2017 - 15/2/2017 - count =2, amount summation: 40
Is it possible to work with such different date ranges? The code would be in Java, but as an example in mongo, can someone please help me?
There must be a more elegant solution than this. Anyways you can wrap it into a function and generalize date related arguments.
First, you need to make a projection at the same time deciding into which range an item goes (note the huge $switch expression). By default, an item goes into 'null' range.
Then, you filter out results that didn't match your criteria (i.e. range != null).
The very last step is to group items by the range and make all needed calculations.
db.items.aggregate([
{ $project : {
amount : true,
account : true,
date : true,
range : {
$switch : {
branches : [
{
case : {
$and : [
{ $gte : [ "$date", ISODate("2017-01-01T00:00:00.000Z") ] },
{ $lt : [ "$date", ISODate("2017-01-10T00:00:00.000Z") ] }
]
},
then : { $concat : [
{ $dateToString: { format: "%d/%m/%Y", date: ISODate("2017-01-01T00:00:00.000Z") } },
{ $literal : " - " },
{ $dateToString: { format: "%d/%m/%Y", date: ISODate("2017-01-10T00:00:00.000Z") } }
] }
},
{
case : {
$and : [
{ $gte : [ "$date", ISODate("2017-02-01T00:00:00.000Z") ] },
{ $lt : [ "$date", ISODate("2017-02-10T00:00:00.000Z") ] }
]
},
then : { $concat : [
{ $dateToString: { format: "%d/%m/%Y", date: ISODate("2017-02-01T00:00:00.000Z") } },
{ $literal : " - " },
{ $dateToString: { format: "%d/%m/%Y", date: ISODate("2017-02-10T00:00:00.000Z") } }
] }
}
],
default : null
}
}
} },
{ $match : { range : { $ne : null } } },
{ $group : {
_id : "$range",
count : { $sum : 1 },
"amount summation" : { $sum : "$amount" }
} }
])
Based on your data it will give the following results*:
{ "_id" : "01/02/2017 - 10/02/2017", "count" : 2, "amount summation" : 230 }
{ "_id" : "01/01/2017 - 10/01/2017", "count" : 3, "amount summation" : 170 }
*I believe you have few typos in your questions, that's why the data look different.

Mongodb embedded document - aggregation query

I have got the below documents in Mongo database:
db.totaldemands.insert({ "data" : "UKToChina", "demandPerCountry" :
{ "from" : "UK" , to: "China" ,
"demandPerItem" : [ { "item" : "apples" , "demand" : 200 },
{ "item" : "plums" , "demand" : 100 }
] } });
db.totaldemands.insert({ "data" : "UKToSingapore",
"demandPerCountry" : { "from" : "UK" , to: "Singapore" ,
"demandPerItem" : [ { "item" : "apples" , "demand" : 100 },
{ "item" : "plums" , "demand" : 50 }
] } });
I need to write a query to find the count of apples exported from UK to any country.
I have tried the following query:
db.totaldemands.aggregate(
{ $match : { "demandPerCountry.from" : "UK" ,
"demandPerCountry.demandPerItem.item" : "apples" } },
{ $unwind : "$demandPerCountry.demandPerItem" },
{ $group : { "_id" : "$demandPerCountry.demandPerItem.item",
"total" : { $sum : "$demandPerCountry.demandPerItem.demand"
} } }
);
But it gives me the output with both apples and plums like below:
{ "_id" : "apples", "total" : 300 }
{ "_id" : "plums", "total" : 150 }
But, my expected output is:
{ "_id" : "apples", "total" : 300 }
So, How can I modify the above query to return only the count of apples exported from UK ?
Also, is there any other better way to achieve the output without unwinding ?
You can add another $match to get only apples.
As you have embedded document structure and performing aggregation, $unwind is required here. The alternate option could be map and reduce. However, unwind is most suitable here.
If you are thinking about performance, unwind shouldn't cause performance issue.
db.totaldemands.aggregate(
{ $match : { "demandPerCountry.from" : "UK" ,
"demandPerCountry.demandPerItem.item" : "apples" } },
{ $unwind : "$demandPerCountry.demandPerItem" },
{ $group : { "_id" : "$demandPerCountry.demandPerItem.item",
"total" : { $sum : "$demandPerCountry.demandPerItem.demand"
} } },
{$match : {"_id" : "apples"}}
);

Get lowest per date from multiple arrays in mongodb

I've the following structure of docs:
{
"_id" : ObjectId("5786458371d24d924d8b4575"),
"uniqueNumber" : "3899822714",
"lastUpdatedAt" : ISODate("2016-07-13T20:11:11.000Z"),
"new" : [
{
"price" : 8.4,
"created" : ISODate("2016-07-13T13:11:28.000Z")
},
{
"price" : 10.0,
"created" : ISODate("2016-07-13T14:50:56.000Z")
}
],
"used" : [
{
"price" : 10.99,
"created" : ISODate("2016-07-08T13:46:31.000Z")
},
{
"price" : 8.59,
"created" : ISODate("2016-07-13T13:11:28.000Z")
}
]
}
Now I need to get a list that gives me the lowest price of each array per date.
So, as example:
{
"uniqueNumber" : 1234,
"prices" : {
"created" : 2016-07-08,
"minNew" : 123,
"minUsed" : 22
}
}
By now I've built the following query
db.getCollection('col').aggregate([
{
$match : {
"uniqueNumber" : "3899822714"
}
},
{
$unwind : "$used"
},
{
$project : {
"uniqueNumber" : "$uniqueNumber",
"price" : "$used.price",
"ts" : "$used.created"
}
},
{
$sort : { "ts" : 1 }
},
{
$group : {_id: "$uniqueNumber", priceOfMaxTS : { $min: "$price" }, ts : { $last: "$ts" }}
}
]);
But this one will only give me the lowest price for the highest date. I couldn't really find anything that pushes me to the right direction to get the desired result.
UPDATE
I've found a way to get the lowest price of the used array grouped by day with this query:
db.getCollection('col').aggregate([
{
$match : {
"uniqueNumber" : "3899822714"
}
},
{
$unwind : "$used"
},
{
$project : {
"asin" : "$uniqueNumber",
"price" : "$used.price",
"ts" : "$used.created",
"y" : { "$year" : "$used.created" },
"m" : { "$month" : "$used.created" },
"d" : { "$dayOfMonth" : "$used.created" }
}
},
{
$group : { _id : { "year" : "$y", "month" : "$m", "day" : "$d" }, minPriceOfDay : { $min: "$price" }}
}
]);
No I only need to find a way to do this also to the new array in the same query.

Mongodb difference in time is returning the current time

{
"_id" : ObjectId("57693a852956d5301b348a99"),
"First_Name" : "Sri Ram",
"Last_Name" : "Bandi",
"Email" : "chinni001sriram#gmail.com",
"Sessions" : [
{
"Class" : "facebook",
"ID" : "1778142655749042",
"Login_Time" : ISODate("2016-06-21T13:00:53.867Z"),
"Logout_Time" : ISODate("2016-06-21T13:01:04.640Z"),
"Duration" : null
}
],
"Count" : 1
}
This is my mongo data. and I want to set the duration as the difference of login and logout time. So, I executed the following query:
db.sessionData.update(
{ "Sessions.ID": "1778142655749042"},
{ $set: {
"Sessions.$.Duration": ISODate("Sessions.$.Logout_Time" - "Sessions.$.Login_Time")
}
}
)
But the result I'm getting is:
{
"_id" : ObjectId("57693a852956d5301b348a99"),
"First_Name" : "Sri Ram",
"Last_Name" : "Bandi",
"Email" : "chinni001sriram#gmail.com",
"Sessions" : [
{
"Class" : "facebook",
"ID" : "1778142655749042",
"Login_Time" : ISODate("2016-06-21T13:00:53.867Z"),
"Logout_Time" : ISODate("2016-06-21T13:01:04.640Z"),
"Duration" : ISODate("2016-06-21T13:02:58.010Z")
}
],
"Count" : 1
}
and duration wast set to current time/date instead of the difference.
You could use the aggregation framework to do the arithmetic operation using the $divide and $subtract operators to give you the difference as duration in seconds. The formula is given by
Duration (sec) = (Logout_Time - Login_Time)/1000
The aggregation pipeline should give you a new field that has this computed value and then you can use the forEach() cursor method on the aggregate() result to iterate the documents in the result and update the collection.
The following example shows this:
db.sessionData.aggregate([
{ "$match": { "Sessions.ID" : "1778142655749042" } },
{ "$unwind": "$Sessions" },
{ "$match": { "Sessions.ID" : "1778142655749042" } },
{
"$project": {
"Duration": {
"$divide": [
{ "$subtract": [ "$Sessions.Logout_Time", "$Sessions.Login_Time" ] },
1000
]
}
}
}
]).forEach(function (doc) {
db.sessionData.update(
{ "Sessions.ID": "1778142655749042", "_id": doc._id },
{
"$set": { "Sessions.$.Duration": doc.Duration }
}
);
});
Query results
{
"_id" : ObjectId("57693a852956d5301b348a99"),
"First_Name" : "Sri Ram",
"Last_Name" : "Bandi",
"Email" : "chinni001sriram#gmail.com",
"Sessions" : [
{
"Class" : "facebook",
"ID" : "1778142655749042",
"Login_Time" : ISODate("2016-06-21T13:00:53.867Z"),
"Logout_Time" : ISODate("2016-06-21T13:01:04.640Z"),
"Duration" : 10.773
}
],
"Count" : 1
}

$avg in mongodb aggregation

Document looks like this:
{
"_id" : ObjectId("361de42f1938e89b179dda42"),
"user_id" : "u1",
"evaluator_id" : "e1",
"candidate_id" : ObjectId("54f65356294160421ead3ca1"),
"OVERALL_SCORE" : 150,
"SCORES" : [
{ "NAME" : "asd", "OBTAINED_SCORE" : 30}, { "NAME" : "acd", "OBTAINED_SCORE" : 36}
]
}
Aggregation function:
db.coll.aggregate([ {$unwind:"$SCORES"}, {$group : { _id : { user_id : "$user_id", evaluator_id : "$evaluator_id"}, AVG_SCORE : { $avg : "$SCORES.OBTAINED_SCORE" }}} ])
Suppose if there are two documents with same "user_id" (say u1) and different "evaluator_id" (say e1 and e2).
For example:
1) Average will work like this ((30 + 20) / 2 = 25). This is working for me.
2) But for { evaluator_id : "e1" } document, score is 30 for { "NAME" : "asd" } and { evaluator_id : "e2" } document, score is 0 for { "NAME" : "asd" }. In this case, I want the AVG_SCORE to be 30 only (not (30 + 0) / 2 = 15).
Is it possible through aggregation??
Could any one help me out.
It's possible by placing a $match between the $unwind and $group aggregation pipelines to first filter the arrays which match the specified condition to include in the average computation and that is, score array where the obtained score is not equal to 0 "SCORES.OBTAINED_SCORE" : { $ne : 0 }
db.coll.aggregate([
{
$unwind: "$SCORES"
},
{
$match : {
"SCORES.OBTAINED_SCORE" : { $ne : 0 }
}
},
{
$group : {
_id : {
user_id : "$user_id",
evaluator_id : "$evaluator_id"
},
AVG_SCORE : {
$avg : "$SCORES.OBTAINED_SCORE"
}
}
}
])
For example, the aggregation result for this document:
{
"_id" : ObjectId("5500aaeaa7ef65c7460fa3d9"),
"user_id" : "u1",
"evaluator_id" : "e1",
"candidate_id" : ObjectId("54f65356294160421ead3ca1"),
"OVERALL_SCORE" : 150,
"SCORES" : [
{
"NAME" : "asd",
"OBTAINED_SCORE" : 0
},
{
"NAME" : "acd",
"OBTAINED_SCORE" : 36
}
]
}
will yield:
{
"result" : [
{
"_id" : {
"user_id" : "u1",
"evaluator_id" : "e1"
},
"AVG_SCORE" : 36
}
],
"ok" : 1
}