MongoDB Grouping Query - mongodb

I'm working on a Meteor application and I have data for a weekly timetable of the format -
events:
event:
day: "Monday"
time: "9am"
location: "A"
event:
day: "Monday"
time: "10am"
location: "B"
There are numerous entries for each day. Can I run a query that will return an object of the format -
day: "Monday"
events:
event:
time: "9am"
location: "A"
event:
time: "10am"
location: "B"
I could store the object in the second format but prefer the first for ease of deleting and updating individual events.
I also want to return them ordered by day of week if there's a nice way to do that.

Several options:
You can use an aggregation command but be warned, you will loose reactivity: it means that except when you reload your template, you will not get external updates. You would also need to use a package to add the aggregation command to Meteor in order to achieve that.
My personal favorite: you don't need to aggregate (and loose reactivity) to achieve your data transformation. You can use a simple Collection.find() query and extend/reduce/modify it using a clever mix of cursor.Observe() and conditional modifications. Have a look at this answer, it did the trick for me (I needed a sum with black listing of some fields, but you can easily adapt it to your group/sorting case) : https://stackoverflow.com/a/30813050/3793161. Comment if you need more details on this
If you plan to have several servers, be warned that each server will have to observe so it may lead to an unnecessary load. So my third solution is either use collection hooks or methods to update an additional field containing every event for each day/user (whatever you need).See #David Weldon answer about that here: https://stackoverflow.com/a/31190896/3793161. In you case, it would probably mean to re-think your database structure to fit your needs (i.e. adding more fields so you ca update them on insert.
EDIT Here are some thoughts on your comment:
If you stick to what you described in the question, you would need seven documents, one per day, with an events field where you put all the events. My second solution is good if you need to rework a collection structure before sending it. However, in your case, you just need an object week with 7 sub-objects matching the days of the week.
I advise you to possible approaches:
use the aggregation in a method, as described by #chridam. Be warned that you will not be able to directly get a sorted array, as stated in mongo $group documentation
$group does not order its output documents
So you need to sort them (by day and by hour within each day) using, for example _.sortBy() before you return your method result. By the way, if you want to know what is going on in your method call, clientside, here is how you should write the call:
Meteor.call("getGroupedDailyEvents", userId, function(error, result){
if(error){
console.log("error", error);
}
if(result){
//do whatever you need
}
});
Make the data sorting client-side. You are looking for an overkill solution because, afaik, you don't need to filter any data to keep it from the user, and you are going to send the data anyway (just with another structure). This is much easier to make a simple helper in your template like this:
Template.displaySchedule.helpers({
"monday_events": function() {
return _.sortBy (events.find({day:"Monday"}).fetch(), "time")
},
//add other days
);
It assumes the format of your time field is sortable this way. If not, you just need to create a function to sort them accordingly to their formats or change the original format into something better suited.
The rest (HTML) would just be to iterate on Monday events using a {{#each monday_events}}

To achieve the desired result, use the aggregation framework where the $group pipeline operator groups all the input documents and apply the accumulator expression $push to the group to get the events array.
Your pipeline would look like this:
var Events = new Mongo.Collection('events');
var pipeline = [
{
"$group": {
"_id": "$day",
"events": {
"$push": {
"time": "$time"
"location": "$location"
}
}
}
},
{
"$project": {
"_id": 0, "day": "$_id", "events": 1
}
}
];
var result = Events.aggregate(pipeline);
console.log(result)
You can add the meteorhacks:aggregate package to implement the aggregation in Meteor:
Add to your app with
meteor add meteorhacks:aggregate
Since this package exposes .aggregate method on Mongo.Collection instances, you can define a method that gets the aggregated result array. For example
if (Meteor.isServer) {
var Events = new Mongo.Collection('events');
Meteor.methods({
getGroupedDailyEvents: function () {
var pipeline = [
{
"$group": {
"_id": "$day",
"events": {
"$push": {
"time": "$time"
"location": "$location"
}
}
}
},
{
"$project": {
"_id": 0, "day": "$_id", "events": 1
}
}
];
var result = Events.aggregate(pipeline);
return result;
}
});
}
if (Meteor.isClient) {
Meteor.call('getGroupedDailyEvents', logResult);
function logResult(err, res) {
console.log("Result: ", res)
}
}

Related

MongoDB query over several collections with one sort stage

I have some data with identical layout divided over several collections, say we have collections named Jobs.Current, Jobs.Finished, Jobs.ByJames.
I have implemented a complex query using some aggregation stages on one of these collections, where the last stage is the sorting. It's something like this (but in real it's implemented in C# and additionally doing a projection):
db.ArchivedJobs.aggregate([ { $match: { Name: { $gte: "A" } } }, { $addFields: { "UpdatedTime": { $max: "$Transitions.TimeStamp" } } }, { $sort: { "__tmp": 1 } } ])
My new requirement is to include all these collections into my query. I could do it by simply running the same query on all collections in sequence - but then I still need to sort the results together. As this sort isn't so trivial (using an additional field being created by a $max on a sub-array) and I'm using skip and limit options I hope it's possible to do it in a way like:
Doing the query I already implemented on all relevant collections by defining appropriate aggregation steps
Sorting the whole result afterwards inside the same aggregation request
I found something with a $lookup stage, but couldn't apply it to my request as it needs to do some field-oriented matching (?). I need to access the complete objects.
The data is something like
{
"_id":"4242",
"name":"Stream recording Test - BBC 60 secs switch",
"transitions":[
{
"_id":"123",
"timeStamp":"2020-02-13T14:59:40.449Z",
"currentProcState":"Waiting"
},
{
"_id":"124",
"timeStamp":"2020-02-13T14:59:40.55Z",
"currentProcState":"Running"
},
{
"_id":"125",
"timeStamp":"2020-02-13T15:00:23.216Z",
"currentProcState":"Error"
} ],
"currentState":"Error"
}

MapReduce function to return two outputs. MongoDB

I am currently using doing some basic mapReduce using MongoDB.
I currently have data that looks like this:
db.football_team.insert({name: "Tane Shane", weight: 93, gender: "m"});
db.football_team.insert({name: "Lily Jones", weight: 45, gender: "f"});
...
I want to create a mapReduce function to group data by gender and show
Total number of each gender, Male & Female
Average weight of each gender
I can create a map / reduce function to carry out each function seperately, just cant get my head around how to show output for both. I am guessing since the grouping is based on Gender, Map function should stay the same and just alter something ont he reduce section...
Work so far
var map1 = function()
{var key = this.gender;
emit(key, {count:1});}
var reduce1 = function(key, values)
{var sum=0;
values.forEach(function(value){sum+=value["count"];});
return{count: sum};};
db.football_team.mapReduce(map1, reduce1, {out: "gender_stats"});
Output
db.football_team.find()
{"_id" : "f", "value" : {"count": 12} }
{"_id" : "m", "value" : {"count": 18} }
Thanks
The key rule to "map/reduce" in any implementation is basically that the same shape of data needs to be emitted by the mapper as is also returned by the reducer. The key reason for this is part of how "map/reduce" conceptually works by quite possibly calling the reducer multiple times. Which basically means you can call your reducer function on output that was already emitted from a previous pass through the reducer along with other data from the mapper.
MongoDB can invoke the reduce function more than once for the same key. In this case, the previous output from the reduce function for that key will become one of the input values to the next reduce function invocation for that key.
That said, your best approach to "average" is therefore to total the data along with a count, and then simply divide the two. This actually adds another step to a "map/reduce" operation as a finalize function.
db.football_team.mapReduce(
// mapper
function() {
emit(this.gender, { count: 1, weight: this.weight });
},
// reducer
function(key,values) {
var output = { count: 0, weight: 0 };
values.forEach(value => {
output.count += value.count;
output.weight += value.weight;
});
return output;
},
// options and finalize
{
"out": "gender_stats", // or { "inline": 1 } if you don't need another collection
"finalize": function(key,value) {
value.avg_weight = value.weight / value.count; // take an average
delete value.weight; // optionally remove the unwanted key
return value;
}
}
)
All fine because both mapper and reducer are emitting data with the same shape and also expecting input in that shape within the reducer itself. The finalize method of course is just invoked after all "reducing" is finally done and just processes each result.
As noted though, the aggregate() method actually does this far more effectively and in native coded methods which do not incur the overhead ( and potential security risks ) of server side JavaScript interpretation and execution:
db.football_team.aggregate([
{ "$group": {
"_id": "$gender",
"count": { "$sum": 1 },
"avg_weight": { "$avg": "$weight" }
}}
])
And that's basically it. Moreover you can actually continue and do other things after a $group pipeline stage ( or any stage for that matter ) in ways that you cannot do with a MongoDB mapReduce implementation. Notably something like applying a $sort to the results:
db.football_team.aggregate([
{ "$group": {
"_id": "$gender",
"count": { "$sum": 1 },
"avg_weight": { "$avg": "$weight" }
}},
{ "$sort": { "avg_weight": -1 } }
])
The only sorting allowed by mapReduce is solely that the key used with emit is always sorted in ascending order. But you cannot sort the aggregated result in output in any other way, without of course performing queries when output to another collection, or by working "in memory" with returned results from the server.
As a "side note" ( though an important one ), you probably should also consider in "learning" that the reality is the "server-side JavaScript" functionality of MongoDB is really a work-around more than being a feature. When MongoDB was first introduced, it applied a JavaScript engine for server execution mostly to make up for features which had not yet been implemented.
Thus to make up for the lack of the complete implementation of many query operators and aggregation functions which would come later, adding a JavaScript engine was a "quick fix" to allow certain things to be done with minimal implementation.
The result over the years is those JavaScript engine features are gradually being removed. The group() function of the API is removed. The eval() function of the API is deprecated and scheduled for removal at the next major version. The writing is basically "on the wall" for the limited future of these JavaScript on the server features, as the clear pattern is where the native features provide support for something, then the need to continue support for the JavaScript engine basically goes away.
The core wisdom here being that focusing on learning these JavaScript on the server features, is probably not really worth the time invested unless you have a pressing use case that currently cannot be solved by any other means.

MongoDB - replace item in nested array

MongoDB does not allow to replace an item in an array in a single operation. Instead it's a pull followed by a push operation.
Unfortunately we have a case where we end up with a race condition on the same item in the array on parallel requests (distributed environment), i.e.
2x pull runs first, then 2x push. This results in duplicate entries, e.g.
{
"_id": ...,
"nestedArray": [
{
"subId": "1"
},
{
"subId": "1"
},
{
"subId": "2"
}
]
}
Are there any workarounds?
I usually use an optimistic lock for this situation.
To prepare for this, you need to add a version field to your model, which you will increment each time you modify that model. Then you use this method:
Model.findOneAndUpdate(
{$and: [{_id: <current_id>}, {version: <current_version>}]},
{nestedArray: <new_nested_array>})
.exec(function(err, result) {
if(err) {
// handle error
}
if(!result) {
// the model has been updated in the mean time
}
// all is good
});
This means that you first need to get the model and compute the new array <new_nested_array>. This way you can be sure that only one modification will take place for a certain version.
Hope I explained myself.

Meteor collection get last document of each selection

Currently I use the following find query to get the latest document of a certain ID
Conditions.find({
caveId: caveId
},
{
sort: {diveDate:-1},
limit: 1,
fields: {caveId: 1, "visibility.visibility":1, diveDate: 1}
});
How can I use the same using multiple ids with $in for example
I tried it with the following query. The problem is that it will limit the documents to 1 for all the found caveIds. But it should set the limit for each different caveId.
Conditions.find({
caveId: {$in: caveIds}
},
{
sort: {diveDate:-1},
limit: 1,
fields: {caveId: 1, "visibility.visibility":1, diveDate: 1}
});
One solution I came up with is using the aggregate functionality.
var conditionIds = Conditions.aggregate(
[
{"$match": { caveId: {"$in": caveIds}}},
{
$group:
{
_id: "$caveId",
conditionId: {$last: "$_id"},
diveDate: { $last: "$diveDate" }
}
}
]
).map(function(child) { return child.conditionId});
var conditions = Conditions.find({
_id: {$in: conditionIds}
},
{
fields: {caveId: 1, "visibility.visibility":1, diveDate: 1}
});
You don't want to use $in here as noted. You could solve this problem by looping through the caveIds and running the query on each caveId individually.
you're basically looking at a join query here: you need all caveIds and then lookup last for each.
This is a problem of database schema/denormalization in my opinion: (but this is only an opinion!):
You could as mentioned here, lookup all caveIds and then run the single query for each, every single time you need to look up last dives.
However I think you are much better off recording/updating the last dive inside your cave document, and then lookup all caveIds of interest pulling only the lastDive field.
That will give you immediately what you need, rather than going through expensive search/sort queries. This is at the expense of maintaining that field in the document, but it sounds like it should be fairly trivial as you only need to update the one field when a new event occurs.

Publish all fields in document but just part of an array in the document

I have a mongo collection in which the documents have a field that is an array. I want to be able to publish everything in the documents except for the elements in the array that were created more than a day ago. I suspect the answer will be somewhat similar to this question.
Meteor publication: Hiding certain fields in an array document field?
Instead of limiting fields in the array, I just want to limit the elements in the array being published.
Thanks in advance for any responses!
EDIT
Here is an example document:
{
_id: 123456,
name: "Unit 1",
createdAt: (datetime object),
settings: *some stuff*,
packets: [
{
_id: 32412312,
temperature: 70,
createdAt: *datetime object from today*
},
{
_id: 32412312,
temperature: 70,
createdAt: *datetime from yesterday*
}
]
}
I want to get everything in this document except for the part of the array that was created more than 24 hours ago. I know I can accomplish this by moving the packets into their own collection and tying them together with keys as in a relational database but if what I am asking were possible, this would be simpler with less code.
You could do something like this in your publish method:
Meteor.publish("pubName", function() {
var collection = Collection.find().fetch(); //change this to return your data
_.each(collection, function(collectionItem) {
_.each(collectionItem.packets, function(packet, index) {
var deadline = Date.now() - 86400000 //should equal 24 hrs ago
if (packet.createdAt < deadline) {
collectionItem.packets.splice(index, 1);
}
}
}
return collection;
}
Though you might be better off storing the last 24 hours worth of packets as a separate array in your document. Would probably be less taxing on the server, not sure.
Also, code above is untested. Good luck.
you can use the $elemMatch projection
http://docs.mongodb.org/manual/reference/operator/projection/elemMatch/
So in your case, it would be
var today = new Date();
var yesterday = new Date(today);
yesterday.setDate(today.getDate() - 1);
collection.find({}, //find anything or specifc
{
fields: {
'packets': {
$elemMatch: {$gt : {'createdAt' : yesterday /* or some new Date() */}}
}
}
});
However, $elemMatch only returns the FIRST element matching your condition. To return more than 1 element, you need to use the aggregation framework, which will be more efficient than _.each or forEach, particularly if you have a large array to loop through.
collection.rawCollection().aggregate([
{
$match: {}
},
{
$redact: {
$cond: {
if : {$or: [{$gt: ["$createdAt",yesterday]},"$packets"]},
then: "$$DESCEND",
else: "$$PRUNE"
}
}
}], function (error, result ){
});
You specify the $match in a way similar to find({}). Then all the documents that match your conditions get pipped into the $redact which is specified by the $cond.
$redact scans the document from top level to bottom. At the top level, you have _id, name, createdAt, settings, packets; hence {$or: [***,"$packets"]}
The presence of $packets in the $or allows the $redact to scan the second level which contain the _id, temperature and createdAt; hence {$gt: ["$createdAt",yesterday]}
This is async, you can use Meteor.wrapAsync to wrap around the function.
Hope this help