I want to do exactly what this SO question gets at but with Meteor on the server side:
How do I retrieve all of the documents which HAVE a unique value of a
field?
> db.foo.insert([{age: 21, name: 'bob'}, {age: 21, name: 'sally'}, {age: 30, name: 'Jim'}])
> db.foo.count()
3
> db.foo.aggregate({ $group: { _id: '$age', name: { $max: '$name' } } }).result
[
{
"_id" : 30,
"name" : "Jim"
},
{
"_id" : 21,
"name" : "sally"
}
]
My understanding is that aggregate is not available for Meteor. If that is correct, how can I achieve the above? Performing post-filtering on a query after-the-fact is not an ideal solution, as I want to use limit. I'm also happy to get documents with a unique field some other way as long as I can use limit.
There is a general setup you can use to access the underlying driver collection object and therefore .aggregate() without installing any other plugins.
The basic process goes like this:
FooAges = new Meteor.Collection("fooAges");
Meteor.publish("fooAgeQuery", function(args) {
var sub = this;
var db = MongoInternals.defaultRemoteCollectionDriver().mongo.db;
var pipeline = [
{ "$group": {
"_id": "$age",
"name": { "$max": "$name" }
}}
];
db.collection("foo").aggregate(
pipeline,
// Need to wrap the callback so it gets called in a Fiber.
Meteor.bindEnvironment(
function(err, result) {
// Add each of the results to the subscription.
_.each(result, function(e) {
// Generate a random disposable id for aggregated documents
sub.added("fooAges", Random.id(), {
"age": e._id,
"name": e.name
});
});
sub.ready();
},
function(error) {
Meteor._debug( "Error doing aggregation: " + error);
}
)
);
});
So you define a collection for the output of the aggregation and within a routine like this you then publish the service that you are also going to subscribe to in your client.
Inside this, the aggregation is run and populated into the the other collection ( logically as it doesn't actually write anything ). So you then use that collection on the client with the same definition and all the aggregated results are just returned.
I actually have a full working example application of a similar processs within this question, as well as usage of the meteor hacks aggregate package on this question here as well, if you need further reference.
Related
My question is almost a duplicate of this question. The difference is that I am using minimongo within the Meteor framework/platform. Given this document:
{
"_id" : ObjectId("4d2d8deff4e6c1d71fc29a07"),
"user_id" : "714638ba-2e08-2168-2b99-00002f3d43c0",
"events" : [
{
"handled" : {
"name": "Mike",
"visible": false
},
"profile" : 10,
"data" : "....."
}
{
"handled" : {
"name": "Shaun",
"visible": false
},
"profile" : 10,
"data" : "....."
}
{
"handled" : {
"name": "Glen",
"visible": true
},
"profile" : 20,
"data" : "....."
}
]}
How do I query for a particular user and update all the objects in the events array ONLY where 'handled.visible':false to 'handled.visible':true? As much as possible, I would like to have it in a single query. My objective is really to improve the performance of my app. Instead of having to fetch the entire array of objects, process them on the client side (change the object properties) then re-update on the server, it would be great to directly update via Mongo. Changing the data directly on the server is also natively reactive and is advantageous, though is not really necessary for my app.
I am not exactly sure as how to formulate the query in minimongo.
I have tried:
Meteor.users.update({_id: 's8Ppj4ZFSeq9X6xC4', 'events.handled.visible': false }, { $set:{'events.$.handled.visible':true} });
This worked for only the first object found in the array. However I would want to update all objects in the array where the handled.visible is false.
Meteor currently does not allow to update multiple elements in an array. One has to loop through the array to update for each document returned.
You could create a server method that uses the aggregation framework to return a count that you can use in your client as a loop to update each matching array element. The following untested example may have some pointers on
how you can go about it (I haven't tested it so feedback on your implementation is much welcome to make this a better solution):
Server:
Add the meteorhacks:aggregate package to your app with
meteor add meteorhacks:aggregate
and then use as
Meteor.methods({
getInvisibleEventsCount: function(userId){
var pipeline = [
{ "$match": { "_id": userId, "events.handled.visible": false } },
{ "$unwind": "$events.handled" },
{ "$match": { "events.handled.visible": false } },
{ "$group": {
"_id": null,
"count": { "$sum": 1 }
}}
];
var result = Meteor.users.aggregate(pipeline);
return result;
}
});
Client:
Meteor.call("getInvisibleEventsCount", userId, function(err, response) {
var max = response[0].count;
while(max--) {
Meteor.users.update(
{ "_id": userId, "events.handled.visible": false },
{ "$set": { "events.$.handled.visible": true} }
);
}
});
I have tried 3 methods so far, one as posted by chridam which works.
Chridam's Method
Method 1 just processes it in the client but selectively chooses only false entries:
testLocalAggregate : function(){
var findQuery = Meteor.users.findOne(Meteor.userId(), {fields:{'events':1}}).events;
var tempArr = [];
_.each(findQuery, function(event, index){
if(events.handled.visible === false){
tempArr.push(index);
}
});
if(tempArr.length){
var count = tempArr.length;
while(count --){
Meteor.users.update(
{ "_id": Meteor.userId(), "events.handled.visible": false },
{$set:{'events.$.handled.visible':true}}
);
}
}
}
Method 2 is by replacing all entries regardless whether true or false, which is the simplest and most straightforward (I know I have specified in my question to only update false entries but for the sake of performance, this solution is also presented).
testUpdate : function(){
var events = Meteor.users.findOne(Meteor.userId(), {fields:{'events':1}}).events;
_.each(events, function(event, index){
if(events.handled.visible === false){
events.handled.visible = true;
}
});
Meteor.users.update({_id : Meteor.userId()}, {$set: {'events': events}});
}
As per testing in Kadira with 5 method calls, the average throughputs and response times are as follows:
Method 1:
Average throughput - .08/min
Average response time - 219ms
Method 2:
Average throughput - .12/min
Average response time - 109ms
Chridam's Method:
Average throughput - .08/min
Average response time - 221ms
Method 1 and Chridam's are almost the same. But I noticed that the UI updates are faster using method 2, with still comparable throughput and response time. I'm just not sure if method 2 will be better as more method calls are done.
Any comments to improve the posted methods or which may be faster in other circumstances will be very much appreciated.
I have two models, one is user
userSchema = new Schema({
userID: String,
age: Number
});
and the other is the score recorded several times everyday for all users
ScoreSchema = new Schema({
userID: {type: String, ref: 'User'},
score: Number,
created_date = Date,
....
})
I would like to do some query/calculation on the score for some users meeting specific requirement, say I would like to calculate the average of score for all users greater than 20 day by day.
My thought is that firstly do the populate on Scores to populate user's ages and then do the aggregate after that.
Something like
Score.
populate('userID','age').
aggregate([
{$match: {'userID.age': {$gt: 20}}},
{$group: ...},
{$group: ...}
], function(err, data){});
Is it Ok to use populate before aggregate? Or I first find all the userID meeting the requirement and save them in a array and then use $in to match the score document?
No you cannot call .populate() before .aggregate(), and there is a very good reason why you cannot. But there are different approaches you can take.
The .populate() method works "client side" where the underlying code actually performs additional queries ( or more accurately an $in query ) to "lookup" the specified element(s) from the referenced collection.
In contrast .aggregate() is a "server side" operation, so you basically cannot manipulate content "client side", and then have that data available to the aggregation pipeline stages later. It all needs to be present in the collection you are operating on.
A better approach here is available with MongoDB 3.2 and later, via the $lookup aggregation pipeline operation. Also probably best to handle from the User collection in this case in order to narrow down the selection:
User.aggregate(
[
// Filter first
{ "$match": {
"age": { "$gt": 20 }
}},
// Then join
{ "$lookup": {
"from": "scores",
"localField": "userID",
"foriegnField": "userID",
"as": "score"
}},
// More stages
],
function(err,results) {
}
)
This is basically going to include a new field "score" within the User object as an "array" of items that matched on "lookup" to the other collection:
{
"userID": "abc",
"age": 21,
"score": [{
"userID": "abc",
"score": 42,
// other fields
}]
}
The result is always an array, as the general expected usage is a "left join" of a possible "one to many" relationship. If no result is matched then it is just an empty array.
To use the content, just work with an array in any way. For instance, you can use the $arrayElemAt operator in order to just get the single first element of the array in any future operations. And then you can just use the content like any normal embedded field:
{ "$project": {
"userID": 1,
"age": 1,
"score": { "$arrayElemAt": [ "$score", 0 ] }
}}
If you don't have MongoDB 3.2 available, then your other option to process a query limited by the relations of another collection is to first get the results from that collection and then use $in to filter on the second:
// Match the user collection
User.find({ "age": { "$gt": 20 } },function(err,users) {
// Get id list
userList = users.map(function(user) {
return user.userID;
});
Score.aggregate(
[
// use the id list to select items
{ "$match": {
"userId": { "$in": userList }
}},
// more stages
],
function(err,results) {
}
);
});
So by getting the list of valid users from the other collection to the client and then feeding that to the other collection in a query is the onyl way to get this to happen in earlier releases.
Considering the document below how can I rename 'techId1' to 'techId'. I've tried different ways and can't get it to work.
{
"_id" : ObjectId("55840f49e0b"),
"__v" : 0,
"accessCard" : "123456789",
"checkouts" : [
{
"user" : ObjectId("5571e7619f"),
"_id" : ObjectId("55840f49e0bf"),
"date" : ISODate("2015-06-19T12:45:52.339Z"),
"techId1" : ObjectId("553d9cbcaf")
},
{
"user" : ObjectId("5571e7619f15"),
"_id" : ObjectId("55880e8ee0bf"),
"date" : ISODate("2015-06-22T13:01:51.672Z"),
"techId1" : ObjectId("55b7db39989")
}
],
"created" : ISODate("2015-06-19T12:47:05.422Z"),
"date" : ISODate("2015-06-19T12:45:52.339Z"),
"location" : ObjectId("55743c8ddbda"),
"model" : "model1",
"order" : ObjectId("55840f49e0bf"),
"rid" : "987654321",
"serialNumber" : "AHSJSHSKSK",
"user" : ObjectId("5571e7619f1"),
"techId" : ObjectId("55b7db399")
}
In mongo console I tried which gives me ok but nothing is actually updated.
collection.update({"checkouts._id":ObjectId("55840f49e0b")},{ $rename: { "techId1": "techId" } });
I also tried this which gives me an error. "cannot use the part (checkouts of checkouts.techId1) to traverse the element"
collection.update({"checkouts._id":ObjectId("55856609e0b")},{ $rename: { "checkouts.techId1": "checkouts.techId" } })
In mongoose I have tried the following.
collection.findByIdAndUpdate(id, { $rename: { "checkouts.techId1": "checkouts.techId" } }, function (err, data) {});
and
collection.update({'checkouts._id': n1._id}, { $rename: { "checkouts.$.techId1": "checkouts.$.techId" } }, function (err, data) {});
Thanks in advance.
You were close at the end, but there are a few things missing. You cannot $rename when using the positional operator, instead you need to $set the new name and $unset the old one. But there is another restriction here as they will both belong to "checkouts" as a parent path in that you cannot do both at the same time.
The other core line in your question is "traverse the element" and that is the one thing you cannot do in updating "all" of the array elements at once. Well, not safely and without possibly overwriting new data coming in anyway.
What you need to do is "iterate" each document and similarly iterate each array member in order to "safely" update. You cannot really iterate just the document and "save" the whole array back with alterations. Certainly not in the case where anything else is actively using the data.
I personally would run this sort of operation in the MongoDB shell if you can, as it is a "one off" ( hopefully ) thing and this saves the overhead of writing other API code. Also we're using the Bulk Operations API here to make this as efficient as possible. With mongoose it takes a bit more digging to implement, but still can be done. But here is the shell listing:
var bulk = db.collection.initializeOrderedBulkOp(),
count = 0;
db.collection.find({ "checkouts.techId1": { "$exists": true } }).forEach(function(doc) {
doc.checkouts.forEach(function(checkout) {
if ( checkout.hasOwnProperty("techId1") ) {
bulk.find({ "_id": doc._id, "checkouts._id": checkout._id }).updateOne({
"$set": { "checkouts.$.techId": checkout.techId1 }
});
bulk.find({ "_id": doc._id, "checkouts._id": checkout._id }).updateOne({
"$unset": { "checkouts.$.techId1": 1 }
});
count += 2;
if ( count % 500 == 0 ) {
bulk.execute();
bulk = db.collection.initializeOrderedBulkOp();
}
}
});
});
if ( count % 500 !== 0 )
bulk.execute();
Since the $set and $unset operations are happening in pairs, we are keeping the total batch size to 1000 operations per execution just to keep memory usage on the client down.
The loop simply looks for documents where the field to be renamed "exists" and then iterates each array element of each document and commits the two changes. As Bulk Operations, these are not sent to the server until the .execute() is called, where also a single response is returned for each call. This saves a lot of traffic.
If you insist on coding with mongoose. Be aware that a .collection acessor is required to get to the Bulk API methods from the core driver, like this:
var bulk = Model.collection.inititializeOrderedBulkOp();
And the only thing that sends to the server is the .execute() method, so this is your only execution callback:
bulk.exectute(function(err,response) {
// code body and async iterator callback here
});
And use async flow control instead of .forEach() such as async.each.
Also, if you do that, then be aware that as a raw driver method not governed by mongoose, you do not get the same database connection awareness as you do with mongoose methods. Unless you know for sure the database connection is already established, it is safter to put this code within an event callback for the server connection:
mongoose.connection.on("connect",function(err) {
// body of code
});
But otherwise those are the only real ( apart from call syntax ) alterations you really need.
This worked for me, I created this query to perform this procedure and I share it, (although I know it is not the most optimized way):
First, make an aggregate that (1) $match the documents that have the checkouts array field with techId1 as one of the keys of each sub-document. (2) $unwind the checkouts field (that deconstructs the array field from the input documents to output a document for each element), (3) adds the techId field (with $addFields), (4) $unset the old techId1 field, (5) $group the documents by _id to have again the checkout sub-documents grouped by its _id, and (6) write the result of these aggregation in a temporal collection (with $out).
const collection = 'yourCollection'
db[collection].aggregate([
{
$match: {
'checkouts.techId1': { '$exists': true }
}
},
{
$unwind: {
path: '$checkouts'
}
},
{
$addFields: {
'checkouts.techId': '$checkouts.techId1'
}
},
{
$project: {
'checkouts.techId1': 0
}
},
{
$group: {
'_id': '$_id',
'checkouts': { $push: { 'techId': '$checkouts.techId' } }
}
},
{
$out: 'temporal'
}
])
Then, you can make another aggregate from this temporal collection to $merge the documents with the modified checkouts field to your original collection.
db.temporal.aggregate([
{
$merge: {
into: collection,
on: "_id",
whenMatched:"merge",
whenNotMatched: "insert"
}
}
])
I have a collection which elements can be simplified to this:
{tags : [1, 5, 8]}
where there would be at least one element in array and all of them should be different. I want to substitute one tag for another and I thought that there would not be a problem. So I came up with the following query:
db.colll.update({
tags : 1
},{
$pull: { tags: 1 },
$addToSet: { tags: 2 }
}, {
multi: true
})
Cool, so it will find all elements which has a tag that I do not need (1), remove it and add another (2) if it is not there already. The problem is that I get an error:
"Cannot update 'tags' and 'tags' at the same time"
Which basically means that I can not do pull and addtoset at the same time. Is there any other way I can do this?
Of course I can memorize all the IDs of the elements and then remove tag and add in separate queries, but this does not sound nice.
The error is pretty much what it means as you cannot act on two things of the same "path" in the same update operation. The two operators you are using do not process sequentially as you might think they do.
You can do this with as "sequential" as you can possibly get with the "bulk" operations API or other form of "bulk" update though. Within reason of course, and also in reverse:
var bulk = db.coll.initializeOrderedBulkOp();
bulk.find({ "tags": 1 }).updateOne({ "$addToSet": { "tags": 2 } });
bulk.find({ "tags": 1 }).updateOne({ "$pull": { "tags": 1 } });
bulk.execute();
Not a guarantee that nothing else will try to modify,but it is as close as you will currently get.
Also see the raw "update" command with multiple documents.
If you're removing and adding at the same time, you may be modeling a 'map', instead of a 'set'. If so, an object may be less work than an array.
Instead of data as an array:
{ _id: 'myobjectwithdata',
data: [{ id: 'data1', important: 'stuff'},
{ id: 'data2', important: 'more'}]
}
Use data as an object:
{ _id: 'myobjectwithdata',
data: { data1: { important: 'stuff'},
data2: { important: 'more'} }
}
The one-command update is then:
db.coll.update(
'myobjectwithdata',
{ $set: { 'data.data1': { important: 'treasure' } }
);
Hard brain working for this answer done here and here.
Starting in Mongo 4.4, the $function aggregation operator allows applying a custom javascript function to implement behaviour not supported by the MongoDB Query Language.
And coupled with improvements made to db.collection.update() in Mongo 4.2 that can accept an aggregation pipeline, allowing the update of a field based on its own value,
We can manipulate and update an array in ways the language doesn't easily permit:
// { "tags" : [ 1, 5, 8 ] }
db.collection.updateMany(
{ tags: 1 },
[{ $set:
{ "tags":
{ $function: {
body: function(tags) { tags.push(2); return tags.filter(x => x != 1); },
args: ["$tags"],
lang: "js"
}}
}
}]
)
// { "tags" : [ 5, 8, 2 ] }
$function takes 3 parameters:
body, which is the function to apply, whose parameter is the array to modify. The function here simply consists in pushing 2 to the array and filtering out 1.
args, which contains the fields from the record that the body function takes as parameter. In our case, "$tag".
lang, which is the language in which the body function is written. Only js is currently available.
In case you need replace one value in an array to another check this answer:
Replace array value using arrayFilters
I have the following structure of a mongo document:
{
"_id": ObjectId("4fba2558a0787e53320027eb"),
"replies": {
"0": {
"email": ObjectId("4fb89a181b3129fe2d000000"),
"sentDate": "2012-05-21T11: 22: 01.418Z"
}
"1": {
"email": ObjectId("4fb89a181b3129fe2d000000"),
"sentDate": "2012-05-21T11: 22: 01.418Z"
}
"2" ....
}
}
How do I count all the replies from all the documents in the collection?
Thank you!
In the following answer, I'm working with a simple data set with five replies across the collection:
> db.foo.find()
{ "_id" : ObjectId("4fba6b0c7c32e336fc6fd7d2"), "replies" : [ 1, 2, 3 ] }
{ "_id" : ObjectId("4fba6b157c32e336fc6fd7d3"), "replies" : [ 1, 2 ] }
Since we're not simply counting documents, db.collection.count() won't help us here. We'll need to resort to MapReduce to scan each document and aggregate the reply array lengths. Consider the following:
db.foo.mapReduce(
function() { emit('totalReplies', { count: this.replies.length }); },
function(key, values) {
var result = { count: 0 };
values.forEach(function(value) {
result.count += value.count;
});
return result;
},
{ out: { inline: 1 }}
);
The map function (first argument) runs across the entire collection and emits the number of replies in each document under a constant key. Mongo will then consider all emitted values and run the reduce function (second argument) a number of times to consolidate (literally reduce) the result. Hopefully the code here is straightforward. If you're new to map/reduce, one caveat is that the reduce method must be capable of processing its own output. This is explained in detail in the MapReduce docs linked above.
Note: if your collection is quite large, you may have to use another output mode (e.g. collection output); however, inline works well for small data sets.
Lastly, if you're using MongoDB 2.1+, we can take advantage of the Aggregation Framework to avoid writing JS functions and make this even easier:
db.foo.aggregate(
{ $project: { replies: 1 }},
{ $unwind: "$replies" },
{ $group: {
_id: "result",
totalReplies: { $sum: 1 }
}}
);
Three things are happening here. First, we tell Mongo that we're interested in the replies field. Secondly, we want to unwind the array so that we can iterate over all elements across the fields in our projection. Lastly, we'll tally up results under a "result" bucket (any constant will do), adding 1 to the totalReplies result for each iteration. Executing this query will yield the following result:
{
"result" : [{
"_id" : "result",
"totalReplies" : 5
}],
"ok" : 1
}
Although I wrote the above answers with respect to the Mongo client, you should have no trouble translating them to PHP. You'll need to use MongoDB::command() to run either MapReduce or aggregation queries, as the PHP driver currently has no helper methods for either. There's currently a MapReduce example in the PHP docs, and you can reference this Google group post for executing an aggregation query through the same method.
I haven't checked your code, might work as well. I've did the following and it just works:
$replies = $db->command(
array(
"distinct" => "foo",
"key" => "replies"
)
);
$all = count($replies['values']);
I've did it again using the group command of the PHP Mongo Driver. It's similar to a MapReduce command.
$keys = array("replies.type" => 1); //keys for group by
$initial = array("count" => 0); //initial value of the counter
$reduce = "function (obj, prev) { prev.count += obj.replies.length; }";
$condition = array('replies' => array('$exists' => true), 'replies.type' => 'follow');
$g = $db->foo->group($keys, $initial, $reduce, $condition);
echo $g['count'];
Thanks jmikola for giving links to Mongo.
JSON should be
{
"_id": ObjectId("4fba2558a0787e53320027eb"),
"replies":[
{
0: {
"email": ObjectId("4fb89a181b3129fe2d000000"),
"sentDate": "2012-05-21T11: 22: 01.418Z"
},
1: {
"email": ObjectId("4fb89a181b3129fe2d000000"),
"sentDate": "2012-05-21T11: 22: 01.418Z"
},
2: {....}
]
}