Generating a Structure for Aggregation - mongodb

So here's a question. What I want to do is generate a data structure given a set of input values.
Since this is a multiple language submission, let's consider the input list to be an array of key/value pairs. And therefore an array of Hash, Map, Dictionary or whatever term that floats your boat. I'll keep all the notation here as JSON, hoping that's universal enough to translate / decode.
So for input, let's say we have this:
[ { "4": 10 }, { "7": 9 }, { "90": 7 }, { "1": 8 } ]
Maybe a little redundant, but lets stick with that.
So from that input, I want to get to this structure. I'm giving a whole structure, but the important part is what gets returned for the value under "weight":
[
{ "$project": {
"user_id": 1,
"content": 1,
"date": 1,
"weight": { "$cond": [
{ "$eq": ["$user_id": 4] },
10,
{ "$cond": [
{ "$eq": ["$user_id": 7] },
9,
{ "$cond": [
{ "$eq": ["$user_id": 90] },
7,
{ "$cond": [
{ "$eq": ["$user_id": 1] },
8,
0
]}
]}
]}
]}
}}
]
So the solution I'm looking for populates the structure content for "weight" as shown in the structure by using the input as shown.
Yes the values that look like numbers in the structure must be numbers and not strings, so whatever the language implementation, the JSON encoded version must look exactly the same.
Alternately, give me a better approach to get to the same result of assigning the weight values based on the matching user_id.
Does anyone have an approach to this?
Would be happy with any language implementation as I think it is fair to just see how the structure can be created.
I'll try to add myself, but kudos goes to the good implementations.
Happy coding.

When I had a moment to think about this, I ran back home to perl and worked this out:
use Modern::Perl;
use Moose::Autobox;
use JSON;
my $encoder = JSON->new->pretty;
my $input = [ { 4 => 10 }, { 7 => 9 }, { 90 => 7 }, { 1 => 8 } ];
my $stack = [];
foreach my $item ( reverse #{$input} ) {
while ( my ( $key, $value ) = each %{$item} ) {
my $rec = {
'$cond' => [
{ '$eq' => [ '$user_id', int($key) ] },
$value
]
};
if ( $stack->length == 0 ) {
$rec->{'$cond'}->push( 0 );
} else {
my $last = $stack->pop;
$rec->{'$cond'}->push( $last );
}
$stack->push( $rec );
}
}
say $encoder->encode( $stack->[0] );
So the process was blindingly simple.
Go through each item in the array and get the key and value for the entry
Create a new "document" that has in array argument to the "$cond" key just two of required three entries. These are the values assigned to test the "$user_id" and the returned "weight" value.
Test the length of the outside variable for stack, and if it was empty (first time through) then push the value of 0 as seen in the last nested element to the end of the "$cond" key in the document.
If there was something already there (length > 0) then take that value and push it as the third value in the "$cond" key for the document.
Put that document back as the value of stack and repeat for the next item
So there are a few things in the listing such as reversing the order of the input, which isn't required but produces a natural order in the nested output. Also, my choice for that outside "stack" was an array because the test operators seemed simple. But it really is just a singular value that keeps getting re-used, augmented and replaced.
Also the JSON printing is just there to show the output. All that is really wanted is the resulting value of stack to be merged into the structure.
Then I converted the logic to ruby, as was the language used by the OP from where I got the inspiration for how to generate this nested structure:
require 'json'
input = [ { 4 => 10 }, { 7 => 9 }, { 90 => 7 }, { 1 => 8 } ]
stack = []
input.reverse_each {|item|
item.each {|key,value|
rec = {
'$cond' => [
{ '$eq' => [ '$user_id', key ] },
value
]
}
if ( stack.length == 0 )
rec['$cond'].push( 0 )
else
last = stack.pop
rec['$cond'].push( last )
end
stack.push( rec )
}
}
puts JSON.pretty_generate(stack[0])
And then eventually into the final form to generate the pipeline that the OP wanted:
require 'json'
userWeights = [ { 4 => 10 }, { 7 => 9 }, { 90 => 7}, { 1 => 8 } ]
stack = []
userWeights.reverse_each {|item|
item.each {|key,value|
rec = {
'$cond' => [
{ '$eq' => [ '$user_id', key ] },
value
]
}
if ( stack.length == 0 )
rec['$cond'].push( 0 )
else
last = stack.pop
rec['$cond'].push( last )
end
stack.push( rec )
}
}
pipeline = [
{ '$project' => {
'user_id' => 1,
'content' => 1,
'date' => 1,
'weight' => stack[0]
}},
{ '$sort' => { 'weight' => -1, 'date' => -1 } }
]
puts JSON.pretty_generate( pipeline )
So that was a way to generate a structure to be passed into aggregate in order to apply "weights" that are specific to a user_id and sort the results in the collection.

First thank you Neil for your help with this here, this workout great for me and it's really fast. For those who use mongoid, this is what I used to create the weight parameter where recommended_user_ids is an array:
def self.project_recommended_weight recommended_user_ids
return {} unless recommended_user_ids.present?
{:weight => create_weight_statement(recommended_user_ids.reverse)}
end
def self.create_weight_statement recommended_user_ids, index=0
return 0 if index == recommended_user_ids.count
{"$cond" => [{ "$eq" => ["$user_id", recommended_user_ids[index]] },index+1,create_weight_statement(recommended_user_ids,index+1)]}
end
So to add this to the pipeline simply merge the hash like this:
{"$project" => {:id => 1,:posted_at => 1}.merge(project_recommended_weight(options[:recommended_user_ids]))}

Related

Laravel MongoDB - aggregation, ordering query

I am wondering how could I achieve a specific result.
Starting of
I am using https://github.com/jenssegers/laravel-mongodb
The code sample below is used to get an array of documents that contains my specific slug in the rewards node. And till that point, everything works as intended.
$array = [
'rewards.slug' => ['$eq' => 'example_slug'],
'expired' => ['$gte' => \Carbon\Carbon::now()->toDateTimeString()]
];
$models = Master::raw(function ($collection) use (&$array) {
return $collection->find(
$array, ["typeMap" => ['root' => 'array', 'document' => 'array']])
->toArray();
});
My example document
{
"_id": {
"$oid": "5be4464eafad20007245543f"
},
"some_int_value": 100,
"some_string_value": "String",
"rewards": [
{
"slug": "example_slug",
"name": "Example slug",
"quantity": 4,
"estimated": {
"value": 18750
}
},
{
"slug": "example_slug",
"name": "Example slug",
"quantity": 1,
"estimated": {
"value": 100
}
},
{
"slug": "other_example",
"name": "Other slug example",
"quantity": 1,
"estimated": {
"value": 100
}
}
],
"expires": "2018-11-08 20:20:45",
}
Desired result
I would like to implement some more complex query, which would do the following.
Retrieve documents that : pseudo select all documents that contain reward "slug": "example_slug", sum the quantity of them, return greater than X quantity documents, order by sum quantity desc
and a very similar one to the above select all documents that contain reward "slug": "example_slug", sum estimated.value, return greater than X estimated.value documents, order by sum of estimated.value desc
If you do need more explanation feel free to ask, I feel like I don't even know where to start with this one.
All help is greatly appreciated
You can use below aggregation in 3.6.
$addFields to create an extra slugcount field to hold the result.
$filter rewards with slug matching example_slug followed by $sum to sum the quantity field.
$match with $gt > X - aggregation expression to filter documents where the sum of all matching quantities is greater than X
$sort slugcount desc and $project with exclusion to remove the slugcount from the final response.
db.colname.aggregate([
{"$addFields":{
"slugcount":
{"$let":{
"vars":{
"mslug":{
"$filter":{
"input":"$rewards",
"cond":{"$eq":["$$this.slug","example_slug"]}
}
}
},
"in":{"$sum":"$$mslug.quantity"}
}}
}},
{"$match":{"slugcount":{"$gt":X}}},
{"$sort":{"slugcount":-1}},
{"$project":{"slugcount":0}}
])
Something like
ModelName::raw(function ($collection) {
return $collection->aggregate([
['$match' => ['expired' => ['$gte' => \Carbon\Carbon::now()->toDateTimeString()]]],
['$addFields' => [
'slugcount'
['$let' => [
'vars' => [
'mslug' => [
'$filter' => [
'input' => '$rewards',
'cond' => ['$eq' => ['$$this.slug','example_slug']]
]
]
],
'in' => ['$sum' => '$$mslug.quantity']
]]
]],
['$match' => ['slugcount'=> ['$gt' => X]]],
['$sort' => ['slugcount' => -1]],
['$project' => ['slugcount' => 0]]]);
});
You can replace quantity with estimated.value for second aggregation.

Get Mongo _id as string and not ObjectId in find query without aggregation [duplicate]

Here I have created a collection with a single document
db.getCollection('example').insert({"example":1});
I have tried to use Projection, and I get back the _id.
db.getCollection('example').find({"example":1},{"_id":1});
{
"_id" : ObjectId("562a6300bbc948a4315f3abc")
}
However, I need the below output as shown below.
id and not _id
ObjectId("562a6300bbc948a4315f3abc") vs "562a6300bbc948a4315f3abc"
{
"id" : "562a6300bbc948a4315f3abc"
}
Although I can process #1 and #2 on my app server(PHP based) to get the desired ouput, I am looking if there is a way to get the expected result on querying from mongo itself
MongoDB 4.0 adds the $convert aggregation operator and the $toString alias which allows you to do exactly that:
db.getCollection('example').aggregate([
{ "$match": { "example":1 } },
{ "$project": { "_id": { "$toString": "$_id" } } }
])
A main usage would most likely be though to use the _id value as a "key" in a document.
db.getCollection('example').insertOne({ "a": 1, "b": 2 })
db.getCollection('example').aggregate([
{ "$replaceRoot": {
"newRoot": {
"$arrayToObject": [
[{
"k": { "$toString": "$_id" },
"v": {
"$arrayToObject": {
"$filter": {
"input": { "$objectToArray": "$$ROOT" },
"cond": { "$ne": ["$$this.k", "_id"] }
}
}
}
}]
]
}
}}
])
Which would return:
{
"5b06973e7f859c325db150fd" : { "a" : 1, "b" : 2 }
}
Which clearly shows the string, as does the other example.
Generally though there is usually a way to do "transforms" on the cursor as documents are returned from the server. This is usually a good thing since an ObjectId is a 12-byte binary representation as opposed to a 24 character hex "string" which takes a lot more space.
The shell has a .map() method
db.getCollection('example').find().map(d => Object.assign(d, { _id: d._id.valueOf() }) )
And NodeJS has a Cursor.map() which can do much the same thing:
let cursor = db.collection('example').find()
.map(( _id, ...d }) => ({ _id: _id.toString(), ...d }));
while ( await cursor.hasNext() ) {
let doc = cursor.next();
// do something
})
And the same method exists in other drivers as well ( just not PHP ), or you can just iterate the cursor and transform the content as is more likely the best thing to do.
In fact, whole cursor results can be reduced into a single object with great ease by simply adding to any cursor returning statement, when working in the shell
.toArray().reduce((o,e) => {
var _id = e._id;
delete e._id;
return Object.assign(o, { [_id]: e })
},{ })
Or for full ES6 JavaScript supporting environments like nodejs:
.toArray().reduce((o,({ _id, ...e })) => ({ ...o, [_id]: e }),{ })
Really simple stuff without the complexity of what needs to process in the aggregation framework. And very possible in any language by much the same means.
You need to use the .aggregate() method.
db.getCollection('example').aggregate([ { "$project": { "_id": 0, "id": "$_id" } } ]);
Which yields:
{ "id" : ObjectId("562a67745488a8d831ce2e35") }
or using the .str property.
db.getCollection('example').find({"example":1},{"_id":1}).map(function(doc) {
return {'id': doc._id.str }
})
Which returns:
[ { "id" : "562a67745488a8d831ce2e35" } ]
Well if you are using the PHP driver you can do something like this:
$connection = new MongoClient();
$db = $connection->test;
$col = $db->example;
$cursor = $col->find([], ["_id" => 1]);
foreach($cursor as $doc) { print_r(array("id" => $doc["_id"])); }
Which yields:
Array
(
[id] => MongoId Object
(
[$id] => 562a6c60f850734c0c8b4567
)
)
Or using again the MongoCollection::aggregate method.
$result = $col->aggregate(array(["$project" => ["id" => "$_id", "_id" => 0]]))
Then using the foreach loop:
Array
(
[_id] => MongoId Object
(
[$id] => 562a6c60f850734c0c8b4567
)
)
One simple solution for traversing MongoCursor on PHP side is to use Generators as well as foreach or array_map($function, iterator_to_array($cursor)).
Example:
function map_traversable(callable $mapper, \Traversable $iterator) {
foreach($iterator as $val) {
yield $mapper($val);
}
}
You can meet more at PHP documentation about generators syntax.
So, now you can use/reuse it (or similar implementation) for any propose of "projecting" your data on PHP side with any amount of mapping (just like pipeline does in aggregate) but with fewer iterations amount. And this solution is pretty convenient for OOP in a case of reusing your map functions.
UPD:
Just for your case example below:
$cursor = $db->getCollection('example')->find(["example":1],["_id":1]);
$mapper = function($record) {
return array('id' => (string) $record['_id']); //see \MongoId::__toString()
}
$traversableWithIdAsStringApplied = map_traversable($mapper, $cursor);
//...
now you can proceed with more mappings applied to $traversableWithIdAsStringApplied or use just iterator_to_array for simple array retrieving.

Convert ObjectID to String in mongo Aggregation

I'm in this scenario right now:
I have a collection X:
{
_id:ObjectId('56edbb4d5f084a51131dd4c6'),
userRef:ObjectId('56edbb4d5f084a51131dd4c6'),
serialNumber:'A123123',
...
}
I need to aggregate all documents, grouping them by the userRef + serialNumber, so I'm trying to use concat like this:
$group: {
_id: {
'$concat': ['$userRef','-','$serialNumber']
},
...
So basically in my aggregation in MongoDB, I need to group documents by the concatenation of a ObjectId and a string. However, It seems that $concat only accepts strings as parameters:
uncaught exception: aggregate failed: {
"errmsg" : "exception: $concat only supports strings, not OID",
"code" : 16702,
"ok" : 0
}
Is there a way to convert an ObjectId to a String within an aggregation expression?
EDIT:
This question is related, but I the solution doesn't fit my problem. (Specially because I can't use ObjectId.toString() during the aggregation)
Indeed I couldn't find any ObjectId().toString() operation in Mongo's documentation, but I wonder if there's any tricky thing that can be done in this case.
Now you can try with $toString aggregation which simply
converts ObjectId to string
db.collection.aggregate([
{ "$addFields": {
"userRef": { "$toString": "$userRef" }
}},
{ "$group": {
"_id": { "$concat": ["$userRef", "-", "$serialNumber"] }
}}
])
You can check the output here
I couldn't find a way to do what I wanted, so instead, I created a MapReduce function that, in the end, generated the keys the way I wanted to (concatenating other keys).
At the end, it looked something like this:
db.collection('myCollection').mapReduce(
function() {
emit(
this.userRef.str + '-' + this.serialNumber , {
count: 1,
whateverValue1:this.value1,
whateverValue2:this.value2,
...
}
)
},
function(key, values) {
var reduce = {}
.... my reduce function....
return reduce
}, {
query: {
...filters_here....
},
out: 'name_of_output_collection'
}
);
You can simply use $toString to apply $concat in aggregation on ObjectIDs in the following way -
$group: {
'_id': {
'$concat': [
{ '$toString' : '$userRef' },
'-',
{ '$toString' : '$serialNumber'}
]
},
}
I think you may try to resolve it by using an Array which contains both fields:
{$project:{newkey:['$userRef','$serialNumber']},{$match:{newkey:{$in:filterArray}}}}
this may match the data with both fields to the filter. Please notice that the data in the newkey array should have the same data type with the filterArray elements.
You can use $substr https://docs.mongodb.com/manual/reference/operator/aggregation/substr/#exp._S_substr to cast any object to string before $concat.
This is a sample of code that's working for me.
group_id_i['_id'] = {
'$concat' => [
{ '$substr' => [ {'$year' => '$t'}, 0, -1] }, '-',
{ '$substr' => [ {'$month' => '$t'}, 0, -1] }, '-',
{ '$substr' => [ {'$dayOfMonth' => '$t'}, 0, -1] }
]
}
Where t is DateTime field, this aggregation returns data like so.
{
"_id" => "28-9-2016",
"i" => 2
}

Map Reduce Mongo DB: Sum of ODD and EVEN numbers with elements

I am trying to process a number series ( collection ) get sum of odd / even numbers separately along with elements considered for calculations of each.
The numberseries document structure is as follows:
{
_id: <Autogenerated>,
number: <any number, it can repeat. Even if it repeats, it should be added each time. >
}
The output is something like below( not exact but in general )
{
..
{
"odd":<result>, elements:{n1,n3,n5}
},
{
"even":<result>, elements:{n2,n4,n6}
}
..
}
Map Function:
mapf = function(){
var value = { sum : 0, elements :[] };
value.sum = this.number;
value.elements.push(this.number);
print(tojson(value));
if( this.number % 2 != 0 ){
emit( "odd", value );
}
if( this.number % 2 == 0 ){
emit( "even", value );
}
}
Reduce Values argument:
Values is an array of JSON emitted from map:
[{
"sum": 1,
"elements": [1]
}, {
"sum": 3,
"elements": [3]
} ... ]
Reduce Function:
reducef = function(key, values){
var result = { sum : 0 , elements:[] };
print("K " + key +"Values array " + tojson(values) );
for(var i = 0; i<values.length;i++ ){
v = values[i];
print("Key "+key+"V.JSON"+tojson(v)+" V.SUM -> "+v.sum);
result.sum += v.sum;
result.elements.push(v.elements[0]);
print(tojson(result));
}
return result;
}
I am getting sum correctly, but the elements array is not properly getting populated. It is containing only some of the elements considered for calculations.
UPDATE
As per the answer given by Neil, I further verified my code. I found that my code, without any modification, works for small dataset, but does not work for large data-set.
Below are points which I have verified as pointed out, I found my code to be correct.
print("K " + key +"Values array " + tojson(values) );
Above line in reduce function results in following values object printed.
[{
"sum": 1,
"elements": [1]
}, {
"sum": 3,
"elements": [3]
}, {
"sum": 5,
"elements": [5]
}, {
"sum": 7,
"elements": [7]
}, {
"sum": 9,
"elements": [9]
}, {
"sum": 11,
"elements": [11]
}, {
"sum": 13,
"elements": [13]
}, {
"sum": 15,
"elements": [15]
}, {
"sum": 17,
"elements": [17]
}, {
"sum": 19,
"elements": [19]
}]
Hence the line to push elements to array in final results result.elements.push(v.elements[0]); should be correct.
In map function, before emitting, I am modifying value.sum as follows
value.sum = this.number;
This ensures that sum is not zero and numbers are properly getting added due to this.
When I test this code with 20 records, 40 records, 100 records, it works perfectly.
When I test this code with 20000 records, the sum value is correct but the element array
does not contain 10000 elements each( Odd and even numbers are equally distributed in collection ) .
In later case, I get below message:
query not recording (too large)
Okay, there is a clear reason and you do appear to have read some of the documentation and at least applied this rule:
"the type of the return object must be identical to the type of the value emitted by the map function ..."
And by that this means that both the map function and the reduce function essentially have the same output, which you did:
{ sum : 0, elements :[] };
But there was a piece of documentation that has not been understood:
"MongoDB can invoke the reduce function more than once for the same key. In this case, the previous output from the reduce function for that key will become one of the input values to the next reduce function invocation for that key."
So where the whole thing goes wrong is that you have assumed that since your "map" function only emits one element, that then there will be only one element in the "elements" array. A careful re-read of the above says that this is not true. And in fact the output from "reduce" will very likely be fed back into the "reduce" function again. This is indeed how mapReduce deals with a large number of values for the "values" array.
To fix it, change this in the "reduce" function:
result.elements.push(v.elements[0]);
To this:
v.elements.forEach(function(element) {
result.elements.push(element);
}
And in that way, when the "reduce" function returns a result that has summed up a few "elements" already and pushed them to the list, then that "input" will be processed correctly and merged with any other "values" that come in with it.
BTW. I Think you actually meant this in your mapper:
var value = { sum : 1, elements :[] };
Otherwise this code down here would just be summing 0's:
result.sum += v.sum;
But aggregate does this better
All of that said the following aggregation framework statement does the same thing but better and faster with an implementation in native code:
db.collection.aggregate([
{ "$project": {
"type": { "$cond": [
{ "$eq": [ { "$mod": [ "$number", 2 ] }, 0 ] },
"even",
"odd"
]},
"number": 1
}},
{ "$group": {
"_id": "$type",
"sum": { "$sum": 1 },
"elements": { "$push": "$number" }
}}
])
And also note that in both cases you are not really "summing the elements", but rather "counting" them. So if your want the sum then the mapReduce part becomes:
//result.sum += v.sum;
v.elements.forEach(function(element) {
result.sum += element;
result.elements.push(element);
}
And the aggregate part becomes:
{ "$group": {
"_id": "$type",
"sum": { "$sum": "$number" },
"elements": { "$push": "$number" }
}}
Which truly sums the "odd" or "even" numbers as found in your collection.

Conditional $inc in a nested MongoDB array

My database looks like this:
{
_id: 1,
values: [ 1, 2, 3, 4, 5 ]
},
{
_id: 2,
values: [ 2, 4, 6, 8, 10 ]
}, ...
I'd like to update every value in every document's nested array ("values") that meets some criterion. For instance, I'd like to increment every value that's >= 4 by one, which ought to yield:
{
_id: 1,
values: [ 1, 2, 3, 5, 6 ]
},
{
_id: 2,
values: [ 2, 5, 7, 8, 11 ]
}, ...
I'm used to working with SQL, where the nested array would be a seperated table connected with a unique ID. I'm a little lost in this new NoSQL world.
Thank you kindly,
This sort of update is not really possible using nested arrays, the reason for this is given in the positional $ operator documentation, and that states that you can only match the first array element for a given condition in the query.
So a statement like this:
db.collection.update(
{ "values": { "$gte": 4 } },
{ "$inc": { "values.$": 1 } }
)
Will not work in the sense that only the "first" array element that was matched would be incremented. So on your first document you would get this:
{ "_id" : 1, "values" : [ 1, 2, 3, 6, 6 ] }
In order to update the values as you are suggesting you would need to iterate the documents and the array elements to produce the result:
db.collecction.find({ "values": { "$gte": 4 } }).forEach(function(doc) {
for ( var i=0; i < doc.values.length; i++ ) {
if ( doc.values[i] >= 4 ) {
doc.values[i]++;
}
}
db.collection.update(
{ "_id": doc._id },
{ "$set": { "values": doc.values } }
);
})
Or whatever code equivalent of that basic concept.
Generally speaking, this sort of update does not lend itself well to a structure that contains elements in an array. If that is really your need, then the elements are better off listed within a separate collection.
Then again, the presentation of this question is more of a "hypothetical" situation without understanding your actual use case for performing this sort of udpate. So if you possibly described what you actually need to do and how your data really looks in another question, then that might get a more meaningful response in terms of the best approach for you to use.