Aggregation of data in mongodb collections - mongodb

in my current project I've got structure like:
-
event_id: 1
attr1: val1
attr2: val2
topics:
-
topic_id: 1
attr1: val1
-
topic_id: 2
attr1: val2
-
event_id: 2
attr1: val3
attr2: val4
topics:
- topic_id: 1
attr1: val7
-
topic_id: 3
attr1: val6
-
topic_id: 4
attr1: val8
I need to be able to present data in two alternative views:
{ event_id: 1, attr1: val1, attr2: val2, topic_count: 2 },
{ event_id: 2, attr1: val3, attr2: val4, topic_count: 3 }
And:
{ topic_id: 1, event_id: 2, event_attr1: val3, e: [ { event_id: 1, e1_attr... }, { event_id: 2, e_attr... } ], t_attr... },
{ topic_id: 2, event_id: 1, e_attr1: val1, e: [ { event_id: 1, e1_attr... } ], t_attr... },
{ topic_id: 3, event_id: 2, e_attr1: val3, e: [ { event_id: 2, e2_attr... } ], t_attr... },
{ topic_id: 4, event_id: 2, e_attr1: val3, e: [ { event_id: 2, e2_attr... } ], t_attr... }
I am using aggregation pipeline to achieve this structure, and it works fine, but mongo gets slow and memory hungry when using sorting, filtering and pagination on bigger data sets. One of ideas was to create second redundant structure like (with attributes, omited bellow):
-
t_id: 1
e:
-
e_id: 1
-
e_id: 2
-
t_id: 2
-
e_id: 1
-
t_id: 3
-
e_id: 2
-
t_id: 4
-
e_id: 2
And update both collections simultaneously on every write. Other way was filtering collection both before and after $project and $unwind, but that makes writing filters more difficult.
Another approach I've been considering was making flat list, like:
{ event_id: 1, topic_id: 1, event1_attrs: ... , topic1_attrs: ... },
{ event_id: 1, topic_id: 2, event1_attrs: ... , topic2_attrs: ... },
{ event_id: 2, topic_id: 1, event2_attrs: ... , topic1_attrs: ... },
{ event_id: 2, topic_id: 3, event2_attrs: ... , topic3_attrs: ... },
{ event_id: 2, topic_id: 4, event2_attrs: ... , topic4_attrs: ... }
But I'm bit afraid of grouping performance.
Does any of those approaches makes any sense, or is emulating searchable, sortable and paginable many2many relationship in mongodb utterly stupid and unachievable on larger data sets? I'm browsing web searching answer on that question, and I found lots of voices weighing both sides.

Related

How can I have array filter count from nested array

Data:
_id: ObjectId(''),
restaurantId: ObjectId(''),
orderId: ObjectId(''),
reviews: [
{
type: "food",
tags: ["good food", "nice food"]
},
{
type: "pricing",
tags: ["best price", "good price"]
}
]
Group by restaurant id
Get total reviews by type: Ex: food: 3, pricing: 2, ambience: 1
Group by type and tags and get counts:
Ex: food-superb: 1, food-loudt:2, pricing-superb: 2, ambience-superb: 1
Expected result:
{
_id: restaurantId,
types: {
food: 4,
....
},
tags: {
nicePrice: 20
}
}

get 2 field count for further calculation using mongo query

I have below 2 colection
**aggregate collection**
{
_id: '1',
test_name: 'test1',
status: "flaked"
test_ids: [1, 2]
}
{
_id: '2',
test_name: 'test1',
status: "Flaked"
test_ids: [3, 4]
}
**test collection**
{
_id: 1,
run_id: 1,
test_name: 'test1',
status: "fail"
pipeline: "A"
}
{
_id: 2,
run_id: 2
test_name: 'test1',
status: "pass",
pipeline: "A"
}
{
_id: 3,
run_id: 3
test_name: 'test1',
status: "Infra",
pipeline: "B"
}
{
_id: 4,
run_id: 4
status: "pass",
test_name: 'test1',
pipeline: "B"
}
I am trying to % of how many times the test flaked like No of Flakes/ no of runs. Above should give me (1/4).
Test is considered flake only when aggregate result is flaked and one of test is failed. Can this be built with one query.

Advanced MongoDB Query needed

I am new to MongoDB queries, and I understand the basics of find. However I haven't yet come to grips with $lookup, $project, $aggregate, $match, etc, which is pretty much needed to do anything fancy. And I need a fancy query 😃 See tricky issue below:
Collections:
schema
{ _id: 1, name: “Schema1”, }
{ _id: 2, name: “Schema2”, }
device:
{ _id: 1, schema: 1, organisation: 2 }
{ _id: 2, schema: 1, organisation: 2 }
{ _id: 3, schema: 2, organisation: 2 }
field:
{ _id: 1, organisation: 2, name: “cost”, displayType: “number” }
{ _id: 2, organisation: 2, name: “retail”, displayType: “number” }
{ _id: 3, organisation: 2, name: “project”, displayType: “string” }
fieldvalue
{ _id: 1, device: 1, field: 1, organisation: 2, value: 2000 }
{ _id: 2, device: 1, field: 2, organisation: 2, value: 3000 }
{ _id: 3, device: 2, field: 1, organisation: 2, value: 1000 }
{ _id: 4, device: 2, field: 2, organisation: 2, value: 1000 }
{ _id: 5, device: 3, field: 1, organisation: 2, value: 500 }
{ _id: 6, device: 1, field: 3, organisation: 2, value: “Project1” }
{ _id: 7, device: 2, field: 3, organisation: 2, value: “Project2” }
{ _id: 8, device: 3, field: 3, organisation: 2, value: “Project2” }
I want to query FieldValue.
I want to $sum all “value” where field is 1
BUT only those which “shares” device with field 3 with value “Project2”.
As parameters I have:
The id of the field I want to sum, e.g. 1 (cost)
The id of the "project” field, e.g. 3 (project)
The value of the “project” field (id 3) which also must be met in order to qualify for summation.
So my query should only $sum the “value” of id’s: 3 and 5
How can I do that in Mongo Query?
And would it be possible to add more contraints? E.g. the “schema” of the “device” must be 3, which should then result in a $sum of just id: 5

Sort 3 columns in different sort directions using Mongo Middleware

When using Mongo Middleware, you can configure multiple columns for ascending / descending order as shown in the example
You may also specify ascending and descending sorts together:
var options = {
sort : {
asc : 'name'
desc : ['birthday', 'home']
}
};
The problem I wish to solve is how to have each column with its own configured sort direction.
The Middleware sort configuration only seems to support two configuration nodes (asc & desc).
There is no way to do something like the following.
var options = {
sort : {
name: 'asc',
birthday: 'desc',
home: asc
}
};
Well it would seem is that all you really need to do is call the normal .sort() method from Mongoose instead of the .order() method of the plugin:
var async = require('async'),
mongoose = require('mongoose'),
Schema = mongoose.Schema;
require('mongoose-middleware').initialize(mongoose);
mongoose.connect('mongodb://localhost/test');
var testSchema = new Schema({
a: Number,
b: Number,
c: Number
});
var Test = mongoose.model( 'Test', testSchema );
async.series(
[
function(callback) {
Test.remove({},callback);
},
function(callback) {
async.each(
[
{ a: 1, b: 2, c: 3 },
{ a: 2, b: 1, c: 4 },
{ a: 4, b: 3, c: 1 },
{ a: 2, b: 2, c: 1 }
],
function(item,callback) {
Test.create(item,callback);
},
callback
);
},
function(callback) {
Test.find().sort("a -b c").page({
start: 0,
count: 3
},function(err,docs) {
console.log(docs);
callback(err);
});
}
],
function(err) {
if (err) throw err;
mongoose.disconnect();
}
);
Which returns the results affected by the middleware itself as per normal:
{ options: { start: 0, count: 3 },
results:
[ { _id: 55da9c9593d2c4ed0d26cf79, a: 1, b: 2, c: 3, __v: 0 },
{ _id: 55da9c9593d2c4ed0d26cf7c, a: 2, b: 2, c: 1, __v: 0 },
{ _id: 55da9c9593d2c4ed0d26cf7a, a: 2, b: 1, c: 4, __v: 0 } ],
total: 4 }
So the standard method for affecting the cursor seems to be the most appropriate to use in this case.
If you really want, then just put in something that reflects the pagination options into the configuration. It apparently does not matter unless the specific .order() method is actually called:
var async = require('async'),
mongoose = require('mongoose'),
Schema = mongoose.Schema;
require('mongoose-middleware').initialize(mongoose);
mongoose.connect('mongodb://localhost/test');
var testSchema = new Schema({
a: Number,
b: Number,
c: Number
});
var Test = mongoose.model( 'Test', testSchema );
async.series(
[
function(callback) {
Test.remove({},callback);
},
function(callback) {
async.each(
[
{ a: 1, b: 2, c: 3 },
{ a: 2, b: 1, c: 4 },
{ a: 4, b: 3, c: 1 },
{ a: 2, b: 2, c: 1 }
],
function(item,callback) {
Test.create(item,callback);
},
callback
);
},
function(callback) {
Test.find().sort({ a: 1, b: -1, c: 1 }).page({
start: 0,
count: 3,
sort: {
a: 1,
b: -1,
c: 1,
}
},function(err,docs) {
console.log(docs);
callback(err);
});
}
],
function(err) {
if (err) throw err;
mongoose.disconnect();
}
);
And the "options" are really just "dumped out":
{ options: { start: 0, count: 3, sort: { a: 1, b: -1, c: 1 } },
results:
[ { _id: 55da9e12c74b0e0b0eeec3cf, a: 1, b: 2, c: 3, __v: 0 },
{ _id: 55da9e12c74b0e0b0eeec3d2, a: 2, b: 2, c: 1, __v: 0 },
{ _id: 55da9e12c74b0e0b0eeec3d0, a: 2, b: 1, c: 4, __v: 0 } ],
total: 4 }
Where { a: 1, b: -1, c: 1 } is a standard MongoDB form for a supported sort operation despite the extended syntax made available by mongoose. So it is still valid.

In an array of objects, how can I aggregate based on object property?

Say I have the following array of objects:
dataArray = [
{ id: "a", score: 1 },
{ id: "b", score: 2 },
{ id: "c", score: 5 },
...
{ id: "a", score: 3 },
...
{ id: "c", score: 2},
...
]
How can I obtain a resultArray like the following:
resultArray = [
{ id: "a", score: sum of all the scores when id is a },
{ id: "b", score: sum of all the scores when id is b },
...
...
]
If you use the underscore library:
_.map _.groupBy(dataArray, 'id'), (v, k) ->
{id: k, score: _.reduce(v, ((m, i) -> m + i['score']), 0) }
The Underscore version is probably the most succinct. This is a plain CoffeeScript version that only creates one auxiliary object to have fast access by id and make the whole thing O(n):
aggregateScores = (dataArr) ->
scores = {}
for {id, score} in dataArr
scores[id] = (scores[id] or 0) + score
{id, score} for id, score of scores
console.log aggregateScores [
{ id: "a", score: 1 }
{ id: "b", score: 2 }
{ id: "c", score: 5 }
{ id: "a", score: 3 }
{ id: "c", score: 2 }
]
# Output:
# [{id:"a", score:4}, {id:"b", score:2}, {id:"c", score:7}]
This is just plain JavaScript, but here is the long answer to your question:
function aggregate(values, init, keyGetter, valueGetter, aggregator) {
var results = {}
for (var index = 0; index != values.length; ++index) {
var value = values[index]
var key = keyGetter(value)
var soFar;
if (key in results) {
soFar = results[key]
} else {
soFar = init
}
value = valueGetter(value)
results[key] = aggregator(soFar, value)
}
return results
}
var array = [
{ id: 'a', score: 1 },
{ id: 'b', score: 2 },
{ id: 'c', score: 5 },
{ id: 'a', score: 3 },
{ id: 'c', score: 2 }
]
function keyGetter(value) {
return value.id
}
function valueGetter(value) {
return value.score
}
function aggregator(sum, value) {
return sum + value
}
function ready() {
var results = aggregate(array, 0, keyGetter, valueGetter, aggregator)
console.info(results)
}
Here's a straightforward coffeescript version:
data = [
{ id: "a", score: 1 }
{ id: "b", score: 2 }
{ id: "a", score: 5 }
{ id: "c", score: 2 }
{ id: "b", score: 3 }
]
# Aggregate scores in a map.
resultSet = {}
for obj in data
resultSet[obj.id] ?= 0
resultSet[obj.id] += obj.score
console.log resultSet
# Create array from map.
resultArr = for key, val of resultSet
{ id: key, score: val}
console.log resultArr
The output is:
{ a: 6, b: 5, c: 2 }
[ { id: 'a', score: 6 },
{ id: 'b', score: 5 },
{ id: 'c', score: 2 } ]
I'm sure it's possible to create a fancier solution using the functions in underscore, but the coffeescript solution isn't bad so I went for something simple to understand.
It's a bit overkill if this is the only aggregation you want to do but there is a nicely documented aggregation library called Lumenize, that does simple group-by operations like this in addition to more advanced pivot table, n-dimensional cubes, hierarchical roll-ups, and timezone-precise time-series aggregations.
Here is the jsFiddle for a Lumenize solution.
If you want to try it in node.js:
npm install Lumenize --save
then put this into a file named lumenizeGroupBy.coffee:
lumenize = require('Lumenize')
dataArray = [
{ id: "a", score: 1 },
{ id: "b", score: 2 },
{ id: "c", score: 5 },
{ id: "a", score: 3 },
{ id: "c", score: 2}
]
dimensions = [{field:'id'}]
metrics = [{field: 'score', f: 'sum', as: 'sum'}]
config = {dimensions, metrics}
cube = new lumenize.OLAPCube(config, dataArray)
console.log(cube.toString(null, null, 'sum'))
and run
coffee lumenizeGroupBy.coffee