MongoDB lookup issues with performance php - mongodb

I am trying to fetch data from mongoDB using lookup
collections companies
{
"id" : 1,
"company_id" : 2,
"user_group" : 1,
"company_name" : "xyz",
"created_on" : "00-00-0000"
}
collection users
{
"id" : 1,
"company_id" : 1,
"user_group" : 1,
"name" : "abcd",
"email" : "abcd#abcd.abcd"
}
{
"id" : 1,
"company_id" : 2,
"active": 1,
"user_group" : 1,
"name" : "efgh",
"email" : "efgh#efgh.efgh"
}
Query used to fetch data using php
$collection->aggregate([
['$match' => ['company_id' => 2]],
['$lookup' => [
'from' => 'users',
'localField' => 'user_group',
'foreignField' => 'user_group',
'as' => 'company_users',
]],
['$unwind' => ['path' => '$company_users', 'preserveNullAndEmptyArrays' => true]],
['$match' => ['$and' => [['company_users.company_id' => 2], ['company_users.active' => 1]]]],
['$project' => [
'_id' => false,
'company_id' => true,
'user_group' => true,
'company_users.name' => true,
'company_users.email' => true
]
]
]);
Query is working correctly but takes more time to retrieve data if document is greater than 1000

Related

laravel eloquent SUM with relation

Hi guys I want to calulate sum of rate for comments group by type like this
Post::with(['comments' => function ($q) {
$q->selectRaw('type, SUM(rate) as total_rate')
->groupBy('type');
}])
I'm waiting for a result like this:
0 => array:4 [
"id" => 5
"start_date" => "2022-01-01"
"end_date" => "2022-01-31"
"comments" => array:2 [
0 => array:3 [
"type" => "personal"
"total_rate" => 44244.0
]
1 => array:3 [
"type" => "business"
"total_rate" => 22358.0
]
]
but the result is
0 => array:4 [
"id" => 5
"start_date" => "2022-01-01"
"end_date" => "2022-01-31"
"comments" => []
]

MongoDB Aggregation - Accessing lookup fields in project

I access a lookup field in $project using $unwind but this breaks the accessibility of the other nested fields from the main collection. Is there any way to access the fields from both collections in $project. I thought of merging the arrays but still not sure if it's the right approach.
Users collection
{
"_id" : ObjectId("5a54f739fe0a00373e7ef1e8"),
"team" : {
"name" : "test",
},
"updated_at" : ISODate("2018-05-22T04:28:00Z"),
"created_at" : ISODate("2018-01-09T17:09:13Z"),
"users" : [
{
"updated_at" : ISODate("2018-11-22T11:55:22Z"),
"created_at" : ISODate("2018-01-09T17:09:13Z"),
"_id" : ObjectId("5a54f739fe0a00373e7ef1e9"),
"name" : test,
"status" : "active",
"title" : "Engineer",
},
{
"updated_at" : ISODate("2018-11-22T11:55:22Z"),
"created_at" : ISODate("2018-01-09T17:09:13Z"),
"_id" : ObjectId("5a54f739fe0a00373e7ef1e9"),
"name" : test1,
"status" : "passive",
"title" : "Tester",
}
]
}
Comments collection:
{
"_id" : ObjectId("6062178fc73fe806e45c9b69"),
"userId" : "5a54f739fe0a00373e7ef1e9",
'text' : 'this is a test',
"status" : "1",
"timestamp" : ISODate("2021-03-29T18:08:14.317Z")
}
Pipeline
$pipeline = [['$match' => [
'users' => [
'$elemMatch' => [
'field1' => $field1,
],
]
]
],
['$unwind' => '$users'],
['$match' => [
'users.field1' => $field1,
]
],
['$addFields' => ['userId' => ['$toString' => '$userId' ]]],
['$lookup' => [
'from' => 'comments',
'localField' => 'userId',
'foreignField' => 'userId',
'as' => 'userComments'
]
],
['$unwind' => '$userComments'],
['$project' => [
'comments' => [
'$switch' => [
'branches' => [
[ 'case' => [
'$eq' => ['$userComments.status','verified']
],
'then' => 1],
[ 'case' => [
'$lte' => ['$userComments.status', '']
],
'then' => 1],
],
'default' => 0
]
],
'status' => '$users.status',
'total' => [false],
]
],
['$group' => [
'_id' => $groupBy,
'text' => ['$sum' => '$comments'],
'total' => ['$sum' => '$total'],
'completed' => ['$sum' => '$status'],
]
],
];
result
{"_id" :"categories","text": 21,"total": 100,"completed":50}

Aggregation Multiple arrays

Hey i'm having troubles with getting my aggregation right.
I'm having this dataset and within the collection there are a few million other documents alike:
{
"_id": ObjectId("5757c73344ce54ae1d8b456c"),
"hostname": "Baklap4",
"timestamp": NumberLong(1465370500),
"networkList": [
{
"name": "46.243.152.13",
"openConnections": NumberLong(3)
},
{
"name": "46.243.152.50",
"openConnections": NumberLong(4)
}
],
"webserver": "nginx",
"deviceList": [
{
"deviceName": "eth0",
"receive": NumberLong(183263),
"transmit": NumberLong(781595)
},
{
"deviceName": "wlan0",
"receive": NumberLong(0),
"transmit": NumberLong(0)
}
]
}
What I want:
I'd like to get a resultset where i'm doing an average (of every numeric value) for every document within a 300 second timespan.
[
[
'$match' => [
'timestamp' => ['$gte' => $todayMidnight],
'hostname' => $serverName
]
],
[
'$unwind' => '$networkList'
],
[
'$unwind' => '$deviceList'
],
[
'$group' => [
'_id' => [
'interval' => [
'$subtract' => [
'$timestamp',
[
'$mod' => ['$timestamp', 300]
]
]
],
'network' => '$networkList.name',
'device' => '$deviceList.name',
],
'openConnections' => [
'$sum' => '$networkList.openConnections'
],
'cpuLoad' => [
'$avg' => '$cpuLoad'
],
'bytesPerSecond' => [
'$avg' => '$bytesPerSecond'
],
'requestsPerSecond' => [
'$avg' => '$requestsPerSecond'
],
'webserver' => [
'$last' => '$webserver'
],
'timestamp' => [
'$max' => '$timestamp'
]
]
],
[
'$project' => [
'_id' => 0,
'timestamp' => 1,
'cpuLoad' => 1,
'bytesPerSecond' => 1,
'requestsPerSecond' => 1,
'webserver' => 1,
'openConnections' => 1,
'networkList' => '$networkList',
'deviceList' => '$_id.device',
]
],
[
'$sort' => [
'timestamp' => -1
]
]
];
Yet this doesn't give me a list with all devices and per device an average of received and trasmited bytes.
How would one get those?
per given example I was able to get result using this mongo shel query:
var projectTime = {
$project : {
_id : 1,
hostname : 1,
timestamp : 1,
networkList : 1,
webserver : 1,
deviceList : 1,
isoDate : {
$add : [new Date(0), {
$multiply : ["$timestamp", 1000]
}
]
}
}
}
var group = {
$group : {
"_id" : {
time : {
"$add" : [{
"$subtract" : [{
"$subtract" : ["$isoDate", new Date(0)]
}, {
"$mod" : [{
"$subtract" : ["$isoDate", new Date(0)]
},
1000 * 60 * 5 // 1000 milsseconds * 60 seconds * 5 minutes
]
}
]
},
new Date(0)
]
},
"hostname" : "$hostname",
"deviceList_deviceName" : "$deviceList.deviceName",
"networkList_name" : "$networkList.name",
},
xreceive : {
$sum : "$deviceList.receive"
},
xtransmit : {
$sum : "$deviceList.transmit"
},
xopenConnections : {
$avg : "$networkList.openConnections"
},
}
}
var unwindNetworkList = {
$unwind : "$networkList"
}
var unwindSeviceList = {
$unwind : "$deviceList"
}
var match = {
$match : {
"_id.time" : ISODate("2016-06-09T08:05:00.000Z")
}
}
var finalProject = {
$project : {
_id : 0,
timestamp : "$_id.time",
hostname : "$_id.hostname",
deviceList_deviceName : "$_id.deviceList_deviceName",
networkList_name : "$_id.networkList_name",
xreceive : 1,
xtransmit : 1,
xopenConnections : 1
}
}
db.baklap.aggregate([projectTime, unwindNetworkList,
unwindSeviceList,
group,
match,
finalProject
])
db.baklap.findOne()
then output:
{
"xreceive" : NumberLong(0),
"xtransmit" : NumberLong(0),
"xopenConnections" : 4.0,
"timestamp" : ISODate("2016-06-09T08:05:00.000Z"),
"hostname" : "Baklap4",
"deviceList_deviceName" : "wlan0",
"networkList_name" : "46.243.152.50"
}
{
"xreceive" : NumberLong(183263),
"xtransmit" : NumberLong(781595),
"xopenConnections" : 4.0,
"timestamp" : ISODate("2016-06-09T08:05:00.000Z"),
"hostname" : "Baklap4",
"deviceList_deviceName" : "eth0",
"networkList_name" : "46.243.152.50"
}
{
"xreceive" : NumberLong(183263),
"xtransmit" : NumberLong(781595),
"xopenConnections" : 3.0,
"timestamp" : ISODate("2016-06-09T08:05:00.000Z"),
"hostname" : "Baklap4",
"deviceList_deviceName" : "eth0",
"networkList_name" : "46.243.152.13"
}
{
"xreceive" : NumberLong(0),
"xtransmit" : NumberLong(0),
"xopenConnections" : 3.0,
"timestamp" : ISODate("2016-06-09T08:05:00.000Z"),
"hostname" : "Baklap4",
"deviceList_deviceName" : "wlan0",
"networkList_name" : "46.243.152.13"
}
The main point is be aware than every time $unwind is processed, our data gets a bit of pollution. This could give a side effect when summing data (average will be same as (2+2+3+3)/4 is same as (2+3)/2))
To check that - you could add x:{$push:"$$ROOT"} in group stage and check values after pipeline executed - as you will have all source documents for given data peroid

MongoDB Aggregate by fields exists

I need to perform a sum with the following collection's schema:
{
"_id" : "20160530/108107/31",
"metadata" : {
"date" : "2016-05-30",
"offer" : "108107",
"adv" : 31,
"update" : ISODate("2016-05-30T15:27:20.240Z")
},
"daily_unique" : 4,
"daily_gross" : 4,
"hourly" : {
"17" : {
"unique" : 4,
"gross" : 4
}
},
"publisher" : {
"738" : {
"daily_unique" : 3,
"daily_gross" : 3,
"hourly" : {
"17" : {
"unique" : 3,
"gross" : 3
}
}
},
"43" : {
"daily_unique" : 1,
"daily_gross" : 1,
"hourly" : {
"17" : {
"unique" : 1,
"gross" : 1
}
}
}
}
},
{
"_id" : "20160530/78220/59",
"metadata" : {
"date" : "2016-05-30",
"offer" : "78220",
"adv" : 59,
"update" : ISODate("2016-05-30T15:24:49.900Z")
},
"daily_unique" : 2,
"daily_gross" : 2,
"hourly" : {
"17" : {
"unique" : 2,
"gross" : 2)
}
},
"publisher" : {
"43" : {
"daily_unique" : 2,
"daily_gross" : 2,
"hourly" : {
"17" : {
"unique" : 2,
"gross" : 2
}
}
}
}
}
First document have data from publisher 738 and 43, but second have data only from 43.
So, when I want to sum all data from publisher 738, I need to sum all daily_gross, or daily_unique only if its present in the publisher, as in the first document.
I am trying some different approaches, with $exists and $cond, but not getting results
aggregate(
['$match' => ['metadata.date' => date('Y-m-d')]],
['$group' => [
'_id' => '$metadata.offer',
'daily_u' => ['$sum' => '$daily_unique']
],
])
which gives me
[
0 => [
'_id' => '108107'
'daily_u' => 4
]
1 => [
'_id' => '78220'
'daily_u' => 2
]
]
When I try to dive deep in publisher I cannot get the results I want:
aggregate(
['$match' => ['metadata.date' => date('Y-m-d')]],
['$group' => [
'_id' => '$metadata.offer',
'daily_u' => [
'$sum' => [
'$cond' => [
'if' => [
'publisher.738' => ['$exists' => true],
'then' => 1,
'else' => 0
]
]
]
],
]]
)
But cannot get daily by publisher.
It even gets complicated when I try to get hourly data.
Can anybody point me in the right direction?
Thanks in advance.

Mongodb aggrgation function for sum of embedded documents not working

i get an error like
Array
(
[errmsg] => exception: the $unwind field path must be specified as a string
[code] => 15981
[ok] => 0
)
While i used following query for given embedded document (I want to sum of rate_number from my given record of table structure)
global $DB, $mongo;
$theObjId = new MongoId($post_id);
$collection = $mongo->getCollection('mongo_hw_posts');
$rt_sum = $collection->aggregate(
array('$unwind'=>$rate),
array('$group'=>
array(
'_id' => $theObjId
),
array(
'rate_number'=>array('$sum' =>$rate.'rate_number')
))
);
table structure
{
"_id": ObjectId("51ff3b38636e3b9803000001"),
"class_id": NumberInt(2986),
"created_by": NumberInt(1758),
"created_datetime": NumberInt(1375681336),
"deleted": NumberInt(0),
"learn": {
"0": {
"user_id": NumberInt(0),
"learn_date": NumberInt(0)
}
},
"parent_id": "0",
"post_text": "2%20C",
"post_type": "text_comment",
"rate": {
"0": {
"user_id": NumberInt(0),
"rate_date": NumberInt(0),
"rate_number": NumberInt(0)
},
"1": {
"user_id": NumberInt(1457),
"rate_date": NumberInt(1375764137),
"rate_number": NumberInt(3)
},
"2": {
"user_id": NumberInt(1619),
"rate_date": NumberInt(1375764694),
"rate_number": NumberInt(8)
}
},
"serialized_data": "",
"unique_key": "8bdddfe8137d14702b4517f7e8e88ee3",
"user_role": "student"
}
There are a few things wrong with your aggrgation command.
$unwind
Instead of:
array( '$unwind' => $rate ),
You need to use:
array( '$unwind' => '$rate'),
$rate is not just a PHP variable, but a field-value expression in MongoDB.
But you can't use the $unwind like this either, because of:
"errmsg" : "exception: Value at end of $unwind field path '$rate' must be an Array, but is a Object",
That's because rate is:
"rate": {
"0": {
"user_id": NumberInt(0),
"rate_date": NumberInt(0),
"rate_number": NumberInt(0)
},
"1": {
"user_id": NumberInt(1457),
"rate_date": NumberInt(1375764137),
"rate_number": NumberInt(3)
},
"2": {
"user_id": NumberInt(1619),
"rate_date": NumberInt(1375764694),
"rate_number": NumberInt(8)
}
}
But it needs to look like:
"rate": [
{
"user_id": NumberInt(0),
"rate_date": NumberInt(0),
"rate_number": NumberInt(0)
},
{
"user_id": NumberInt(1457),
"rate_date": NumberInt(1375764137),
"rate_number": NumberInt(3)
},
{
"user_id": NumberInt(1619),
"rate_date": NumberInt(1375764694),
"rate_number": NumberInt(8)
}
]
Otherwise, $unwind will not work. You need to change your documents for this.
$group
Your $group is also wrong, instead of:
array('$group'=>
array(
'_id' => $theObjId
),
array(
'rate_number'=>array('$sum' =>$rate.'rate_number')
)
)
You need to make it syntax wise:
array('$group'=>
array(
'_id' => $theObjId
'rate_number'=>array('$sum' => '$rate.rate_number')
)
)
I don't understand why you have:
'_id' => $theObjId
Are you trying to only summarize the rates for one post? If that's the case, you will need to add a $match and change the $theObjId to null, something like this:
$rt_sum = $collection->aggregate(
array( '$match' => array( '_id' => $theObjId ) ),
array( '$unwind' => '$rate' ),
array( '$group'=>
array(
'_id' => null,
'rate_number' => array('$sum' => '$rate.rate_number')
)
)
);
The full example is here:
<?php
$m = new MongoClient;
$c = $m->test->so;
$c->drop();
$post_id = "51ff3b38636e3b9803000001";
$theObjId = new MongoId($post_id);
$c->insert( array(
"_id" => new MongoId("51ff3b38636e3b9803000001"),
"class_id" => 2986,
"created_by" => 1758,
"created_datetime" => 1375681336,
"deleted" => 0,
"learn" => array(
array(
"user_id" => 0,
"learn_date" => 0
)
),
"parent_id" => "0",
"post_text" => "2%20C",
"post_type" => "text_comment",
"rate" => array(
array(
"user_id" => 0,
"rate_date" => 0,
"rate_number" => 0
),
array(
"user_id" => 1457,
"rate_date" => 1375764137,
"rate_number" => 3
),
array(
"user_id" => 1619,
"rate_date" => 1375764694,
"rate_number" => 8
)
),
"serialized_data" => "",
"unique_key" => "8bdddfe8137d14702b4517f7e8e88ee3",
"user_role" => "student"
) );
$rt_sum = $c->aggregate(
array( '$match' => array( '_id' => $theObjId ) ),
array( '$unwind' => '$rate' ),
array( '$group'=>
array(
'_id' => null,
'rate_number' => array('$sum' => '$rate.rate_number')
)
)
);
var_dump ($rt_sum);
And the output is:
array(2) {
'result' =>
array(1) {
[0] =>
array(2) {
'_id' => NULL
'rate_number' => int(11)
}
}
'ok' =>
double(1)
}