MongoDB Aggregate by fields exists - mongodb

I need to perform a sum with the following collection's schema:
{
"_id" : "20160530/108107/31",
"metadata" : {
"date" : "2016-05-30",
"offer" : "108107",
"adv" : 31,
"update" : ISODate("2016-05-30T15:27:20.240Z")
},
"daily_unique" : 4,
"daily_gross" : 4,
"hourly" : {
"17" : {
"unique" : 4,
"gross" : 4
}
},
"publisher" : {
"738" : {
"daily_unique" : 3,
"daily_gross" : 3,
"hourly" : {
"17" : {
"unique" : 3,
"gross" : 3
}
}
},
"43" : {
"daily_unique" : 1,
"daily_gross" : 1,
"hourly" : {
"17" : {
"unique" : 1,
"gross" : 1
}
}
}
}
},
{
"_id" : "20160530/78220/59",
"metadata" : {
"date" : "2016-05-30",
"offer" : "78220",
"adv" : 59,
"update" : ISODate("2016-05-30T15:24:49.900Z")
},
"daily_unique" : 2,
"daily_gross" : 2,
"hourly" : {
"17" : {
"unique" : 2,
"gross" : 2)
}
},
"publisher" : {
"43" : {
"daily_unique" : 2,
"daily_gross" : 2,
"hourly" : {
"17" : {
"unique" : 2,
"gross" : 2
}
}
}
}
}
First document have data from publisher 738 and 43, but second have data only from 43.
So, when I want to sum all data from publisher 738, I need to sum all daily_gross, or daily_unique only if its present in the publisher, as in the first document.
I am trying some different approaches, with $exists and $cond, but not getting results
aggregate(
['$match' => ['metadata.date' => date('Y-m-d')]],
['$group' => [
'_id' => '$metadata.offer',
'daily_u' => ['$sum' => '$daily_unique']
],
])
which gives me
[
0 => [
'_id' => '108107'
'daily_u' => 4
]
1 => [
'_id' => '78220'
'daily_u' => 2
]
]
When I try to dive deep in publisher I cannot get the results I want:
aggregate(
['$match' => ['metadata.date' => date('Y-m-d')]],
['$group' => [
'_id' => '$metadata.offer',
'daily_u' => [
'$sum' => [
'$cond' => [
'if' => [
'publisher.738' => ['$exists' => true],
'then' => 1,
'else' => 0
]
]
]
],
]]
)
But cannot get daily by publisher.
It even gets complicated when I try to get hourly data.
Can anybody point me in the right direction?
Thanks in advance.

Related

Mongodb update on nested document with sort and slice

The document represents one users having images. Each image can have N images related to it. I would like to be able to update the matches list only if:
The match does exist yet.
There is less then N elements in the matches array
If they are already N element, only push if "c" parameter is higher than the lower present.
{
"user_id" : 1,
"imgs" : [
{
"img_id" : 1,
"matches" : [
{
"c" : 0.3,
"img_id" : 2
},
{
"c" : 0.2,
"img_id" : 3
}
]
},
{
"img_id" : 5,
"matches" : [
{
"c" : 0.4,
"img_id" : 6
}
]
}
]
}
Basically, "matches" is a set, but $addToSet does not provide $slice and $sort, so I am trying to use $push instead.
db.stack.updateOne(
{ "user_id" : 1, "imgs.img_id" : 1, "imgs.matches.img_id" : { "$ne" : 2 } },
{ "$push" : { "imgs.$.matches" : { "$each" : [ { "c" : 0.7, "img_id" : 2} ], "$sort" : { "c" : -1 }, "$slice" : 3 } } }
);
Does not work, since my document get inserted several times.
Your issue is with the filter part of the updateOne. You should use $elemMatch to make sure that the filter is applied to only one element of the "matches" list.
{"user_id": 1, "imgs": {"$elemMatch": {"img_id" : 1, "matches.img_id": {"$ne": 2}}}},
{ "$push" : { "imgs.$.matches" : { "$each" : [ { "c" : 0.7, "img_id" : 2} ], "$sort" : { "c" : -1 }, "$slice" : 3 } } })

Get data with not,and

MongoDB get data with not,and How to get value for INID not equal to 1 and SESSION not equal to 1 ( need to match INID and SESSION in same document ).
Ex:
{
"_id" : ObjectId("5946800b962d74070729407a"),
"INID" : 2,
"SESSION" : 1,
"TD" : ISODate("2017-06-18T13:28:43.409Z"),
"ID" : 2,
"OUT" : [
{
"score" : 50,
"id" : "0",
"out" : {
"status" : "unreachable"
}
}
]
}
{
"_id" : ObjectId("5946800b962d74070729407a"),
"INID" : 3,
"SESSION" : 1,
"TD" : ISODate("2017-06-18T13:28:43.409Z"),
"ID" : 2,
"OUT" : [
{
"score" : 50,
"id" : "0",
"out" : {
"status" : "unreachable"
}
}
]
}
{
"_id" : ObjectId("5946800b962d74070729407a"),
"INID" : 1,
"SESSION" : 1,
"TD" : ISODate("2017-06-18T13:28:43.409Z"),
"ID" : 2,
"OUT" : [
{
"score" : 50,
"id" : "0",
"out" : {
"status" : "unreachable"
}
}
]
}
I want the first two documents.
Well, this worked for me:
db.yourCollectionName.find(
{ $or : [ { INID : {$gt: 1} }, { SESSION : {$gt: 1} } ] }
)
With this query you can have INID larger than 1 or SESSION larger than 1 or both larger than one. Why would you need to negate?
I guess you can also do this:
db.yourCollectionName.find(
{ $or : [ { INID : {$ne: 1} }, { SESSION : {$ne: 1} } ] }
)

Use $ and $elemMatch to group entities

Considering the following document in my mongo DB instance :
{
"_id": 1,
"people": [
{"id": 1, "name": "foo"},
{"id": 2, "name": "bar"},
/.../
],
"stats": [
{"peopleId": 1, "workHours": 24},
{"peopleId": 2, "workHours": 36},
/.../
}
Each element in my collection represent the work of every employee in my company, each weeks. As an important note, peopleId may change from one week to another !
I would like to get all weeks where foo worked more than 24 hours. As you can see, the format is kinda annoying since the people name and the work hours are separated in my database. A simple $and is not enough.
I wonder if, using some $ and $elemMatch I can achieve doing this query.
Can I use this to group the "people" entities with "stats" entities ?
Query to get foo worked more than 24 hours.
db.collection.aggregate([
{$unwind: { path : "$people"}},
{$unwind: { path : "$stats"}},
{$match: { "people.name" : "foo"}},
{$group: {
_id: "$_id",
peopleIdMoreThan24: { $addToSet: {
$cond : { if : { $and : [ {"$eq" : ["$people.id", "$stats.peopleId" ] },
{"$gt" : ["$stats.workHours", 24] }]} , then : "$people.id", else: "Not satisfying the condition"}}}
}
},
{$unwind: { path : "$peopleIdMoreThan24" }},
{$match: { "peopleIdMoreThan24" : {$nin : [ "Not satisfying the condition"]}}},
]);
Data in collection:-
/* 1 */
{
"_id" : 1,
"people" : [
{
"id" : 1,
"name" : "foo"
},
{
"id" : 2,
"name" : "bar"
}
],
"stats" : [
{
"peopleId" : 1,
"workHours" : 24
},
{
"peopleId" : 2,
"workHours" : 36
}
]
}
/* 2 */
{
"_id" : 2,
"people" : [
{
"id" : 1,
"name" : "foo"
},
{
"id" : 2,
"name" : "bar"
}
],
"stats" : [
{
"peopleId" : 1,
"workHours" : 25
},
{
"peopleId" : 2,
"workHours" : 36
}
]
}
/* 3 */
{
"_id" : 3,
"people" : [
{
"id" : 1,
"name" : "foo"
},
{
"id" : 2,
"name" : "bar"
}
],
"stats" : [
{
"peopleId" : 1,
"workHours" : 25
},
{
"peopleId" : 2,
"workHours" : 36
}
]
}
Output:-
The output has document id and people id of foo worked more than 24 hours.
/* 1 */
{
"_id" : 3,
"peopleIdMoreThan24" : 1
}
/* 2 */
{
"_id" : 2,
"peopleIdMoreThan24" : 1
}

Aggregation Multiple arrays

Hey i'm having troubles with getting my aggregation right.
I'm having this dataset and within the collection there are a few million other documents alike:
{
"_id": ObjectId("5757c73344ce54ae1d8b456c"),
"hostname": "Baklap4",
"timestamp": NumberLong(1465370500),
"networkList": [
{
"name": "46.243.152.13",
"openConnections": NumberLong(3)
},
{
"name": "46.243.152.50",
"openConnections": NumberLong(4)
}
],
"webserver": "nginx",
"deviceList": [
{
"deviceName": "eth0",
"receive": NumberLong(183263),
"transmit": NumberLong(781595)
},
{
"deviceName": "wlan0",
"receive": NumberLong(0),
"transmit": NumberLong(0)
}
]
}
What I want:
I'd like to get a resultset where i'm doing an average (of every numeric value) for every document within a 300 second timespan.
[
[
'$match' => [
'timestamp' => ['$gte' => $todayMidnight],
'hostname' => $serverName
]
],
[
'$unwind' => '$networkList'
],
[
'$unwind' => '$deviceList'
],
[
'$group' => [
'_id' => [
'interval' => [
'$subtract' => [
'$timestamp',
[
'$mod' => ['$timestamp', 300]
]
]
],
'network' => '$networkList.name',
'device' => '$deviceList.name',
],
'openConnections' => [
'$sum' => '$networkList.openConnections'
],
'cpuLoad' => [
'$avg' => '$cpuLoad'
],
'bytesPerSecond' => [
'$avg' => '$bytesPerSecond'
],
'requestsPerSecond' => [
'$avg' => '$requestsPerSecond'
],
'webserver' => [
'$last' => '$webserver'
],
'timestamp' => [
'$max' => '$timestamp'
]
]
],
[
'$project' => [
'_id' => 0,
'timestamp' => 1,
'cpuLoad' => 1,
'bytesPerSecond' => 1,
'requestsPerSecond' => 1,
'webserver' => 1,
'openConnections' => 1,
'networkList' => '$networkList',
'deviceList' => '$_id.device',
]
],
[
'$sort' => [
'timestamp' => -1
]
]
];
Yet this doesn't give me a list with all devices and per device an average of received and trasmited bytes.
How would one get those?
per given example I was able to get result using this mongo shel query:
var projectTime = {
$project : {
_id : 1,
hostname : 1,
timestamp : 1,
networkList : 1,
webserver : 1,
deviceList : 1,
isoDate : {
$add : [new Date(0), {
$multiply : ["$timestamp", 1000]
}
]
}
}
}
var group = {
$group : {
"_id" : {
time : {
"$add" : [{
"$subtract" : [{
"$subtract" : ["$isoDate", new Date(0)]
}, {
"$mod" : [{
"$subtract" : ["$isoDate", new Date(0)]
},
1000 * 60 * 5 // 1000 milsseconds * 60 seconds * 5 minutes
]
}
]
},
new Date(0)
]
},
"hostname" : "$hostname",
"deviceList_deviceName" : "$deviceList.deviceName",
"networkList_name" : "$networkList.name",
},
xreceive : {
$sum : "$deviceList.receive"
},
xtransmit : {
$sum : "$deviceList.transmit"
},
xopenConnections : {
$avg : "$networkList.openConnections"
},
}
}
var unwindNetworkList = {
$unwind : "$networkList"
}
var unwindSeviceList = {
$unwind : "$deviceList"
}
var match = {
$match : {
"_id.time" : ISODate("2016-06-09T08:05:00.000Z")
}
}
var finalProject = {
$project : {
_id : 0,
timestamp : "$_id.time",
hostname : "$_id.hostname",
deviceList_deviceName : "$_id.deviceList_deviceName",
networkList_name : "$_id.networkList_name",
xreceive : 1,
xtransmit : 1,
xopenConnections : 1
}
}
db.baklap.aggregate([projectTime, unwindNetworkList,
unwindSeviceList,
group,
match,
finalProject
])
db.baklap.findOne()
then output:
{
"xreceive" : NumberLong(0),
"xtransmit" : NumberLong(0),
"xopenConnections" : 4.0,
"timestamp" : ISODate("2016-06-09T08:05:00.000Z"),
"hostname" : "Baklap4",
"deviceList_deviceName" : "wlan0",
"networkList_name" : "46.243.152.50"
}
{
"xreceive" : NumberLong(183263),
"xtransmit" : NumberLong(781595),
"xopenConnections" : 4.0,
"timestamp" : ISODate("2016-06-09T08:05:00.000Z"),
"hostname" : "Baklap4",
"deviceList_deviceName" : "eth0",
"networkList_name" : "46.243.152.50"
}
{
"xreceive" : NumberLong(183263),
"xtransmit" : NumberLong(781595),
"xopenConnections" : 3.0,
"timestamp" : ISODate("2016-06-09T08:05:00.000Z"),
"hostname" : "Baklap4",
"deviceList_deviceName" : "eth0",
"networkList_name" : "46.243.152.13"
}
{
"xreceive" : NumberLong(0),
"xtransmit" : NumberLong(0),
"xopenConnections" : 3.0,
"timestamp" : ISODate("2016-06-09T08:05:00.000Z"),
"hostname" : "Baklap4",
"deviceList_deviceName" : "wlan0",
"networkList_name" : "46.243.152.13"
}
The main point is be aware than every time $unwind is processed, our data gets a bit of pollution. This could give a side effect when summing data (average will be same as (2+2+3+3)/4 is same as (2+3)/2))
To check that - you could add x:{$push:"$$ROOT"} in group stage and check values after pipeline executed - as you will have all source documents for given data peroid

Only return inner element from nested array in Mongo

I currently have a collection that contains documents like this
{
"_id" : "sHXFGyTkZBYeZXcax",
"name" : "Sunless Sunday",
"description" : "blabla",
"game_id" : "qPrZBahQLHQXabwuv",
"date_checkin" : ISODate("2015-11-07T01:01:00.000Z"),
"date_start" : ISODate("2015-11-12T00:04:00.000Z"),
"date_end" : ISODate("2015-11-19T00:05:00.000Z"),
"company_id" : 1,
"featured" : 1,
"premium" : 0,
"type" : 0,
"ongoing" : 1,
"prizes" : [
{
"place" : 1,
"amount" : 18
},
{
"place" : 2,
"amount" : 2
}
],
"createdAt" : ISODate("2015-11-05T22:34:01.494Z"),
"modifiedAt" : ISODate("2015-11-05T22:34:01.494Z"),
"owner" : "CLEopD9HRAeE9eiXW",
"players" : [
{
"player_id" : "WdLK9aaRgdPnYsw8B",
"status" : 2
},
{
"player_id" : "vF6JEwMy9yaRtKuiG",
"status" : 1
},
{
"player_id" : "KD4s2E3AezhFcQDCd",
"status" : -1
},
{
"player_id" : "KD4s2E3AezhFcQDCd",
"status" : -1
},
{
"player_id" : "KD4s2E3AezhFcQDCd",
"status" : -1
},
{
"player_id" : "KD4s2E3AezhFcQDCd",
"status" : -1
},
{
"player_id" : "KD4s2E3AezhFcQDCd",
"status" : -1
},
{
"player_id" : "KD4s2E3AezhFcQDCd",
"status" : -1
},
{
"player_id" : "KD4s2E3AezhFcQDCd",
"status" : -1
},
{
"player_id" : "KD4s2E3AezhFcQDCd",
"status" : 1
}
],
"rounds" : [
{
"roundNumber" : 1,
"participants" : [],
"matchesToWin" : 3,
"matches" : [
{
"matchNumber" : 1,
"party1" : "WdLK9aaRgdPnYsw8B"
}
]
},
{
"roundNumber" : 2,
"participants" : [
{
"player_id" : "WdLK9aaRgdPnYsw8B",
"status" : 2
},
{
"player_id" : "vF6JEwMy9yaRtKuiG",
"status" : 1
},
{
"player_id" : "KD4s2E3AezhFcQDCd",
"status" : 1
}
],
"matchesToWin" : 2,
"matches" : [
{
"matchNumber" : 1,
"players" : [
{
"party" : 1
},
{
"player_id" : "WdLK9aaRgdPnYsw8B",
"party" : 2
}
],
"party1" : "freewin",
"party2" : "WdLK9aaRgdPnYsw8B",
"currentGame" : 1,
"winner" : "WdLK9aaRgdPnYsw8B",
"matchFinished" : ISODate("2015-11-16T16:25:37.712Z")
},
{
"matchNumber" : 2,
"players" : [
{
"player_id" : "vF6JEwMy9yaRtKuiG",
"party" : 1
},
{
"player_id" : "KD4s2E3AezhFcQDCd",
"party" : 2
}
],
"party1" : "vF6JEwMy9yaRtKuiG",
"party2" : "KD4s2E3AezhFcQDCd",
"score1" : 0,
"score2" : 0,
"currentGame" : 1
}
]
}
]
}
Now I'm trying to see if a certain player has an active match. So this means that the the player would be either party1 or party2 in a match that has no winner at the moment.
I currently have this query
activeMatch = Tournaments.find({
'ongoing': 1,
'rounds.matches': {
$elemMatch: {
$or: [
{ 'party1': player[0]._id },
{ 'party2': player[0]._id },
],
winner: { $exists: false }
}
}
},
{ "rounds.matches.$": 1 }
);
I have 2 problems at the moment:
My query returns the round it matched, but it does not filter for the match it matched. So I effectively only get round with roundNumber = 2, but from that round I get every match, where as I would only be interested in a specific match.
The filtering only happens when I execute my query through the shell. When I use it in my .jsx file and console.log the result to the client, I see ALL rounds.
Any help?