MongoDB aggregate + $match + $group + Array - mongodb

Here is my MongoDB query :
profiles.aggregate([{"$match":{"channels.sign_up":true}},{"$group":{"_id":"$channels.slug","user_count":{"$sum":1}}},{"$sort":{"user_count":-1}}])
Here is my Code :
$profiles = Profile::raw()->aggregate([
[
'$match' => [
'channels.sign_up' => true
]
],
[
'$group' => [
'_id' => '$channels.slug',
'user_count' => ['$sum' => 1]
]
],
[
'$sort' => [
"user_count" => -1
]
]
]);
Here is my Mongo Collection :
"channels": [
{
"id": "5ae44c1c2b807b3d1c0038e5",
"slug": "swachhata-citizen-android",
"mac_address": "A3:72:5E:DC:0E:D1",
"sign_up": true,
"settings": {
"email_notifications_preferred": true,
"sms_notifications_preferred": true,
"push_notifications_preferred": true
},
"device_token": "ff949faeca60b0f0ff949faeca60b0f0"
},
{
"id": "5ae44c1c2b807b3d1c0038f3",
"slug": "website",
"mac_address": null,
"device_token": null,
"created_at": "2018-06-19 19:15:13",
"last_login_at": "2018-06-19 19:15:13",
"last_login_ip": "127.0.0.1",
"last_login_user_agent": "PostmanRuntime/7.1.5"
}
],
Here is my response :
{
"data": [
{
"_id": [
"swachhata-citizen-android"
],
"user_count": 1
},
{
"_id": [
"icmyc-portal"
],
"user_count": 1
},
{
"_id": [
"swachhata-citizen-android",
"website",
"icmyc-portal"
],
"user_count": 1
}
]
}
what i am expecting is :
{
"data": [
{
"_id": [
"swachhata-citizen-android"
],
"user_count": 1
},
{
"_id": [
"icmyc-portal"
],
"user_count": 1
},
{
"_id": [
"website",
],
"user_count": 1
}
]
}
As you can see channels is an array and "sign_up" is true only for one element in array from where user is registered as we have many app so we have to maintain more than 1 channel for users.
i want to data how many user registered with different channels but in response its coming all the channel instead of one channel where sign_up is true.
Also count is wrong as i have to records where "slug": "swachhata-citizen-android" and "sign_up": true.
Need suggestion :)

Use $unwind to transform each document with arrays to array of documents with nested fields. In your example, like this:
profiles.aggregate([
{$unwind: '$channels'},
{$match: {'channels.sign_up': true}},
{$group: {_id: '$channels.slug', user_count: {$sum: 1}}},
{$sort: {user_count: -1}}
])

Related

How to custom sort a field in MongoDB

I have a collection called INFODOCS which has a field called ML_PRIORITY(HIGH/MEDIUM/LOW) and STATUS(True/False/Null). I want to determine count of STATUS for each ML_PRIORITY and then sort the ML_PRIORITY in order High, Medium and Low.
[
{
"_id": "1",
"ML_PRIORITY" : "HIGH",
"STATUS" : "True"
},
{
"_id": "2",
"ML_PRIORITY" : "HIGH",
"STATUS" : ""
},
{
"_id": "3",
"ML_PRIORITY" : "HIGH",
"STATUS" : "False"
},
{
"_id": "4",
"ML_PRIORITY" : "MEDIUM",
"STATUS" : ""
},
{
"_id": "5",
"ML_PRIORITY" : "Low",
"STATUS" : ""
}
]
I was able to determine the count of STATUS for each ML_PRIORITY using below aggregation pipeline
but I am not sure how can I custom sort the ML_PRIORITY as $sort has only two option (1 and -1)
db.collection.aggregate([
{
'$group': {
'_id': '$ML_PRIORITY',
'QUALITYCHECKDONE': {
'$sum': {
'$cond': [
{
'$eq': [
'$STATUS', 'TRUE'
]
}, 1, 0
]
}
},
'QUALITYCHECKNOTDONE': {
'$sum': {
'$cond': [
{
'$eq': [
'$STATUS', ''
]
}, 1, 0
]
}
},
'QUALITYCHECKNOTREQ': {
'$sum': {
'$cond': [
{
'$eq': [
'$STATUS', 'FALSE'
]
}, 1, 0
]
}
}
}
}, {
'$project': {
'_id': 0,
'ML_PRIORITY': '$_id',
'QUALITYCHECKDONE': 1,
'QUALITYCHECKNOTDONE': 1,
'QUALITYCHECKNOTREQ': 1
}
}
])
Example - https://mongoplayground.net/p/anAwoqZk2Ys
One option is to replace your last step with 3 steps, in order to $set an order field, $sort, and $unset it:
[
{$set: {
order: {$indexOfArray: [["HIGH", "MEDIUM", "Low"], "$_id"]},
"ML_PRIORITY": "$_id"
}},
{$sort: {order: 1}},
{$unset: ["_id", "order"]}
]
See how it works on the playground example

get sum of integer from array of objects in mongodb

I want to filter my documents by sum of decimal field in array of objects, but didn't find anything good enough. for example I have documents like below:
[
{
"id": 1,
"limit": NumberDecimal("100000"),
"requests": [
{
"money": NumberDecimal("50000"),
"user": "user1"
}
]
},
{
"id": 2,
"limit": NumberDecimal("100000"),
"requests": [
{
"money": NumberDecimal("100000"),
"user": "user2"
}
]
},
{
"id": 1,
"limit": null,
"requests": [
{
"money": NumberDecimal("50000"),
"user": "user1"
},
{
"money": NumberDecimal("50000"),
"user": "user3"
}
]
},
]
description by documents fields:
limit - maximum amount of money, that I have
requests - array of objects, where money it's how much money user get from limit (if user1 get 50000 money there remainder it's 50000, limit - sum(requests.money))
I am making query in mongodb from scala projects:
get all documents where limit equal to null
get all documents where I have x remainder money (x like input value)
first case it's more easy than second one, I know how I can get sum of requests.money: I am doing it by this query:
db.campaign.aggregate([
{$project: {
total: {$sum: ["$requests.money"]}
}}
])
scala filter part
Filters.or(
Filters.equal("limit", null),
Filters.expr(Document(s""" {$$project: {total: {$$sum: ["$$requests.money"]}}}"""))
)
But I don't want to store it and get as result, I want to filter by this condition x (money which I want to get by some user) limit >= sum(requests.money) + x. And by this filter I want to get all filtered documents.
Example:
x = 50000
and output must be like this:
[
{
"id": 1,
"limit": NumberDecimal("100000"),
"requests": [
{
"money": NumberDecimal("50000"),
"user": "user1"
}
]
},
{
"id": 1,
"limit": null,
"requests": [
{
"money": NumberDecimal("50000"),
"user": "user1"
},
{
"money": NumberDecimal("50000"),
"user": "user3"
}
]
},
]
You have to use an aggregation pipeline like this:
db.campaign.aggregate([
{
$set: {
remainder: {
$subtract: [ "$limit", { $sum: "$requests.money" } ]
}
}
},
{
"$match": {
$or: [
{ limit: null },
{ remainder: { $gte: 0 } }
]
}
},
{ $unset: "remainder" }
])
Mongo Playground
This one is also possible, but more difficult to read:
db.campaign.aggregate([
{
"$match": {
$or: [
{ limit: null },
{
$expr: {
$gt: [
{ $subtract: [ "$limit", { $sum: "$requests.money" } ] },
0
]
}
}
]
}
}
])

Optimizing MongoDB aggregate query on large Index objects

I have 20 Million objects in my MongoDb collection. Currently running on M30 MongoDb instance with 7.5Gb ram and 40Gb disk.
Data is stored in collection like this -
{
_id:xxxxx,
id : 1 (int),
from : xxxxxxxx (int),
to : xxxxxx (int),
status : xx (int)
.
.
.
.
},
{
_id:xxxxx,
id : 2 (int),
from : xxxxxxxx (int),
to : xxxxxx (int),
status : xx (int)
.
.
.
.
}
.
.
.
. and so on..
id is unique Index & from is a Index in this collection.
I am running a query to group 'to' and return me the max id and sort by max id with a given condition i.e 'from'
$collection->aggregate([
['$project' => ['id'=>1,'to'=>1,'from'=>1],
[ '$match'=> [
'$and'=>
[
[ 'from'=> xxxxxxxxxx],
[ 'status'=> xx ],
]
]
],
['$group' => [
'_id' =>
'$to',
'max_revision'=>['$max' => '$id'],
]
],
['$sort' => ['max_revision' => -1]],
['$limit' => 20],
]);
Above query runs just fine (~2 sec) on small data set on Index from like for 50-100k of same 'from' value in collection. But for conditions like, for example if 2M objects are having same 'from' value, then it is taking over >10 sec to execute and giving the result.
A quick example,
case 1- same query runs under 2 sec if it is executed with from as 12345, As 12345 is present 50k times in the collection.
case 2- query takes over 10 sec if it executed with from as 98765, As 98765 is present 2M times in the collection.
Edit : Explained query below -
{
"command": {
"aggregate": "mycollection",
"pipeline": [
{
"$project": {
"id": 1,
"to": 1,
"from": 1
}
},
{
"$match": {
"$and": [
{
"from": {
"$numberLong": "12345"
}
},
{
"status": 22
}
]
}
},
{
"$group": {
"_id": "$to",
"max_revision": {
"$max": "$id"
}
}
},
{
"$sort": {
"max_revision": -1
}
},
{
"$limit": 20
}
],
"allowDiskUse": false,
"cursor": {},
"$db": "mongo_jc",
"lsid": {
"id": {
"$binary": "8LktsSkpTjOzF3GIC+m1DA==",
"$type": "03"
}
},
"$clusterTime": {
"clusterTime": {
"$timestamp": {
"t": 1597230985,
"i": 1
}
},
"signature": {
"hash": {
"$binary": "PHh4eHh4eD4=",
"$type": "00"
},
"keyId": {
"$numberLong": "6859724943999893507"
}
}
}
},
"planSummary": [
{
"IXSCAN": {
"from": 1
}
}
],
"keysExamined": 1246529,
"docsExamined": 1246529,
"hasSortStage": 1,
"cursorExhausted": 1,
"numYields": 9747,
"nreturned": 0,
"queryHash": "29DAFB9E",
"planCacheKey": "F5EBA6AE",
"reslen": 231,
"locks": {
"ReplicationStateTransition": {
"acquireCount": {
"w": 9847
}
},
"Global": {
"acquireCount": {
"r": 9847
}
},
"Database": {
"acquireCount": {
"r": 9847
}
},
"Collection": {
"acquireCount": {
"r": 9847
}
},
"Mutex": {
"acquireCount": {
"r": 100
}
}
},
"storage": {
"data": {
"bytesRead": {
"$numberLong": "6011370213"
},
"timeReadingMicros": 4350129
},
"timeWaitingMicros": {
"cache": 2203
}
},
"protocol": "op_msg",
"millis": 8548
}
For this specific case the mongod query executor can use an index for the initial match, but not for the sort.
If you were to reorder and modify the stages a bit, it could use an index on {from:1, status:1, id:1} for both matching and sorting:
$collection->aggregate([
[ '$match'=> [
'$and'=>
[
[ 'from'=> xxxxxxxxxx],
[ 'status'=> xx ],
]
]
],
['$sort' => ['id' => -1]],
['$project' => ['id'=>1,'to'=>1,'from'=>1],
['$group' => [
'_id' => '$to',
'max_revision'=>['$first' => '$id'],
]
],
['$limit' => 20],
]);
This way the it should be able to combine the $match and $sort stages into a single index scan.

Return object from a list if it's child object contains a certain value

I've been stuck on this issue for a while, I feel like I'm close but just can't to figure out the solution.
I have a condensed schema that look like this:
{
"_id": {
"$oid": "5a423f48d3983274668097f3"
},
"id": "59817",
"key": "DW-15450",
"changelog": {
"histories": [
{
"id": "449018",
"created": "2017-12-13T11:11:26.406+0000",
"items": [
{
"field": "status",
"toString": "Released"
}
]
},
{
"id": "448697",
"created": "2017-12-08T09:54:41.822+0000",
"items": [
{
"field": "resolution",
"toString": "Fixed"
},
{
"field": "status",
"toString": "Completed"
}
]
}
]
},
"fields": {
"issuetype": {
"id": "1",
"name": "Bug"
}
}
}
And I would like to grab all changelog.histories that have a changelog.histories.items.toString value of Completed.
Below is my pipeline
"pipeline" => [
[
'$match' => [
'changelog.histories.items.toString' => 'Completed'
]
],
[
'$unwind' => '$changelog.histories'
],
[
'$project' => [
'changelog.histories' => [
'$filter' => [
'input' => '$changelog.histories.items',
'as' => 'item',
'cond' => [
'$eq' => [
'$$item.toString', 'Completed'
]
]
]
]
]
]
]
So ideally I would like the following returned
{
"id": "448697",
"created": "2017-12-08T09:54:41.822+0000",
"items": [
{
"field": "resolution",
"toString": "Fixed"
},
{
"field": "status",
"toString": "Completed"
}
]
}
You can try something like this.
db.changeLogs.aggregate([
{ $unwind: '$changelog.histories' },
{ $match: {'changelog.histories.items.toString': 'Completed'} },
{ $replaceRoot: { newRoot: "$changelog.histories" } }
]);
This solution performs a COLLSCAN, so it is expensive in case of a large collection. Should you have strict performance requirements, you can create an index as follows.
db.changeLogs.createIndex({'changelog.histories.items.toString': 1})
Then, in order to exploit the index, you have to change the query as follows.
db.changeLogs.aggregate([
{ $match: {'changelog.histories.items.toString': 'Completed'} },
{ $unwind: '$changelog.histories' },
{ $match: {'changelog.histories.items.toString': 'Completed'} },
{ $replaceRoot: { newRoot: "$changelog.histories" } }
]);
The first stage filters the changeLog documents having at least one history item in the Completed state. This stage uses the index. The second stage unwinds the vector. The third stage filters again the unwound documents having at least one history item in the Completed state. Finally, the fourth stage replaces the root returning items as documents.
Edit
Based on your comment, this is an alternate solution preserving id and key fields in the returned documents (while keeping using the index).
db.changeLogs.aggregate([
{ $match: {'changelog.histories.items.toString': 'Completed'} },
{ $unwind: '$changelog.histories' },
{ $match: {'changelog.histories.items.toString': 'Completed'} },
{ $project: { _id: 0, id: 1, key: 1, changelog: 1 }}
]);

Match Documents based on Nested Array Values and Count Unique

I have a MongoDB Collection which has Documents in Given format,
{
"_id" : ObjectId("595f5661f34ae7b2adee31bc"),
"app_userUpdatedOn" : "2017-03-09T12:01:07.615Z",
"appId" : 31625,
"app_lastCommunicatedAt" : "2017-03-09T12:18:53.067Z",
"currentDate" : "2017-03-09T12:19:28.626Z",
"objectId" : "58c14850e4b0b2406992b29e",
"name" : "APPSESSION",
"action" : "START",
"installationId" : "98088f6641a0fa79",
"userName" : "98088f6641a0fa79",
"properties" : [
[
"userid",
"98088f6641a0fa79"
],
[
"app_os_version",
"6.0.1"
],
[
"app_installAt",
"2017-03-09T12:01:01.307Z"
],
[
"app_model",
"SM-J210F"
],
[
"app_lastCommunicatedAt",
"2017-03-09T12:18:53.067Z"
],
[
"app_carrier",
"Jio 4G"
],
[
"app_counter",
1
],
[
"app_brand",
"samsung"
],
[
"app_lib_version",
"1.0"
],
[
"app_app_version",
"3.0.2"
],
[
"app_os",
"Android"
]
],
"date" : "2017-03-09"
}
{
"_id" : ObjectId("595f5661f34ae7b2adee31bd"),
"app_userUpdatedOn" : "2017-02-05T07:38:32.866Z",
"appId" : 31625,
"app_lastCommunicatedAt" : "2017-03-09T08:09:05.342Z",
"currentDate" : "2017-03-09T12:19:28.806Z",
"objectId" : "58c14850e4b06ec88ecaa9c6",
"name" : "APPINSTALL",
"action" : "START",
"installationId" : "eef436554fbdf4ac",
"userName" : "eef436554fbdf4ac",
"properties" : [
[
"userid",
"eef436554fbdf4ac"
],
[
"app_os_version",
"5.1"
],
[
"app_installAt",
"2017-02-05T11:20:49.809Z"
],
[
"app_model",
"Micromax Q465"
],
[
"app_lastCommunicatedAt",
"2017-03-09T08:09:05.342Z"
],
[
"app_carrier",
"JIO 4G"
],
[
"app_counter",
1
],
[
"app_brand",
"Micromax"
],
[
"app_lib_version",
"1.0"
],
[
"app_app_version",
"3.0.2"
],
[
"app_os",
"Android"
]
],
"date" : "2017-03-09"
}
I want to Fetch the Count and Unique Count of the Documents where currentDate lies in between, startDate and endDate, name is x (eg. APPSESSION), Containing multiple Properties Nested Array (like ["app_installAt","This can be any value instead of null"] ,["app_model","This can be any value instead of null"], and so on... ), Group By userName
Previously i have created a Query in which Nested Array Both Element are Known, and it is as follows
db.testing.aggregate(
[
{$match: {currentDate: {$gte:"2017-03-01T00:00:00.000Z", $lt:"2017-03-02T00:00:00.000Z"},name:"INSTALL"}},
{$match: {properties: ["app_os_version","4.4.2"]}},
{$match: {properties: ["app_carrier","telenor"]}},
{$match: {properties: ["app_brand","Micromax"]}},
{$group: {_id: "$userName"}},
{$count: "uniqueCount"}
]
);
But i am unable to find the Data where i know only 0th index of Property Data Nested Array.
Please do Help.
Thanks in Advance.... :)
The query for this is essentially the use of $all for the multiple conditions to match in the array and then use $elemMatch and $eq to match the individual array elements.
For example to match and count the first document supplied in your question "only" the parameters would be:
db.testing.find({
"currentDate": {
"$gte": "2017-03-09T00:00:00.000Z",
"$lt": "2017-03-10T00:00:00.000Z"
},
"properties": {
"$all": [
{ "$elemMatch": { "$eq": ["app_os_version","6.0.1"] } },
{ "$elemMatch": { "$eq": ["app_carrier", "Jio 4G"] } },
{ "$elemMatch": { "$eq": ["app_brand", "samsung"] } }
]
}
})
With .aggregate() then you put the whole query into a single $match stage as in:
db.testing.aggregate([
{ "$match": {
"currentDate": {
"$gte": "2017-03-09T00:00:00.000Z",
"$lt": "2017-03-10T00:00:00.000Z"
},
"properties": {
"$all": [
{ "$elemMatch": { "$eq": ["app_os_version","6.0.1"] } },
{ "$elemMatch": { "$eq": ["app_carrier", "Jio 4G"] } },
{ "$elemMatch": { "$eq": ["app_brand", "samsung"] } }
]
}
}},
{ "$group": { "_id": "$userName" }
{ "$count": "unique_count"
])
So $elemMatch in this context is going to examine each "inner" array and see if it matches the supplied conditions, which we give in argument as an "array" to the $eq operator.
The wrapping $all means that "all" the provided $elemMatch conditions "must" be met in order to fulfill the query conditions. And that is how the selection gets made with this type of structure.
If you needed to adjust one of those then the "inner" match is using the element of the array. So on the key it would use the "0" for the index position. i.e:
{ "$elemMatch": { "0": "app_os_version" } },