Mongo aggregation with $ne - mongodb

I can't seem to find any resources at all on Mongo aggregation with the boolean operations. My query looks something like this (I am using the pymongo driver):
db.collection.aggregate([{'$match': {'foo': 3, 'bar': 'baz'}},
{'$project': {'quxx': 1, '_id': 0, 'count': 1}},
{'$group': {'total': {'$sum': '$count'}, '_id': '$quxx'}},
{'$sort': {'total': -1}},
{'$limit': 2000}])
Which all works great ($match is on an index etc). Now, there is a single rogue quxx that I would like to filter out of the pipeline so I thought that I would use the $ne operator. However, I can't seem to figure out the proper way to do it! I'm not sure if I'm not placing it at the right point (I want it after the $match operator but before the $group operator) or I have the syntax wrong but help would be appreciated.
The things I have tried so far (all in their own step after $match) are:
{'$quxx': {'$ne': 'rogue'}}
{'quxx': {'$ne': 'rogue'}}
{'$ne': {'quxx': 'rogue'}}
{'$ne': {'$quxx': 'rogue'}}
Every single one of them gives me unrecognized pipeline op.

You would either put that in its own $match pipeline element, or just include it in the initial $match.
So either add:
{'$match': {'quxx': {'$ne': 'rogue'}}}
or modify the initial $match to:
{'$match': {'foo': 3, 'bar': 'baz', 'quxx': {'$ne': 'rogue'}}}

Related

Count unique value of an object in array

I am trying to count the occurences of the field uuid in my collection. The dataset is a collection of Magic the Gathering decks. I want to see which cards are played the most. This is an example document in the collection:
_id:60aa430832f934a5874f7e3a
code:"JMP"
commander:Array
mainBoard:Array
name:"Rogues (1)"
releaseDate:"2020-07-17"
sideBoard:Array
type:"Jumpstart"
The uuid is located in a object in the mainBoard array. This would look like this:
mainBoard:Array[
{0:Object
artist:"Daarken"
...
uuid:"d99be31c-b176-530a-bb17-88056d2e06b1"}]
This is the aggregation I have tried without achieving my goal:
pipeline = [
{'$project': {"mainBoard": 1}},
{'$unwind': "$mainBoard"},
{'$group': {'_id': '$uuid', "count": {'$sum': 1}}},
{'$sort': {'_id': 1}},
]
# execute pipeline
for doc in database.deck_lists.aggregate(pipeline):
pprint(doc)
The aggregation above returns: {'_id': None, 'count': 22559}
What am I doing wrong?

MongoDB - Safely sort inner array after group

I'm trying to look up all records that match a certain condition, in this case _id being certain values, and then return only the top 2 results, sorted by the name field.
This is what I have
db.getCollection('col1').aggregate([
{$match: {fk: {$in: [1, 2]}}},
{$sort: {fk: 1, name: -1}},
{$group: {_id: "$fk", items: {$push: "$$ROOT"} }},
{$project: {items: {$slice: ["$items", 2]} }}
])
and it works, BUT, it's not guaranteed. According to this Mongo thread $group does not guarantee document order.
This would also mean that all of the suggested solutions here and elsewhere, which recommend using $unwind, followed by $sort, and then $group, would also not work, for the same reason.
What is the best way to accomplish this with Mongo (any version)? I've seen suggestions that this could be accomplished in the $project phase, but I'm not quite sure how.
You are correct in saying that the result of $group is never sorted.
$group does not order its output documents.
Hence doing a;
{$sort: {fk: 1}}
then grouping with
{$group: {_id: "$fk", ... }},
will be a wasted effort.
But there is a silver lining with sorting before $group stage with name: -1. Since you are using $push (not an $addToSet), inserted objects will retain the order they've had in the newly created items array in the $group result. You can see this behaviour here (copy of your pipeline)
The items array will always have;
"items": [
{
..
"name": "Michael"
},
{
..
"name": "George"
}
]
in same order, therefore your nested array sort is a non-issue! Though I am unable to find an exact quote in documentation to confirm this behaviour, you can check;
this,
or this where it is confirmed.
Also, accumulator operator list for $group, where $addToSet has "Order of the array elements is undefined." in its description, whereas the similar operator $push does not, which might be an indirect evidence? :)
Just a simple modification of your pipeline where you move the fk: 1 sort from pre-$group stage to post-$group stage;
db.getCollection('col1').aggregate([
{$match: {fk: {$in: [1, 2]}}},
{$sort: {name: -1}},
{$group: {_id: "$fk", items: {$push: "$$ROOT"} }},
{$sort: {_id: 1}},
{$project: {items: {$slice: ["$items", 2]} }}
])
should be sufficient to have the main result array order fixed as well. Check it on mongoplayground
$group doesn't guarantee the document order but it would keep the grouped documents in the sorted order for each bucket. So in your case even though the documents after $group stage are not sorted by fk but each group (items) would be sorted by name descending. If you would like to keep the documents sorted by fk you could just add the {$sort:{fk:1}} after $group stage
You could also sort by order of values passed in your match query should you need by adding a extra field for each document. Something like
db.getCollection('col1').aggregate([
{$match: {fk: {$in: [1, 2]}}},
{$addField:{ifk:{$indexOfArray:[[1, 2],"$fk"]}}},
{$sort: {ifk: 1, name: -1}},
{$group: {_id: "$ifk", items: {$push: "$$ROOT"}}},
{$sort: {_id : 1}},
{$project: {items: {$slice: ["$items", 2]}}}
])
Update to allow array sort without group operator : I've found the jira which is going to allow sort array.
You could try below $project stage to sort the array.There maybe various way to do it. This should sort names descending.Working but a slower solution.
{"$project":{"items":{"$reduce":{
"input":"$items",
"initialValue":[],
"in":{"$let":{
"vars":{"othis":"$$this","ovalue":"$$value"},
"in":{"$let":{
"vars":{
//return index as 0 when comparing the first value with initial value (empty) or else return the index of value from the accumlator array which is closest and less than the current value.
"index":{"$cond":{
"if":{"$eq":["$$ovalue",[]]},
"then":0,
"else":{"$reduce":{
"input":"$$ovalue",
"initialValue":0,
"in":{"$cond":{
"if":{"$lt":["$$othis.name","$$this.name"]},
"then":{"$add":["$$value",1]},
"else":"$$value"}}}}
}}
},
//insert the current value at the found index
"in":{"$concatArrays":[
{"$slice":["$$ovalue","$$index"]},
["$$othis"],
{"$slice":["$$ovalue",{"$subtract":["$$index",{"$size":"$$ovalue"}]}]}]}
}}}}
}}}}
Simple example with demonstration how each iteration works
db.b.insert({"items":[2,5,4,7,6,3]});
othis ovalue index concat arrays (parts with counts) return value
2 [] 0 [],0 [2] [],0 [2]
5 [2] 0 [],0 [5] [2],-1 [5,2]
4 [5,2] 1 [5],1 [4] [2],-1 [5,4,2]
7 [5,4,2] 0 [],0 [7] [5,4,2],-3 [7,5,4,2]
6 [7,5,4,2] 1 [7],1 [6] [5,4,2],-3 [7,6,5,4,2]
3 [7,6,5,4,2] 4 [7,6,5,4],4 [3] [2],-1 [7,6,5,4,3,2]
Reference - Sorting Array with JavaScript reduce function
There is a bit of a red herring in the question as $group does guarantee that it will be processing incoming documents in order (and that's why you have to sort of them before $group to get an ordered arrays) but there is an issue with the way you propose doing it, since pushing all the documents into a single grouping is (a) inefficient and (b) could potentially exceed maximum document size.
Since you only want top two, for each of the unique fk values, the most efficient way to accomplish it is via a "subquery" using $lookup like this:
db.coll.aggregate([
{$match: {fk: {$in: [1, 2]}}},
{$group:{_id:"$fk"}},
{$sort: {_id: 1}},
{$lookup:{
from:"coll",
as:"items",
let:{fk:"$_id"},
pipeline:[
{$match:{$expr:{$eq:["$fk","$$fk"]}}},
{$sort:{name:-1}},
{$limit:2},
{$project:{_id:0, fk:1, name:1}}
]
}}
])
Assuming you have an index on {fk:1, name:-1} as you must to get efficient sort in your proposed code, the first two stages here will use that index via DISTINCT_SCAN plan which is very efficient, and for each of them, $lookup will use that same index to filter by single value of fk and return results already sorted and limited to first two. This will be the most efficient way to do this at least until https://jira.mongodb.org/browse/SERVER-9377 is implemented by the server.

Mongo aggregate pipeline project groups on match (equivalent to SQL select sum as group)

I am new to Mongo DB and would really appreciate some help on the following query on the database. I basically need to select all the results where the ‘Outcome’ field is in ‘first’, ‘second’ or ‘third’ and project that grouped within countries as ‘medals’ (where medals were won).
I then need to do the reverse in the pipeline to select where medals were not won. I guess the equivalent in SQL would be select the sun of the entries where the outcome is in ‘first’, ‘second’ or ‘third’ as ‘medals’.And select the sum of the entries not in ‘first’, ‘second’ or ‘third’ as ‘non_medals’. Then groups the results by country.
Below is the query I have managed to come up with thus far, but cannot seem to get it right.
pipeline_4 = [
{'$match': {'Outcome': {'$in': ['first','second', 'third'] } ,'Country': {'$exists': True}}},
{'$group': {'_id': {'outcome': '$Outcome', 'country': '$Country'},
'medals': {'$sum': 1}}},
{'$project': {
'outcome': 1, 'country', 1, 'medals': 1
}},
{'$match': {'Outcome': {'$nin': ['first','second', 'third'] } ,'Country': {'$exists': True}}},
{'$group': {'_id': {'outcome': '$Outcome', 'country': '$Country'},
'non_medals': {'$sum': 1}}},
{'$project': {
'outcome': 1, 'country', 1, 'non_medals': 1
}}
]
Could anyone please advise on this? at the moment I only get the medal groups returned and they are not grouped. If you need more info please ask, as I say I am new to Mongo and may be approaching it in too much of a standard SQL way.
Thanks,
$project step is wrong
Since you define {_id:{outcome:"$foo"}} in the $group step, you have to use outcome:"$_id.outcome" in the project step.
You have two pipelines in pipeline_4 ? you need to create two requests

Mongodb aggregation sort on array primative

Given documents like
{
...
name:'whatever',
games: [122, 199, 201, 222]
}
db.col.aggregate({$match:{}},
{$sort:{'games.0': -1}})
doesn't sort ... no errors ... it just doesn't sort on the first array element of the games array.
Although a query with the same syntac .. works fine
col.find({}).sort({'games.0':-1})
if I change the collection so games is an array of objects like
[ {game1:198}, {game2:201} ...]
then the aggregation works using
{$sort:{'games.game1': -1}})
what am I missing to get this to work with an array of numbers?
Try unwinding the array first by applying the $unwind operator on the array, then use $sort on the deconstructed array and finally use $group to get the original documents structure:
db.coll.aggregate([
{"$unwind": "$games"},
{"$sort": {"games": 1}},
{
"$group": {
"_id": "$_id",
"name": {"$first": "$name"},
"games": {"$push": "$games"}
}
}
])
Try this:
db.coll.aggregate([
{"$unwind": "$games"},
{"$sort": {"games": -1}}
]}
I hope this will work for you as you expected.
In mongo 3.4 find sort i.e. db.col.find({}).sort({'games.0':-1}) works as expected whereas aggregation sort doesn't.
In mongo 3.6 both find and aggregation sort works as expected.
Jira issue for that: https://jira.mongodb.org/browse/SERVER-19402
I would recommend you to update your mongo version and your aggregation query will work fine.

MongoDB - filter in find function

I am writing a query in mongo db. I want to know that how do I use substring function in find function, like I have a date string. I want to filter the date by year and then group by month.
This is what I have tried:
db.booking.find({$substr:["$bookingdatetime",5,2]:"14"}).aggregate([ {$group: { _id: {$substr: ['$bookingdatetime', 5, 2]}, numberofbookings: {$sum: 1} }} ])
Where am I going wrong ?
$substr can be used only as a string aggregation operation. You cannot use it as a operator in a find query. You can form the aggregation pipeline as below:
db.booking.aggregate([
{$project:{"year":{$substr:["$bookingdatetime",5,2]},
"numberofbookings":1,
"bookingdatetime":1}},
{$match:{"year":14}},
{$group:{"_id":{$substr: ['$bookingdatetime', 5, 2]},
numberofbookings: {$sum: 1}}}
])
or,
use a regex to match the date pattern with specific year.