I have a document with the following structure:
{
'_id': '',
'a': '',
'b': '',
'c': [
{
'_id': '',
'd': '',
'f': [
{
'orderDate': 12345,
'orderProfit': 12,
},
{
'orderDate': 67891,
'orderProfit': 12341,
},
{
'orderDate': 23456,
'orderProfit': 474,
},
],
},
{
'_id': '',
'd': '',
'f': [
{
'orderDate': 14232,
'orderProfit': 12222,
},
{
'orderDate': 643532,
'orderProfit': 4343,
},
{
'orderDate': 33423,
'orderProfit': 5555,
},
],
},
],
}
orderDate is an int64 that represents the date that an order was made
orderProfit is an int64 that represents the profit of an order
I needed to return the document that had the biggest "orderDate" and check if the "orderProfit" was the one i was looking for.
For that matter I used a query like this (in an aggregate query):
[
{
'$addFields': {
'orders': {
'$map': {
'input': '$c',
'as': 'c',
'in': {
'profit': {
'$filter': {
'input': '$$c.f',
'cond': {
'$eq': [
{
'$max': '$$c.f.orderDate',
},
'$$this.orderDate',
],
},
},
},
},
},
},
},
},
{
'$match': {
'$or': [
{ 'orders.profit.orderProfit': 500 },
],
},
},
];
It is working properly.
The issue comes when trying to add this query to a countDocuments() query in order to fetch the total number of documents.
It is a requirement to use the countDocuments().
I just can't seem to make it work...
$addFields throws as an unknown top level operator. if I remove the $addFields then I can't add to the countDocuments() the query that finds the max date. If I totally remove it $match is an unknown operator.
db.getCollection('orders').countDocuments(
{
"orders.profit.orderProfit" : {"Query that was shown previously"}
})
You can't.
countDocuments receives a find query, you can't attach an entire pipeline to it.
However countDocuments is just a wrapper.
Returns the count of documents that match the query for a collection or view. The method wraps the $group aggregation stage with a $sum expression to perform the count and is available for use in Transactions.
Basically this method just executes the following aggregation:
db.collection.aggregate([
{
$match: yourQuery
},
{
$group: {
_id: null,
sum: {$sum: 1}
}
}
])
And then returns results[0].sum or 0 depending on result.
So you can just use your own pipeline and add this stage at the end, it would literally be the same complexity wise.
db.collection.aggregate([
...
your entire pipeline
...
{
$group: {
_id: null,
sum: {$sum: 1}
}
}
])
If there's any other specific reason you want to not use the aggregation framework let me know, maybe there's a workaround.
The countDocuments() can't allow aggregation pipeline query,
You can use aggregation operators in match query but it cause the performance issues, you can use this when you don't have any way.
$let Binds variables for use in the specified expression, and returns the result of the expression,
vars, create variable for orders for your $addFields operation,
in, $map to iterate loop of $$orders.profit.orderProfit nested array and check $in condition if your profit amount found then it will return true otherwise false
$anyElementTrue will check returned is true then condition will true otherwise fales
db.getCollection('orders').countDocuments({
$expr: {
$let: {
vars: {
orders: {
"$map": {
"input": "$c",
"as": "c",
"in": {
"profit": {
"$filter": {
"input": "$$c.f",
"cond": {
"$eq": [{ "$max": "$$c.f.orderDate" }, "$$this.orderDate"]
}
}
}
}
}
}
},
in: {
$anyElementTrue: {
$map: {
input: "$$orders.profit.orderProfit",
in: { $in: [500, "$$this"] }
}
}
}
}
}
})
Playground
Second option, other way to handle this condition with less operators,
$filter to match your both condition with orderProfit
$size to get total result from above $filter result
now $map will return array of size that returned by filter
$sum to sum that return result number, if its greater than 0 then condition will be true otherwise false
db.getCollection('orders').countDocuments({
$expr: {
$sum: {
"$map": {
"input": "$c",
"as": "c",
"in": {
$size: {
"$filter": {
"input": "$$c.f",
"cond": {
$and: [
{ "$eq": [{ "$max": "$$c.f.orderDate" }, "$$this.orderDate"] },
{ "$eq": ["$$this.orderProfit", 500] }
]
}
}
}
}
}
}
}
})
Playground
Related
I have next DB structure:
Workspaces:
Key
Index
PK
id
id
content
Projects:
Key
Index
PK
id
id
FK
workspace
workspace_1
deleted
deleted_1
content
Items:
Key
Index
PK
id
id
FK
project
project_1
type
_type_1
deleted
deleted_1
content
I need to calculate a number of items of each type for each project in workspace, e.g. expected output:
[
{ _id: 'projectId1', itemType1Count: 100, itemType2Count: 50, itemType3Count: 200 },
{ _id: 'projectId2', itemType1Count: 40, itemType2Count: 100, itemType3Count: 300 },
....
]
After few attempts and some debugging I've created a query which provides output I needed:
const pipeline = [
{ $match: { workspace: 'workspaceId1' } },
{
$lookup: {
from: 'items',
let: { id: '$_id' },
pipeline: [
{
$match: {
$expr: {
$eq: ['$project', '$$id'],
},
},
},
// project only fields necessary for later pipelines to not overload
// memory and to not get `exceeded memory limit for $group` error
{ $project: { _id: 1, type: 1, deleted: 1 } },
],
as: 'items',
},
},
// Use $unwind here to optimize aggregation pipeline, see:
// https://stackoverflow.com/questions/45724785/aggregate-lookup-total-size-of-documents-in-matching-pipeline-exceeds-maximum-d
// Without $unwind we may get an `matching pipeline exceeds maximum document size` error.
// Error appears not in all requests and it's really strange and hard to debug.
{ $unwind: '$items' },
{ $match: { 'items.deleted': { $eq: false } } },
{
$group: {
_id: '$_id',
items: { $push: '$items' },
},
},
{
$project: {
_id: 1,
// Note: I have only 3 possible item types, so it's OK that it's names hardcoded.
itemType1Count: {
$size: {
$filter: {
input: '$items',
cond: { $eq: ['$$this.type', 'type1'] },
},
},
},
itemType2Count: {
$size: {
$filter: {
input: '$items',
cond: { $eq: ['$$this.type', 'type2'] },
},
},
},
itemType3Count: {
$size: {
$filter: {
input: '$items',
cond: { $eq: ['$$this.type', 'type3'] },
},
},
},
},
},
]
const counts = await Project.aggregate(pipeline)
Query works like expected, but very slow... If I have some about 1000 items in one workspace it takes about 8 seconds to complete. Any ideas how to make it faster are appreciated.
Thanks.
Assuming your indexs are properly indexed that they contain the "correct" fields, we can still have some tweaks on the query itself.
Approach 1: keeping existing collection schema
db.projects.aggregate([
{
$match: {
workspace: "workspaceId1"
}
},
{
$lookup: {
from: "items",
let: {id: "$_id"},
pipeline: [
{
$match: {
$expr: {
$and: [
{$eq: ["$project","$$id"]},
{$eq: ["$deleted",false]}
]
}
}
},
// project only fields necessary for later pipelines to not overload
// memory and to not get `exceeded memory limit for $group` error
{
$project: {
_id: 1,
type: 1,
deleted: 1
}
}
],
as: "items"
}
},
// Use $unwind here to optimize aggregation pipeline, see:
// https://stackoverflow.com/questions/45724785/aggregate-lookup-total-size-of-documents-in-matching-pipeline-exceeds-maximum-d
// Without $unwind we may get an `matching pipeline exceeds maximum document size` error.
// Error appears not in all requests and it's really strange and hard to debug.
{
$unwind: "$items"
},
{
$group: {
_id: "$_id",
itemType1Count: {
$sum: {
"$cond": {
"if": {$eq: ["$items.type","type1"]},
"then": 1,
"else": 0
}
}
},
itemType2Count: {
$sum: {
"$cond": {
"if": {$eq: ["$items.type","type2"]},
"then": 1,
"else": 0
}
}
},
itemType3Count: {
$sum: {
"$cond": {
"if": {$eq: ["$items.type","type1"]},
"then": 1,
"else": 0
}
}
}
}
}
])
There are 2 major changes:
moving the items.deleted : false condition into the $lookup subpipeline to lookup less items documents
skipped items: { $push: '$items' }. Instead, do a conditional sum in later $group stage
Here is the Mongo playground for your reference. (at least for the correctness of the new query)
Approach 2: If the collection schema can be modified. We can denormalize projects.workspace into the items collection like this:
{
"_id": "i1",
"project": "p1",
"workspace": "workspaceId1",
"type": "type1",
"deleted": false
}
In this way, you can skip the $lookup. A simple $match and $group will suffice.
db.items.aggregate([
{
$match: {
"deleted": false,
"workspace": "workspaceId1"
}
},
{
$group: {
_id: "$project",
itemType1Count: {
$sum: {
"$cond": {
"if": {$eq: ["$type","type1"]},
"then": 1,
"else": 0
}
}
},
...
Here is the Mongo playground with denormalized schema for your reference.
A document in my DB looks like this :
{
"_id": ObjectId("5e92e63fad262707ff301d6c"),
"uknum": 30,
"area": "bath",
"ukelectors": 62355,
"ukresults": [
{
"party": "con",
"leader": "thatcher",
"ukvotes": 22544
},
{
"party": "lab",
"leader": "foot",
"ukvotes": 7259
},
{
"party": "sdp",
"leader": "jenkins",
"ukvotes": 17240
},
{
"party": "eco",
"leader": "whittaker",
"ukvotes": 441
}
]
}
Requirement :
I need to build a query in python to get the name of the party which won area: bath. Basically, check who got maximum votes and choose that party.
The idea was to use $max aggregation pipeline but it does not seem to work.
You can do that using either one of the aggregation-pipeline query :
Query 1 : Without use of $unwind, by using $reduce on array :
db.collection.aggregate([
{ $match: { area: "bath" } },
{
$addFields: {
ukresults: {
$let: {
vars: {
res: {
$reduce: {
input: "$ukresults",
initialValue: { votes: 0, party: {} },
in: {
votes: {
$cond: [
{ $gt: ["$$this.ukvotes", "$$value.votes"] },
"$$this.ukvotes",
"$$value.votes",
],
},
party: {
$cond: [
{ $gt: ["$$this.ukvotes", "$$value.votes"] },
"$$this",
"$$value.party",
],
},
},
},
},
},
in: "$$res.party",
},
},
},
},
]);
Test : MongoDB-Playground
Query 2 : With use of $unwind :
db.collection.aggregate([
{
$match: {
area: "bath"
}
},
{
$unwind: {
path: "$ukresults",
preserveNullAndEmptyArrays: true
}
},
{
$sort: {
"ukresults.ukvotes": -1
}
},
{
$limit: 1
}
])
Test : MongoDB-Playground
I would say these two might perform well as we're doing on mostly one document (Cause we've $match as first stage), maybe first query might take a wile to iterate if you've more elements in array, give it a try & choose one which helps most.
Ref : Check this pymongo documentation for aggregation examples : pymongo-aggregation.
The following aggregation query prints the ukresults sub-document with maximum votes.
The aggregation operator $max cannot be applied directly on the ukresults array, as the array has sub-documents rather than scalar values, like numbers. So, we use $reduce aggregation operator to extract the sub-document with maximum votes. Note that using the $reduce operator is a reduction operation on an array, so is using $max - the difference is the array element data type.
The PyMongo code:
import pymongo
import pprint
client = pymongo.MongoClient()
collection = client.test.testCollection
pipeline = [
{
"$match": { "area": "bath" }
},
{
"$addFields": {
"maxvotes": {
"$reduce": {
"input": "$ukresults",
"initialValue": { "ukvotes": 0 },
"in": {
"$cond": [
{ "$gt": [ "$$this.ukvotes", "$$value.ukvotes"] },
"$$this",
"$$value"
]
}
}
}
}
},
{
"$project": {
"_id": 0,
"area": 1,
"maxvotes": 1
}
}
]
pprint.pprint(list(collection.aggregate(pipeline)))
The output:
[{'area': 'bath',
'maxvotes': {'leader': 'thatcher', 'party': 'con', 'ukvotes': 22544.0}}]
I am looking for away to group collection1 by tags that reside on collection2
the two collections needs to be joined (lookup) by 2 fields (field1, field2)
So far I came up with the following query:
db.collection1.aggregate([
{
"$lookup": {
"from": "collection2",
"let": { _field1: '$field1', _field2: '$field2' },
"pipeline": [{
"$match": {
"$expr": {
"$and": [
{ "$eq": ["$field1", "$$_field1"] },
{ "$eq": ["$field2", "$$_field2"] }
]
}
}
},
{ "$project": { _id: 0, tags: 1 } },
],
"as": "col2"
}
},
{ "$unwind": "$col2" },
{ $group: { _id: "$col2.tags", count: { $sum: 1 } } }
]);
I got no result at all.
field1 and field2 are together unique in collection2 (having unique index)
Your syntax is correct apart from the name of your variables in:
{ _field1: '$field1', _field2: '$field2' },
When you define such variables, they are called user variables and mongo has certain naming limitations on them that are different from "real" variables convention.
from the docs:
User variable names must begin with a lowercase ascii letter [a-z] or a non-ascii character.
Meaning in your case the underscore is causing an error.
ok i have managed to solve it myself.
i have added a unique index on collection2 (filed1,field2)
added extra unwind to flat the tags array
my last query is as foolows:
db.collection1.aggregate([
{
"$lookup": {
"from": "collection2",
"let": { field1: '$field1', field2: '$field2' },
"pipeline": [{
"$match": {
"$expr": {
"$and": [
{ "$eq": ["$field1", "$$field1"] },
{ "$eq": ["$field2", "$$field2"] }
]
}
}
},
{ "$project": { _id: 0, tags: 1 } },
],
"as": "col2"
}
},
{ "$unwind": "$col2" },
{ "$unwind": "$col2.tags" },
{ $group: { _id: "$col2.tags", count: { $sum: 1 } } }
{ $sort: { count: -1 } },
]);
I've been trying every method I found on SO with no success. Trying
to accomplish a seemingly simple task (very easy with json/lodash for example) in MongoDB..
I have a collection:
db.users >
[
{
_id: 'userid',
profile: {
username: 'abc',
tests: [
{
_id: 'testid',
meta: {
category: 'math',
date: '9/2/2017',
...
}
questions: [
{
type: 'add',
correct: true,
},
{
type: 'subtract',
correct: true,
},
{
type: 'add',
correct: false,
},
{
type: 'multiply',
correct: false,
},
]
},
...
]
}
},
...
]
I want to end up with an array grouped by question type:
[
{
type: 'add',
correct: 5,
wrong: 3,
},
{
type: 'subtract',
correct: 4,
wrong: 9
}
...
]
I've tried different variations of aggregate, last one is:
db.users.aggregate([
{ $match: { 'profile.tests.meta.category': 'math' }},
{
$project: {
tests: {
$filter: {
input: "$profile.tests",
as: "test",
cond: { $eq: ['$$test.meta.category', 'math'] }
}
}
}
},
{
$project: {
question: "$tests.questions"
}
},
{ $unwind: "$questions"},
])
Also tried adding $group at the end of the pipeline:
{
$group:
{
_id: '$questions.type',
res: {
$addToSet: { correct: {$eq:['$questions.chosenAnswer', '$questions.answers.correct'] }
}
}
}
No variation gave me what I'm looking for, I'm sure I'm missing a core concept, I've looked over the documentation and couldn't figure it out.. what I'm basically looking for is a flatMap to extract away all the questions of all users and group them by type.
If anyone can lead me in the right direction, I'll greatly appreciate it :) thx. (Also, I'm using Meteor, so any query has to work in Meteor mongo)
You can try below aggregation in 3.4.
$filter to filter math categories with $map to project questions array in each matching category followed by $reduce and $concatArrays to get all questions into single array for all matching categories.
$unwind questions array and $group by type and $sum to compute correct and wrong count.
db.users.aggregate([
{
"$match": {
"profile.tests.meta.category": "math"
}
},
{
"$project": {
"questions": {
"$reduce": {
"input": {
"$map": {
"input": {
"$filter": {
"input": "$profile.tests",
"as": "testf",
"cond": {
"$eq": [
"$$testf.meta.category",
"math"
]
}
}
},
"as": "testm",
"in": "$$testm.questions"
}
},
"initialValue": [],
"in": {
"$concatArrays": [
"$$value",
"$$this"
]
}
}
}
}
},
{
"$unwind": "$questions"
},
{
"$group": {
"_id": "$questions.type",
"correct": {
"$sum": {
"$cond": [
{
"$eq": [
"$questions.correct",
true
]
},
1,
0
]
}
},
"wrong": {
"$sum": {
"$cond": [
{
"$eq": [
"$questions.correct",
false
]
},
1,
0
]
}
}
}
}
])
I have an article collection:
{
_id: 9999,
authorId: 12345,
coAuthors: [23456,34567],
title: 'My Article'
},
{
_id: 10000,
authorId: 78910,
title: 'My Second Article'
}
I'm trying to figure out how to get a list of distinct author and co-author ids out of the database. I have tried push, concat, and addToSet, but can't seem to find the right combination. I'm on 2.4.6 so I don't have access to setUnion.
Whilst $setUnion would be the "ideal" way to do this, there is another way that basically involved "switching" between a "type" to alternate which field is picked:
db.collection.aggregate([
{ "$project": {
"authorId": 1,
"coAuthors": { "$ifNull": [ "$coAuthors", [null] ] },
"type": { "$const": [ true,false ] }
}},
{ "$unwind": "$coAuthors" },
{ "$unwind": "$type" },
{ "$group": {
"_id": {
"$cond": [
"$type",
"$authorId",
"$coAuthors"
]
}
}},
{ "$match": { "_id": { "$ne": null } } }
])
And that is it. You may know the $const operation as the $literal operator from MongoDB 2.6. It has always been there, but was only documented and given an "alias" at the 2.6 release.
Of course the $unwind operations in both cases produce more "copies" of the data, but this is grouping for "distinct" values so it does not matter. Just depending on the true/false alternating value for the projected "type" field ( once unwound ) you just pick the field alternately.
Also this little mapReduce does much the same thing:
db.collection.mapReduce(
function() {
emit(this.authorId,null);
if ( this.hasOwnProperty("coAuthors"))
this.coAuthors.forEach(function(id) {
emit(id,null);
});
},
function(key,values) {
return null;
},
{ "out": { "inline": 1 } }
)
For the record, $setUnion is of course a lot cleaner and more performant:
db.collection.aggregate([
{ "$project": {
"combined": {
"$setUnion": [
{ "$map": {
"input": ["A"],
"as": "el",
"in": "$authorId"
}},
{ "$ifNull": [ "$coAuthors", [] ] }
]
}
}},
{ "$unwind": "$combined" },
{ "$group": {
"_id": "$combined"
}}
])
So there the only real concerns are converting the singular "authorId" to an array via $map and feeding an empty array where the "coAuthors" field is not present in the document.
Both output the same distinct values from the sample documents:
{ "_id" : 78910 }
{ "_id" : 23456 }
{ "_id" : 34567 }
{ "_id" : 12345 }