MongoDB aggregation: How to get the index of a document in a collection depending sorted by a document property - mongodb

Assume I have a collection with millions of documents. Below is a sample of how the documents look like
[
{ _id:"1a1", points:[2,3,5,6] },
{ _id:"1a2", points:[2,6] },
{ _id:"1a3", points:[3,5,6] },
{ _id:"1b1", points:[1,5,6] },
{ _id:"1c1", points:[5,6] },
// ... more documents
]
I want to query a document by _id and return a document that looks like below:
{
_id:"1a1",
totalPoints: 16,
rank: 29
}
I know I can query the whole document, sort by descending order then get the index of the document I want by _id and add one to get its rank. But I have worries about this method.
If the documents are in millions won't this be 'overdoing' it. Querying a whole collection just to get one document? Is there a way to achieve what I want to achieve without querying the whole collection? Or the whole collection has to be involved because of the ranking?
I cannot save them ranked because the points keep on changing. The actual code is more complex but the take away is that I cannot save them ranked.
Total points is the sum of the points in the points array. The rank is calculated by sorting all documents in descending order. The first document becomes rank 1 and so on.

an aggregation pipeline like the following can get the result you want. but how it operates on a collection of millions of documents remains to be seen.
db.collection.aggregate(
[
{
$group: {
_id: null,
docs: {
$push: { _id: '$_id', totalPoints: { $sum: '$points' } }
}
}
},
{
$unwind: '$docs'
},
{
$replaceWith: '$docs'
},
{
$sort: { totalPoints: -1 }
},
{
$group: {
_id: null,
docs: { $push: '$$ROOT' }
}
},
{
$set: {
docs: {
$map: {
input: {
$filter: {
input: '$docs',
as: 'x',
cond: { $eq: ['$$x._id', '1a3'] }
}
},
as: 'xx',
in: {
_id: '$$xx._id',
totalPoints: '$$xx.totalPoints',
rank: {
$add: [{ $indexOfArray: ['$docs._id', '1a3'] }, 1]
}
}
}
}
}
},
{
$unwind: '$docs'
},
{
$replaceWith: '$docs'
}
])

Related

Mongoose - filter matched documents and assign the resultant length to a field

I have this collection(some irrelevant fields were omitted for brevity):
clients: {
userId: ObjectId,
clientSalesValue: Number,
currentDebt: Number,
}
Then I have this query that matches all the clients for a specific user, then calculates the sum of all debts and sales and put those results in a separate field each of them:
await clientsCollection.aggregate([
{
$match: { userId: new ObjectId(userId) }
},
{
$group: {
_id: null,
totalSalesValue: { $sum: '$clientSalesValue' },
totalDebts: { $sum: '$currentDebt' },
}
},
{
$unset: ['_id']
}
]).exec();
This works as expected, it returns an array with only one item which is an object, but now I need to also include in that resultant object a field for the amount of debtors, that is for the amount of clients that have currentDebt > 0, how can I do that is the same query? is it possible?
PD: I cannot modify the $match condition, it need to always return all the clients for the corresponding users.
To include a count of how many matching documents have a positive currentDebt, you can use the $sum and $cond operators like so:
await clientsCollection.aggregate([
{
$match: { userId: new ObjectId(userId) }
},
{
$group: {
_id: null,
totalSalesValue: { $sum: '$clientSalesValue' },
totalDebts: { $sum: '$currentDebt' },
numDebtors: {
$sum: {
$cond: [{ $gt: ['$currentDebt', 0] }, 1, 0]
}
},
}
},
{
$unset: ['_id']
}
]).exec();

MongoDB sorting does not work with inner array

I'm trying to query specific fields in my document and sort them by one of the fields, however, the engine seems to completely ignore the sort.
I use the query:
db.symbols.find({_id:'AAPL'}, {'income_statement.annual.totalRevenue':1,'income_statement.annual.fiscalDateEnding':1}).sort({'income_statement.annual.totalRevenue': 1})
This is the output:
[
{
_id: 'AAPL',
income_statement: {
annual: [
{
fiscalDateEnding: '2021-09-30',
totalRevenue: '363172000000'
},
{
fiscalDateEnding: '2020-09-30',
totalRevenue: '271642000000'
},
{
fiscalDateEnding: '2019-09-30',
totalRevenue: '256598000000'
},
{
fiscalDateEnding: '2018-09-30',
totalRevenue: '265595000000'
},
{
fiscalDateEnding: '2017-09-30',
totalRevenue: '229234000000'
}
]
}
}
]
I would expect to have the entries sorted by fiscalDateEnding, starting with 2017-09-30 ascending.
However, the order is fixed, even if I use -1 for sorting.
Any ideas?
The sort you are using is for the ordering of documents in the result set. This is different from the ordering of array elements inside the document.
For your case, if you are using a newer version of MongoDB (5.2+), you can use the $sortArray.
db.symbols.aggregate([
{
$project: {
_id: 1,
annual: {
$sortArray: {
input: "$income_statement.annual",
sortBy: {
fiscalDateEnding: 1
}
}
}
}
}
])
If you are using older version of MongoDB, you can do the followings to perform the sorting.
db.collection.aggregate([
{
"$unwind": "$income_statement.annual"
},
{
$sort: {
"income_statement.annual.fiscalDateEnding": 1
}
},
{
$group: {
_id: "$_id",
annual: {
$push: "$income_statement.annual"
}
}
},
{
"$project": {
_id: 1,
income_statement: {
annual: "$annual"
}
}
}
])
Here is the Mongo Playground for your reference.

Count nested wildcard array mongodb query

I have the following data of users and model cars:
[
{
"user_id":"ebebc012-082c-4e7f-889c-755d2679bdab",
"car_1a58db0b-5449-4d2b-a773-ee055a1ab24d":1,
"car_37c04124-cb12-436c-902b-6120f4c51782":0,
"car_b78ddcd0-1136-4f45-8599-3ce8d937911f":1
},
{
"user_id":"f3eb2a61-5416-46ba-bab4-459fbdcc7e29",
"car_1a58db0b-5449-4d2b-a773-ee055a1ab24d":1,
"car_0d15eae9-9585-4f49-a416-46ff56cd3685":1
}
]
I want to see how many users have a car_ with the value 1 using mongodb, something like:
{"car_1a58db0b-5449-4d2b-a773-ee055a1ab24d": 2}
For this example.
The issue is that I will never know how are the fields car_ are going to be, they will have a random structure (wildcard).
Notes:
car_id and user_id are at the same level.
The car_id is not given, I simply want to know for the entire database which are the most commmon cars_ with value 1.
$group by _id and convert root object to array using $objectToArray,
$unwind deconstruct root array
$match filter root.v is 1
$group by root.k and get total count
db.collection.aggregate([
{
$group: {
_id: "$_id",
root: { $first: { $objectToArray: "$$ROOT" } }
}
},
{ $unwind: "$root" },
{ $match: { "root.v": 1 } },
{
$group: {
_id: "$root.k",
count: { $sum: 1 }
}
}
])
Playground

MongoDB get different sum for each list per document

I have these documents in my collection
{
id:1,
small:[{k:'A',v:1},{k:'B',v:2},{k:'D',v:3}],
big:[{k:'A',v:2},{k:'B',v:3},{k:'C',v:1},{k:'D',v:4}]
},
{
id:2,
small:[{k:'A',v:1},{k:'B',v:2},{k:'D',v:3}],
big:[{k:'A',v:2},{k:'B',v:3},{k:'C',v:1},{k:'D',v:4}]
},
{
id:3,
small:[{k:'A',v:1},{k:'B',v:2},{k:'D',v:3}],
big:[{k:'A',v:2},{k:'B',v:3},{k:'C',v:1},{k:'D',v:4}]
}
Now, I want to get the sum for each key in both lists. I want my output to look like this:
{k:'A',small:3, big:6},
{k:'B',small:6, big:9},
{k:'D',small:9, big:12}
Notice that the output did not contain the key 'C'. This is because I only want to output the keys that are existing in the 'small' list. What mongodb functions
should I use for this?
Thanks!
Try below aggregation:
db.col.aggregate([
{ $unwind: "$small" },
{ $unwind: "$big" },
{ $redact: {
$cond: {
if: { $eq: [ "$small.k", "$big.k" ] },
then: "$$KEEP",
else: "$$PRUNE"
}
}
},
{
$group: { _id: "$small.k", small: { $sum: "$small.v" }, big: { $sum: "$big.v" } }
},
{
$sort: { "_id": 1 }
}
])
In general we need to have only one small and big in each document (that's why double $unwind). Then we want to keep only documents where keys are equal. That's the moment where C is filtered out - has no pair in small and we're utilizing $redact for that. Aggregation is just a $group with $sum.

Best usage for MongoDB Aggregate request

I would like to highlight a list of _id documents (with a limit) ranked in descending order (via their timestamp) based on a list of ObjectId.
Corresponding to this:
db.collection.aggregate( [ { $match: { _id: { $in: [ObjectId("X"), ObjectId("Y") ] } } }, { $sort: { timestamp: -1 } }, { $group: { _id: "$_id" } }, { $skip: 0 }, { $limit: 100 } ] )
Knowing that the list from the loop may contain way more than 1000 ObjectId (in $in array), do you think my solution is viable? Is not there a faster and less resource intensive way?
Best Regards.