Delete multiple occurences of same object - mongodb

In our collection, there's structure like:
Object: //below is object metadata from mongo
_id
created_at
lang
source
object: //this is real object data from our db
id
created_at
object_class
I ran below query on this collection:
db.getCollection('foo').aggregate(
[
{
$match: {
lang: 'bar',
pushed_at:{
$gte: new ISODate("2015-11-09T00:00:00.000Z"),
$lt: new ISODate("2015-11-10T00:00:00.000Z")
}
}
},
{
$group: {
_id: "$object.id",
occurences: {$sum: 1}
}
},
{
$match: {
occurences: {$gt: 1}
}
}
])
Which returned:
It appears that we got duplicate entries in our collection. By duplicate I mean objects with same Object.object.id.
I'd like to remove redundant occurences using results from agreggate function I used. Notice that I don't want to delete anything, just rendundant ones, so above aggregate returns occurences: 1.
How to do this, also using results from aggregation?

I think you can try that in the shell :
db.foo.aggregate(
[
{
$match: {
lang: 'bar',
pushed_at:{
$gte: new ISODate("2015-11-09T00:00:00.000Z"),
$lt: new ISODate("2015-11-10T00:00:00.000Z")
}
}
},
{
$group: {
_id: "$object.id",
occurences: {$sum: 1}
}
},
{
$match: {
occurences: {$gt: 1}
}
}
]).result.forEach(function(x) {
if(x.occurences > 1) {
for(i=0;i<x.occurences - 1;i++) {
db.foo.remove({"object.id":x._id}, true);
}
}
}
);

Related

Mongoose - filter matched documents and assign the resultant length to a field

I have this collection(some irrelevant fields were omitted for brevity):
clients: {
userId: ObjectId,
clientSalesValue: Number,
currentDebt: Number,
}
Then I have this query that matches all the clients for a specific user, then calculates the sum of all debts and sales and put those results in a separate field each of them:
await clientsCollection.aggregate([
{
$match: { userId: new ObjectId(userId) }
},
{
$group: {
_id: null,
totalSalesValue: { $sum: '$clientSalesValue' },
totalDebts: { $sum: '$currentDebt' },
}
},
{
$unset: ['_id']
}
]).exec();
This works as expected, it returns an array with only one item which is an object, but now I need to also include in that resultant object a field for the amount of debtors, that is for the amount of clients that have currentDebt > 0, how can I do that is the same query? is it possible?
PD: I cannot modify the $match condition, it need to always return all the clients for the corresponding users.
To include a count of how many matching documents have a positive currentDebt, you can use the $sum and $cond operators like so:
await clientsCollection.aggregate([
{
$match: { userId: new ObjectId(userId) }
},
{
$group: {
_id: null,
totalSalesValue: { $sum: '$clientSalesValue' },
totalDebts: { $sum: '$currentDebt' },
numDebtors: {
$sum: {
$cond: [{ $gt: ['$currentDebt', 0] }, 1, 0]
}
},
}
},
{
$unset: ['_id']
}
]).exec();

MongoDB sorting does not work with inner array

I'm trying to query specific fields in my document and sort them by one of the fields, however, the engine seems to completely ignore the sort.
I use the query:
db.symbols.find({_id:'AAPL'}, {'income_statement.annual.totalRevenue':1,'income_statement.annual.fiscalDateEnding':1}).sort({'income_statement.annual.totalRevenue': 1})
This is the output:
[
{
_id: 'AAPL',
income_statement: {
annual: [
{
fiscalDateEnding: '2021-09-30',
totalRevenue: '363172000000'
},
{
fiscalDateEnding: '2020-09-30',
totalRevenue: '271642000000'
},
{
fiscalDateEnding: '2019-09-30',
totalRevenue: '256598000000'
},
{
fiscalDateEnding: '2018-09-30',
totalRevenue: '265595000000'
},
{
fiscalDateEnding: '2017-09-30',
totalRevenue: '229234000000'
}
]
}
}
]
I would expect to have the entries sorted by fiscalDateEnding, starting with 2017-09-30 ascending.
However, the order is fixed, even if I use -1 for sorting.
Any ideas?
The sort you are using is for the ordering of documents in the result set. This is different from the ordering of array elements inside the document.
For your case, if you are using a newer version of MongoDB (5.2+), you can use the $sortArray.
db.symbols.aggregate([
{
$project: {
_id: 1,
annual: {
$sortArray: {
input: "$income_statement.annual",
sortBy: {
fiscalDateEnding: 1
}
}
}
}
}
])
If you are using older version of MongoDB, you can do the followings to perform the sorting.
db.collection.aggregate([
{
"$unwind": "$income_statement.annual"
},
{
$sort: {
"income_statement.annual.fiscalDateEnding": 1
}
},
{
$group: {
_id: "$_id",
annual: {
$push: "$income_statement.annual"
}
}
},
{
"$project": {
_id: 1,
income_statement: {
annual: "$annual"
}
}
}
])
Here is the Mongo Playground for your reference.

MongoDB aggregation: How to get the index of a document in a collection depending sorted by a document property

Assume I have a collection with millions of documents. Below is a sample of how the documents look like
[
{ _id:"1a1", points:[2,3,5,6] },
{ _id:"1a2", points:[2,6] },
{ _id:"1a3", points:[3,5,6] },
{ _id:"1b1", points:[1,5,6] },
{ _id:"1c1", points:[5,6] },
// ... more documents
]
I want to query a document by _id and return a document that looks like below:
{
_id:"1a1",
totalPoints: 16,
rank: 29
}
I know I can query the whole document, sort by descending order then get the index of the document I want by _id and add one to get its rank. But I have worries about this method.
If the documents are in millions won't this be 'overdoing' it. Querying a whole collection just to get one document? Is there a way to achieve what I want to achieve without querying the whole collection? Or the whole collection has to be involved because of the ranking?
I cannot save them ranked because the points keep on changing. The actual code is more complex but the take away is that I cannot save them ranked.
Total points is the sum of the points in the points array. The rank is calculated by sorting all documents in descending order. The first document becomes rank 1 and so on.
an aggregation pipeline like the following can get the result you want. but how it operates on a collection of millions of documents remains to be seen.
db.collection.aggregate(
[
{
$group: {
_id: null,
docs: {
$push: { _id: '$_id', totalPoints: { $sum: '$points' } }
}
}
},
{
$unwind: '$docs'
},
{
$replaceWith: '$docs'
},
{
$sort: { totalPoints: -1 }
},
{
$group: {
_id: null,
docs: { $push: '$$ROOT' }
}
},
{
$set: {
docs: {
$map: {
input: {
$filter: {
input: '$docs',
as: 'x',
cond: { $eq: ['$$x._id', '1a3'] }
}
},
as: 'xx',
in: {
_id: '$$xx._id',
totalPoints: '$$xx.totalPoints',
rank: {
$add: [{ $indexOfArray: ['$docs._id', '1a3'] }, 1]
}
}
}
}
}
},
{
$unwind: '$docs'
},
{
$replaceWith: '$docs'
}
])

Aggregate is not a function - mongodb/meteor

I have this collection (Spieltag) with two documents in MongoDB:
0: Object Note:2.5 SaisonID:201516 SpielerID:105 SpieltagID:1 Tore:1 _id:"vkD5sMCdZdntoCFGP"
1: Object Note:3 SaisonsID:201516 SpielerID:105 SpieltagID:1 Tore:0 _id:"PrqokMS47K4vx4KR4"
I want to summarize Note (2.5+1) with a "where clause" on SpielerID.
This is what I have tried to use:
Spieltag.aggregate({ $match: {
{ SpielerID: { $gte: 105 } }
} },
{ $group: { _id : null, sum : { $sum: "$Note" } } });
But it doesn't work, throwing Aggregate is not a function. Any idea what's wrong?
First, you need to add the aggregate package for Meteor :
meteor add meteorhacks:aggregate
Second, you must pass an array parameter in aggregate like :
Spieltag.aggregate([{
$match: {
SpielerID: { $gte: 105 },
},
}, {
$group: {
_id: null,
sum: { $sum: '$Note' },
},
}]);

How to simply count the documents by a given criteria (mongo aggregation framework)?

I would like to simply count the documents. What would be the correct way to do the following:
db.my_collection.aggregate({
$match: { // go by the indexed field
date: {
$gte: new Date(2013,1,20),
$lte: new Date(2013,1,27)
}
}
},{
$match: { // go by some other field
someField: 'someValue'
}
},{
$count: { // $sum? $group? $anythingElse?
// ???????
}
})
You should use $group with $sum. Something like this:
$group: {
_id: null,
count: {$sum: 1}
}
SQL to Aggregation Framework mapping chart.