How do I store the aggregation result into another database - mongodb

I can store an aggregation result into another collection within the same database.
But how can I store the result into another database?
This is for copying a collection into another database:
use test1;
db["user_data"].find().forEach(
function(d){ db.getSiblingDB("test2")['user_data'].insert(d);
});
Aggregation function:
pipeline = [
{
'$group': {
'_id': {
'$year': '$birthday'
},
'count': {
'$sum': 1
}
}
}, {
'$sort': {
'_id': 1
}
}, {
'$out': output_collection
}
];
cur = db[source_collection].runCommand('aggregate', {
pipeline: pipeline,
allowDiskUse: true
});

After running the aggregation to the output collection, you need to run another command that clones the collection to another database using db.cloneCollection() as follows:
db.runCommand({
cloneCollection: "test.output_collection",
from: "mongodb.example.net:27017",
query: { active: true }
})
The above copies the output_collection collection from the test database on the server at mongodb.example.net. The operation only copies documents that satisfy the query { active: true } but the query arguments is optional. cloneCollection always copies indexes.

Starting Mongo 4.2, the new $merge aggregation operator can be used to write the result of an aggregation pipeline to the specified collection within another database:
db.collection.aggregate([
// { $group: { "_id": { $year: "$birthday" }, "count": { $sum: 1 } } },
// { $sort: { "_id": 1 } },
{ $merge: { into: { db: "to", coll: "collection" } } }
])
Note that if the targeted collection already contains records, the $merge operator comes with many options to specify how to merge inserted records conflicting with existing records.

Related

MongoDB: How to update document fields based on document field calculations?

I'm wondering how I can perform calculations on document fields, and then update an existing field in that document based on those calculations?
I'm currently using a roundabout way of doing it (below), but I'm wondering if there's a more performant or straight-forward way?
Or if possible, have a document field ("cumulativeField") that is dynamic and updates in the following way:
db.collection
.aggregate([
{ $match: { arrayField: { $exists: true } } },
{ $addFields: { cumulativeField: { $sum: "$arrayField.number" } } }
])
.forEach(function (x){
db.collection.updateOne(
{ id: x.id },
{ $set: { cumulativeField: NumberInt(x.cumulativeField) } }
)})
Note: arrayField = an "array of objects" field, with each object in the array having a key "number" whose value(s) I am summing up to then put as a single value into the "cumulativeField" field.
MongoDB >= 4.2 supports pipeline updates, and updates can be like aggregation, aggregation result is the new value of the document.
In you case i think you only need to write the code as
updateOne({},
[{ $match: { arrayField: { $exists: true } } },
{ $addFields: { cumulativeField: { $toInt: { $sum: "$arrayField.number" } } } }])

MongoDB - count by field, and sort by count

I am new to MongoDB, and new to making more than super basic queries and i didn't succeed to create a query that does as follows:
I have such collection, each document represents one "use" of a benefit (e.g first row states the benefit "123" was used once):
[
{
"id" : "1111",
"benefit_id":"123"
},
{
"id":"2222",
"benefit_id":"456"
},
{
"id":"3333",
"benefit_id":"456"
},
{
"id":"4444",
"benefit_id":"789"
}
]
I need to create q query that output an array. at the top is the most top used benefit and how many times is was used.
for the above example the query should output:
[
{
"benefit_id":"456",
"cnt":2
},
{
"benefit_id":"123",
"cnt": 1
},
{
"benefit_id":"789",
"cnt":1
}
]
I have tried to work with the documentation and with $sortByCount but with no success.
$group
$group by benefit_id and get count using $sum
$sort by count descending order
db.collection.aggregate([
{
$group: {
_id: "$benefit_id",
count: { $sum: 1 }
}
},
{ $sort: { count: -1 } }
])
Playground
$sortByCount
Same operation using $sortByCount operator
db.collection.aggregate([
{ $sortByCount: "$benefit_id" }
])
Playground

how to delete many documents from a collection based on a condition with values from an other collection

here is attached the aggregate query
i want to delete all the returning values of this query from the same collection "History"
how to do it ?
lets say i have a collection of stock-market companies records named "history" as
{_id:"value",symbol:"value",date:"value",open:"value",close:"value",......}
my file is supposed to have a document for each company for each day , in total 42days of records for each company
but after checking the data it seems like some companies doesn't have all the 42days records "one document/day" they have less
so i want to delete the companies who doesn't have exactly 42 documents
my group by will be on the "symbol" my count on "date" i can get the list but i don't know how to delete it
You can remove them running .remove method.
db.history.aggregate(...).forEach(function(doc){
db.history.remove({symbol: doc._id});
})
Note: It's very slow.
Alternative solution: Change aggregation criteria to return valid documents and override history collection with $out operator:
db.history.aggregate([
{
$group: {
_id: "$symbol",
nbr_jours: {
$sum: 1
},
data: {
$push: "$$ROOT"
}
}
},
{
$match: {
nbr_jours: {
$gte: 42 //$eq
}
}
},
{
$unwind: "$data"
},
{
$replaceRoot: {
newRoot: "$data"
}
},
{
$out: "history"
}
])
Note: It's very fast.

Mongodb - aggregation of subdocument and update with the result

I have the following problem. I have found and summarized each value in a subdocument.
It gives the following [ { _id: 551fb140e4b04589d8997213, sumOfpeople: 342 } ]
I want to take the sumOfpeople and insert it to the same House( the same req.params.house_id)
House.aggregate([
{ $match: {
id: req.params.house_id
}},
{ $unwind: '$people' }, // unwind creates a doc for every array element
{ $group: {
_id: '$_id',
sumOfpeople: { $sum: '$people.nr'}
}}
], function (err, result) {
if (err) {
console.log(err);
return;
}
console.log(result);
});
This is the model that I want insert the result after the aggregation into.
module.exports = mongoose.model('House', {
id: String,
people: [{
id: String,
nr: Number
}],
sumOfpeople: Number //this is the field that I want to update after the aggregation
});
I have tried to use $set : {sumOfpeople: { $sum: '$people.nr'}}.
Is it possible to use $set inside an aggregation, or how can it be solved otherwise?
There's no way in MongoDB to write results directly into an existing document while doing an aggregation.
You've got 2 options:
retrieve the results in your application code, and then in a second query update the document.
use the $out operator, that will write the results of the aggregation into a new collection. This operation will delete all documents in the results collection and insert the new one. ( http://docs.mongodb.org/manual/reference/operator/aggregation/out/ )

mongodb aggregation framework group + project

I have the following issue:
this query return 1 result which is what I want:
> db.items.aggregate([ {$group: { "_id": "$id", version: { $max: "$version" } } }])
{
"result" : [
{
"_id" : "b91e51e9-6317-4030-a9a6-e7f71d0f2161",
"version" : 1.2000000000000002
}
],
"ok" : 1
}
this query ( I just added projection so I can later query for the entire document) return multiple results. What am I doing wrong?
> db.items.aggregate([ {$group: { "_id": "$id", version: { $max: "$version" } }, $project: { _id : 1 } }])
{
"result" : [
{
"_id" : ObjectId("5139310a3899d457ee000003")
},
{
"_id" : ObjectId("513931053899d457ee000002")
},
{
"_id" : ObjectId("513930fd3899d457ee000001")
}
],
"ok" : 1
}
found the answer
1. first I need to get all the _ids
db.items.aggregate( [
{ '$match': { 'owner.id': '9e748c81-0f71-4eda-a710-576314ef3fa' } },
{ '$group': { _id: '$item.id', dbid: { $max: "$_id" } } }
]);
2. then i need to query the documents
db.items.find({ _id: { '$in': "IDs returned from aggregate" } });
which will look like this:
db.items.find({ _id: { '$in': [ '1', '2', '3' ] } });
( I know its late but still answering it so that other people don't have to go search for the right answer somewhere else )
See to the answer of Deka, this will do your job.
Not all accumulators are available in $project stage. We need to consider what we can do in project with respect to accumulators and what we can do in group. Let's take a look at this:
db.companies.aggregate([{
$match: {
funding_rounds: {
$ne: []
}
}
}, {
$unwind: "$funding_rounds"
}, {
$sort: {
"funding_rounds.funded_year": 1,
"funding_rounds.funded_month": 1,
"funding_rounds.funded_day": 1
}
}, {
$group: {
_id: {
company: "$name"
},
funding: {
$push: {
amount: "$funding_rounds.raised_amount",
year: "$funding_rounds.funded_year"
}
}
}
}, ]).pretty()
Where we're checking if any of the funding_rounds is not empty. Then it's unwind-ed to $sort and to later stages. We'll see one document for each element of the funding_rounds array for every company. So, the first thing we're going to do here is to $sort based on:
funding_rounds.funded_year
funding_rounds.funded_month
funding_rounds.funded_day
In the group stage by company name, the array is getting built using $push. $push is supposed to be part of a document specified as the value for a field we name in a group stage. We can push on any valid expression. In this case, we're pushing on documents to this array and for every document that we push it's being added to the end of the array that we're accumulating. In this case, we're pushing on documents that are built from the raised_amount and funded_year. So, the $group stage is a stream of documents that have an _id where we're specifying the company name.
Notice that $push is available in $group stages but not in $project stage. This is because $group stages are designed to take a sequence of documents and accumulate values based on that stream of documents.
$project on the other hand, works with one document at a time. So, we can calculate an average on an array within an individual document inside a project stage. But doing something like this where one at a time, we're seeing documents and for every document, it passes through the group stage pushing on a new value, well that's something that the $project stage is just not designed to do. For that type of operation we want to use $group.
Let's take a look at another example:
db.companies.aggregate([{
$match: {
funding_rounds: {
$exists: true,
$ne: []
}
}
}, {
$unwind: "$funding_rounds"
}, {
$sort: {
"funding_rounds.funded_year": 1,
"funding_rounds.funded_month": 1,
"funding_rounds.funded_day": 1
}
}, {
$group: {
_id: {
company: "$name"
},
first_round: {
$first: "$funding_rounds"
},
last_round: {
$last: "$funding_rounds"
},
num_rounds: {
$sum: 1
},
total_raised: {
$sum: "$funding_rounds.raised_amount"
}
}
}, {
$project: {
_id: 0,
company: "$_id.company",
first_round: {
amount: "$first_round.raised_amount",
article: "$first_round.source_url",
year: "$first_round.funded_year"
},
last_round: {
amount: "$last_round.raised_amount",
article: "$last_round.source_url",
year: "$last_round.funded_year"
},
num_rounds: 1,
total_raised: 1,
}
}, {
$sort: {
total_raised: -1
}
}]).pretty()
In the $group stage, we're using $first and $last accumulators. Right, again we can see that as with $push - we can't use $first and $last in project stages. Because again, project stages are not designed to accumulate values based on multiple documents. Rather they're designed to reshape documents one at a time. Total number of rounds is calculated using the $sum operator. The value 1 simply counts the number of documents passed through that group together with each document that matches or is grouped under a given _id value. The project may seem complex, but it's just making the output pretty. It's just that it's including num_rounds and total_raised from the previous document.