MongoDB Shell: Join a single field from another collection - mongodb

I have a collection Spots:
- spotId
- name
and Codes:
- codeId
- spotId
- code
I want to introduce the field
- spotName
to Codes (and join by spotId).
How do I update all my existing documents within mongo shell?
When I try $lookup I always get the whole Spot document, not just the name field.

If you want to get a format where you get spotName in each doc, you can use an aggregation pipeline. $lookup was a good start. Here it generates the full array, which you may $unwind and project necessary stuff
[{$lookup: {
from: 'Spots',
localField: 'spotId',
foreignField: 'spotId',
as: 'new_field'
}},
{$unwind: '$new_field'},
{$project: {
codeId: 1,
spotId: 1,
code: 1,
spotName: '$new_field.name'
}}
]
The solution above represents the case if you need to just use the results.
If you indeed want to perform an update on the collection, the only way would be to write a find which goes through all docs in Codes and then once it gets the doc, it searches a doc in Spots using findOne, and then issuing an update statement.
Codes.find().each(function(err, doc) {
Spots.findOne({spotId: doc.spotId}, function (err, item) {
Codes.update({codeId: doc.codeId}, {$set: {spotName: item.name}}, function (err, res) {})
})
});

You will have to run a script
var spotMap ={}; Spots.find({},{spotId:1,name:1}).each(function(spot){spotMap[spot.spotId]=spot.name}); Codes.find({}).each(function(code){code.spotName=spotMap[code.spotId]; db.Codes.save(code);});
Also note that if the collection is too big, there would be problem while fetching all the data at once due to mongo limit of 16MB data in memory, and then you have to run the script in batches multiple time.
Codes.find({}).limit(1000).each(function(code){//do normal stuff}
Codes.find({}).skip(1000).limit(1000).each(function(code){//do normal stuff}
Codes.find({}).skip(2000).limit(1000).each(function(code){//do normal stuff}
and so on....

Related

Mongodb aggregate $count

I would like to count the number of documents returned by an aggregation.
I'm sure my initial aggregation works, because I use it later in my programm. To do so I created a pipeline variable (here called pipelineTest, ask me if you want to see it in detail, but it's quite long, that's why I don't give the lines here).
To count the number of documents returned, I push my pipeline with :
{$count: "totalCount"}
Now I would like to get (or log) totalCount value. What should I do ?
Here is the aggregation :
pipelineTest.push({$count: "totalCount"});
cursorTest = collection.aggregate(pipelineTest, options)
console.log(cursorTest.?)
Thanks for your help, I read lot and lot doc about aggregation and I still don't understand how to read the result of an aggregation...
Assuming you're using async/await syntax - you need to await on the result of the aggregation.
You can convert the cursor to an array, get the first element of that array and access totalCount.
pipelineTest.push({$count: "totalCount"});
cursorTest = await collection.aggregate(pipelineTest, options).toArray();
console.log(cursorTest[0].totalCount);
Aggregation
db.mycollection.aggregate([
{
$count: "totalCount"
}
])
Result
[ { totalCount: 3 } ]
Your Particulars
Try the following:
pipelineTest.push({$count: "totalCount"});
cursorTest = collection.aggregate(pipelineTest, options)
console.log(cursorTest.totalCount)

MongoDB save aggregation result and retrieve saved result

My mongo version is 3.4
I'm running a complex aggregation and writing the result to a collection using $out stage.
What I wish to do is, if that aggregation is ran before and result exists on target collection, just return the saved result without rerunning the whole pipeline.
Problem 1 is, when $out is used, aggregation doesn't return any results, so I have to send a separate query to the target collection to retrieve them.
Problem 2 is, I need another query to check if aggregation is ran before.
Is it possible to combine all 3 queries to a single pipeline, so it would:
check outCollection, if result with given user_id exists, return it
if doesn't, run pipeline and save the result to outCollection
return aggregation result
my current pipeline looks like:
db.getCollection('sourceCollection').aggregate([
{
$match: {
user_id: "myUserId"
}
},...//perform various steps
{
$out: "outCollection"
}],{
allowDiskUse: true
})
Result is saved to outCollection in a document like:
{ user_id: "myUserId", aggResult: [...] }

how to populate field of one collection with count query results of another collection?

Kind of a complex one here and i'm pretty new to Mongo, so hopefully someone can help. I have a db of users. Each user has a state/province listed. I'm trying to create another collection of the total users in each state/province. Because users sign up pretty regularly, this will be an growing total i'm trying to generate and display on a map.
I'm able to query the database to find total number of users in a specific state, but i want to do this for all users and come out with a list of totals in all states/provinces and have a separate collection in the DB with all states/provinces listed and the field TOTAL to be dynamically populated with the count query of the other collection. But i'm not sure how to have a query be the result of a field in another collection.
used this to get users totals:
db.users.aggregate([
{"$group" : {_id:"$state", count:{$sum:1}}}
])
My main question is how to make the results of a query the value of a field in each corresponding record in another collection. Or if that's even possible.
Thanks for any help or guidance.
Looks like that On-Demand Materialized Views (just added on version 4.2 of MongoDB) should solve your problem!
You can create an On-Demand Materialized View using the $merge operator.
A possible definition of the Materialized View could be:
updateUsersLocationTotal = function() {
db.users.aggregate( [
{ $match: { <if you need to perform a match, like $state, otherwise remove it> } },
{ $group: { _id:"$state", users_quantity: { $sum: 1} } },
{ $merge: { into: "users_total", whenMatched: "replace" } }
] );
};
And then you perform updates just by calling updateUsersLocationTotal()
After that you can query the view just like a normal collection, using db.users_total.find() or db.users_total.aggregate().

Mongo aggregaton does not return whole result only a part of it

Can somebody tell me please where is the problem in this simple aggregation command:
db.test.aggregate([
{
$group: {
_id: "$type",
numbers: { $sum: 1 }
}
}
]).pretty()
Collection has about 2 millions of documents and everyone has type field. But the result returns only few of them as result + message "Type "it" for more" If I type "it" it returns next partial aggregation result till the end. But I want to have the whole aggregation in one result. What am I doing wrong?
Thanks.
MongoDB won't return you whole bunch of data because it has built-in pagination.
In other case (2 mil of documents) would crash your server/computer as it runs out of memory.
But, if you'd like to get all bunch of data, it's better to store it with script.
You can write script with your programming language, request db, paginate through data and store in some variable.
Example

MongoDB - Aggregation on referenced field

I've got a question on the design of documents in order to be able to efficiently perform aggregation. I will take a dummy example of document :
{
product: "Name of the product",
description: "A new product",
comments: [ObjectId(xxxxx), ObjectId(yyyy),....]
}
As you could see, I have a simple document which describes a product and wraps some comments on it. Imagine this product is very popular so that it contains millions of comments. A comment is a simple document with a date, a text and eventually some other features. The probleme is that such a product can easily be larger than 16MB so I need not to embed comments in the product but in a separate collection.
What I would like to do now, is to perform aggregation on the product collection, a first step could be for example to select various products and sort the comments by date. It is a quite easy operation with embedded documents, but how could I do with such a design ? I only have the ObjectId of the comments and not their content. Of course, I'd like to perform this aggregation in a single operation, i.e. I don't want to have to perform the first part of the aggregation, then query the results and perform another aggregation.
I dont' know if that's clear enough ? ^^
I would go about it this way: create a temp collection that is the exact copy of the product collection with the only exception being the change in the schema on the comments array, which would be modified to include a comment object instead of the object id. The comment object will only have the _id and the date field. The above can be done in one step:
var comments = [];
db.product.find().forEach( function (doc){
doc.comments.forEach( function(x) {
var obj = {"_id": x };
var comment = db.comment.findOne(obj);
obj["date"] = comment.date;
comments.push(obj);
});
doc.comments = comments;
db.temp.insert(doc);
});
You can then run your aggregation query against the temp collection:
db.temp.aggregate([
{
$match: {
// your match query
}
},
{
$unwind: "$comments"
},
{
$sort: { "comments.date": 1 } // sort the pipeline by comments date
}
]);