MongoDB. query from three collections - mongodb

There are collections
city: {"_id", "name"}
company: {"_id", "name", "cityID"}
comments: {"_id", "text", "companyID"}
You must select the last 10 comments on companies of a certain city.
Now I select _id first of all companies in the city, and then the 10 comments received on _id
Here's the code:
$ db-> execute ('function () {
var result = {};
var company = [];
result.company = [];
db.company.find ({"city": "msk"}, {"title": 1, "_id": 1}). forEach (function (a) {
result.company [a._id] = Object.deepExtend (a, result [a._id]);
company.push (a._id);
});
var comments = [];
result.comments = [];
db.comments.find ({"company": {"$ in": company}}). sort ({"createTime": -1}). limit (10). forEach (function (a) {
result.comments [a._id] = Object.deepExtend (a, result [a._id]);
comments.push (a._id);
});
return result;
} ');
Esti's a better option to do what I need?
Thanks in advance for your advice!

Right now your code
db.comments
.find({"company": {"$in": company}})
.sort({"createTime": -1})
.limit(10)
is only fetching the last 10 comments overall for the specified city, rather than the last 10 for each company. Is this what you want? If so, then it might make sense to just store a "city" field for each comment, and just query the comments collection based on city.
If you want to last 10 for each company, with your current schema you will have to make individual queries on the comments collection for each relevant company.
What is the use case for your data?
Edit:
In a relational database model, you would be able to execute a JOIN query that would get you the information from both tables (companies and comments) in one query. However, since MongoDB does not support joins, if you want to be able to fetch the information you need in a single query then you need to denormalize the data a little bit by storing a city field in the comments collection.

Related

How to update a field in a MongoDB collection so it increments linearly

I've been struggling with mongodb for some time now, and the idea is quite simple: I have a collection, and I want o to add a new ID field. This field is controlled by our API, and it auto increments it for each new document inserted.
The thing is, the collection already has some documents, so I must initialize each document with a number sequentially, no matter the order:
collection: holiday {
'date': date,
'name': string
}
The collection has 12 documents, so each document should get an ID property, with values from 1 to 12. What kind of query or function should I use to do this? No restrictions so far, and performance is not a problem.
Maybe it is not optimal but works :)
var newId = 1;
var oldIds = [];
db.holiday.find().forEach(it => {
const documentToMigrate = it;
oldIds.push(it._id);
documentToMigrate._id = newId;
db.holiday.save(documentToMigrate);
++newId;
})
db.holiday.remove({_id: {$in: oldIds}});

Mongo Shell Querying one collection with the results of another

I have 2 different collections and I am trying to query the first collection and take the output of that as an input to the second collection.
var mycursor = db.item.find({"itemId":NumberLong(123)},{"_id":0})
var outp = "";
while(mycursor.hasNext()){
var rec = mycursor.next()
outp = outp + rec.eventId;
}
This query works fine and returns me a list of eventIds.
I have another collection named users, which has eventId field in it. A eventId can repeat in multiple users. So for each eventId I get in the above query I want to get the list of users too.
My query for the second collection would be something like this :
db.users.find({"eventId":ObjectdId("each eventId from above query")},{"_id":0})
My final result would be a unique list of users.
Whell this should basically work ( to a point that is ):
db.events.find({
"eventId": { "$in": db.item.find({
"itemId":NumberLong(123)
}).map(function(doc) { return doc.eventId }) }
})
Or even a bit better:
db.events.find({
"eventId": { "$in": db.item.distinct("eventId",{
"itemId":NumberLong(123) }) }
})
The reason is that "inner query" is evaluated before the outer query is sent. So you get an array of arguments for use with $in.
That is basically the same as doing the following, which translates better outside of the shell:
var events = db.item.distinct("eventId",{ "itemId":NumberLong(123) });
db.events.find({ "eventId": { "$in": events } })
If the results are "too large" however, then your best approach is to loop the initial results as you have done already and build up an array of arguments. Once at a certain size then do the same $in query several times in "pages" to get the results.
But looking for "ditinct" eventId via .distinct() or .aggregate() will help.

MongoDB - Aggregation on referenced field

I've got a question on the design of documents in order to be able to efficiently perform aggregation. I will take a dummy example of document :
{
product: "Name of the product",
description: "A new product",
comments: [ObjectId(xxxxx), ObjectId(yyyy),....]
}
As you could see, I have a simple document which describes a product and wraps some comments on it. Imagine this product is very popular so that it contains millions of comments. A comment is a simple document with a date, a text and eventually some other features. The probleme is that such a product can easily be larger than 16MB so I need not to embed comments in the product but in a separate collection.
What I would like to do now, is to perform aggregation on the product collection, a first step could be for example to select various products and sort the comments by date. It is a quite easy operation with embedded documents, but how could I do with such a design ? I only have the ObjectId of the comments and not their content. Of course, I'd like to perform this aggregation in a single operation, i.e. I don't want to have to perform the first part of the aggregation, then query the results and perform another aggregation.
I dont' know if that's clear enough ? ^^
I would go about it this way: create a temp collection that is the exact copy of the product collection with the only exception being the change in the schema on the comments array, which would be modified to include a comment object instead of the object id. The comment object will only have the _id and the date field. The above can be done in one step:
var comments = [];
db.product.find().forEach( function (doc){
doc.comments.forEach( function(x) {
var obj = {"_id": x };
var comment = db.comment.findOne(obj);
obj["date"] = comment.date;
comments.push(obj);
});
doc.comments = comments;
db.temp.insert(doc);
});
You can then run your aggregation query against the temp collection:
db.temp.aggregate([
{
$match: {
// your match query
}
},
{
$unwind: "$comments"
},
{
$sort: { "comments.date": 1 } // sort the pipeline by comments date
}
]);

MongoDB: How can I execute multiple dereferrence in a single query?

There are several collections, i.e. Country, Province, City, Univ.
Just like in the real world, every country has several provinces, and every province has several cities, and every city has several universities.
How can I know whether a university is in the given country?For example, country0 may have some universities, what are their _ids?
Documents in those collections are showed below:
{
_id:"country0",
provinces:[
{
$ref:"Province",
$id:"province0"
},
...
]
}
{
_id:"province0",
belongs:{$ref:"Country", $id:"country0"},
cities:[
{
$ref:"City",
$id:"city0"
}
...
]
}
{
_id:"city0",
belongsTo:{$ref:"Province",$id:"province0"},
univs:[
{
$ref:"Univ",
$id:"univ0"
}
...
]
}
{
_id:"univ0",
address:{$ref:"City", $id:"city0"}
}
If there are only two collections, I know fetch() may be useful.
Also, python drivers may be useful, but I can't know their performance well, because I can't use db.system.profile in a .py file.
MongoDB doesn't do joins. N queries are required to get information from N collections. In this situation, to get the _id's of universities in a given country in an array one could do the following (in the mongo shell):
> var country = db.countries.findOne({ "_id": "country0" });
> var province_ids = [];
> country.provinces.forEach(function(province) { province_ids.push(province["$id"]); });
> var provinces = db.provinces.find({ "_id": { "$in": province_ids });
> var city_ids = [];
> provinces.forEach(function(province) { province.cities.forEach(function(city) { city_ids.push(city["$id"]); }); });
> var cities = db.cities.find({ "_id": { "$in": city_ids } });
> univ_ids = [];
> cities.forEach(function(city) { city.univs.forEach(function(univ) { univ_ids.push(univ["$id"]); }); });
It's also possible to accomplish the same thing using the belongsTo field, using similar steps. This is cumbersome and it seems like there should be a better way. There is! Normalize the data. Countries have provinces, which have cities, which have universities, but the relationships are fixed and not of huge cardinality. For doing queries like "what universities are in a given country?" I would suggest storing province documents entirely within countries and university documents entirely within city documents. You could store cities inside of province documents, or inside country documents directly, but a province or country could have hundreds or thousands of cities and this might be too much information for one document (16MB limit per document in MongoDB). Having provinces in countries and universities in cities reduces the number of queries necessary to two.
Another option is to store more information in each child document. Essentially you have a forest (a collection of trees): countries are parents of provinces which are parents of cities which are parents of universities. The belongsTo field is a parent reference. You could store a reference to all ancestors instead of just the parent. Then finding all universities in a given country is one query on the universities collection.
> db.universities.findOne();
{
_id: "univ0",
city: "city0",
province: "province0",
country: "country0"
}
> db.universities.find({ "country": "country0" });
The schema design that is best for you depends on the types of queries your application will need to answer and their relative frequencies and importance. I can't determine that from your question so I can't firmly recommend one schema over another.
As to your mini-question about performance and the db.system.profile collection, note that db.system.profile is a collection. You can query it from a .py file using a driver.

Applying condition to multiple documents for the same field in MongoDB

I have a documents with the following structure:
user {id: 123, tag:"tag1"}
user {id: 123, tag:"tag2"}
user {id: 123, tag:"tag3"}
user {id: 456, tag:"tag1"}
Given user id I want to find if this user has records with all 3 tags (AND operand).
If user has records for "tag1" AND "Tag2" AND "tag3" then return true
SQL equivalent would be like this:
SELECT * FROM users WHERE
EXISTS (SELECT * FROM tags WHERE user_id = users.id AND name ='tag1') AND
EXISTS (SELECT * FROM tags WHERE user_id = users.id AND name ='tag2') AND
user_id=123
How can I express something simular in MongoDB?
Because MongoDB does not have a concept of joins, you need to work across this by reducing your input to a single document.
If you change your document structure so that you are storing an array of tags, like the following:
{id: 123, tag:["tag1","tag2","tag3"]}
{id: 456, tag:["tag1"]}
You can do a query like the following:
db.user.find({$and:[{tag:"tag1"},{tag:"tag2"},{tag:"tag3"}]})
If needed, you could write a map-reduce to emit an array of tags for a userid, so you can do the above query on it.
Edit - Including a simple map-reduce
Here is a really simple map-reduce to get the initial input into a format that is useful for the find query given above.
var map = function() {emit(this.id, {tag:[this.tag]});}
var reduce = function(key, values){
var result_array=[];
values.forEach(function(v1){
v1.tag.forEach(function(v2){
result_array.push(v2);
});
});
return {"tag":result_array};}
var op = db.user.mapReduce(map, reduce, {out:"mr_results"})
Then you can query on the map-reduce output collection, like the following:
db.mr_results.find({$and:[{"value.tag":"tag1"},{"value.tag":"tag2"}, {"value.tag":"tag3"}]})