Reference an _id in a Subdocument in another Collection Mongodb - mongodb

I am developing an application with mongodb and nodejs
I should also mention that I am new to both so please help me through this question
my database has a collection categories and then in each category I am storing products in subdocument
just like below :
{
_id : ObjectId(),
name: String,
type: String,
products : [{
_id : ObjectId(),
name : String,
description : String,
price : String
}]
});
When it comes to store the orders in database the orders collection will be like this:
{
receiver : String,
status : String,
subOrders : [
{
products :[{
productId : String,
name : String,
price : String,
status : String
}],
tax : String,
total : String,
status : String,
orderNote : String
}
]
}
As you can see we are storing _id of products which is a subdocument of categories in orders
when storing there is no issue obviously, when it comes to fetch these data if we just need the limited field like name or price there will be no issue as well, but if later on we need some extra fields from products like description,... they are not stored in orders.
My question is this:
Is there any easy way to access other fields of products apart from loop through the whole categories in mongodb, namely I need a sample code for querying the description of a product by only having its _id in mongodb?
or our design and implementation was wrong and I have to re-design it from scratch and separate the products from categories into another collection?
please don't put links to websites or weblogs that generally talks about mongodb and its collections implementations unless they focus on a very similar issue to mine
thanks in advance

I'd assume that you'd want to return as many product descriptions as matched the current list of products, so first, there isn't a query to return only matching array elements. Using $elemMatch you can return a specific element or the first match, but not only matching array elements. However, $elemMatch can also be used as a projection operator.
db.categories({ "products._id" : "PID1" },
{ $elemMatch : { "products._id" : "PID1" },
"products._id" : 1,
"products.description" : 1})
You'd definitely want to index the "products._id" field to achieve reasonable performance.
You might consider instead creating a products collection where each document contains a category identifier, much like you would in a relational database. This is a common pattern in MongoDb when embedding doesn't make sense, or complicates queries and aggregations.
Assuming that is true:
You'll need to load the data from the second collection manually. There are no joins in MognoDb. You might consider using $in which takes a list of values for a field and loads all matching documents.
Depending on the driver you're using to access MongoDb, you should be able to use the projection feature of find, which can limit the fields returned for a document to just those you've specified.
As product descriptions ardently likely to change frequently, you might also consider caching the values for a period on the client (like a web server for example).
db.products.find({ _id: { $in : [ 'PID1', 'PID2'] } }, { description : 1 })

Related

Next.js/MongoDB - Query Optimization

I am building a website using Next.js and MongoDB. On one of my website page, I have implemented filters to help search for products. To retrieve and update the filters (update item count each time a filter is changing), I have an api endpoint which query my MongoDB Collection. This specific collection contains ~200.000 items. Each item have several fields such as brand, model, place etc...
I have 9 fields which I use to filter and thus must fetch through my api each time there's a change. Therefore I have 9 queries running through my api, on for each field/filter and the query on MongoDB looks like :
var models = await db_collection
.aggregate([
{
$match: {
$and: [filter],
},
},
{
$group: { _id: '$model', count: { $sum: 1 } },
},
{ $sort: { _id: 1 } },
])
.toArray();
The problem is that, as 9 queries are running, the update of the page (mainly due to the queries) takes ~4secs which is too long. I would like to reach <1sec. I would like to now if there is a good practice I am missing such as doing one query instead of one for each filter or maybe a database optimization on my database.
Thank you,
I have tried using a $project argument before $groupon aggregate pipeline for the query to reduce the number of field returned, using distinct and then sorting instead of aggregate but none of these solutions seem to improve efficiency.
EDIT :
As suggested by R2D2, I am posting the structure of a document on MongoDB in my collection :
{
_id : ObjectId('example_id')
source : string
date : date
brand : string
family : string
model : string
size : string
color : string
condition : string
contact : string
SKU : string
}
Depending on the pages, I query unique values of each field of interest (source, date, brand, family, model, size, color, condition, contact) and their count depending on filters (e.g. Number for each unique values of model for selected brands, I also query documents based on specific values of these fields.
As mentioned, you indexes are important and if you are querying by those field I recomand to create compound indexes, see here for indexes optimisation : https://learnmongodbthehardway.com/schema/indexes/
As far as the aggregation pipeline goes, nothing is out of the ordinary, but this specific aggregation just return the number of items per model matching the criteria, not the matching document. If it is all the data you need you might find it usefull to create a new collection when you perform pre-caculation for common search daily (how many items have the color black, ...) this way, when the page loads, you don't have to look in you 200k+ items, but just in your pre-calculated statistical collection. Schedule a cron task or use a lambda function to invoke a route on your api that will calculate all your stats once a day and upsert them in a new collection.
Also I believe the "and" is useless useless since you can use the implicit $and. You can look for an object like :
{
color : {$in : ['BLACK', 'BLUE']},
size : 3
}
rather than :
[{color : 'BLACK'}, {color : 'BLUE'}, {size : 3}]
Reserve the explicit $and for when you really need it.

Is it better to save id of a document in another document as ObjectId or String

Lets take a simple "bad" example : lets assume I have 2 collections 'person' and 'address'. And lets assume in 'address' I want to store '_id' of the person the address is associated with. Is there any benefit to store this "referential key" item as ObjectId vs string in 'address' collection?
I feel like storing them as string should not hurt but I have not worked in mongo for very long and do not know if it will hurt down the road if I follow this pattern.
I read the post here : Store _Id as object or string in MongoDB?
And its said that ObjectId is faster, and I assume its true if you are fetching/updating using the ObjectId in parent collection(for eg. fetching/updating 'person' collection using person._id as ObjectId), but I couldn't find anything that suggests that same could be true if searching by string id representation in other collection(in our example search in address collection by person._id as string)
Your feedback is much appreciated.
Regardless of performance, you should store the "referential key" in the same format as the _id field that you are referring too. That means that if your referred document is:
{ _id: ObjectID("68746287..."), value: 'foo' }
then you'd refer to it as:
{ _id: ObjectID(…parent document id…), subDoc: ObjectID("68746287...")
If the document that you're pointing to has a string as an ID, then it'd look like:
{ _id: "derick-address-1", value: 'foo' }
then you'd refer to it as:
{ _id: ObjectID(…parent document id…), subDoc: "derick-address-1" }
Besides that, because you're talking about persons and addresses, it might make more sense to not have them in two documents altogether, but instead embed the document:
{ _id: ObjectID(…parent document id…),
'name' : 'Derick',
'addresses' : [
{ 'type' : 'Home', 'street' : 'Victoria Road' },
{ 'type' : 'Work', 'street' : 'King William Street' },
]
}
As for use string as id of document, in meteor collection, you could generate the document id either Random.id() as string or Meteor.Collection.ObjectID() as ObjectId.
In this discussion loop, Mongodb string id vs ObjectId, here is one good summary,
ObjectId Pros
it has an embedded timestamp in it.
it's the default Mongo _id type; ubiquitous
interoperability with other apps and drivers
ObjectId Cons
it's an object, and a little more difficult to manipulate in practice.
there will be times when you forget to wrap your string in new ObjectId()
it requires server side object creation to maintain _id uniqueness
- which makes generating them client-side by minimongo problematic
String Pros
developers can create domain specific _id topologies
String Cons
developer has to ensure uniqueness of _ids
findAndModify() and getNextSequence() queries may be invalidated
All those information above is based on the meteor framework. For Mongodb, it is better to use ObjectId, reasons are in the question linked in your question.
Storing it as objectId is benificial. It is faster as ObjectId size is 12 bytes compared to string which takes 24 bytes.
Also, You should try to de-normalize your collections so that you don't need to make 2 collections (Opposite to RDBMS).
Something like this might be better in general:
{ _id : "1",
person : {
Name : "abc",
age: 20
},
address : {
street : "1st main",
city: "Bangalore",
country: "India"
}
}
But again, it depends on your use case. This might be not suitable sometimes.
Hope that helps! :)

Fetch all the documents has reference to a certain document in MongoDB

I need all the documents from different collections that have reference to a certain "parent" document. Any easy workaround?
My naive approach would be to loop all the collections and loop all the documents and check the condition. Any easier / more efficient way?
My naive approach would be to loop all the collections and loop all the documents and check the condition. Any easier / more efficient way?
Well, without the data model that is hard to answer, but the reference could have an index, right? I mean you know which field is the reference? For example,
Users (parent) {
Id : ObjectId("123"),
...
}
Comments {
Id : ObjectId("abc"),
UserId : ObjectId("123"),
...
}
Questions {
Id : ObjectId("efc"),
UserId : ObjectId("123"),
...
}
Then, knowing that UserId is the reference, you can call db.Comments.find({ UserId : ObjectId("123") }) and find all references. That way, you don't have to loop through all documents. The only thing you need is a lookup table collection name -> name of the reference field

Need Help on Mongo DB query

There is an existing person collection in the system which is like:
{
"_id" : ObjectId("536378bcc9ecd7046700001f"),
"engagements":{
"5407357013875b9727000111" : {
"role" : "ADMINISTRATOR",
},
"5407357013875b9727000222" : {
"role" : "DEVELOPER",
}
}
}
So that multiple user objects can have the same engagement with a specific role, I need to fire a query in this hierarchy where I can get all the persons which have a specific engagement in the engagements property of person collection.
I want to get all the persons which have
5407357013875b9727000222 in the engagements.
I know $in operator could be used but the problem is that I need to compare the keys of the sub Json engagements.
I think it's as simple as this:
db.users.find({'engagements.5407357013875b9727000222': {$exists: true}})
If you want to match against multiple engagement ids, then you'll have to use $or. Sorry, no $in for you here.
Note, however, that you need to restructure your data, as this one can't be indexed to help this concrete query. Here I assume you care about performance and this query is used often enough to have impact on the database.

Can you do group by as map reduce in Mongo?

I have two collections. "people" are connected with a "location" like so:
location_id = ObjectId()
db.people.insert(
{
_id : ObjectId(),
name : "Nick",
location : location_id
});
db.locations.insert(
{
_id : location_id,
city : "Cape Town"
});
I would like to create a histogram of locations giving the count of people in each city. But I can't seem to do it with the Mongo group command because they are different collections. Is the correct way to do it with map/reduce?
map-reduce also won't help you here. It also operates on a single collection.
It seems to me that you're trying to apply your relational design skills here. Don't do that. MongoDB requires a fresh approach.
If I were you, I'd probably denormalize location into people collection.
db.people.insert(
{
_id : ObjectId(),
name : "Nick",
location : {city: "Cape Town"}
});
This way it's possible to analyze this data with map-reduce, because everything is in the same collection.