MongoDB multiple schemas in one collection - mongodb

I am new to mongo and have done a lot of reading and a proof of concept. There are many discussions about multiple collections or embedded documents. Isn't there another choice? Ignoring my relationalDB mind... Couldn't you put two different schemas in the same collection?
Crude example:
{
_id: 'f48a2dea-e6ec-490d-862a-bd1791e76d9e',
_owner: '7a147aad-e3fd-4e55-9fd5-e2cb48d31a83'
manufacturer: 'Porsche',
model: '911',
img: '<byte array>'
},{
_id: '821ca9b7-faa1-4516-a27e-aec79fcb89a9',
_owner: '46ade116-cd59-4d0c-a4d3-cd2e517a256c',
manufacturer: 'Nissan',
model: 'GT-R',
img: '<byte array>'
},{
_id: '87999e27-c98b-4cad-b444-75626f161840'
_owner: 'fba765c8-32dd-49ba-91d3-d361b40bf4a7',
manufacturer: 'BMW',
model: 'M3',
wiki:'http://en.wikipedia.org/wiki/Bmw_m3',
img: '<byte array>'
}
and a totally difference schema in the same collection as well
{
_id: '7a147aad-e3fd-4e55-9fd5-e2cb48d31a83',
name: 'Keeley Bosco',
email: 'katlyn#jenkinsmaggio.net,
city": 'Lake Gladysberg',
mac: '08:fd:0b:cd:77:f7',
timestamp: '2015-04-25 13:57:36 +0700',
},{
_id: '46ade116-cd59-4d0c-a4d3-cd2e517a256c',
name: 'Rubye Jerde',
email: 'juvenal#johnston.name',
city: null,
mac: '90:4d:fa:42:63:a2',
timestamp: '2015-04-25 09:02:04 +0700',
},{
_id: 'fba765c8-32dd-49ba-91d3-d361b40bf4a7',
name: 'Miss Darian Breitenberg',
email: null,
city: null,
mac: 'f9:0e:d3:40:cb:e9',
timestamp: '2015-04-25 13:16:03 +0700',
}
(The reason I don't use an embedded document (in my real POC) is that a person may have 80000 "cars" and go over the 16MB limit).
Besides the aching desire to compartmentalize data is there a downfall here?
The reasoning for doing this may be so that we can correlate the records... I do see that 3.2 has join. The project it too new to know all of the business cases.

Although Mongodb supports different schema within a same collection. However, as a good practice, better to stick to one schema or similar schema through out the collection, so your application logic will be simpler.
In your case, yes, it is good that you didn't use a embedded document considering the size of the sub document. However, I would suggest to go for normalized data model which is not really bad in this kind of situation.
Further you can refer here: https://docs.mongodb.com/master/core/data-model-design/

Related

Mongoose findOne not working as expected on nested records

I've got a collection in MongoDB whose simplified version looks like this:
Dealers = [{
Id: 123,
Name: 'Someone',
Email: 'someone#somewhere.com',
Vehicles: [
{
Id: 1234,
Make: 'Honda',
Model: 'Civic'
},
{
Id: 2345,
Make: 'Ford',
Model: 'Focus'
},
{
Id: 3456,
Make: 'Ford',
Model: 'KA'
}
]
}]
And my Mongoose Model looks a bit like this:
const vehicle_model = mongoose.Schema({
Id: {
Type: Number
},
Email: {
Type: String
},
Vehicles: [{
Id: {
Type: Number
},
Make: {
Type: String
},
Model: {
Type: String
}
}]
})
Note the Ids are not MongoDB Ids, just distinct numbers.
I try doing something like this:
const response = await vehicle_model.findOne({ 'Id': 123, 'Vehicles.Id': 1234 })
But when I do:
console.log(response.Vehicles.length)
It's returned all the Vehicles nested records instead on the one I'm after.
What am I doing wrong?
Thanks.
This question is asked very frequently. Indeed someone asked a related question here just 18 minutes before this one.
When query the database you are requesting that it identify and return matching documents to the client. That is a separate action entirely than asking for it to transform the shape of those documents before they are sent back to the client.
In MongoDB, the latter operation (transforming the shape of the document) is usually referred to as "Projection". Simple projections, specifically just returning a subset of the fields, can be done directly in find() (and similar) operations. Most drivers and the shell use the second argument to the method as the projection specification, see here in the documentation.
Your particular case is a little more complicated because you are looking to trim off some of the values in the array. There is a dedicated page in the documentation titled Project Fields to Return from Query which goes into more detail about different situations. Indeed near the bottom is a section titled Project Specific Array Elements in the Returned Array which describes your situation more directly. In it is where they describe usage of the positional $ operator. You can use that as a starting place as follows:
db.collection.find({
"Id": 123,
"Vehicles.Id": 1234
},
{
"Vehicles.$": 1
})
Playground demonstration here.
If you need something more complex, then you would have to start exploring usage of the $elemMatch (projection) operator (not the query variant) or, as #nimrod serok mentions in the comments, using the $filter aggregation operator in an aggregation pipeline. The last option here is certainly the most expressive and flexible, but also the most verbose.

Dynamically updating and getting data in MongoDB

so I've been messing around with Mongo lately using Mongoose and I came to a bump lately. I want to update and get something but without specifically targeting it. Let me explain myself better.
I have this schema:
id: {
required: true,
type: String
},
information: {
number: String,
Identification: String,
title: String,
address: String
},
products: {
}
Now ofcourse I won't hardcode every product into the schema because there are a lot of products..etc, what I eventually want to do is to update doc.updateOne({'products.productIDHere.review': newReviewData}, { new: true, upsert: true, setDefaultsOnInsert: true })
So whenever a client changes their review or rating..etc it will update that.
Here are my questions:
1- How do I insert the products individually without overwriting everything within products:{}.
2- How do I update the review or rating value within a certain product.
3- How do I get information about that product because I cannot do something like doc.products.product.id.review, product.id is the only information I have about the product.
4- Do I need to change something about the schema?
Please try to answer with Mongoose as some answers are different in MongoDB than how their executed in Mongoose. No problem if you rather answer in MongoDB sense though.
This is a time-honored data design: products and reviews. A good, simple, scalable way to approach it is with two collections: product and reviews. The product collection contains all details about a product and carries the product ID (pid):
{pid: "ABC123", name: "TV", manu: "Sony", ...}
{pid: "G765", name: "Fridge", manu: "Whirlpool", ...}
The reviews collection is an ever-growing list of pid, timestamp, and review information.
{pid: "G765", ts: ISODate("2020-03-04), author: "A1", review: "Great", rating: 4}
{pid: "G765", ts: ISODate("2020-03-05), author: "A2", review: "Good", rating: 3}
{pid: "G765", ts: ISODate("2020-03-06), author: "A3", review: "Awesome", rating: 5}
If you're thinking this sounds very relational, that's because it is and it is a good design pattern.
It answers the OP questions easily:
1- How do I insert the products individually without overwriting everything within products:{}. ANSWER: You simply add a new product doc with a new pid to the product collection.
2- How do I update the review or rating value within a certain product. ANSWER Not sure you want to do that; you probably want to accumulate reviews over time. But since each review is a separate doc (with a separate _id) you can easily do this:
db.reviews.update({_id:targetID},{$set: {review:"new text"}});
3- How do I get information about that product because I cannot do something like doc.products.product.id.review, product.id is the only information I have about the product.
Easy:
db.product.find({pid:"ABC123"})
or
db.product.find({name:"TV"})

Sub-structures vs Flat Data-Structure in MongoDB - NoSQL

I try to understand how to best structure a MongoDB Schema and therefore looking for guidance especially on using substructures (embedded documents) vs. a flat data structure.
Let's imagine we want to store a User account within MongoDB. The user has only one address, therefore we could choose one of the two following structures:
{
_id: String,
username: String,
firstname: String,
surname: String,
email: String,
street: String,
city String,
zip: Number,
}
or
{
_id: String,
name: {
first: String,
last: String,
}
email: String,
address: {
street: String,
city String,
zip: Number,
}
}
What are the advantages / disadvantages of each of the structures. Is there a rule when to use substructures or when to use a flat structure? What is the reasoning for one against the other?
Thank you in advance!
There are various data modeling patterns and schema design provided in MongoDB.I will share my experience what problems I have faced and what are the benefits of different DB schema. We will discuss it one by one below:
Embedded VS Flat data structure: In this case there is not much difference between both of the pattern but in case of data model in embedded form we are grouping similar kind of data so that makes your query little bit easy or small in size while you will $project data from any collection.
For example: if you want to fetch complete address then in case of embedded doc you don't need to $project address fields individually and if you want to skip address field while fetching document then you do not need to skip address fields individually.
Embedded (one to one) VS Embedded (one to many): As we discuss benefits of the embedded document on flat data structure but in case, if our users are having more then one addresses then we need to go for embedded documents with one to many relationship.
The schema for defining one to one and one to many relationship is as below:
One To One Relation schema:
{
_id: String,
name: {
first: String,
last: String,
}
email: String,
address: {
street: String,
city String,
zip: Number,
}
}
One To Many Relationship schema:
{
_id: String,
name: {
first: String,
last: String,
}
email: String,
address: [{ // Embedded address doc with one to many relationship
street: String,
city String,
zip: Number,
}]
}
In case of one to one relationship it will not that much affect your query part but if you will go with one to many relationship there will be many conceptual changes in your query.
For example: As mainly we are facing different scenarios while updating both kinds of data structures so I will share the difference between update queries.
To update data embedded with one to one relationship you can simply use dot notation.
db.collection.update(
{ _id: 'anyId' },
{ $set: { "address.street": "abc" } }
)
To update data embedded with one to many relationship you need to use $ operator. In this one there are two different cases. First, if you want to update specific element of subdocument and second if you want to update all subdocuments:
Case 1 query will be (with the use of $ operator):
db.collection.update(
{ 'address.streent': 'abc' },
{ $set: { "address.$.street": "xyz" } }
)
Case 2 query will be (with use of $[]):
db.collection.update(
{ 'address.streent': 'abc' },
{ $set: { "address.$[]": "xyz" } }
)

Mongo “manual reference” performance compare to traditional DB’s “table joining”

According to the official document: "manual reference" operation is usually preferred, experienced guy even suggest never use DBref, then I am seriously concerning how much of the performance penalty to do twice query when I want to query entities with relational collection, especially comparing with the traditional relational DB - we can retrieve the expected result within one query using table joins.
Denormalize example:
db.blogs.insert({
_id: 1,
title: "Investigation on MongoDB",
content: "some investigation contents",
post_date: Date.now(),
permalink: "http://foo.bar/investigation_on_mongodb",
comments: [
{ content: "Gorgeous post!!!", nickname: "Scott", email: "foo#bar.org", timestamp: "1377742184305" },
{ content: "Splendid article!!!", nickname: "Guthrie", email: "foo#bar.org", timestamp: "1377742184305" }
]}
)
We can simply use: db.blogs.find() to get everything we want: blog posts with comments belong to them.
Normalize example:
db.books.insert({
  _id: 1,
  name: "MongoDB Applied Design Patterns",
  price: 35,
  rate: 5,
  author: "Rick Copeland",
  ISBN: "1449340040",
  publisher_id: 1,
  reviews: [
    { isUseful: true, content: "Cool book!", reviewer: "Dick", timestamp: "1377742184305" },
    { isUseful: true, content: "Cool book!", reviewer: "Xiaoshen", timestamp: "1377742184305" }
  ]
  }
);
  
db.publishers.insert({
  _id: 1,
  name: "Packtpub INC",
  address: "2nd Floor, Livery Place 35 Livery Street Birmingham",
  telephone: "+44 0121 265 6484",
  }
);
Now if I want to get the complete information about a single book I have to manually query twice, similar to below:
> var book = db.books.find({ "name": { $regex: 'mongo*', $options: 'i' } })
> db.publishers.find({ _id: book.publisher_id })
Things I know is: the precedence operations will be process by Mongo "in memory", but I will have the summarized question below:
In simply words: document oriented database advocates "denormalize" data to retrieve result within one query, however, when we have to store relational data, it "suggest" you to use "manual reference", which means twice query, while in relational DB there will be only one time query by using "table joining".
This makes no sense for me:)
A relational database also performs a JOIN by querying both tables. But it has the advantage that it can do this internally and doesn't have to communicate with the client to do this. It can query the 2nd table immediately.
MongoDB first needs to send the results of the first query to the client before the application can formulate and send the 2nd query back to the database. The time lost by this is:
Network latency between database server and application server (a couple ms)
Interprete the response on the application server and generate a $in-query from it (a couple µs)
Network latency between application server and database server (a couple ms)
Depending on how well the application server and the database server are interconnected, we are talking about a penalty of a few ms here.

MongoDB design/refactoring

I have a project that has (ta-daaa) scope-crept on me.
What started as a simple app to track calibrated tools (each tool has a yearly rotation cycle to check calibration) has turned into inventory tracking too.
So my current model has some required fields and an embedded doc of calibrations:
{
_id: ObjectId("51b0d94c3f72fb89c9000014"),
barcode: "H-131887",
calibrations: [
{
_id: ObjectId("51b0d94c3f72fb89c9000015"),
cal_date: ISODate("2013-07-03T16:04:57.893Z"),
cal_date_due: ISODate("2013-07-03T16:04:57.894Z"),
ats_in: ISODate("2013-06-01T16:04:57.895Z"),
ats_out: ISODate("2013-06-06T16:04:57.897Z")
},
{
_id: ObjectId("51b0e6053f72fbb27900001b"),
cal_date: ISODate("2013-06-13T00:00:00Z"),
cal_date_due: ISODate("2014-06-13T00:00:00Z"),
ats_in: ISODate("2013-06-06T00:00:00Z"),
ats_out: ISODate("2013-06-17T00:00:00Z"),
updated_at: ISODate("2013-07-09T14:44:31.113Z"),
created_at: ISODate("2013-06-06T19:41:57.770Z")
}
],
created_at: ISODate("2013-06-06T18:47:40.481Z"),
creator_id: ObjectId("5170547c791e4b1a16000001"),
description: "",
group: "engine",
location: "Cabinet 1",
maker: "MITUTOYO",
model: "2046S",
serial: "QEL228",
status: "In",
tool: "Dial Indicator",
updated_at: ISODate("2013-07-09T14:44:31.103Z")
}
What would be the best way to allow non-calibrated tools in this schema where Barcode/Serial are not required for those tools? Also, they won't have calibration dates, so my current table that lists the tool and most recent calibration date won't be happy returning nil calibrations...
It is unlikely that you will need to refactor your database schema.
MongoDB is supposed to work with heterogeneous data. That means not all documents in the same collection need to have the same fields. It is no problem at all for MongoDB when some documents have fields and even sub-documents regarding calibration information and some have not.
When you have a find-query which is not supposed to return documents which don't have calibration information, you can just add the find-condition calibrations: { $exists: true } and only return those documents where the calibration field exists. But even a query like find({"calibrations.cal_date_due":{$lt:ISODate()}) will not choke on documents which don't have a field calibrations and thus no calibrations.cal_date_due either. It will just skip these documents silently.