Sub-structures vs Flat Data-Structure in MongoDB - NoSQL

Sub-structures vs Flat Data-Structure in MongoDB - NoSQL - mongodb

I try to understand how to best structure a MongoDB Schema and therefore looking for guidance especially on using substructures (embedded documents) vs. a flat data structure.
Let's imagine we want to store a User account within MongoDB. The user has only one address, therefore we could choose one of the two following structures:
{
_id: String,
username: String,
firstname: String,
surname: String,
email: String,
street: String,
city String,
zip: Number,
}
or
{
_id: String,
name: {
first: String,
last: String,
}
email: String,
address: {
street: String,
city String,
zip: Number,
}
}
What are the advantages / disadvantages of each of the structures. Is there a rule when to use substructures or when to use a flat structure? What is the reasoning for one against the other?
Thank you in advance!

There are various data modeling patterns and schema design provided in MongoDB.I will share my experience what problems I have faced and what are the benefits of different DB schema. We will discuss it one by one below:
Embedded VS Flat data structure: In this case there is not much difference between both of the pattern but in case of data model in embedded form we are grouping similar kind of data so that makes your query little bit easy or small in size while you will $project data from any collection.
For example: if you want to fetch complete address then in case of embedded doc you don't need to $project address fields individually and if you want to skip address field while fetching document then you do not need to skip address fields individually.
Embedded (one to one) VS Embedded (one to many): As we discuss benefits of the embedded document on flat data structure but in case, if our users are having more then one addresses then we need to go for embedded documents with one to many relationship.
The schema for defining one to one and one to many relationship is as below:
One To One Relation schema:
{
_id: String,
name: {
first: String,
last: String,
}
email: String,
address: {
street: String,
city String,
zip: Number,
}
}
One To Many Relationship schema:
{
_id: String,
name: {
first: String,
last: String,
}
email: String,
address: [{ // Embedded address doc with one to many relationship
street: String,
city String,
zip: Number,
}]
}
In case of one to one relationship it will not that much affect your query part but if you will go with one to many relationship there will be many conceptual changes in your query.
For example: As mainly we are facing different scenarios while updating both kinds of data structures so I will share the difference between update queries.
To update data embedded with one to one relationship you can simply use dot notation.
db.collection.update(
{ _id: 'anyId' },
{ $set: { "address.street": "abc" } }
)
To update data embedded with one to many relationship you need to use $ operator. In this one there are two different cases. First, if you want to update specific element of subdocument and second if you want to update all subdocuments:
Case 1 query will be (with the use of $ operator):
db.collection.update(
{ 'address.streent': 'abc' },
{ $set: { "address.$.street": "xyz" } }
)
Case 2 query will be (with use of $[]):
db.collection.update(
{ 'address.streent': 'abc' },
{ $set: { "address.$[]": "xyz" } }
)

Related

Mongoose findOne not working as expected on nested records

I've got a collection in MongoDB whose simplified version looks like this:
Dealers = [{
Id: 123,
Name: 'Someone',
Email: 'someone#somewhere.com',
Vehicles: [
{
Id: 1234,
Make: 'Honda',
Model: 'Civic'
},
{
Id: 2345,
Make: 'Ford',
Model: 'Focus'
},
{
Id: 3456,
Make: 'Ford',
Model: 'KA'
}
]
}]
And my Mongoose Model looks a bit like this:
const vehicle_model = mongoose.Schema({
Id: {
Type: Number
},
Email: {
Type: String
},
Vehicles: [{
Id: {
Type: Number
},
Make: {
Type: String
},
Model: {
Type: String
}
}]
})
Note the Ids are not MongoDB Ids, just distinct numbers.
I try doing something like this:
const response = await vehicle_model.findOne({ 'Id': 123, 'Vehicles.Id': 1234 })
But when I do:
console.log(response.Vehicles.length)
It's returned all the Vehicles nested records instead on the one I'm after.
What am I doing wrong?
Thanks.

This question is asked very frequently. Indeed someone asked a related question here just 18 minutes before this one.
When query the database you are requesting that it identify and return matching documents to the client. That is a separate action entirely than asking for it to transform the shape of those documents before they are sent back to the client.
In MongoDB, the latter operation (transforming the shape of the document) is usually referred to as "Projection". Simple projections, specifically just returning a subset of the fields, can be done directly in find() (and similar) operations. Most drivers and the shell use the second argument to the method as the projection specification, see here in the documentation.
Your particular case is a little more complicated because you are looking to trim off some of the values in the array. There is a dedicated page in the documentation titled Project Fields to Return from Query which goes into more detail about different situations. Indeed near the bottom is a section titled Project Specific Array Elements in the Returned Array which describes your situation more directly. In it is where they describe usage of the positional $ operator. You can use that as a starting place as follows:
db.collection.find({
"Id": 123,
"Vehicles.Id": 1234
},
{
"Vehicles.$": 1
})
Playground demonstration here.
If you need something more complex, then you would have to start exploring usage of the $elemMatch (projection) operator (not the query variant) or, as #nimrod serok mentions in the comments, using the $filter aggregation operator in an aggregation pipeline. The last option here is certainly the most expressive and flexible, but also the most verbose.

How to remove related documents after removing by TTL?

As part of my MongoDB, I have three different collections - A, B and AtoB.
A and B are different types of entities, where AtoB connects between them as follows-
A:
{
_id: ObjectId,
timestamp: Date,
keyA: string
}
B:
{
_id: ObjectId,
timestamp: Date,
keyB: number
}
AtoB:
{
_id: ObjectId,
aId: ObjectId, // Points to a document from A
bId: ObjectId // Points to a document from B
}
I created a TTL index on A documents - that will be deleted when the timestamp key is older than an hour.
Is it possible somehow to remove all the related AtoB documents, based on the removed _id property of the removed As?
In other words, is it possible to not only remove the A documents using the TTL, but also remove the related documents of the ones the were removed?
Thanks

In a word - no, not in this set up.
The options you have:
set up a changestream worker to delete links
set up a cron job to clean up links collection every minute
embed AtoB links into A documents
I would recommend the later, but it really depends on how feasible the change is for the rest of your application. Having a dedicated lookup collection is really a RDBS practice. It has very niche usecases in Mongo universe.
Your A documents will looks like this:
{
_id: ObjectId,
timestamp: Date,
keyA: string,
bIds: [
bId: ObjectId,
bId: ObjectId,
....
]
}
When the document's ttl expires the document is removed with all links at once.

Sorting nested objects in MongoDB

So I have documents that follow this schema:
{
_id: String,
globalXP: {
xp: {
type: Number,
default: 0
},
level: {
type: Number,
default: 0
}
},
guilds: [{ _id: String, xp: Number, level: Number }]
}
So basically users have their own global XP and xp based on each guild they are in.
Now I want to make a leaderboard for all the users that have a certain guildID in their document.
What's the most efficient way to fetch all the user documents that have the guild _id in their guilds array and how do I sort them afterwards?
I know it might be messy as hell but bare with me here.

If I've understand well, you only need this line of code:
var find = await model.find({"guilds._id":"your_guild_id"}).sort({"globalXP.level":-1})
This query will return all documentas where guilds array contains the specific _id and sort by player level.
In this way the best level will be displayed first.
Here is an example how the query works. Please check if it work as you expected.

MongoDB multiple schemas in one collection

I am new to mongo and have done a lot of reading and a proof of concept. There are many discussions about multiple collections or embedded documents. Isn't there another choice? Ignoring my relationalDB mind... Couldn't you put two different schemas in the same collection?
Crude example:
{
_id: 'f48a2dea-e6ec-490d-862a-bd1791e76d9e',
_owner: '7a147aad-e3fd-4e55-9fd5-e2cb48d31a83'
manufacturer: 'Porsche',
model: '911',
img: '<byte array>'
},{
_id: '821ca9b7-faa1-4516-a27e-aec79fcb89a9',
_owner: '46ade116-cd59-4d0c-a4d3-cd2e517a256c',
manufacturer: 'Nissan',
model: 'GT-R',
img: '<byte array>'
},{
_id: '87999e27-c98b-4cad-b444-75626f161840'
_owner: 'fba765c8-32dd-49ba-91d3-d361b40bf4a7',
manufacturer: 'BMW',
model: 'M3',
wiki:'http://en.wikipedia.org/wiki/Bmw_m3',
img: '<byte array>'
}
and a totally difference schema in the same collection as well
{
_id: '7a147aad-e3fd-4e55-9fd5-e2cb48d31a83',
name: 'Keeley Bosco',
email: 'katlyn#jenkinsmaggio.net,
city": 'Lake Gladysberg',
mac: '08:fd:0b:cd:77:f7',
timestamp: '2015-04-25 13:57:36 +0700',
},{
_id: '46ade116-cd59-4d0c-a4d3-cd2e517a256c',
name: 'Rubye Jerde',
email: 'juvenal#johnston.name',
city: null,
mac: '90:4d:fa:42:63:a2',
timestamp: '2015-04-25 09:02:04 +0700',
},{
_id: 'fba765c8-32dd-49ba-91d3-d361b40bf4a7',
name: 'Miss Darian Breitenberg',
email: null,
city: null,
mac: 'f9:0e:d3:40:cb:e9',
timestamp: '2015-04-25 13:16:03 +0700',
}
(The reason I don't use an embedded document (in my real POC) is that a person may have 80000 "cars" and go over the 16MB limit).
Besides the aching desire to compartmentalize data is there a downfall here?
The reasoning for doing this may be so that we can correlate the records... I do see that 3.2 has join. The project it too new to know all of the business cases.

Although Mongodb supports different schema within a same collection. However, as a good practice, better to stick to one schema or similar schema through out the collection, so your application logic will be simpler.
In your case, yes, it is good that you didn't use a embedded document considering the size of the sub document. However, I would suggest to go for normalized data model which is not really bad in this kind of situation.
Further you can refer here: https://docs.mongodb.com/master/core/data-model-design/

MongoDB: Is a range query possible using multikeys?

var jd = {
type: "Person",
attributes: {
name: "John Doe",
age: 30
}
};
var pd = {
type: "Person",
attributes: {
name: "Penelope Doe",
age: 26
}
};
var ss = {
type: "Book",
attributes: {
name: "The Sword Of Shannara",
author: "Terry Brooks"
}
};
db.things.save(jd);
db.things.save(pd);
db.things.save(ss);
db.things.ensureIndex({attributes: 1})
db.things.find({"attributes.age": 30}) // => John Doe
db.things.find({"attributes.age": 30}).explain() // => BasicCursor... (don't want a scan)
db.things.find({"attributes.age": {$gte: 18}) // John Doe, Penelope Doe (via a scan)
The goal is that all attributes be indexed and searchable via range queries and that the index actually be used (as opposed to a collection scan). There's no telling what attributes a document will have. I have read about multikeys but they seem only to work (by index) with exact-match queries.
Multikeys prefers this format for a document:
var pd = {
type: "Person",
attributes: [
{name: "Penelope Doe"},
{age: 26}
]
};
Is there a pattern where by one index I can find items by attribute using a range?
EDIT:
In a schemaless DB it makes sense to have potentially a limitless array of types, yet a collection name practically implies some sort of type. But if we go to the extreme, we want to allow for any number of types within a collection (so that we don't have to define a collection for every conceivable custom type a user might imagine). Searching, therefore, by attributes (of any sort) with just a single deep index (that supports ranged queries) makes this sort of thing far more feasible. Seems to me a natural fit for a schemaless DB.
Opened a ticket if you wanna vote it up:
http://jira.mongodb.org/browse/SERVER-2675

Yes range queries work with multikeys. However multikeys are for arrays rather than embedded objects.
In the example above try
db.things.ensureIndex({"attributes.age": 1})

Range queries are possible using multikeys; however, expressing the query can be tricky.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse