Performance difference between storing the asset as subdocument vs single document in Mongoose - mongodb

I have an API for synchronizing contacts from the user's phone to our database. The controller essentially iterates the data sent in the request body and if it passes validation a new contact is saved:
const contact = new Contact({ phoneNumber, name, surname, owner });
await contact.save();
Having a DB with 100 IOPS and considering the average user has around 300 contacts, when the server is busy this API takes a lot of time.
Since the frontend client is made in a way that a contact ID is necessary for other operations (edit, delete), I was thinking about changing the data structure to subdocuments, and instead of saving each Contact as a separate document, the idea is to save one document with many contacts inside:
const userContacts = new mongoose.Schema({
owner: //the id of the contacts owner,
contacts: [new mongoose.Schema({
name: { type: String },
phone: { type: String }
})]
});
This way I have to do just one save. But since Mongo has to generate an ID for each subdocument, is this really that much faster than the original approach?

Summary
This really depends on your exact usage scenarios:
are contacts often updated?
what is the max / average quantity of contacts per user
are they ever partially loaded, or are they always fetched all together?
But for a fairly common collection such as contacts, I would not recommend storing them in subdocuments.
Instead you should be able to use insertMany for your initial sync scenario.
Explanation
Storing as subdocuments makes a bulk-write easier will make querying and updating contacts slower and more awkward than as regular documents.
For example, if I have 100 contacts, and I want to view and edit 1 of them, it needs to load the full 100 contacts. I can make the change via a partial update using $set or $update, so the update will be OK. But when I add a new contact, I will have to add a new contact subDocument to you Contacts document. This makes it a growing document, meaning your database will suffer from fragmentation which can slow things down a lot (see this answer)
You will have to use aggregate with $ projection or $unwind to search through contacts in MongoDB. If you want to apply a specific sort order, this too would have to be done via aggregate or in code.
Matching via projection can also lead to problems with duplicate contacts being difficult to find.
And this won't scale. What if you get users with 1000s of contacts later? Then this single document will grow large and querying it will become very slow.
Alternatives
If your contacts for sync are in the 100s, you might get away with a splitting them into groups of ~50-100 and calling insertMany for each batch.
If they grow into the thousands, then I would suggest uploading all contacts, saving them as JSON / CSV files to disk, then slowly processing these in the background in batches.

Related

Query documents in one collection that aren't referenced in another collection with Firestore

I have a firestore DB where I'm storing polls in one collection and responses to polls in another collection. I want to get a document from the poll collection that isn't referenced in the responses collection for a particular user.
The naive approach would be to get all of the poll documents and all of the responses filtered by user ID then filter the polls on the client side. The problem is that there may be quite a few polls and responses so those queries would have to pull down a lot of data.
So my question is, is there a way to structure my data so that I can query for polls that haven't been completed by a user without having to pull down the collections in their entirety? Or more generally, is there some pattern to use when you need to query for documents in one collection that aren't referenced by another?
The documents in each of the collections look something like this:
Polls:
{
question: string;
answers: Answer[];
}
Responses:
{
userId: string;
pollId: string;
answerId: string;
}
Anyhelp would be much appreciated!
Queries in Firestore can only return documents from one collection (or from all collections with the same name) and can only contain conditions on the data that they actually return.
Since there's no way to filter based on a condition in some other documents, you'll need to include the information that you want to filter on in the polls documents.
For example, you could include a completionCount field in each poll document, that you initially set to 0, and then update only every poll completion. With that in place, the query becomes a simple query on the completionCount field of the polls collection.
For a specific user I'd actually add all polls to their profile document, and remove them from there. Duplicating data is usually the easiest (and sometimes only) way to implement use-cases such as this.
If you're worried about having to add each new poll to each new user profile when it is created, you can also query all polls on their creation timestamp when you next load a user profile and perform that sync at that moment.
load user profile,
check when they were last active,
query for new polls,
add them to user profile.

Embedded and references in a data model - mongodb

I want to create a mongodb database, and use embedded structur. For exemple, consider that the size of each document of the persons 's collection is 16MB. It means that i can not add the sub-document contacts in the person's collection.
1- In this case what should i do ?
2- If i create the collection of contact, it will be an obligation to reference to the a person.
Can we have embeded and reference stuctur in a mongodb database ?
Thank you.
{
nom:'Kox',
prenom:'Karl',
gender:'M',
addres:
{
rue: '123 Fake Street',
appt:108,
city:'mycity',
zip_code:'GGG23'
},
class:
{
name:'CLASS ONE',
group:'C',
section:'SECTION ONE'
}
}
One of the strengths of MongoDB is the flexible schema.
You can certainly have contacts embedded for some of person documents, referenced for others, or a single person document that has some of its contacts embedded and some referenced.
One possible use of this is to have recently or frequently used contacts embedded for quick access (similar to a per-person cache) and all contacts available for lookup via reference.
The natural extension of this is that if a person's entire contact list fits within the person document, there is never a need to do a separate contact lookup for that person.
The tradeoff is:
A referenced approach allows contact lists to be arbitrarily large, but requires a separate contact lookup aside from the person lookup.
An embedded approach reduces the load on the database server by requiring only 1 lookup for both person and contacts, but limits the size of the person + contact list to 16MB.
A hybrid embedded/referenced approach requires a bit more complexity in the application code, but provides a reduction in query load on the database server while still allowing a contact list to be extremely large.

How to model data in NoSQL Firebase datastore?

I want to store following data:
Users,
Events,
Attendees
(similar to Firebase's example given here:
https://www.youtube.com/watch?v=ran_Ylug7AE)
My Firebase store is like the following:
Users - Collection
{
"9582940055" :
{
"name" : "test"
}
}
Every user is a different document. Am I doing it correctly?
If yes, I have kept Mobile number of every user as Document Id instead of auto id, as the mobile number is going to be unique and it will help me in querying. Is this right?
Events - Collection
{
"MkyzuARd8Uelh0qD1WMa" : // auto id for every event
{
"name" : "test",
"attendees" : {
"user": 'Lakshay'
}
}
}
Here, I have kept attendees as a Map inside the Event document. Is it right or should I make Attendees as a collection inside Event document?
Also, "user": 'Lakshay' inside "attendees" is just a string. Is it advisable to use reference data type of Firebase?
Every user is a different document. Am I doing it correctly?
Yes, this is quite common. Initially it may seem a bit weird to have documents with so little data, but over time you'll get used to it (and likely add more data to each user).
I have kept Mobile number of every user as Document Id instead of auto id, as the mobile number is going to be unique and it will help me in querying. Is this right?
If the number is unique for each user in the context of your app, then it can be used to identify users, and thus also as the ID of the user profile documents. It is slightly more idiomatic to use the user's UID for this purpose, since that is the most common way to look up a user. But if you use phone numbers for that and they are unique for each user, you can also use that.
Here, I have kept attendees as a Map inside the Event document. Is it right or should I make Attendees as a collection inside Event document?
That depends...
Storing the events for a user in a single document means you have a limit to how many events you can store for a user, as a document can be no bigger than 1MB.
Storing the events for a user in their document means you always read the data for a user's events, even when you maybe only need to have the user's name. So you'll be reading more data than needed, wasting bandwidth for both you and your users.
Storing the events inside a subcollection allows you to query them, and read a subset of the events of a user.
On the other hand: using a subcollection means you end up reading more smaller documents. So you'd be paying for more document reads from a subcollection, while paying less for bandwidth.
Also, "user": 'Lakshay' inside "attendees" is just a string. Is it advisable to use reference data type of Firebase?
This makes fairly little difference, as there's not a lot of extra functionality that Firestore's DocumentReference field type gives.

MeteorJS + MongoDB: How should I set up my collections when users can have the same document?

I wasn't quite sure how to word my question in one line, but here's a more in depth description.
I'm building a Meteor app where users can "own" the same document. For example, a user has a list of movies they own, which of course multiple people can own the same movie. There are several ways I've thought of structuring my database/collections for this, but I'm not sure which would be best.
I should also note that the movie info comes from an external API, that I'm currently storing into my own database as people find them in my app to speed up the next lookup.
Option 1 (My current config):
One collection (Movies) that stores all the movies and their info. Another collection that basically stores a list of movie ids in each document based on userId. On startup, I get the list of ids, find the movies in my database, and store them in local collections (there are 3 of them). The benefit that I see from this is I only have to store the movie once. The downside that I've ran into so far is difficulty in keeping things in sync and properly loading on startup (waiting on the local collections to populate).
Option 2 :
A Movies collection that stores a list of movie objects for each user. This makes the initial lookup and updating very simple, but it means I'll be storing the same fairly large documents multiple times.
Option 3:
A Movies collection with an array of userids on each movie that own that movie. This sounds pretty good too, but when I update the movie with new info, will an upsert work and keep the userids safe?
Option 3 seems sensible. Some of the choice may depend on the scale of each collection or the amount of links (will many users own the same movie, will users own many movies).
Some helpful code snippits for using option 3:
Upsert a movie detail (does not affect any other fields on the document if it already exists):
Movies.upsert({name: "Jaws"}, {$set: {year: 1975}});
Set that a user owns a movie (also does not affect any other document fields. $addToSet will not add the value twice if it is already in the array while using $push instead would create duplicates):
Movies.update({_id: ~~some movie id~~}, {$addToSet: {userIds: ~~some user id~~}});
Set that a user no longer owns a movie:
Movies.update({_id: ~~some movie id~~}, {$pull: {userIds: ~~some user id~~}});
Find all movies that a user owns (mongo automatically searches the field's array value):
Movies.find({userIds: ~~some user id~~});
Find all movies that a user owns, but exclude the users field from the result (keep the document small in the case that movie.userIds is a large array or protect the privacy of other user-movie ownership):
Movies.find({userIds: ~~some user id~~}, {userIds: 0});

MongoDB storing user-specific data on shared collection objects

I'm designing an application that processes RSS feeds using MongoDB. Currently my collections are as follows:
Entry
fields: content, feed_id, title, publish_date, url
Feed
fields: description, title, url
User
fields: email_address
subscriptions (embedded collection; fields: feed_id, tags)
A user can subscribe to feeds which are linked from the embedded subscription collection. From the subscriptions I can get a list of all the feeds a user should see and also the corresponding entries.
How should I store entry status information (isRead, isStarred, etc.) that is specific to a user? When a user views an entry I need to record isRead = 1. Two common queries I need to be able to perform are:
Find all entries for a specific feed where isRead = 0 or no status exists currently
For a specific user, mark all entries prior to a publish date with isRead = 1 (this could be hundreds or even thousands of records so it must be efficient)
Hmm, this is a tricky one!
It makes sense to me to store a record for entries that are unread, and delete them when they're read. I'm basing this on the assumption that there will be more read posts than unread for each individual user, so you might as well not have documents for all of those already-read entries sitting around in your DB forever. It also makes it easier to not have to worry about the 16MB document size limit if you're not having to drag around years of history with you everywhere.
For starred entries, I would simply add an array of Entry ObjectIds to User. No need to make these subscription-specific; it'll be much easier to pull a list of items a User has starred that way.
For unread entries, it's a little more complex. I'd still add it as an array, but to satisfy your requirement of being able to quickly mark as-read entries before a specific date, I would denormalize and save the publish-date alongside the Entry ObjectId, in a new 'UnreadEntry' document.
User
fields: email_address, starred_entries[]
subscriptions (embedded collection; fields: feed_id, tags, unread_entries[])
UnreadEntry
fields: id is Entry ObjectId, publish_date
You need to be conscious of the document limit, but 16MB is one hell of a lot of unread entries/feeds, so be realistic about whether that's a limit you really need to worry about. (If it is, it should be fairly straightforward to break out User.subscriptions to its own document.)
Both of your queries now become fairly easy to write:
All entries for a specific feed that are unread:
user.subscriptions.find(feedID).unread_entries
Mark all entries prior to a publish date read:
user.subscriptions.find(feedID).unread_entries.where(publish_date.lte => my_date).delete_all
And, of course, if you simply need to mark all entries in a feed as read, that's very easy:
user.subscriptions.find(feedID).unread_entries.delete_all