How to properly "associate" multiple items with a mongoDB object without using relations per se? - mongodb

Let's say I have a number of users who watch my application for notifications about things like social media activity account(s) of their choosing.
I'd like to be able to flash the users with an option to see the updates they've "missed" while they were offline.
If I store the Notification's MongoDB _id in a data object attached to the User model, I foresee a situation where they've signed up to ALL channels and have missed a few megabytes worth of updates, making the User object very large:
{ name: 'John'
missedNotifications: [ /* 10 million items */ ]
}
On the other hand, Mongoose, though "supports" associations sort of runs into the same issue, except the many-to-many association would have this duplicate data in several places.
If the Notification object carries a list of users who've seen it or not, after a few years, scanning the entire Notifications collection may become very time-consuming.
Is there a third method for keeping track of who has seen what and amending the models properly?

Instead of tracking the missed notifications, consider instead tracking the last notification received. MongoDB's ObjectIds are constructed as follows, as per the documentation:
a 4-byte value representing the seconds since the Unix epoch,
a 3-byte machine identifier,
a 2-byte process id, and
a 3-byte counter, starting with a random value.
Because of the way these ids are constructed, you can generally perform a $gt search on the _id field to retrieve all documents that were inserted after the previous known id (e.g. db.notifications.find({_id: {$gt: last_known_id}})).
In this way, you can retrieve all new notifications that were missed while only tracking one notification id. If you require tracking of multiple notification types and want to have greater granularity in your notification tracking, then just keep track of the last viewed document id for each type.

Related

How to model data in NoSQL Firebase datastore?

I want to store following data:
Users,
Events,
Attendees
(similar to Firebase's example given here:
https://www.youtube.com/watch?v=ran_Ylug7AE)
My Firebase store is like the following:
Users - Collection
{
"9582940055" :
{
"name" : "test"
}
}
Every user is a different document. Am I doing it correctly?
If yes, I have kept Mobile number of every user as Document Id instead of auto id, as the mobile number is going to be unique and it will help me in querying. Is this right?
Events - Collection
{
"MkyzuARd8Uelh0qD1WMa" : // auto id for every event
{
"name" : "test",
"attendees" : {
"user": 'Lakshay'
}
}
}
Here, I have kept attendees as a Map inside the Event document. Is it right or should I make Attendees as a collection inside Event document?
Also, "user": 'Lakshay' inside "attendees" is just a string. Is it advisable to use reference data type of Firebase?
Every user is a different document. Am I doing it correctly?
Yes, this is quite common. Initially it may seem a bit weird to have documents with so little data, but over time you'll get used to it (and likely add more data to each user).
I have kept Mobile number of every user as Document Id instead of auto id, as the mobile number is going to be unique and it will help me in querying. Is this right?
If the number is unique for each user in the context of your app, then it can be used to identify users, and thus also as the ID of the user profile documents. It is slightly more idiomatic to use the user's UID for this purpose, since that is the most common way to look up a user. But if you use phone numbers for that and they are unique for each user, you can also use that.
Here, I have kept attendees as a Map inside the Event document. Is it right or should I make Attendees as a collection inside Event document?
That depends...
Storing the events for a user in a single document means you have a limit to how many events you can store for a user, as a document can be no bigger than 1MB.
Storing the events for a user in their document means you always read the data for a user's events, even when you maybe only need to have the user's name. So you'll be reading more data than needed, wasting bandwidth for both you and your users.
Storing the events inside a subcollection allows you to query them, and read a subset of the events of a user.
On the other hand: using a subcollection means you end up reading more smaller documents. So you'd be paying for more document reads from a subcollection, while paying less for bandwidth.
Also, "user": 'Lakshay' inside "attendees" is just a string. Is it advisable to use reference data type of Firebase?
This makes fairly little difference, as there's not a lot of extra functionality that Firestore's DocumentReference field type gives.

How to optimize collection subscription in Meteor?

I'm working on a filtered live search module with Meteor.js.
Usecase & problem:
A user wants to do a search through all the users to find friends. But I cannot afford for each user to ask the complete users collection. The user filter the search using checkboxes. I'd like to subscribe to the matched users. What is the best way to do it ?
I guess it would be better to create the query client-side, then send it the the method to get back the desired set of users. But, I wonder : when the filtering criteria changes, does the new subscription erase all of the old one ? Because, if I do a first search which return me [usr1, usr3, usr5], and after that a search that return me [usr2, usr4], the best would be to keep the first set and simply add the new one to it on the client-side suscribed collection.
And, in addition, if then I do a third research wich should return me [usr1, usr3, usr2, usr4], the autorunned subscription would not send me anything as I already have the whole result set in my collection.
The goal is to spare processing and data transfer from the server.
I have some ideas, but I haven't coded enough of it yet to share it in a easily comprehensive way.
How would you advice me to do to be the more relevant possible in term of time and performance saving ?
Thanks you all.
David
It depends on your application, but you'll probably send a non-empty string to a publisher which uses that string to search the users collection for matching names. For example:
Meteor.publish('usersByName', function(search) {
check(search, String);
// make sure the user is logged in and that search is sufficiently long
if (!(this.userId && search.length > 2))
return [];
// search by case insensitive regular expression
var selector = {username: new RegExp(search, 'i')};
// only publish the necessary fields
var options = {fields: {username: 1}};
return Meteor.users.find(selector, options);
});
Also see common mistakes for why we limit the fields.
performance
Meteor is clever enough to keep track of the current document set that each client has for each publisher. When the publisher reruns, it knows to only send the difference between the sets. So the situation you described above is already taken care of for you.
If you were subscribed for users: 1,2,3
Then you restarted the subscription for users 2,3,4
The server would send a removed message for 1 and an added message for 4.
Note this will not happen if you stopped the subscription prior to rerunning it.
To my knowledge, there isn't a way to avoid removed messages when modifying the parameters for a single subscription. I can think of two possible (but tricky) alternatives:
Accumulate the intersection of all prior search queries and use that when subscribing. For example, if a user searched for {height: 5} and then searched for {eyes: 'blue'} you could subscribe with {height: 5, eyes: 'blue'}. This may be hard to implement on the client, but it should accomplish what you want with the minimum network traffic.
Accumulate active subscriptions. Rather than modifying the existing subscription each time the user modifies the search, start a new subscription for the new set of documents, and push the subscription handle to an array. When the template is destroyed, you'll need to iterate through all of the handles and call stop() on them. This should work, but it will consume more resources (both network and server memory + CPU).
Before attempting either of these solutions, I'd recommend benchmarking the worst case scenario without using them. My main concern is that without fairly tight controls, you could end up publishing the entire users collection after successive searches.
If you want to go easy on your server, you'll want to send as little data to the client as possible. That means every document you send to the client that is NOT a friend is waste. So let's eliminate all that waste.
Collect your filters (eg filters = {sex: 'Male', state: 'Oregon'}). Then call a method to search based on your filter (eg Users.find(filters). Additionally, you can run your own proprietary ranking algorithm to determine the % chance that a person is a friend. Maybe base it off of distance from ip address (or from phone GPS history), mutual friends, etc. This will pay dividends in efficiency in a bit. Index things like GPS coords or other highly unique attributes, maybe try out composite indexes. But remember more indexes means slower writes.
Now you've got a cursor with all possible friends, ranked from most likely to least likely.
Next, change your subscription to match those friends, but put a limit:20 on there. Also, only send over the fields you need. That way, if a user wants to skip this step, you only wasted sending 20 partial docs over the wire. Then, have an infinite scroll or 'load more' button the user can click. When they load more, it's an additive subscription, so it's not resending duplicate info. Discover Meteor describes this pattern in great detail, so I won't.
After a few clicks/scrolls, the user won't find any more friends (because you were smart & sorted them) so they will stop trying & move on to the next step. If you returned 200 possible friends & they stop trying after 60, you just saved 140 docs from going through the pipeline. There's your efficiency.

MongoDB and Write/Read Consistency

I've passed a MongoU Course and Midway through a second. I've read what I can and done what I can to learn, but failed to learn how to handle what I consider a standard situation.
Consider a booking system for Hotel Rooms. There is a collection bookings:
4/1 - Room 1
4/1 - Room 2
4/1 - Room 3
Then when a client checks the bookings collections { date: 4/1 Room: 3}, they will find a booking and the application can deny the booking.
However, say two users look for { date: 4/1 Room: 4} at the same time, the application will proceed with booking for both clients, meaning they will try to create the booking.
What happens next? One of the clients get a booking and the other fails. Somewhat of a race condition? Or does one client get a booking and other person overwrites it.
Can this be prevented with a write concern? Or some other lock? Or is this a better case for more atomic database?
All the demos I see have to do with a blog, which have very little concerns for unique data.
In general, you need to be careful with your data models and – of course – your application flow.
One way to prevent duplicate bookings would be to use a compound _id field:
{
_id: { room: 1, date:"4/1" }
}
So once room 1 is booked for 4/1, there is no way a duplicate booking for room 1 can be created, as _id's are guaranteed to be unique. The second insert will fail. Alternatively, you can create a unique index on an arbitrary compound field.
Be aware though that your application should not upserts or updates on this document without proper permission checks, which does not apply to MongoDB only. In the most simple case, for updates, you would need to check wether the user trying to update the document actually is the user who booked the room. So our model needs to be expanded:
{
_id:{room:"1",date:"4/1"},
reservationFor:"userID"
}
So now it becomes pretty easy: before inserting a reservation, you check for one. If the result is not empty, there already is a reservation for that room. If an exception is thrown because of a duplicate ID on insert, there was a reservation made in the meantime. Before doing an update, you need to check wether reservationFor holds the userID of the current user.
How to react to this kind of situation heavily depends on the language and framework(s) used to develop your application. What I tend to do is to catch the according exceptions and modify an errorMessage accordingly.

MongoDB storing user-specific data on shared collection objects

I'm designing an application that processes RSS feeds using MongoDB. Currently my collections are as follows:
Entry
fields: content, feed_id, title, publish_date, url
Feed
fields: description, title, url
User
fields: email_address
subscriptions (embedded collection; fields: feed_id, tags)
A user can subscribe to feeds which are linked from the embedded subscription collection. From the subscriptions I can get a list of all the feeds a user should see and also the corresponding entries.
How should I store entry status information (isRead, isStarred, etc.) that is specific to a user? When a user views an entry I need to record isRead = 1. Two common queries I need to be able to perform are:
Find all entries for a specific feed where isRead = 0 or no status exists currently
For a specific user, mark all entries prior to a publish date with isRead = 1 (this could be hundreds or even thousands of records so it must be efficient)
Hmm, this is a tricky one!
It makes sense to me to store a record for entries that are unread, and delete them when they're read. I'm basing this on the assumption that there will be more read posts than unread for each individual user, so you might as well not have documents for all of those already-read entries sitting around in your DB forever. It also makes it easier to not have to worry about the 16MB document size limit if you're not having to drag around years of history with you everywhere.
For starred entries, I would simply add an array of Entry ObjectIds to User. No need to make these subscription-specific; it'll be much easier to pull a list of items a User has starred that way.
For unread entries, it's a little more complex. I'd still add it as an array, but to satisfy your requirement of being able to quickly mark as-read entries before a specific date, I would denormalize and save the publish-date alongside the Entry ObjectId, in a new 'UnreadEntry' document.
User
fields: email_address, starred_entries[]
subscriptions (embedded collection; fields: feed_id, tags, unread_entries[])
UnreadEntry
fields: id is Entry ObjectId, publish_date
You need to be conscious of the document limit, but 16MB is one hell of a lot of unread entries/feeds, so be realistic about whether that's a limit you really need to worry about. (If it is, it should be fairly straightforward to break out User.subscriptions to its own document.)
Both of your queries now become fairly easy to write:
All entries for a specific feed that are unread:
user.subscriptions.find(feedID).unread_entries
Mark all entries prior to a publish date read:
user.subscriptions.find(feedID).unread_entries.where(publish_date.lte => my_date).delete_all
And, of course, if you simply need to mark all entries in a feed as read, that's very easy:
user.subscriptions.find(feedID).unread_entries.delete_all

Is it ok to turn the mongo ObjectId into a string and use it for URLs?

document/show?id=4cf8ce8a8aad6957ff00005b
Generally I think you should be cautious to expose internals (such as DB ids) to the client. The URL can easily be manipulated and the user has possibly access to objects you don't want him to have.
For MongoDB in special, the object ID might even reveal some additional internals (see here), i.e. they aren't completely random. That might be an issue too.
Besides that, I think there's no reason not to use the id.
I generally agree with #MartinStettner's reply. I wanted to add a few points, mostly elaborating what he said. Yes, a small amount of information is decodeable from the ObjectId. This is trivially accessible if someone recognizes this as a MongoDB ObjectID. The two downsides are:
It might allow someone to guess a different valid ObjectId, and request that object.
It might reveal info about the record (such as its creation date) or the server that you didn't want someone to have.
The "right" fix for the first item is to implement some sort of real access control: 1) a user has to login with a username and password, 2) the object is associated with that username, 3) the app only serves objects to a user that are associated with that username.
MongoDB doesn't do that itself; you'll have to rely on other means. Perhaps your web-app framework, and/or some ad-hoc access control list (which itself could be in MongoDB).
But here is a "quick fix" that mostly solves both problems: create some other "id" for the record, based on a large, high-quality random number.
How large does "large" need to be? A 128-bit random number has 3.4 * 10^38 possible values. So if you have 10,000,000 objects in your database, someone guessing a valid value is a vanishingly small probability: 1 in 3.4 * 10^31. Not good enough? Use a 256-bit random number... or higher!
How to represent this number in the document? You could use a string (encoding the number as hex or base64), or MongoDB's binary type. (Consult your driver's API docs to figure out how to created a binary object as part of a document.)
While you could add a new field to your document to hold this, then you'd probably also want an index. So the document size is bigger, and you spend more memory on that index. Here's what you might not have though of: simply USE that "truly random id" as your documents "_id" field. Thus the per-document size is only a little higher, and you use the index that you [probably] had there anyways.
I can set both the 128 character session string and other collection document object ids as cookies and when user visits do a asynchronous fetch where I fetch the session, user and account all at once. Instead of fetching the session first and then after fetching user, account. If the session document is valid ill share the user and account documents.
If I do this I'll have to make every single request for a user and account document require the session 128 character session cookie to be fetched too thus making exposing the user and account object id safer. It means if anyone is guessing a user ID or account ID, they also have to guess the 128 string to get any answers from the system.
Another security measure you could do is wrap the id is some salt which you only know the positioning such as
XXX4cf8ce8XXXXa8aad6957fXXXXXXXf00005bXXXX
Now you know exactly how to slice that up to get the ID.