MongoDB and Write/Read Consistency

MongoDB and Write/Read Consistency - mongodb

I've passed a MongoU Course and Midway through a second. I've read what I can and done what I can to learn, but failed to learn how to handle what I consider a standard situation.
Consider a booking system for Hotel Rooms. There is a collection bookings:
4/1 - Room 1
4/1 - Room 2
4/1 - Room 3
Then when a client checks the bookings collections { date: 4/1 Room: 3}, they will find a booking and the application can deny the booking.
However, say two users look for { date: 4/1 Room: 4} at the same time, the application will proceed with booking for both clients, meaning they will try to create the booking.
What happens next? One of the clients get a booking and the other fails. Somewhat of a race condition? Or does one client get a booking and other person overwrites it.
Can this be prevented with a write concern? Or some other lock? Or is this a better case for more atomic database?
All the demos I see have to do with a blog, which have very little concerns for unique data.

In general, you need to be careful with your data models and – of course – your application flow.
One way to prevent duplicate bookings would be to use a compound _id field:
{
_id: { room: 1, date:"4/1" }
}
So once room 1 is booked for 4/1, there is no way a duplicate booking for room 1 can be created, as _id's are guaranteed to be unique. The second insert will fail. Alternatively, you can create a unique index on an arbitrary compound field.
Be aware though that your application should not upserts or updates on this document without proper permission checks, which does not apply to MongoDB only. In the most simple case, for updates, you would need to check wether the user trying to update the document actually is the user who booked the room. So our model needs to be expanded:
{
_id:{room:"1",date:"4/1"},
reservationFor:"userID"
}
So now it becomes pretty easy: before inserting a reservation, you check for one. If the result is not empty, there already is a reservation for that room. If an exception is thrown because of a duplicate ID on insert, there was a reservation made in the meantime. Before doing an update, you need to check wether reservationFor holds the userID of the current user.
How to react to this kind of situation heavily depends on the language and framework(s) used to develop your application. What I tend to do is to catch the according exceptions and modify an errorMessage accordingly.

Related

PostgreSQL Array Contains vs JOIN (performance)

I have a model in which a person can receive a gift for attending one event or receive multiple gifts for attending multiple events. The gift to person or multiple gifts to person is considered one transaction in both cases. I'm using PostgreSQL to implement this model.
For example,
if you attend to certain event, you will receive a gift (a single transaction of gift to person).
And another example, you attend to a set of events therefore you receive a set of gifts for these events (in a single transaction of gifts to person).
So, in the majority of cases, only one gift to one person will be transacted. But there will be a few cases of the second example.
In order to handle this cases, i have two options,
the first one is use a postgres array and query by array contains,
and the second one is create a new table of transaction_events and make a join to query by event.
I wanted to know which option is more performant and which option the community recommends. Tanking into account that the most transaction will contains only one event and also that i cannot change the transactions model.

The second option will perform better, and it has the added benefit that you can have foreign key constraints to enforce data integrity.
In most cases, it is a good idea to avoid composite types like arrays or JSON in the database.

How to properly "associate" multiple items with a mongoDB object without using relations per se?

Let's say I have a number of users who watch my application for notifications about things like social media activity account(s) of their choosing.
I'd like to be able to flash the users with an option to see the updates they've "missed" while they were offline.
If I store the Notification's MongoDB _id in a data object attached to the User model, I foresee a situation where they've signed up to ALL channels and have missed a few megabytes worth of updates, making the User object very large:
{ name: 'John'
missedNotifications: [ /* 10 million items */ ]
}
On the other hand, Mongoose, though "supports" associations sort of runs into the same issue, except the many-to-many association would have this duplicate data in several places.
If the Notification object carries a list of users who've seen it or not, after a few years, scanning the entire Notifications collection may become very time-consuming.
Is there a third method for keeping track of who has seen what and amending the models properly?

Instead of tracking the missed notifications, consider instead tracking the last notification received. MongoDB's ObjectIds are constructed as follows, as per the documentation:
a 4-byte value representing the seconds since the Unix epoch,
a 3-byte machine identifier,
a 2-byte process id, and
a 3-byte counter, starting with a random value.
Because of the way these ids are constructed, you can generally perform a $gt search on the _id field to retrieve all documents that were inserted after the previous known id (e.g. db.notifications.find({_id: {$gt: last_known_id}})).
In this way, you can retrieve all new notifications that were missed while only tracking one notification id. If you require tracking of multiple notification types and want to have greater granularity in your notification tracking, then just keep track of the last viewed document id for each type.

Update embedded data on referenced data update

I am building a Meteor application and am currently creating the publications and coming up against what seems like a common design quandary around related vs embedded documents. My data model (simplified) has Bookings, each of which have a related Client and a related Service. In order to optimise the speed of retrieving a collection I am embedding the key fields of a Client and Service in the Booking, and also linking to the ID - my Booking model has the following structure:
export interface Booking extends CollectionObject {
client_name: string;
service_name: string;
client_id: string;
service_id: string;
bookingDate: Date;
duration: number;
price: number;
}
In this model, client_id and service_id are references to the linked documents and client_name / service_name are embedded as they are used when displaying a list of bookings.
This all seems fine to me however the missing part of the puzzle is keeping this embedded data up to date. If a user in a separate part of the system updates a service (which would be a reactive collection) then I need this to trigger an update of the service_name to any bookings with the corresponding service ID. Is there an event I should subscribe to for this or am I able to? Client side, I have a form which allows the user to add / edit a Service which simply uses the insert or update method on the MongoObservable collection - the OOP part of me feels like this needs to be overridden in the server code to also then update the related data or am I completely going about this the wrong way?
Is this all irrelevant and shoudl I actually just use https://atmospherejs.com/reywood/publish-composite and return collections of related documents (it just feels like it would harm performance in a production environment when returning several hundred bookings at once)

i use a lot of the "foreign key" concept as you're describing, and do de-normalize data across collection as you're doing with the service name. i do this explicitly to avoid extra lookups / publishes.
i use 2 strategies to keep things up to date. the first is done when the source data is saved, say in a Meteor method call. i'll update the de-normalized data on the spot, touching the other collection(s). i would do all this in a "high read, low write" scenario.
the other strategy is to use collection hooks to fire when the source collection is updated. i use this package: matb33:collection-hooks
conceptually, it's similar to the first, but the hook into knowing when to do it is different.
an example we're using in the current app i'm working on: we have a news feed with comments. news items and comments are in separate collections, and each record the comment collection has the id of the associated news item.
we keep a running comment count associated with the news item itself. whenever a comment is added or removed, we increment/decrement the count and update the news item right away.

DB relationship: implementing a conversation

I want to implement a simple conversation feature, where each conversation has a set of messages between two users. My question is, if I have a reference from a message to a conversation, whether I should have a reference the other way as well.
Right now, each message has conversationId. To retrieve all the messages the belong to a certain conversation, I should do Message.find({conversationId: ..}). If I had stored an array of messages in a conversation object, I could do conversation.messages.
Which way is the convention?

It all depends on usage patterns. First, you normalize: 1 conversation has many messages, 1 message belongs to 1 conversation. That means you've got a 1-to-many (1:M) relationship between conversations and messages.
In a 1:M relationship, the SQL standard is to assign the "1" as a foreign key to each of the "M". So, each message would have the conversationId.
In Mongo, you have the option of doing the opposite via arrays. Like you said, you could store an array of messageIds in the conversation. This gets pretty messy because for every new message, you have to edit the conversation doc. You're essentially doubling your writes to the DB & keeping the 2 writes in sync is completely on you (e.g. what if the user deletes a message & it's not deleted from the conversation?).
In Mongo, you also have to consider the difference between 1:M and 1:F (1-to-few). Many times, it's advantageous to nest 1:F relationships, ie make the "F" a subdoc of the "1". There is a limit: each doc cannot exceed 16MB (this may lift in future versions). The advantage of nesting subdocs is you have atomic transactions because it's all the same doc, not to mention subscriptions in a pub/sub are easier. This may work, but if you've got a group-chat with 20 friends that's been going on for the last 4 years, you might have to get clever (cap it, start a new conversation, etc.)
Nesting would be my recommendation, although your origin idea of assigning a conversationId to each message works too (make sure to index!).

Ensure data coherence across documents in MongoDB

I'm working on a Rails app that implements some social network features as relationships, following, etc. So far everything was fine until I came across with a problem on many to many relations. As you know mongo lacks of joins, so the recommended workaround is to store the relation as an array of ids on both related documents. OK, it's a bit redundant but it should work, let's say:
field :followers, type: Array, default: []
field :following, type: Array, default: []
def follow!(who)
self.followers << who.id
who.following << self.id
self.save
who.save
end
That works pretty well, but this is one of those cases where we would need a transaction, uh, but mongo doesn't support transactions. What if the id is added to the 'followed' followers list but not to the 'follower' following list? I mean, if the first document is modified properly but the second for some reason can't be updated.
Maybe I'm too pessimistic, but there isn't a better solution?

I would recommend storing relationships only in one direction, storing the users someone follows in their user document as "following". Then if you need to query for all followers of user U1, you can query for {users.following : "U1"} Since you can have a multi-key index on an array, this query will be fast if you index this field.
The other reason to go in that direction only is a single user has a practical limit to how many different users they may be following. But the number of followers that a really popular user may have could be close to the total number of users in your system. You want to avoid creating an array in a document that could be that large.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse