Can creating table per synchronous chat instant be wrong idea - mongodb

I am developing a chat kind of web app to experiment live group edit.
My idea, is something like wave, where you can edit even those you have already send.
I was planning to use mongodb or something similar for per chat basis.
My reason for that is: Say there are 100 texts in one instants of chat. And we have 10 such
chats. What happens is there will 1000 chats in the table in which its store. So even a person
in one chat edit his chat, the db has to look through all 1000. So if I use table per chat,
i felt it could improve speed and performance.
But I want to know from people who have done this before.

There are at least 2 obvious issues with the approach (not going to the schema design)
you can't have unlimited no of namespace (I am not sure how big is your use case)
Check this : http://docs.mongodb.org/manual/reference/limits/#namespaces
The write lock is per db, so if you make multiple tables/collections, it won't optimize on insertion

Related

Pagination and listing in APIs

I wanna ask you about lists and pagination in APIs
I want to build a long list in home screen that's mean this request will have a lot of traffic because it's the main screen and I want to build it in a good way to handle the traffic
After I searched about the way of how I gonna implement it
Can I depend on postgresql in pagination ? Or I need to use search engine like solr
If I depend on the database and users started to visit the app, then this request gonna submit a lot of queries on the database is this gonna kill the database ?
Also I'm using Redis to Cache Some data and this gonna handle some traffic but the problem with home screen the response it too large and I can't cache all of this response in one key in Redis
Can anyone explain to me what is the best way to implement this request for pagination .. the only thing I want is pagination I'm not looking to implement a full text search but to handle the traffic I read that search engine will handle it to not affect the database or kill it
Thanks a lot :D
You can do this seamlessly with the pagination technology we know in PostgreSQL. PostgreSQL has enough functions and capabilities to do this. (limit, offset, fetch)
But let me give you a recommendation.
There are several types of pagination.
The first type is that the count of pages must be known in advance. This technology is outdated and is not recommended. Because at this time you need to know the count of records in the table. But calculating count of records is a very slowing process, mainly in large tables.
The second type is that the number of pages is not known in advance. Information from the next page is brought in parts only if necessary. Just like Google, LinkedIn and other big companies use it. In this case, it is not necessary to calculate the count of any table.

Proper state management architecture to implement read/unread of items

Context: We are implementing a news app. For now, you can assume the news to be the same across all users, and maintains an order based on the parameters we set (according to trends, and date).
Problem: We are not sure what the best implementation for keeping track of what users read is. We want to be able to configure a way in which we can track what users read and what they didn’t.
Assumption: You can assume that the posts in the database are in a descending order, based on time.
So, the ideal scenario is that: when there are posts: A,B,C,D,E fetched from the server in the app, and the user read A,B. Now the user only gets to see C,D,E when they check for next posts. If they do previous, they see posts in the following order B-> A.
Furthermore, when P,Q is added to the database, now, the user must see next posts in the order of P->Q->C->D->E and so on.
Example: Let us assume there are 20 news in our app right now, and Gavin picks up his phone and starts reading from our app. In midst of his usage, he finds himself occupied with some other work, so quits the app after reading 5 news posts.
The challenge for us now is to figure the best way to make sure Gavin doesn’t have to re-read the 5 posts he already did.
One way we thought we could solve this problem is through use of index. We can assume uniform ordering for our posts as mentioned in the context, so we could use an index to track where Gavin was last in the order of news and show him news based on that index.
However, one problem with that approach is, we could easily have 5 new posts when Gavin picks up his phone and uses our app again. So, if we have the news based on date, technically that indexing approach means that we omit 5 unread new posts instead of the 5 read old ones.
We've also thought of maintaining three lists: Read, Unread and New so that we fetch only posts that are not in our lists. For example, in my initial example: A-B-C-D-E is in unread initially. Then, after user reads A-B, read becomes A-B. Meanwhile, when P-Q is added in the database, P-Q is added to the list of unread posts as P-Q-C-D-E.
How do you solve this problem? Any suggestions are welcome as we kind of think we're not thinking out of box when it comes to a solution for the problem. Thank you! :)
As i first read problem the solution ends up in my mind is also having 2 different list read unread and new ones are added to end of unread ones and unread list is shown in reverse order so most recent ones are on the top. However is it the most efficient way? Discussible. For example if number of new number increases a lot, then will be memory inefficient. But i assume small numbers in general.

Instant Messaging Schema design advice

I'm trying to build an Instant Messaging functionality in my app as part a bigger project.
Chats can have more than 2 participants (group chats)
If participant A delete a message, it still should be visible to participant B (that's why I used the Message Participants table)
Same applies to Conversation.
By same logic, if all participants delete the conversation/message, it should be erased from DB.
Questions :
I'm afraid that this schema is too cumbersome, meaning that the queries will be too slow once the app gets certain traffic mark (1k active users ? I'm guessing)
Message Participants will have multiple records for each message - one for each participants in the chat. Instant Messaging means it will involve those writes with very tight timings. Wouldn't that be a problem?
Should I add a layer of Redis DB, to manage a chat's active session's messaging? it will store the recent messages, and actively sync the PostgreSQL db with those messages (perhaps with Async transactions functionality that postgresql has?)
UPDATED schema :
I would also gladly hear ideas for having a "read" status functionality. I'm assuming it's much more complex with Group chats, so at least offering that for 1:1 chats would be nice.
I am a little confused by your diagram. Shouldn't the Conversation Participants be linked to the Conversations instead of the Message? The FKs look all right, just the lines appear wrong.
I wouldn't be worried about performance yet. The Premature Optimization Anti-Pattern warns us not to give up a clean design for performance reasons until we know whether we are going to have a performance problem. You anticipate 1000 users - that's not much for a modern information system. Even if they are all active at the same time and enter a message every 10 seconds, this will just mean 100 transactions per second, which is nothing to be afraid of. Of course, I don't know the platform on which you are going to run this. But it should be an easy task to set up those tables and write a simple test program that inserts those records as fast as possible.
Your second question makes me wonder how "instant" you expect your message passing to be. Shall all viewers of a message receive each keystroke of the text within a millisecond? Or do they just need to see each message appear right after it was posted? Anyway, the limiting factor for user responsiveness will probably be the network, not the database.
Maybe this is not mainly a database design issue. Let's assume you will have a tremendous rate of postings and viewings. But not all conversations will be busy all the time. If the need arises - but not earlier - it might be necessary to hold the currently busy conversations in memory and using the database just as a backup for future times when they aren't busy any more.
Concerning your additional comments:
100k users: This is a topic not for this forum, but concerning business development of a startup. Many founders of startup companies imagine huge masses of users being attracted to their site, while in reality most startups just fail or only reach very few. So beware of investments (in money, but also in design and implementation effort) that will only pay in the highly improbable case that your company will be the next Whatsapp.
In case you don't really anticipate such masses of users but just want to imagine this as a programming exercise, you still have a difficult task. You won't have the platform to simulate the traffic, so there is no way to make measurements on where you actually have a performance problem to solve. That's one of the reasons for the Premature Optimization warning: Unless you know positively where you have a bottleneck, you - and all of us - will be just guessing and probably make the wrong decisions.
Marking a message as read is easy: Introduce a boolean attribute read at Message Participants, and set it to true as soon as, well, the user has read the message. It's up to your business requirements in which cases and to whom you show this.

NoSQL database design using single table

How satisfied are you with the statement "You should maintain as few tables as possible in a DynamoDB application. Most well designed applications require only one table." ?
I have listed down my use cases for a NoSQL design but restricting myself for a single table design makes the design complex and requires every developer working with NoSQL table to understand the logical complexity adhering to principles of partitioning and performance gain. just to write few my app do following things :
Register a user account with mobile device.
Allow multiple users to check their account on any mobile device having the app installed on it.
Logs user activities which could go from 10 to 1000 in a day per user.
There is a batch job which periodically checks if any user account get updates then it sends the notification on all devices which have been logged in by the user ( as the last logged user on device ).
Ofcourse these FCM notifications are based on user preferences for notifications.
And at last the user account is hinged around a unique email address and the user can have updates to all other attributes of his account from any device.
I found the batch processing job which periodically scans the whole table forcing me to create a second table so that I can spawn as many threads to process the row(s).
How can I make this all fit in a single table ?
So this is all Rick Houlihan's fault. He's a Principal Engineer at AWS, focusing on DynamoDB.
So the opening statement that most well design applications only require one table, but what is missing is the design guidance on how you should structure your table to be effective.
For single table design in your example you would want a partition per device and a partition per user, which you would write the relevant information per device or per user too. You would mix and match this all in your single table. You would use global secondary indexes to retrieve the relevant data, but that index design would be dependent on your access patterns.
Essentially the way you model your data for a single table design is completely different to an RDBMS, so you need to throw everything you knew out the window and learn it all again.
I recommend reading these blog posts
https://www.trek10.com/blog/dynamodb-single-table-relational-modeling/
https://www.jeremydaly.com/takeaways-from-dynamodb-deep-dive-advanced-design-patterns-dat403/
and watching Rick's reInvent session on it multiple times... typically by about the 10th time light bulbs start going off...
Rick's DynamoDB Advanced Design Patterns talk - https://www.youtube.com/watch?v=6yqfmXiZTlM
There is a fair amount of depth to it. Good luck!

how to implement number of views of a particular page

So basically I want to implement the same functionality as StackOverflow's:
viewed 59344 times
So here is some background information:
I want to count only unique visits. The assumption that registered users will read the article many times (it is evolving)
I use MongoDB as a store
I would like it to be close to real-time
My system will have a registration, but I want to count the views of anonymous users as well
I understand that the best way to count unique visits is through registration, but the thing is that a big chunk of users will be just passive readers who do not need to create an account to read the information from the application. As far as I understand, the most convenient way is to save the IP address of every user, who reads the post. I also understand that IP addresses will not provide uniqueness (some different users will have the same IP, because they are behind the same ISP and one user can have different IPs, by using proxies, tor, etc)
The use of Mongo is not absolutely essential, just the thing is that everything is written in Mongo right now, so I will switch only if it will be much faster/convenient.
Background
Are you certain you need to track "unique" views?
I actually wouldn't expect popular sites to try to keep the view counts unique - bigger is better and re-visits for new comments are still additional "views" in the the sense of showing new content/comments/ads. There are other possible subtleties to "correctness" that may or may not be important for your use case, such as excluding crawlers or your own company's users/IPs.
Instead of spending time tracking unique views (which isn't overly meaningful), I would look at counting unique user interactions such as voting/liking/commenting on the page. You can then determine "popularity" of a page with some formula based on those metrics. There is an interesting example of this approach in the Radioactivity module for Drupal, where a "hotness" metric is calculated based on activity based on recency of user interactions.
Approaches to consider
1) For a simple view counter in MongoDB, I would just use $inc to bump up the view count when the page is loaded. You can exclude logging users by role as needed (for example admin users).
2) For a more accurate view counter I would pass off the problem to a web analytics platform (which you should be using with your site for more detailed analysis anyway). For example, you can use Google Analytics API or an open source application like Piwik. Web analytics systems already have solutions in place for determining unique users/views, and the API calls for these can be asynchronous via JavaScript.
3) If implementing your own unique view tracking a definite requirement, I would use a separate collection for tracking views and upsert based on your uniqueness criteria (unique view per user,article pair for registered users or session_id,article pair for anon users). I would combine this with approach #1 (incrementing a view counter for the article views) by incrementing a counter of article views if the upsert results in an insert.
One of the way that you can solve the problem is using the cookies , once a user has visited the page , you can have one cookie added saying that he is already visited the page and you do not need to count him again. You can keep on appending some key to know what all pages he had visited. I know cookies can be deleted but in any solution there will be tradeoff.
From the mongoDB prospective , if you want very fast insert and read , i would suggest couple of things you can do.
1) As you create a article , create a document like this in your may be log collection
{"_id" : "Article URL" , {"Hit" : 0}}
Why i am not suggesting to add IP address or any other information because , as you will add IP addresses , the size of the document going to change mongoDB need to find new allocated space. Which is bad from performance angle. As you are only incrementing the counter it will not increase the size of the document and it will no need to change it place. + You have limitation on the maximum size of the document you can have.
2) Creating document in advance will give direct update statement and no worry to check for the existence of the document for the article Id or not.