Twitter exercise with MongoDB and lack of transactions? - mongodb

I was trying to figure out if MongoDB needs transactions and why you wouldn't have everything in a single document. I also know twitter uses HBase which does have transactions so I thought about a tweet and watchers.
If i post a tweet it will be inserted with no problem. But how would I or anyone else find my tweet? I heard mongodb has indexes so maybe I can index author and find my tweet however I can't imagine that being efficient if everyone does that. Also time has to be indexed.
So from what I understand (I think i saw some slides twitter released) twitter has a 'timeline' so everytime a person tweets twitter inserts the tweetid in everyone timeline which is indexed by date and when a given user browse it grabs available tweets sorted by time.
How would that be done in mongodb? The only solution I can think of is having a column in the tweet document saying {SendOut:DateStamp} which is removed when completed. If it didnt complete on the first attempt (checking timestamp to guess if it should be completed by now or not) then I would need to check all the watchers to see who hasn't received it and insert if they didn't. But also since theres no transactions i guess i need to index the SendOut column? Would this solution work? How would I efficiently insert a tweet and give it to everyone watching the user? (if this solution would not work)

It sounds like you're describing a model similar to pub/sub. Couldn't you instead just track that last post (by date) with each user object that the user last read? Users would requests tweets the same way, using various indexes including time.
I'm not sure what you need transactions for, but Mongo does support atomic operations.
[Updated]
So in other words, each user's object stores the dateTime of the last tweet read/delivered. Obviously you would also need the list of subscribed author IDs. To fetch new tweets you would ask for tweets indexed by both author_id,time properties and then sort by time.
By using the last read date from the user object and using it as the secondary index into your tweets collection, I don't believe you need either pub/sub or transactions to do it.
I might be missing something though.

Related

Any way in Firebase Firestore to sustainably compare data collections?

I'm fairly new to Firestore NoSQL but not databases or structuring. I know this is not a new question but I can't seem to wrap my head around some of the answers.
I have a simple react native rating style app that presents the items to be rated in a list format, most popular items first. My goal is to not present the user with the same item in the future or on refresh once they have already rated the item.
I have 3 collections, a standard 'user', a 'ratings', and a 'scores'. The item info and overall calculated rating score is held in the 'scores'. The 'ratings' collection keeps track of the actual user's score of each item so it can be searched and changed. The list is queried by pulling the 'scores' ordered by highest calced score.
Here's where I'm stuck. Even though I technically have a reference to everything a user has voted on in the 'ratings' collection, all of the examples of this I can find suggest to store a reference id array of each item voted on in the users 'user' collection. When I pull all the items to be scored I check each item id to see if it has already been scored before I push it into a data array. Here are the problems I see...
If the user has already scored 200 of the first 250 top results, Firestore will count 250 reads each refresh even though the user is only seeing 50 new items. This could scale to a huge cost/problem just to show a few new or extra items.
Firebase arrays I believe have a max of 40,000 index entries per document. Probably never an issue someone rating 40,000 different items but seems crazy to structure data with a dead end.
On the same concept of 40,000 entries I have played with the approach of storing user.id's in the 'score' collection on items the user has voted on. I tried using 'not-in' from Firestore docs.
const q = query(scoresRef, where('uids', 'not-in', [user.id]));
This seems to only work on single string entries and seems to have no impact on arrays with more than one index.
I don't know if I'm overlooking an obvious answer or if I'm trying to force the impossible. I just can't figure out how Firestore can backend anything big without some relational data. Sorry for the long question, I have been coding for weeks and now I'm stuck. Any code, links, or suggestions would be appreciated. Also, if this is just impossible with Firestore, any suggestions for a low cost cloud SQL backend that plays well with react native expo would also be appreciated.

Firestore array-not-contains alternative solution

TL-DR
I have created a Flutter Firestore posts application. I want to present the user only new posts, which they didn't read yet.
How do I achieve that using Firestore query?
The problem
Each time a user sees a post, their id is added to the post views field.
Next time the user opens the app, I want to present only posts they didn't read yet.
The problem is that query array-not-contains is not supported. How do I achive that functionality?
You're going to have a real hard time with this because Firestore can only give you documents where you know something about the contents of that document. That's how indexes work - by recording data present in the document. Indexes don't track data not present in a document because that's basically an infinite amount of data.
If you are trying to track documents seen by the user, you would think to mark the document as "seen" using a boolean per user, or tracking the document ID somewhere. But as you can see, you can only query for documents that the user has seen, because that's the data present in the system.
What you can do is query for all documents, then query for all the documents the user has seen, then subtract the seen documents from all documents in order to get the unseen documents. But this probably doesn't scale in a way you'd like. (It's essentially the same problem with Firestore indexes not being able to surface documents without some known data present. Firestore won't do the equivalent of a SQL table scan, since that would be a lot of reads you'd have to pay for.)
You can kind of fake it by making sure there is a creation timestamp in each document, and record for each user the timestamp of the most recent seen document. If you require that the user must view the documents in chronological order, then you can simply query for documents with a creation timestamp greater than the timestamp of the latest document seen by the user. This is really as good as it's going to get with Firestore, since you can't query for the absence of data.

Is a good idea to store chat messages in a mongodb collection?

I'm developing a chat app with node.js, redis, socket.io and mongodb. MongoDB comes the last and for persisting the messages.
My question is what would be the best approach for this last step?
I'm afraid a collection with all the messages like
{
id,
from,
to,
datetime,
message
}
can get too big too soon, and is going to get very slow for reading purposes, what do you think?
Is there a better approach you already worked with?
In MongoDB, you store your data in the format you will want to read them later.
If what you read from the database is a list of messages filtered on the 'to' field and with a dynamic datetime filter, then this schema is the perfect fit.
Don't forget to add an index on the fields you will be querying on, then it will be reasonable fast to query them, even over millions of records.
If you would, for example, always show a full history of a full day, you would store all messages for a single day in one document. If both types of queries occur a lot, you would even store your messages in both formats.
If storage is an issue, you could also use capped collection, which will automatically delete messages of e.g. over 1 year old.
I think the db structure is fine, the way you mentioned in your question.
You may assign some unique id for chat between each pair and keep it in each record of chat. Retrieve based on that when you want to show it.
Say 12 is the unique id for chat between A and B, retrieve should be based on 12 when you want to show chat for A and B.
So your db structure can be like:-
{
id,
from,
to,
datetime,
message,
uid
}
Remember, you can optimize your retrieve, if you will give some limit(say 100 at a time) for retrieve. If user is scrolling beyond 100 retrieve more 100 chats. Which will solve lots of retrieve.
When using limit, retrieve based on date created and use sort with find query as well.
Just a thought here, are the messages plain text or are you allowed to share images and videos as well ?
If it's the latter then storing all the chats for a single day in one collection might not work out.
Actually if you have images and videos shares allowed then you need to take into account the. 16mb document restriction as well.

Database design for queries that has tons of sql-like join

I have a collection named posts consisting of multiple article posts and a collection called users which has a lot of user info. Each post has a field called author that references the post author in the users collection.
On my home page I will query the posts collection and return a list of posts to the client. Since I also want to display the author of the post I need to do sql-like join commands so that all the posts will have author names, ids,...etc.
If I return a list of 40 posts I'd have to do 40 sqllike-joins. Which means each time I will do 41 queries to get a list of posts with author info. Which just seems really expensive.
I am thinking to store the author info at the time I am storing the post info. This way I only need to do 1 query to retrieve all posts and author info. However when the user info changes (such as name changes) the list will be outdated and it seems not quite easy to manage lists like this.
So is there's a better or standard approach to this?
p.s: I am using mongodb
Mongo is NoSQL DB. By definition, NoSQL solutions are meant to be denormalized(all required data should be located at a same location)
In your example, relationship between authors and posts is one to many but ratio of authors as compared to posts is very small. In simple words, no. of authors as compared to no. of posts will be very small.
Based on this, you can safely store author info in posts collection.
If you need to query posts collection i.e. if you know your most queries will be executed on posts collection then it makes sense to store author in posts. It wont take huge space to store one attribute but it will make huge difference in query performance and easiness to code/retrieve the data.

use MongoDb in place of MYSQl,SQL

We want to store the Facebook o/p which is in JSON format in mongoDb. We were parsing the json object and storing the content of json as columns in mysql. Which is th ebest way to store the Facebook data. What steps should I take to avoid storing duplicate posts and how can we retrive those data based on conditions.
For example if we try to retrieve posts about football, facebook returns as json object and we can store in mongodb. If we try to fetch posts about football after an hour, we should not be inserting duplicare posts posted by same user id and same time. Similarly if I want to retrieve records related to football, I should be able to fetch only those records and not records related to tennis assuming records exist for tennis.
Kindly clarify or post your opinion and suggestions. it would be of great help.
Thanks,
Balaji D
The way you want to store you data in mongodb depends on the behaviour of your application; e.g. if it's going to be write-heavy, or read-only, etc. I recommend reading about mongodb schema design. In terms of dealing with Facebook data specifically, you might want to check out this similar post from a user here