Weird performance drop - mongodb

Welcome everyone!
I am facing a weird drop of performance for my symfony2 app using MongoDB.
The query that creates the problem is the following:
$posts=$dm->getRepository('ngNearBundle:Posts')
->findBy(array('author'=>new \MongoId($userId)), array('date' => -1),5);
for a user (the first ever in the database) , the whole page loads in 100ms. For another user, it takes 500ms.
MY ASSUMPTION
The first user has almost 140K posts, where the second has barely 3 posts, so in order to gather the limit (5 posts) the cursor should navigate all the database looking for 5 posts but finds only 3.
If you agree with my assumption, how can I fix this problem, can Indexes help me out here ?
I have an index on the field author though.. is it because 99% of posts was written by user 1 so it's easy for Mongo to fetch data for that user?
Please enlighten me.

Related

Any way in Firebase Firestore to sustainably compare data collections?

I'm fairly new to Firestore NoSQL but not databases or structuring. I know this is not a new question but I can't seem to wrap my head around some of the answers.
I have a simple react native rating style app that presents the items to be rated in a list format, most popular items first. My goal is to not present the user with the same item in the future or on refresh once they have already rated the item.
I have 3 collections, a standard 'user', a 'ratings', and a 'scores'. The item info and overall calculated rating score is held in the 'scores'. The 'ratings' collection keeps track of the actual user's score of each item so it can be searched and changed. The list is queried by pulling the 'scores' ordered by highest calced score.
Here's where I'm stuck. Even though I technically have a reference to everything a user has voted on in the 'ratings' collection, all of the examples of this I can find suggest to store a reference id array of each item voted on in the users 'user' collection. When I pull all the items to be scored I check each item id to see if it has already been scored before I push it into a data array. Here are the problems I see...
If the user has already scored 200 of the first 250 top results, Firestore will count 250 reads each refresh even though the user is only seeing 50 new items. This could scale to a huge cost/problem just to show a few new or extra items.
Firebase arrays I believe have a max of 40,000 index entries per document. Probably never an issue someone rating 40,000 different items but seems crazy to structure data with a dead end.
On the same concept of 40,000 entries I have played with the approach of storing user.id's in the 'score' collection on items the user has voted on. I tried using 'not-in' from Firestore docs.
const q = query(scoresRef, where('uids', 'not-in', [user.id]));
This seems to only work on single string entries and seems to have no impact on arrays with more than one index.
I don't know if I'm overlooking an obvious answer or if I'm trying to force the impossible. I just can't figure out how Firestore can backend anything big without some relational data. Sorry for the long question, I have been coding for weeks and now I'm stuck. Any code, links, or suggestions would be appreciated. Also, if this is just impossible with Firestore, any suggestions for a low cost cloud SQL backend that plays well with react native expo would also be appreciated.

Compound indexes in MongoDB efficiency unclear

Im looking for a structure to save userdata for a discord bot.
The context is that i need a unique save for a user for each discord sever (aka. guild) he is on.
Therefore neither userID nor guildID should be unique, but i could use them as compound index to quickly find users inside the users collection.
Is my train of thought correct until now?
My actual question is:
Which ID should be the first index its "sorted" by?
there are multiple hundred or thousand users per guild, but a single user is on about 1-5 guilds the bot is on.
Therefore first searching by guildID would make the amount of data to search in by userID somewhat smaller.
But first searching for userID would make the amount of data to search in by guildID even smaller.
Since the DB will search both indexes completely anyway, so step1 will be similarly quick for both, the second idea with first filtering by userID and then by guildID seems more efficient to me.
I'd like to know if my assumption seems viable, and if not, why not.
Or if there would be a better way that i haven't thought of.
Thanks in advance!
Compound indexes worked fine.
Still not big enough to see any difference in implementation of them, so i don't know about that.

how does morphia skip deal with new records arriving during pagination?

I've read a lot about using skip with paging (and the related performance issues). For my application, the performance issues are not a problem, however, it's not clear to me what happens with skip if new records arrive in between requesting pages.
For example, let's say I have 10 records, a user requests a page of 5 and we deliver them. When the user is browsing the first page, another 5 records are inserted into the db, the user requests the next page of 5. Assuming we're sorting on id or date, will the user now be returned the same 5 records (because, for the second page, skip skips the first 5 newly added records and returns the next 5, which are now the same records that were originally returned)?
You are correct. Both performance and correctness with added / removed entries is an issue.
For a good explanation see http://use-the-index-luke.com/no-offset (Markus Winand has been fighting offset for years ;-) ).
Keyset pagination is neither supported in MongoDB nor in Morphia from what I know, so you'll have to build it yourself. Make sure you're always working with something unique (like date and ID).
Other systems have implemented this feature natively, for example in Elasticsearch with search after.

Meteor database design to prevent performance degradation

Let's say I have an app that allows users to create ingredients.
I want each user to have 200 ingredients already in the system ready for them to use and edit upon joining the app. Now because I want to allow them the posibility to edit their ingredients, I believe I need to add the 200 ingredients to the database with the createdBy field set to their user id upon account creation. This way a user will only see and edit their own set of ingredients and not someone elses.
My question is in regards to performance...
If I have 10 users join, the ingredients collection now has 1000 documents. If in 6 months time I have 5000 users. The ingredients collection now has 5 million documents (plus the ingredients users create on their own). 5000 users is great, but what if theres even more.
Correct me if I am wrong but I am pretty sure that this would degrade performance as time goes by wouldn't it? If the query is searching through millions of documents, a user would most likely be waiting quite some time before their ingredients are displayed on the page right?
Is there any other way I can achieve what I want without this potential problem?
Thanks

Twitter exercise with MongoDB and lack of transactions?

I was trying to figure out if MongoDB needs transactions and why you wouldn't have everything in a single document. I also know twitter uses HBase which does have transactions so I thought about a tweet and watchers.
If i post a tweet it will be inserted with no problem. But how would I or anyone else find my tweet? I heard mongodb has indexes so maybe I can index author and find my tweet however I can't imagine that being efficient if everyone does that. Also time has to be indexed.
So from what I understand (I think i saw some slides twitter released) twitter has a 'timeline' so everytime a person tweets twitter inserts the tweetid in everyone timeline which is indexed by date and when a given user browse it grabs available tweets sorted by time.
How would that be done in mongodb? The only solution I can think of is having a column in the tweet document saying {SendOut:DateStamp} which is removed when completed. If it didnt complete on the first attempt (checking timestamp to guess if it should be completed by now or not) then I would need to check all the watchers to see who hasn't received it and insert if they didn't. But also since theres no transactions i guess i need to index the SendOut column? Would this solution work? How would I efficiently insert a tweet and give it to everyone watching the user? (if this solution would not work)
It sounds like you're describing a model similar to pub/sub. Couldn't you instead just track that last post (by date) with each user object that the user last read? Users would requests tweets the same way, using various indexes including time.
I'm not sure what you need transactions for, but Mongo does support atomic operations.
[Updated]
So in other words, each user's object stores the dateTime of the last tweet read/delivered. Obviously you would also need the list of subscribed author IDs. To fetch new tweets you would ask for tweets indexed by both author_id,time properties and then sort by time.
By using the last read date from the user object and using it as the secondary index into your tweets collection, I don't believe you need either pub/sub or transactions to do it.
I might be missing something though.