How to efficiently loop through a MongoDB collection in order update a sequence column? - mongodb

I am new to MongoDB/Mongoose and have a challenge I'm trying to solve in order to avoid a performance rabbit hole!
I have a MongoDB collection containing a numeric column called 'sequence' and after inserting a new document, I need to cycle through the collection starting at the position of the inserted document and to increment the value of sequence by one. In this way I maintain a collection of documents numbered from 1 to n (i.e. where n = the number of documents in the collection), and can render the collection as a table in which newly inserted records appear in the right place.
Clearly one way to do this is to loop through the collection, doing a seq++ in each iteration, and then using Model.updateOne() to apply the value of seq to sequence for each document in turn.
My concern is that this involves calling updateOne() potentially hundreds of times, which might not be optimal for performance. Any suggestions on how I should approach this in a more efficient way?

Related

MongoDb optimisation

Say I have a MongoDB collection with 5,000 records (or thousands). I want to update this collection with new records.
These new records can be the same or with few updated records.
To do this, I have 2 approaches:
Iterate over these records and determine which ones to update (based on a condition). If the condition is truthy update the record.
Delete all the existing records (deleteMany), and then simply add new data.
Which one would be more performant and optimised? And why? Or this there another way?

Sharding with array in Cloud Firestore with composite index

I have read in the documentation, that writes per second can be limited to 500 per second if a collection has sequential values with an index.
I can add a shard field to avoid this.
Therefore I should add the shard field before the sequential field in a composite index.
But what if my sequential field is an array?
An array must always be the first field in a composite index.
For example:
I have a Collection "users" with an array field "reminders".
The field reminders contains time strings like ["12:15", "17:45", "20:00", ...].
I think these values could result in hot spotting but maybe I am wrong.
I don't know how Firestore handles arrays in composite indexes.
Clould my array reminders slow down the writes per second? And if so how could I implement a shard field? Or is there a completely different solution?

Does MongoDb $set operator in (non index field) will be expensive?

I have a collection which have multiple indexes, and often i have to push some data into an array of that collection. I have tried to go through MongoDb Doc, but the best i can get was,
For inserts and updates to un-indexed fields, the overhead for sparse indexes is less than for non-sparse indexes. Also for non-sparse indexes, updates that do not change the record size have less indexing overhead.
I am aware of the difference of sparse indexes and non sparse indexes, and its makes sense that overhead for sparse indexes will be less.
But why is it that, even when i am updating just a un-indexed field in my document, why all other index has to update ! Is it because every index has the same data and all the data has to be updated ?
My Document
var sample = new Schema({
***
student_list: [ {type :Schema.Types.Mixed}],
location: [ {type :Schema.Types.Mixed}],
****
});
student_list.studID will be indexed
{studID:1,city:M,Time:"... e}
Now i often have to update location field. Queries
db.sample.find({student_list.studID:"studid"})
db.sample.find({student_list.studID:"studid", student_list.city:"M"})
all using student_list_studId_1 index
Is this approach is fine or shall i create a diff collection and with every student list as a seperate doc, (every sample doc will have multiple student ids, which may be common across diff samples docs )
The reason why index is updated on every insert is connected with document size and its allocation.
Let's say that document has 1765 bytes, and we are adding next 950 bytes (data + bson overhead), that could execute a relocation of given document as it is not fitting in current allocated data block -> and db engine need to update pointers in all indexes to point to new document location.

DB Compound indexing best practices Mongo DB

How costly is it to index some fields in MongoDB,
I have a table where i want uniqueness combining two fields, Every where i search they suggested compound index with unique set to true. But what i was doing is " Appending both field1_field2 and making it a key, so that field2 will be always unique for field1.(and add Application logic) As i thought indexing is costly.
And also as MongoDB documentation advices us not to use Custom Object ID like auto incrementing number, I end up giving big numbers to Models like Classes, Students etc, (where i could have used easily used 1,2,3 in sql lite), I didn't think to add a new field for numbering and index that field for querying.
What are the best practices advice for production
The advantage of using compound indexes vs your own indexed field system is that compound indexes allows sorting quicker than regular indexed fields. It also lowers the size of every documents.
In your case, if you want to get the documents sorted with values in field1 ascending and in field2 descending, it is better to use a compound index. If you only want to get the documents that have some specific value contained in field1_field2, it does not really matter if you use compound indexes or a regular indexed field.
However, if you already have field1 and field2 in seperate fields in the documents, and you also have a field containing field1_field2, it could be better to use a compound index on field1 and field2, and simply delete the field containing field1_field2. This could lower the size of every document and ultimately reduce the size of your database.
Regarding the cost of the indexing, you almost have to index field1_field2 if you want to go down that route anyways. Queries based on unindexed fields in MongoDB are really slow. And it does not take much more time adding a document to a database when the document has an indexed field (we're talking 1 millisecond or so). Note that adding an index on many existing documents can take a few minutes. This is why you usually plan the indexing strategy before adding any documents.
TL;DR:
If you have limited disk space or need to sort the results, go with a compound index and delete field1_field2. Otherwise, use field1_field2, but it has to be indexed!

Possible to retrieve multiple random, non-sequential documents from MongoDB?

I'd like to retrieve a random set of documents from a MongoDB database. So far after lots of Googling, I've only seen ways to retrieve one random document OR a set of documents starting at a random skip position but where the documents are still sequential.
I've tried mongoose-simple-random, and unfortunately it doesn't retrieve a "true" random set. What it does is skip to a random position and then retrieve n documents from that position.
Instead, I'd like to retrieve a random set like MySQL does using one query (or a minimal amount of queries), and I need this list to be random every time. I need this to be efficient -- relatively on par with such a query with MySQL. I want to reproduce the following but in MongoDB:
SELECT * FROM products ORDER BY rand() LIMIT 50;
Is this possible? I'm using Mongoose, but an example with any adapter -- or even a straight MongoDB query -- is cool.
I've seen one method of adding a field to each document, generating a random value for each field, and using {rand: {$gte:rand()}} each query we want randomized. But, my concern is that two queries could theoretically return the same set.
You may do two requests, but in an efficient way :
Your first request just gets the list of all "_id" of document of your collections. Be sure to use a mongo projection db.products.find({}, { '_id' : 1 }).
You have a list of "_id", just pick N randomly from the list.
Do a second query using the $in operator.
What is especially important is that your first query is fully supported by an index (because it's "_id"). This index is likely fully in memory (else you'd probably have performance problems). So, only the index is read while running the first query, and it's incredibly fast.
Although the second query means reading actual documents, the index will help a lot.
If you can do things this way, you should try.
I don't think MySQL ORDER BY rand() is particularly efficient - as I understand it, it essentially assigns a random number to each row, then sorts the table on this random number column and returns the top N results.
If you're willing to accept some overhead on your inserts to the collection, you can reduce the problem to generating N random integers in a range. Add a counter field to each document: each document will be assigned a unique positive integer, sequentially. It doesn't matter what document gets what number, as long as the assignment is unique and the numbers are sequential, and you either don't delete documents or you complicate the counter document scheme to handle holes. You can do this by making your inserts two-step. In a separate counter collection, keep a document with the first number that hasn't been used for the counter. When an insert occurs, first findAndModify the counter document to retrieve the next counter value and increment the counter value atomically. Then insert the new document with the counter value. To find N random values, find the max counter value, then generate N distinct random numbers in the range defined by the max counter, then use $in to retrieve the documents. Most languages should have random libraries that will handle generating the N random integers in a range.