A bad performance of upserting item to a million-document collection - mongodb

It takes 700~800 ms to upsert an item into a collection, which is containing about 2 million documents. I have tried the functions as following,
Model.findOneAndUpdate()
bulk.find({...}).upsert().updateOne()
But both of them takes about almost 1 second to upsert ONE item.
I have another 1 million items to insert/upsert, so it will takes me several days. How can I improve it?

Adding an Index for the querying item will accelerate the process.

Related

MongoDb optimisation

Say I have a MongoDB collection with 5,000 records (or thousands). I want to update this collection with new records.
These new records can be the same or with few updated records.
To do this, I have 2 approaches:
Iterate over these records and determine which ones to update (based on a condition). If the condition is truthy update the record.
Delete all the existing records (deleteMany), and then simply add new data.
Which one would be more performant and optimised? And why? Or this there another way?

How to efficiently loop through a MongoDB collection in order update a sequence column?

I am new to MongoDB/Mongoose and have a challenge I'm trying to solve in order to avoid a performance rabbit hole!
I have a MongoDB collection containing a numeric column called 'sequence' and after inserting a new document, I need to cycle through the collection starting at the position of the inserted document and to increment the value of sequence by one. In this way I maintain a collection of documents numbered from 1 to n (i.e. where n = the number of documents in the collection), and can render the collection as a table in which newly inserted records appear in the right place.
Clearly one way to do this is to loop through the collection, doing a seq++ in each iteration, and then using Model.updateOne() to apply the value of seq to sequence for each document in turn.
My concern is that this involves calling updateOne() potentially hundreds of times, which might not be optimal for performance. Any suggestions on how I should approach this in a more efficient way?

MongoDb java driver projection performance

I have encounter probably a problem using MongoDB like this. I have 860000 documents in a collection and have 500 collections like this. I have 3 columns, first and second field is type of Array contains 10 elements, third is type of Int64 that keeps currentTimeMillis. When i query 1000 document from one table it tooks ~2500 ms. But when i execute same query getting only first elements of two fields (using $slice operator for Array) (each other contains 10 elements), it takes ~2000 ms. This looks weird. MongoDB is in remote host, so i take approximately 10 times smaller data from network but it takes almost same amount of time. Any thoughts?
Problem turns into this :) When i query 1000 documents using collection.find(whereQuery), it takes ~2400ms. But when i query 13 documents using same code, it takes ~1500ms. Data taken 100 times smaller but time even not half. Am i missing something.

MongoDB Collection Structure Performance

I have a MongoDB database of semi-complex records and my reporting queries are struggling as the collection size increases. I want to make some reporting Views that are optimized for quick searching and aggregating. Here is an sample format:
var record = {
fieldOne:"",
fieldTwo:"",
fieldThree:"", //There is approx 30 fields at this level
ArrayOne:[
{subItem1:""},
{subItem2:""} // There are usually about 10-15 items in this array
],
ArrayTwo:[
{subItem1:""}, //ArrayTwo items reference ArrayOne item ids for ref
{subItem2:""} // There are usually about 20-30 items in this array
],
ArrayThree:[
{subItem1:""},// ArrayThree items reference both ArrayOne and ArrayTwo items for ref
{subItem2:""},// There are usually about 200-300 items in this array
{subArray:[
{subItem1:""},
{subItem2:""} // There are usually about 5 items in this array
]}
]
};
I used to have this data where ArrayTwo was inside ArrayOne items and ArrayThree was inside ArrayTwo items so that referencing a parent was implied, but reporting became a nightmare with multiple nested levels of arrays.
I have a field called 'fieldName' at every level which is a way we target objects in the arrays.
I will often need to aggregate values from any of the 3 arrays across thousands of records in a query.
I see two ways of doing it.
A). Flatten and go Vertically, making a single smaller record in the database for every item in ArrayThree, essentially adding 200 records per single complex record. I tried this and I already have 200K records in 5 days of new data coming in. The benefit to this is that I have fieldNames that I can put indexing on.
B). Flatten Horizontally, making every array flat all within a single collection record. I would use the FieldName located in each array object as the key. This would make a single record with 200-300 fields in it. This would make a lot less records in the collection, but the fields would be dynamic, so adding indexes would not be possible(that I know of).
At this time, I have approx 300K existing records that I would be building this View off of. If I go vertical, that would place 60 Million simple records in the db and if I go Horizontal, it would be 300K records with 200 fields flattened in each with no indexing ability.
What's the right way to approach this?
I'd be inclined to stick with the mongo philosophy and do individual entries for each distinct set/piece of information, rather than relying on references within a weird composite object.
60 Million records is "a lot" (but it really isn't "a ton"), and mongodb loves to have lots of little things tossed at it. On the flipside, you'd end up with fewer big objects and take up just as much space.
(*using the wired tiger back end with compression will make your disk go further too).
**edit:
I'd also add that you really really really want indexes at the end of the day, so that's another vote for this approach.

Remove given number of records in Mongodb

I have Too much records in my Collection, can I have only desired number of records and remove others without any condition?
I have a collection called Products with around 10,0000 of records and its slowing down my Local application, I am thinking to shrink this huge amount of records to something around 1000, How can do it?
OR
How to copy a collection with limited number of records?
If you want to copy collection with limited number of records without any filter condition, for loop can be used . It copies 1000 document form originalCollection to copyCollection.
db.originalCollection.find().limit(1000).forEach( function(doc)
{db.copyCollection.insert(doc)} );