MongoDB Remove operation does not remove indexes - mongodb

We have collection with Millions of records with necessary indexes. we have started archiving data and at the same time we remove the data from the production collection.
Now, the indexes are not getting removed with data.
Is there any way to remove indexes along with data. thanks.
For Example,
Before Backup:
Number of records - 58002174,
Index Size - 10.3 GB
After Backup:
Number of records - 169376,
Index Size - 10.3 GB
The Number of records are far less. but, the index size didn't reduce. I need to reduce index size.

You can rebuild the index to reduce its size after bulk deletions:
db.collection.reIndex()
See the warnings in the linked documentation regarding locking and sharding.
Or just drop the index and recreate it. That would allow you to recreate it in the background, if desired.

Try the compact command: https://docs.mongodb.com/manual/reference/command/compact/#dbcmd.compact
It is advertised to "rewrite and defragment all data and indexes in a collection".

Related

MongoDB - does scanning indexes require first retrieving the index from disk?

Do indexes always persist on RAM?
Hence, does scanning indexes require first retrieving the index from disk?
EDITED:
My questions is more about whether or not MongoDB will keep the index on RAM always, assuming that there is enough space. Because actual data is pushed off of RAM if they are not recent to make room for more recently accessed data. Is this the case with indexes as well? WIll indexes be pushed off of RAM based upon recency? Or does MongoDB treat indexes with priority and always keep it in RAM if there is enough room?
That is not guaranteed.
MongoDB does store indexes in the same cache as documents, which does evict LRU.
It does not load the entire structure into memory, it load pages as they are needed, so the amount of the index in memory will depend on how it is accessed.
Indexes do get a bit of priority, but that is not absolute, so index pages can be evicted.
An insert into a collection will likely need to update all of the indexes, so it would be a reasonable assumption that any collection that is not totally idle will have at least the root page of each index in the cache.

Why does index size not decrease when deleting documents from a large mongodb collection?

Environment: MongoDB 4.2.7, Centos7
I have a collection with about 500 million document and an index that is about 8GB. If I delete half the documents, I would expect the indexSize to decrease by about 50%. But I don't. Why does it not go down? Is there a way to compact the index?
Just to verify that the index should be smaller, I copied 50% of the documents to a brand new mongodb instance and created the index there. The index is indeed about 50% less.
Well, I just found the answer to my own question. MongoDB actually has a compact command:
https://docs.mongodb.com/manual/reference/command/compact/
This should rebuild the indexes let me reclaim the free space.
It seems you need to compact/repair/resync your db.
Ref: https://dzone.com/articles/reclaiming-disk-space-from-mongodb

MongoDB write performance drops after adding new index on a sharded collection

I have created a collection in MongoDB that has four indexes (one for _id, one for sharding's key, and two other indexes for query optimization on fields f1 and f2) and it is sharded on an 8-node cluster (each node has 14GB RAM). The application is write
Updated: I am using WiredTiger as Database Engine.
The problem is that when I remove one of the secondary index (from f1 or f2), the insertion speed achieves to an acceptable rate, but when I add the new index back, the insertion performance drops rapidly!
I guess the problem is that the index does not fit on RAM and because the access pattern is near random, therefore the HDD speed will be bottleneck. But I expect that MongoDB loads all indexes into the RAM, because the total RAM of each node is 14GB, and the 'top' command says that MongoDB is using about 6GB on each node. The index size are as follow:
Each Node:
2GB for _id index
1.5GB for shard_key index
3GB for f1 index
3GB for f2 index
Total: 9.5GB for all indexes
As you can see, the total index size is about 9.5GB, MongoDB is using about 6GB, and the available RAM is 14GB, so
Why the performance drops after adding new index
If the problem is about random access to index, why MongoDB does not load all indexes on RAM?
How can I determine which part of each index is loaded to RAM and which part didn't?
Best Regards
Why the performance drops after adding new index
It's expected that an index slows write performance, as each index increases the amount of work necessary to complete a write. How much does performance degrade? Can you quantify how much it degrades and what performance change would be acceptable? Can you show us an example document and specify what the indexes are that you are creating? Some indexes are much more costly to maintain than others.
If the problem is about random access to index, why MongoDB does not load all indexes on RAM?
It will load what is being used. How do you know it's not loading the indexes into RAM? Are you seeing a lot of page faults despite having extra RAM? What's your WiredTiger cache size set to?
How can I determine which part of each index is loaded to RAM and which part didn't?
I don't believe there is a simple way to do this.

TTL index on oplog or reducing the size of oplog?

I am using mongodb with elasticsearch for my application. Elasticsearch creates indexes by monitioring oplog collection. When both the applications are running constantly then any changes to the collections in mongodb are immediately indexed. The only problem I face is if for some reason I had to delete and recreate the index then it takes ages(2days) for the indexing to complete.
When I was looking at the size of my oplog by default it's capacity is 40gb and its holding around 60million transactions because of which creating a fresh index is taking a long time.
What would be the best way to optimize fresh index creation?
Is it to reduce the size of oplog so that it holds less number of transactions and still not affect my replication or is it possible to create a ttl index(which I failed to do on several attempts) on oplog.
I am using elasticsearch with mongodb using mongodb river https://github.com/richardwilly98/elasticsearch-river-mongodb/.
Any help to overcome the above mentioned issues is appreciated.
I am not a Elastic Search Pro but your question:
What would be the best way to optimize fresh index creation?
Does apply a little to all who use third party FTS techs with MongoDB.
The first thing to note is that if you have A LOT of records then there is no easy way around this unless you are prepared to lose some of them.
The oplog isn't really a good idea for this, you should probably seek out using a custom script using timers in the main collection to do this personally, or a change table giving you a single place to quickly query for new or updated records.
Unless you are filtering the oplog to get specific records, i.e. inserts, then you could be pulling out ALL oplog records including deletes, collection operations and even database operations. So you could try stripping out unneeded records from your oplog search, however, this then creates a new problem; the oplog has no indexes or index updating.
This means that if you start to read in a manner more appropiate you will actually use an unindexed query over these 60 million records. This will result in slow(er) performance.
The oplog having no index updating answers another one of your questions:
is it possible to create a ttl index(which I failed to do on several attempts) on oplog.
Nope.
As for the other one of your questions:
Is it to reduce the size of oplog so that it holds less number of transactions
Yes, but you will have a smaller recovery window of replication and not only that but you will lose records from your "fresh" index so only a part of your data is actually indexed. I am unsure, from your question, if this is a problem or not.
You can reduce the oplog for a single secondary member that no replica is synching from. Look up rs.syncFrom and "Change the Size of the Oplog" in the mongodb docs.

Mongodb is slow on 8 million rows collection

I have 8 milion rows of collection, and I am new to mongoDB.
It loads 20 rows really slowly...
What should I do do speed it up?
Probably you need to add index.
Optimization Guidelines
Need more informations.Just like server's ram。
Optimize Step:
1.Index your query field。
2.Check the index's storage size。If the index's size larger than RAM。MongoDB need to read data by disk I/O.So slower.
8 milion documents is not large。It is not a big deal。