I wonder if it is a must of reindexing the geospatial data in mongodb if there are some new geo-data has been inserted in order to search them? Say we have a document,which looks like:
{user:'a',loc:[363.236,-45.365]}, and it is indexed. Later on, I inserted document b, which looks like: {user:'b',loc:{42.3654,-56.3}}. In order to search, do I have to reindex (using ensureIndex()) the collection every time when a new document is inserted? Will the frequent reindexing affect the overall application performance?
Thanks.
You only need to ensureIndex once; after that MongoDB maintains the index on every insert. I'm not 100% sure the index is maintained for deletes though - I imagine it must do.
You can defragment an index and rebuild it to make it smaller, hence the existence of the functionality. A useful post:
http://jasonwilder.com/blog/2012/02/08/optimizing-mongodb-indexes/
Related
we're using MongoDb 4.0 with Spring Data MongoDB and we noticed that when doing some housekeeping by batch-deleting millions of documents using external Studio3T that all index entries on all indexes stayed untouched. I read lots of MongoDb documentation regarding this but couldn't find any reference to that circumstance.
If this code does not trigger an index update, then which code does?
Query query = new Query();
query.addCriteria(Criteria.where("modifiedAt").lte(LocalDateTime.now()));
// Does not remove index entries
mongoTemplate.findAllAndRemove(query, MyModel.class);
// Does not either
mongoTemplate.remove(query, MyModel.class);
// Does not either
mongoTemplate.findAll(MyModel.class).forEach(mongoTemplate::remove);
Having an effective mechanic of removing documents for housekeeping purposes and having their index entries removed at the same time is important to us as the Index size is growing and does not fit in memory anymore. Therefore we're required to scale up our hardware here which is more expensive unnecessarily.
I know there are ways to trigger this manually, e. g. dropping indexes and recreating them, or using the compact administrative function. However in a 24/7 onlineshop use case this seems rather unpractical.
I've learned a lot of things about indexing and finding some stuff from
here.
Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the
query statement. If an appropriate index exists for a query, MongoDB can use the index to limit the number of documents it must inspect.
But i still have some questions:
While Creating index using (createIndex), is the Record always stored in
RAM?
Is every time need to create Index Whenever My application
is going to restart ?
What will Happen in the case of default id (_id). Is always Store in RAM.
_id Is Default Index, That means All Records is always Store in RAM for particular collections?
Please help me If I am wrong.
Thanks.
I think, you are having an idea that indexes are stored in RAM. What if I say they are not.
First of all we need to understand what are indexes, indexes are basically a pointer to tell where on disk that document is. Just like we have indexing in book, for faster access we can see what topic is on which page number.
So when indexes are created, they also are stored in the disk, But when an application is running, based on the frequent use and even faster access they get loaded into RAM but there is a difference between loaded and created.
Also loading an index is not same as loading a collection or records into RAM. If we have index loaded we know what all documents to pick up from disk, unlike loading all document and verifying each one of them. So indexes avoid collection scan.
Creation of indexes is one time process, but each write on the document can potentially alter the indexing, so some part might need to be recalculating because records might get shuffled based on the change in data. that's why indexing makes write slow and read fast.
Again think of as a book, if you add a new topic of say 2 pages in between the book, all the indexes after that topic number needs to be recalculated. accordingly.
While Creating index Using (createIndex),Is Record always store in RAM
?.
No, records are not stored in RAM, while creating it sort of processes all the document in the collection and create an index sheet, this would be time consuming understandably if there are too many documents, that's why there is an option to create index in background.
Is every time need to create Index Whenever My application is going to
restart ?
Index are created one time, you can delete it and create again, but it won't recreated on the application or DB restart. that would be insane for huge collection in sharded environment.
What will Happen in the case of default id (_id). Is always Store in
RAM.
Again that's not true. _id comes as indexed field, so index is already created for empty collection, as when you do a write , it would recalculate the index. Since it's a unique index, the processing would be faster.
_id Is Default Index, That means All Records is always Store in RAM for particular collections ?
all records would only be stored in RAM when you are using in-memory engine of MongoDB, which I think comes as enterprise edition. Due to indexing it would not automatically load the record into RAM.
To answer the question from the title:
MongoDB indexes use a B-tree data structure.
source: https://docs.mongodb.com/manual/indexes/index.html#b-tree
I have have a Python application that is iteratively going through every document in a MongoDB (3.0.2) collection (typically between 10K and 1M documents), and adding new fields (probably doubling/tripling the number of fields in the document).
My initial thought was that I would use upsert the entire of the revised documents (using pyMongo) - now I'm questioning that:
Given that the revised documents are significantly bigger should I be inserting only the new fields, or just replacing the document?
Also, is it better to perform a write to the collection on a document by document basis or in bulk?
this is actually a great question that can be solved a few different ways depending on how you are managing your data.
if you are upserting additional fields does this mean your data is appending additional fields at a later point in time with the only changes being the addition of the additional fields? if so you could set the ttl on your documents so that the old ones drop off over time. keep in mind that if you do this you will want to set an index that sorts your results by descending _id so that the most recent additions are selected before the older ones.
the benefit of this of doing it this way is that your are continually writing data as opposed to seeking and updating data so it is faster.
in regards to upserts vs bulk inserts. bulk inserts are always faster than upserts since bulk upserting requires you to find the original document first.
Given that the revised documents are significantly bigger should I be inserting only the new fields, or just replacing the document?
you really need to understand your data fully to determine what is best but if only change to the data is additional fields or changes that only need to be considered from that point forward then bulk inserting and setting a ttl on your older data is the better method from the stand point of write operations as opposed to seek, find and update. when using this method you will want to db.document.find_one() as opposed to db.document.find() so that only your current record is returned.
Also, is it better to perform a write to the collection on a document by document basis or in bulk?
bulk inserts will be faster than inserting each one sequentially.
Since MongoDB 3.x introduces lock per record and not on collection or database, does it make sense to write all of your data to single collection with one extra identifier field "documentType".
It will help simulate "join" through map-reduce operation.
Couchbase does the same thing with "buckets" instead of collection.
Does anybody see any disadvatanges with this approach ?
There's one big general-case disadvantage: indexes.
With Mongo, you generally want to set up indexes so that most, if not all, queries you make, use them. So in addition to the one on _id, you'll set up indexes on the primary fields you search by (often compounded with those you sort by).
If you're storing everything in one single collection, that means you need to have all those indexes on that collection. Which means two things:
The indexes are be bigger, since there's more documents to index. Granted, this can be somewhat mitigated by using sparse indexes.
Inserting or modifying documents in the collection requires Mongo to update all these indexes (where it'd just update the relevant indexes in the standard use-many-collections approach). This kills your write performance.
Furthermore, if you have in your application a query that somehow doesn't use one of those many indexes, it needs to scan through the entire collection, which is O(n) where n is the number of documents in the collection -- in your case, that means the number of documents in the entire database.
Collections are cheap. Use them ;)
that might seems a stupid question, but when should I create an index on my collection ?
To be more explicit, I was wondering if I just have to create it once, when I create my collection, and then it will be updated automatically when I add some new documents. Or do I have to regenerate it regularly in background ?
The index will be kept up-to-date by MongoDB as you update/insert documents.
Performance-wise, do not create an index until you need it (to speed up queries). And when doing massive bulk-inserts, it may be more efficient to drop the index and recreate it after you are done inserting.
MongoDB will maintain any and all indexes itself, in other words only once.
This does, however, mean you need to be careful about just what indexes you ensure as each index will create significant overhead while performing write operations. The more indexes you have the more MongoDB will have to update to do a single write.