How to update lucene index synchronously? - lucene.net

I have created a Lucene index (using Lucene.net) and the search is working fine.
My concern is as follows:
I used data from my SQL database to create an index. Now the thing is, this data is growing and I am unable to find a way to modify the index without deleting and recreating it. Please let me know if there is a way of modifying the lucene index without the delete-recreate process.

IndexWriter has methods like addDocument, updateDocument, and deleteDocuments, which are used to modify data in the index. Updating a document does require the document to be deleted and reindexed behind the scenes, but it shouldn't require you to recreate the entire index.

Related

Does MongoDB not update index entries upon document deletion?

we're using MongoDb 4.0 with Spring Data MongoDB and we noticed that when doing some housekeeping by batch-deleting millions of documents using external Studio3T that all index entries on all indexes stayed untouched. I read lots of MongoDb documentation regarding this but couldn't find any reference to that circumstance.
If this code does not trigger an index update, then which code does?
Query query = new Query();
query.addCriteria(Criteria.where("modifiedAt").lte(LocalDateTime.now()));
// Does not remove index entries
mongoTemplate.findAllAndRemove(query, MyModel.class);
// Does not either
mongoTemplate.remove(query, MyModel.class);
// Does not either
mongoTemplate.findAll(MyModel.class).forEach(mongoTemplate::remove);
Having an effective mechanic of removing documents for housekeeping purposes and having their index entries removed at the same time is important to us as the Index size is growing and does not fit in memory anymore. Therefore we're required to scale up our hardware here which is more expensive unnecessarily.
I know there are ways to trigger this manually, e. g. dropping indexes and recreating them, or using the compact administrative function. However in a 24/7 onlineshop use case this seems rather unpractical.

MongoDB: what’s happend when the mongodb drop indexes

MongoDB create index be slow in the large data collections,it's easy to understand. But why drop indexes operation so fast? Is there any changes of the data structure after executing drop indexes operation?
Creating a new index on a collection is like create an new collection which is arranged as B-Tree so you can do search on the key fields quickly. This will look like copying part of the collection.
So for deleting index, it will like deleting a collection, mongo just remove the index collection, and then it's done.
I am not sure if you know how file system work or not but you can consider this problem in the same way.When you copy file to a disk , it will take time. But if you remove file from a disk, it takes little time because file system just mark it unused, need almost zero time.

When create my index MongoDB

that might seems a stupid question, but when should I create an index on my collection ?
To be more explicit, I was wondering if I just have to create it once, when I create my collection, and then it will be updated automatically when I add some new documents. Or do I have to regenerate it regularly in background ?
The index will be kept up-to-date by MongoDB as you update/insert documents.
Performance-wise, do not create an index until you need it (to speed up queries). And when doing massive bulk-inserts, it may be more efficient to drop the index and recreate it after you are done inserting.
MongoDB will maintain any and all indexes itself, in other words only once.
This does, however, mean you need to be careful about just what indexes you ensure as each index will create significant overhead while performing write operations. The more indexes you have the more MongoDB will have to update to do a single write.

How to avoid dropping existing index when reindexing

When reindexing Sunspot, the existing index is cleared/dropped first. This means user will see blank search results for a short time, which is bad for production environment. Is there a way to do reindexing without clearing existing indexes?
The clearing occurs when I call rake task, and when I call solr_reindex in console.
By looking into the code, I think doing a Model.solr_index is enough. When indexing is complete, one can start searching into new indexed fields.
The searchable schema is not something shared across all records from one model. Therefore indexing a single record will update the searchable schema of that record.

Is it OK to call ensureIndex on non-existent collections?

I read somewhere that calling ensureIndex() actually creates a collection if it does not exist. But the index is always on some fields, not all of them, so if I ensure an index on say { name:1 } and then add documents to that collection that have many more fields, the index will work? I know we don't have a schema, coming from RDBMS world I just want to make sure. :) I'd like to create indexes when my website starts, but initially the database is empty. I do not need to have any data prior to ensuring indexes, is that correct?
ensureIndex will create the collection if it does not yet exist. It does not matter if you add documents that don't have the property that the index covers, you just can't use that index to find those documents. The way I understand it is that in versions before 1.7.4 a document that is missing a property for which there is an index will be indexed as though it had that property, but will a null value. In versions after 1.7.4 you can create sparse indexes that don't include these objects at all. The difference is slight but may be significant in some situations.
Depending on the circumstances it may not be a good idea to create indexes when the app starts. Consider the situation where you deploy a new version which adds new indexes when it starts up, in development you will not notice this as you only have a small database, but in production you may have a huge database and adding the index will take a lot of time. During the index creation your app will hang and you can't serve requests. You can create indexes with the background flag set to true (the syntax depends on which driver you're using), but in most cases it's better to add indexes manually, or as part of a setup script. That way you will have to think before you update indexes.
Deprecated since version 3.0: db.collection.ensureIndex() has been
replaced by db.collection.createIndex().
Ref: https://docs.mongodb.com/manual/reference/method/db.collection.ensureIndex/