How to avoid dropping existing index when reindexing - sunspot

When reindexing Sunspot, the existing index is cleared/dropped first. This means user will see blank search results for a short time, which is bad for production environment. Is there a way to do reindexing without clearing existing indexes?
The clearing occurs when I call rake task, and when I call solr_reindex in console.

By looking into the code, I think doing a Model.solr_index is enough. When indexing is complete, one can start searching into new indexed fields.
The searchable schema is not something shared across all records from one model. Therefore indexing a single record will update the searchable schema of that record.

Related

Does MongoDB not update index entries upon document deletion?

we're using MongoDb 4.0 with Spring Data MongoDB and we noticed that when doing some housekeeping by batch-deleting millions of documents using external Studio3T that all index entries on all indexes stayed untouched. I read lots of MongoDb documentation regarding this but couldn't find any reference to that circumstance.
If this code does not trigger an index update, then which code does?
Query query = new Query();
query.addCriteria(Criteria.where("modifiedAt").lte(LocalDateTime.now()));
// Does not remove index entries
mongoTemplate.findAllAndRemove(query, MyModel.class);
// Does not either
mongoTemplate.remove(query, MyModel.class);
// Does not either
mongoTemplate.findAll(MyModel.class).forEach(mongoTemplate::remove);
Having an effective mechanic of removing documents for housekeeping purposes and having their index entries removed at the same time is important to us as the Index size is growing and does not fit in memory anymore. Therefore we're required to scale up our hardware here which is more expensive unnecessarily.
I know there are ways to trigger this manually, e. g. dropping indexes and recreating them, or using the compact administrative function. However in a 24/7 onlineshop use case this seems rather unpractical.

MongoDB: is indexing a pain?

Speaking in general, I want to know what are the best practices for querying (and therefore indexing) of schemaless data structures? (i.e. documents)
Lets say I use MongoDB to store and query deterministic data structures in a collection. At this point all documents have the same structure therefore I can easily create indexes for any queries in my app since I know each document has required field(s) for the index.
What happens after I change the structure and try to save new documents to the db? Lets say I joined two fields FirstName and Lastname to FullName. As a result the collection contains nondeterministic data. I see two problems here:
Old indexes cannot cover new data, therefore new indexes needed that handle both fields old and new
App should take care of dealing with two representations of the documents
This may result in a big problem when there are many changes in the db resulting in many versions of document structures.
I see two main approaches:
Lazy migration. This means that each document is migrated on demand (i.e. only after loading from collection) to final structure and then stored back to colection. This approach actually does not solve the problems because it concedes nondeterminism at any point of time.
Forced migration. This is the same approach as for RDBMS migrations. The migration is performed for all documents at one point of time while the app does not run. The main con is downtime of the app.
So the question: Is there any good way of solving the problem, especially without app downtime?
If you can't have downtime then the only choice is to do the migrations "on the fly":
Change the application so that when new documents are saved the new field is created, but read from the old ones.
Update your collection with a script/queries to add the new field in the collection.
Create new indexes on that field.
Change the application so that it reads from the new fields.
Drop the unnecessary indexes and remove the old fields from the documents.
Changing the schema on a live database is never an easy process, no matter what database you use. It always requires some forward thinking and careful planning.
is indexing a pain?
Indexing is not a pain, but premature optimization is. You should always test and check that you actually need indexes before adding them and when you have them, check that they are being properly used.
If you're worried about performance issues on a live system when creating indexes, then you should consider having replica sets and doing rolling maintenance (in short: taking secondaries down from replication, creating indexes on them, bringing them back into replication and then repeating the process for all the subsequent replica set members).
Edit
What I was describing is basically a process of migrating your schema to a new one while temporary supporting both versions of the documents.
In step 1, you're basically adding support for multiple versions of documents. You're updating existing documents i.e. creating new fields, while you're reading data from the previous version fields. Step 2 is optional, because you can gradually update your documents as they are being saved.
In step 4 you're removing the support for the previous versions from your application code and migrating to a new version. Finally, in step 5 you're removing the previous version fields from your actual MongoDB documents.

MongoDB update (add field to nearly every document) is very very slow

I am working on a MongoDB cluster.
One DB named bnccdb, with a collection named AnalysedLiterature. It has about 7,000,000 documents in it.
For each document, I want to add two keys and then update this document.
I am using Java client. So I query this document, add these both keys to the BasicDBObject and then I use the save() method to update this object. I found the speed is so slow that it will take me nearly several weeks to complete the update for this whole collection.
I wonder the reason why my update operation is so slow is that what I do is add keys.
This will cause a disk/block re-arrangement in the background, so this operation becomes extremely time-consuming.
After I changed from save() to update, problem remains.This is my status information.
From the output of mongostat,it is very obvious that the faults rate is absolutely high.But I don't know what cased it.
Anyone can help me?

How to update lucene index synchronously?

I have created a Lucene index (using Lucene.net) and the search is working fine.
My concern is as follows:
I used data from my SQL database to create an index. Now the thing is, this data is growing and I am unable to find a way to modify the index without deleting and recreating it. Please let me know if there is a way of modifying the lucene index without the delete-recreate process.
IndexWriter has methods like addDocument, updateDocument, and deleteDocuments, which are used to modify data in the index. Updating a document does require the document to be deleted and reindexed behind the scenes, but it shouldn't require you to recreate the entire index.

Is it OK to call ensureIndex on non-existent collections?

I read somewhere that calling ensureIndex() actually creates a collection if it does not exist. But the index is always on some fields, not all of them, so if I ensure an index on say { name:1 } and then add documents to that collection that have many more fields, the index will work? I know we don't have a schema, coming from RDBMS world I just want to make sure. :) I'd like to create indexes when my website starts, but initially the database is empty. I do not need to have any data prior to ensuring indexes, is that correct?
ensureIndex will create the collection if it does not yet exist. It does not matter if you add documents that don't have the property that the index covers, you just can't use that index to find those documents. The way I understand it is that in versions before 1.7.4 a document that is missing a property for which there is an index will be indexed as though it had that property, but will a null value. In versions after 1.7.4 you can create sparse indexes that don't include these objects at all. The difference is slight but may be significant in some situations.
Depending on the circumstances it may not be a good idea to create indexes when the app starts. Consider the situation where you deploy a new version which adds new indexes when it starts up, in development you will not notice this as you only have a small database, but in production you may have a huge database and adding the index will take a lot of time. During the index creation your app will hang and you can't serve requests. You can create indexes with the background flag set to true (the syntax depends on which driver you're using), but in most cases it's better to add indexes manually, or as part of a setup script. That way you will have to think before you update indexes.
Deprecated since version 3.0: db.collection.ensureIndex() has been
replaced by db.collection.createIndex().
Ref: https://docs.mongodb.com/manual/reference/method/db.collection.ensureIndex/