zend_search_lucene rebuild index - zend-framework

I'm wondering if anybody can suggest the right way to re-index with zend_search_lucene. There isn't an option to update documents, you need to delete and re-add. I've got a bunch of database tables which I'm going to cycle over and add a document to the index for each. I can't see any point in deleting documents as I go - I may as well empty the entire index, and then add everything afresh.
There doesn't seem to be a simple deleteAllDocs() method, so I have to find them all first, and then loop over them, delete them one by one, then loop over my database tables and add them all. There isn't a getAllDocuments method either (although there is a solution here http://forums.zend.com/viewtopic.php?f=69&t=9121)
Obviously I could write something fancy which checks if the document has changed, and only delete it if it has, but this involves comparing all fields doesn't it?
I feel like I must be missing something.

I delete the index and create a new index. more or less as here

Related

Replace instead update

I have parent document with references. The question is, it is OK to delete all referenced documents and insert new ones, instead updating old, inserting new and deleting removed documents? In SQL it's not very good practice, be cause index becomes fragmented.
When you start inserting documents into MongoDB, it puts each
document right next to the previous one on disk. Thus, if a document
gets bigger, it will no longer fit in the space it was originally
written to and will be moved to another part of the collection
i believe its better to remove and insert incase we are not sure of the size, else if the size of update is bigger we can face performance concerns in case of relocating.
If i am not wrong, what you are trying to achieve is the behavior of Document replacement, i believe you can use db.collection.findAndModify() , it has update and remove field, which can help you achieve your desired behavior.

MongoDB update (add field to nearly every document) is very very slow

I am working on a MongoDB cluster.
One DB named bnccdb, with a collection named AnalysedLiterature. It has about 7,000,000 documents in it.
For each document, I want to add two keys and then update this document.
I am using Java client. So I query this document, add these both keys to the BasicDBObject and then I use the save() method to update this object. I found the speed is so slow that it will take me nearly several weeks to complete the update for this whole collection.
I wonder the reason why my update operation is so slow is that what I do is add keys.
This will cause a disk/block re-arrangement in the background, so this operation becomes extremely time-consuming.
After I changed from save() to update, problem remains.This is my status information.
From the output of mongostat,it is very obvious that the faults rate is absolutely high.But I don't know what cased it.
Anyone can help me?

How to avoid dropping existing index when reindexing

When reindexing Sunspot, the existing index is cleared/dropped first. This means user will see blank search results for a short time, which is bad for production environment. Is there a way to do reindexing without clearing existing indexes?
The clearing occurs when I call rake task, and when I call solr_reindex in console.
By looking into the code, I think doing a Model.solr_index is enough. When indexing is complete, one can start searching into new indexed fields.
The searchable schema is not something shared across all records from one model. Therefore indexing a single record will update the searchable schema of that record.

Overwriting a collection in mongoDB (involving remove + bulk save) . How to make sure this is performed as transaction?

In some situations I need to completely overwrite a specific MongoDB collection by doing:
db.collection.remove()
db.collection.insert(doc) multiple times.
What if 1. succeeds but somewhere 2. fails?
Is there a way to do a rollback when this fails?
Any other way to go about this?
If your collection isn't sharded you could:
Rename the original collection.
Create a new collection using the original name.
Populate the new collection.
If all went well, drop the original collection, otherwise drop the new collection and rename the original one back to the original name.

Is it OK to call ensureIndex on non-existent collections?

I read somewhere that calling ensureIndex() actually creates a collection if it does not exist. But the index is always on some fields, not all of them, so if I ensure an index on say { name:1 } and then add documents to that collection that have many more fields, the index will work? I know we don't have a schema, coming from RDBMS world I just want to make sure. :) I'd like to create indexes when my website starts, but initially the database is empty. I do not need to have any data prior to ensuring indexes, is that correct?
ensureIndex will create the collection if it does not yet exist. It does not matter if you add documents that don't have the property that the index covers, you just can't use that index to find those documents. The way I understand it is that in versions before 1.7.4 a document that is missing a property for which there is an index will be indexed as though it had that property, but will a null value. In versions after 1.7.4 you can create sparse indexes that don't include these objects at all. The difference is slight but may be significant in some situations.
Depending on the circumstances it may not be a good idea to create indexes when the app starts. Consider the situation where you deploy a new version which adds new indexes when it starts up, in development you will not notice this as you only have a small database, but in production you may have a huge database and adding the index will take a lot of time. During the index creation your app will hang and you can't serve requests. You can create indexes with the background flag set to true (the syntax depends on which driver you're using), but in most cases it's better to add indexes manually, or as part of a setup script. That way you will have to think before you update indexes.
Deprecated since version 3.0: db.collection.ensureIndex() has been
replaced by db.collection.createIndex().
Ref: https://docs.mongodb.com/manual/reference/method/db.collection.ensureIndex/