Cassandra does not show or apply index when adding secondary index - nosql

Cassandra seems to have automatically deleted an important secondary index so I have to restore it.
I used both PHPcassa and the Cassandra CLI to restore the index.
Both reported the index was applied, but when I went to try and retrieve the data according to the index, it complains that the indexed column is not there.
This is the latest Cassandra version 1.1.2.
Thanks and help quick!

Related

How to manually create empty MongoDB index on a new field?

I have a huge collection, more than 2TiB of data. During release of a new feature I add an index of new field, that 100% sure doesn't exist in any document, MongoDB will still perfom a full scan for this field, which may process for a long time.
Is there any hack to just manually create an empty index file with valid structure and notify MongoDB node about it, so it will load index into memory and everything else MongoDB is doing when index is crerated?
Unlike in relational RDBMS, MongoDB creates indexes also on non-existing fields, i.e. it scans the entire collection.
Index creation runs in background, so it should not harm so much.
See createIndexes
Changed in version 4.2.
For feature compatibility version (fcv) "4.2", all index builds use an optimized build process that holds the exclusive lock only at the beginning and end of the build process. The rest of the build process yields to interleaving read and write operations. MongoDB ignores the background option if specified.
If you run MongoDB version 4.2 or earlier, then you may specify option { background: true }

Distributing MongoDB background index creation for single machine MongoDB setup

I have a single machine MongoDB setup which satisfies the needs of my application at runtime but impose a significant bottleneck at the data ingestion time as the background indexing on an array field takes days to complete (inverted index). It seems to be the same issue as posted here MongoDB large index build very slow. I wonder if it makes sense to delegate/distribute index creation and then deploy the result index on the main machine. If anyone considered it - would appreciate sharing the experience. Here are some ideas I wanted to test:
Use a distributed job like Hadoop or DataFlow to create index tuples , then load them back to either MongoDB directly or another DB that can be more efficient for storing an inverted index.
Use another service like ElasticSearch that can potentially handle indexing more efficiently; however, I have no experience with it and want to continue hosting everything on the same machine.
At the end I decided to generate all tuples to index with Apache Beam/DataFlow, import all tuples with mongoimport and then create an index on the fields that I need. This way I get an index to query in hours rather than days.

Update settings of existing river in Elasticsearch

I created MongoDB river with index in Elasticsearch, but later on noticed that I don't need several fields and I want to use different username/password for river.
How to:
1.Initialize update of river settings for new username/password?
2.Delete/exclude useless fields from index from Elasticsearch without rebuilding whole index from the beginning?
I have like 20-30GB of indexed data and whole process of getting data through river may take long hours.
All I found is Delete and PUT, but there's no update for index fields or river mentioned either in docs or in google.
It's not currently possible to remove a field from a mapping. In order to remove all values of a field from all documents, you need to reindex all documents with this field removed.

How to update lucene index synchronously?

I have created a Lucene index (using Lucene.net) and the search is working fine.
My concern is as follows:
I used data from my SQL database to create an index. Now the thing is, this data is growing and I am unable to find a way to modify the index without deleting and recreating it. Please let me know if there is a way of modifying the lucene index without the delete-recreate process.
IndexWriter has methods like addDocument, updateDocument, and deleteDocuments, which are used to modify data in the index. Updating a document does require the document to be deleted and reindexed behind the scenes, but it shouldn't require you to recreate the entire index.

Can I modify existing index in MongoDB without dropping it?

Can I modify existing index in MongoDB without dropping it ? I don't see anything about it in documentation.
I have an non-unique index on String field. Collection is ~6M documents. It's replica set.
I know I can delete index and add new one. But it's problematic due to two reasons:
1) at time while index doesn't exist some queries will be very very slow.
2) adding new index (in my project) creates very high load on DB which visibly slows down my web-site.
There is no way to alter an index as you describe, and if there was I think the outcome in terms of performance would be similar - how would the database use the half created/altered index while this operation was going on for example?
Instead I would recommend using the background option to build the index on a single node, if that is your configuration - it will take longer but will not interfere with your normal operation as much. Once it is finished you can drop the old index at your leisure.
However, if you have a replica set (recommended) you should be aware that index creation is always (currently) done in the foreground on the secondary. If you want to avoid load on your secondaries, then you should follow the steps outlined here to take a member out one at a time and build the index required before rejoining the set:
http://docs.mongodb.org/manual/administration/indexes/#index-building-replica-sets
Update
Background index builds on secondaries will be possible starting with the 2.6 release (see release notes for details). This is not going to be backported to prior versions, so the above note will be true for versions prior to 2.6.
Finally, As a general note, indexes built in the background will generally be larger and a less efficient than those built in the foreground, so the methodology above will still have its uses.