is there multikey index and compound index in hbase? - mongodb

I 'm familiar with mongodb.
you know, there are many index types in mongodb, such as:
multikey index : http://docs.mongodb.org/manual/core/index-multikey/
, which is very useful for keyword search, I ever used it to build a simple search engine.
compound index is also very useful in mongodb : http://docs.mongodb.org/manual/tutorial/create-a-compound-index/ which is used for multi fields' query.
but I need to migrate my database from mongodb to hbase, do you know some similar index in hbase which can realize the same function with multikey and compound index in mongodb?

HBase doesn't support secondary indexes, that's one of the trade-offs in order to be able to scale to massive data sets. These are the options you have:
http://hbase.apache.org/book/secondary.indexes.html
It all depends on the amount of data you're going to handle and your access patterns. For me, both dual writing to "index" tables & summary tables are the best approaches, just keep in mind that this has to be done manually.

There is no concept of indexing in HBase as of now. I know there is some demand within the community for Indexing. But there are other projects which provide indexing on top of Hbase, One particular one i looked at was Huawei Hindex

Related

How to view MongoDB indexes data structure?

In the MongoDB docs it is stated that
Indexes are special data structures [1] that store a small portion of
the collection’s data set in an easy to traverse form.
How can I see these data structures? Is it possible?
I was going through this question and I saw that in this answer they gave an example of a schema for an index. Is there such a thing in MongoDB that is what I am trying to see. I am trying to understand indexes in MongoDB better.
When you create an index in Mongo (using createIndex) you specify which fields the index will use, or what you call the index "schema".
As mentioned in the docs these indexes are built as b-trees (don't read too much into this as indexes are a "black box" for us users), viewing the exact tree structure is not possible, but you can use indexStats to get some more information on an index you created.

How can we create an Index on MongoDB?

I want to create an Index on Mongo database for performance perspective, so could you please help me how I can do it?
Your help will be appreciated here.
If you want to index on field email on users collection:
db.users.createIndex({"email":1}, {background:true})
Before applying indexing in mongodb collections you need to understand the following aspects of indexing:
Indexing strategy:
Check your application for what type of queries does it send to mongodb.
List down all such possible queries.
Based on the number of operations, type of operations define index type
Choose the correct type of indexes for application needs. Type can be single index, compound index, partial index, TTL index and so on
Do your queries involve the sort operations? Follow this guide on indexing for operations with sort.
The more detailed guide on indexing strategy here.
Test your indexes:
Once you have the list of indexes to be applied, test your indexes performance using explain.
Generate a sample application calls on your database and enable profiler (in dev or stag) to check how your indexes are performing.
How to index:
Create indexes in the background. It will make sure that the create index operation does not block the other operations.
Depending on your data size, if the indexes to be created on large collections, consider doing it in low traffic hours. Or in a scheduled maintenance window
You may need to consider building rolling index in certain use cases to minimize the impact of indexing.
Keep track of indexes you create:
Document your indexes. This may include when you have created those indexes, why and so on.
Measure your index usage stats in production:
Once you have applied these indexes in production, in a week or two check usage stas of your indexes to check whether they're really being used
Consider dropping the indexes if they're not used at all.
Caution:
Indexes add performance penalty for write operations. Design and apply indexes which are must for your application.
The basic syntax is:
db.collection.createIndex(keys, options)
So, for example:
$ db.users.createIndex({"username" : 1})
See MongoDB Indexes for the full details.

Does ElasticSearch have the same indexes functionality that mongodb have?

I want to know as we have index creation feature in mognodb to speed up the query process https://docs.mongodb.org/v3.0/indexes/ what do we have for elasticsearch for this purpose? I googled it but I was unable to find any suitable information, I used indexing in mongodb on most frequently used fields to speed up the query process and now I want to do same in elasticsearch i want to know is there anything that elasticsearch provides .Thanks
Elasticsearch also has indices: https://www.elastic.co/blog/what-is-an-elasticsearch-index
They are also used as part of the database's key features to provide swift search capabilities.
It is annoying that "index" is used in a different context with ES and many other databases. I'm not as familiar with MongoDB so I'll resort to their documentation at v3.0/core/index-types.
Basically Elasticsearch was designed to serve efficient "filtering" (yes/no queries) and "scoring" (relevance ranking via tf-idf etc.), and it uses Lucene as the underlying inverted index.
MongoDB concepts and their ES counter-parts:
Single Field Index: trivially supported, perhaps as not_analyzed fields for exact matching
Compound Index: Lucene applies AND filter condition via efficient bitmaps, can ad-hoc merge any "single field" indexes
Multikey Index: Transparent support, no difference values and an array of values
Geospatial Index: directly supported via geo-shapes
Text Index: In some way ES was optimized for this use-case as analyzed field type
In my view at search applications relevance is more important that plain filtering the results, as some words occur at almost every document and thus are less relevant when searching.
Elasticsearch has other very useful concepts as well such as aggregations, nested documents and child/parent relationships.

Best database for multiple-column indexes?

I have a sparse database. Some fields are of Boolean type (these fields should be indexed), some other fields are of Nominal type (again, these fields should also be indexed) whereas some other fields are of Text type (but those ones should not be indexed). I would like to save my data in a database so that I can search based on any combination of the indexed fields and get back the results. Should I consider using Elasticsearch, MongoDB or another databases?
Any help is appreciated.
According to above mentioned description I suggest MongoDB is best suitable for your requirement as MongoDB has powerful index management and it supports multiple types of indexes.
Indexes allow MongoDB to process and fulfill queries quickly by
creating small and efficient representations of the documents in a
collection.
For more detailed description regarding index types in mongodb please refer the documentation mentioned in following URL
https://docs.mongodb.org/manual/core/index-types/

Do you need Solr/Lucene for MongoDB, CouchDB and Cassandra?

If you have RDBMS you probably have to use Solr to index your relational tables to fully nested documents.
Im new to non-sql databases like Mongodb, CouchDB and Cassandra, but it seems to me that the data you save is already in that document structure like the documents saved in Solr/Lucene.
Does this mean that you don't have to use Solr/Lucene when using these databases?
Is it already indexed so that you can do full-text search?
It depends on your needs. They have a full text search. In CouchDB the search is Lucene (same as solr). Unfortunately, this is just a full text index, if you need complex scoring or DisMax type searching, you'll likely want the added capabilities of an independent Solr Index.
Solr (Lucene) uses an algorithm to returns relevant documents from a query. It will returns a score to indicate how relevant each document is related to the query.
It is different than what a database (relational or not) does, which is returning results that matches or not a query.