Do you need Solr/Lucene for MongoDB, CouchDB and Cassandra? - mongodb

If you have RDBMS you probably have to use Solr to index your relational tables to fully nested documents.
Im new to non-sql databases like Mongodb, CouchDB and Cassandra, but it seems to me that the data you save is already in that document structure like the documents saved in Solr/Lucene.
Does this mean that you don't have to use Solr/Lucene when using these databases?
Is it already indexed so that you can do full-text search?

It depends on your needs. They have a full text search. In CouchDB the search is Lucene (same as solr). Unfortunately, this is just a full text index, if you need complex scoring or DisMax type searching, you'll likely want the added capabilities of an independent Solr Index.

Solr (Lucene) uses an algorithm to returns relevant documents from a query. It will returns a score to indicate how relevant each document is related to the query.
It is different than what a database (relational or not) does, which is returning results that matches or not a query.

Related

How to view MongoDB indexes data structure?

In the MongoDB docs it is stated that
Indexes are special data structures [1] that store a small portion of
the collection’s data set in an easy to traverse form.
How can I see these data structures? Is it possible?
I was going through this question and I saw that in this answer they gave an example of a schema for an index. Is there such a thing in MongoDB that is what I am trying to see. I am trying to understand indexes in MongoDB better.
When you create an index in Mongo (using createIndex) you specify which fields the index will use, or what you call the index "schema".
As mentioned in the docs these indexes are built as b-trees (don't read too much into this as indexes are a "black box" for us users), viewing the exact tree structure is not possible, but you can use indexStats to get some more information on an index you created.

Does ElasticSearch have the same indexes functionality that mongodb have?

I want to know as we have index creation feature in mognodb to speed up the query process https://docs.mongodb.org/v3.0/indexes/ what do we have for elasticsearch for this purpose? I googled it but I was unable to find any suitable information, I used indexing in mongodb on most frequently used fields to speed up the query process and now I want to do same in elasticsearch i want to know is there anything that elasticsearch provides .Thanks
Elasticsearch also has indices: https://www.elastic.co/blog/what-is-an-elasticsearch-index
They are also used as part of the database's key features to provide swift search capabilities.
It is annoying that "index" is used in a different context with ES and many other databases. I'm not as familiar with MongoDB so I'll resort to their documentation at v3.0/core/index-types.
Basically Elasticsearch was designed to serve efficient "filtering" (yes/no queries) and "scoring" (relevance ranking via tf-idf etc.), and it uses Lucene as the underlying inverted index.
MongoDB concepts and their ES counter-parts:
Single Field Index: trivially supported, perhaps as not_analyzed fields for exact matching
Compound Index: Lucene applies AND filter condition via efficient bitmaps, can ad-hoc merge any "single field" indexes
Multikey Index: Transparent support, no difference values and an array of values
Geospatial Index: directly supported via geo-shapes
Text Index: In some way ES was optimized for this use-case as analyzed field type
In my view at search applications relevance is more important that plain filtering the results, as some words occur at almost every document and thus are less relevant when searching.
Elasticsearch has other very useful concepts as well such as aggregations, nested documents and child/parent relationships.

Best database for multiple-column indexes?

I have a sparse database. Some fields are of Boolean type (these fields should be indexed), some other fields are of Nominal type (again, these fields should also be indexed) whereas some other fields are of Text type (but those ones should not be indexed). I would like to save my data in a database so that I can search based on any combination of the indexed fields and get back the results. Should I consider using Elasticsearch, MongoDB or another databases?
Any help is appreciated.
According to above mentioned description I suggest MongoDB is best suitable for your requirement as MongoDB has powerful index management and it supports multiple types of indexes.
Indexes allow MongoDB to process and fulfill queries quickly by
creating small and efficient representations of the documents in a
collection.
For more detailed description regarding index types in mongodb please refer the documentation mentioned in following URL
https://docs.mongodb.org/manual/core/index-types/

What is the typical usage of ElasticSearch in conjuncion with other storage?

It is not recommended to use ElasticSearch as the only storage from some obvious reasons like security, transactions etc. So how it is usually used together with other database?
Say, I want to store some documents in MongoDB and be able to effectively search by some of their properties. What I'd do would be to store full document in Mongo as usual and then trigger insertion to ElasticSearch but I'd insert only searchable properties plus MongoDB ObjectID there. Then I can search using ElasticSearch and having ObjectID found, go to Mongo and fetch whole documents.
Is this correct usage of ElasticSearch? I don't want to duplicate whole data as I have them already in Mongo.
The best practice is for now to duplicate documents in ES.
The cool thing here is that when you search, you don't have to return to your database to fetch content as ES provide it in only one single call.
You have everything with ES Search Response to display results to your user.
My 2 cents.
You may like to use mongodb river take a look at this post
There are more issue then the size of the data you store or index, you might like to have MongoDB as a backup with "near real time" query for inserted data. and as a queue for the data to indexed (you may like to use mongodb as cluster with the relevant write concern suited for you application

Sphinx + NoSQL Help

So I'm looking to run Sphinx over a NoSQL system such as MongoDB, HBase, Cassandra, etc.
Right now, we're comparing all the NoSQL systems out there. Basically, we need to query 50+ Million rows of product data with fulltext searches thousands of times a second, so we're trying to find the most efficient NoSQL system.
Here is our question, though. If we use any NoSQL system with Sphinx, when we perform the actual searches, will the search have any interaction with the NoSQL system itself, or will Sphinx be doing the work as it has the data indexed? If it's only Sphinx, then wouldn't the performance of the NoSQL system be only secondary?
Thanks!
Using the latest string attribute, you can cut of the database part of the search completely, that will be much more efficient.
As my understanding, I think you can do it. Because I'm only familiar with mongodb and hbase, i can only talk about this question based on the 2 databases. You need to do some work on the indexer and build the data/attributes into the sphinx index file, and to include the primary key(which mark the sole record in the database) into it too(for mongodb, it's object_id, for hbase, it's row key), then after you do the fulltext search, you can get the whole data/attributes from databases by the primary key.
Besides, another full-text search engine supports no-sql db very well, it's solr. you can try it if the performance of it can satisfy your request.