I have an application that needs to do filter the data based on more than 7+ fields.
2+ of these fields are array and currently stored on MongoDB (each of them individually store almost thousands of hexadecimal id). In MongoDB it's not possible to create parallel indexes (for very understandable reasons) Therefore, I'm just able to index based on one single field. In the following thread, the similar issue has been already discussed.
elasticsearch v.s. MongoDB for filtering application
The answer provides some good insights about how ElasticSearch differs from NoSQL databases. But I'm still confused about, will ElasticSearch be performant if I just create the nested mappings for two array fields.
Will the described "Vector Space Model" help me on filtering based on multiple array fields with a good performance when I do exact match / range searches?
Related
This question is about choosing the type of database to run queries on for an application. Keeping other factors aside for the moment, and given that the choice is between mongodb and elastic, the key criterion is that the query should be resolved in near real time. The queries will be ad-hoc and as such can contain any of the fields in the JSON objects and will likely contain aggregations and subaggregations. Furthermore, there will not be nested objects and none of the fields will be containing 'descriptive' text (like movie reviews etc.), i.e., all the fields will be keyword type fields like State, Country, City, Name etc.
Now, I have read that elasticsearch performance is near real time and that elasticsearch uses inverted indices and creates them automatically for every field.
Given all the above, my questions are as follows.
(there is a similar question posted in stack but I do not think it answers my questions
elasticsearch v.s. MongoDB for filtering application)
1) Since the fields in the use case I mentioned do not contain descriptive text and hence would not require the full-text search capability and other additional features that elastic provides (especially for text search), what would be a better choice between elastic and mongo? How would elastic search and mongo query/aggregation performance compare if I were to create single field indices on all the available fields in mongo?
2) I am not familiar with advanced indexing, so I am assuming that it would be possible to create indices on all available fields in mongo (either using multiple single field indices or maybe compound indices?). I understand that this will come with a cost for storage and write speed which is true for elastic as well.
3) Also, in elastic the user can trade off write speed (indexing rate) with the speed with which the written document becomes available (refresh_interval) for a query. Is there a similar feature in mongo?
I think the size of your data set is also a very important aspect about choosing DB engine. According to this benckmark (2015), if you have over 10 millions of documents, Elasticsearch could be a better choice. If your data set is small there should be no obvious different about performance between Elasticsearch and MongoDB.
I have data in MongoDB and synced data in ElasticSearch. My requirement is to filter data based on certain parameters.
Let's say I am filtering data based on couple of parameters and retrieving a couple of hundred results from 10,000 documents.(I am mentioning numbers for perspective.)
Since this query is based on filtering and not search, which of the two perform better? MongoDB or ElasticSearch?
Intuitively it feels that ElasticSearch is fast and returns data quickly.
Given this scenario and indexed values in DB, is Mongo competitive with ElasticSearch? Should I even consider ElasticSearch at this scale?
Elasticsearch is the right choice for the your requirement. It has two different concept query and filter
Please find below link for more explanation
http://blog.quarkslab.com/mongodb-vs-elasticsearch-the-quest-of-the-holy-performances.html
I want to know as we have index creation feature in mognodb to speed up the query process https://docs.mongodb.org/v3.0/indexes/ what do we have for elasticsearch for this purpose? I googled it but I was unable to find any suitable information, I used indexing in mongodb on most frequently used fields to speed up the query process and now I want to do same in elasticsearch i want to know is there anything that elasticsearch provides .Thanks
Elasticsearch also has indices: https://www.elastic.co/blog/what-is-an-elasticsearch-index
They are also used as part of the database's key features to provide swift search capabilities.
It is annoying that "index" is used in a different context with ES and many other databases. I'm not as familiar with MongoDB so I'll resort to their documentation at v3.0/core/index-types.
Basically Elasticsearch was designed to serve efficient "filtering" (yes/no queries) and "scoring" (relevance ranking via tf-idf etc.), and it uses Lucene as the underlying inverted index.
MongoDB concepts and their ES counter-parts:
Single Field Index: trivially supported, perhaps as not_analyzed fields for exact matching
Compound Index: Lucene applies AND filter condition via efficient bitmaps, can ad-hoc merge any "single field" indexes
Multikey Index: Transparent support, no difference values and an array of values
Geospatial Index: directly supported via geo-shapes
Text Index: In some way ES was optimized for this use-case as analyzed field type
In my view at search applications relevance is more important that plain filtering the results, as some words occur at almost every document and thus are less relevant when searching.
Elasticsearch has other very useful concepts as well such as aggregations, nested documents and child/parent relationships.
I have a sparse database. Some fields are of Boolean type (these fields should be indexed), some other fields are of Nominal type (again, these fields should also be indexed) whereas some other fields are of Text type (but those ones should not be indexed). I would like to save my data in a database so that I can search based on any combination of the indexed fields and get back the results. Should I consider using Elasticsearch, MongoDB or another databases?
Any help is appreciated.
According to above mentioned description I suggest MongoDB is best suitable for your requirement as MongoDB has powerful index management and it supports multiple types of indexes.
Indexes allow MongoDB to process and fulfill queries quickly by
creating small and efficient representations of the documents in a
collection.
For more detailed description regarding index types in mongodb please refer the documentation mentioned in following URL
https://docs.mongodb.org/manual/core/index-types/
I have an application that stores items (e.g. web documents). Each item can feature a arbitrary large set of tags. And typical a common query is to retrieve all documents with given set of tags. Well, a pretty common Web application.
Now I'm thinking about a NoSQL database as persistent storage. Various NoSQL systems (e.g. MongoDB) support secondary indexes and with that keyword-based searches. Examples showing how to do it in different systems are easy to find. The problem is, I would like to know what's going on "under the hood", i.e. how/where the secondary indexes stored, and how a query with a list of tags is actually executed. Particularly in systems with many nodes.
I'm aware of solutions based on Map/Reduce or similar. But here I'm interested how the indexing works. Questions I have, for example, are:
Does the secondary index only store the item/object id or more?
If a query contains k tags, are k subqueries - one for each tag - executed and the k partial results are combined one the initiating node?
Where can I find such information for different NoSQL systems? Thanks a lot for any hints.
Christian
In MongoDB an index on tags would be done by utilizing the multi-keys feature whereby the database tries to match documents against each element of an array. You would index this tags attribute of a given document which would create a btree that is constructed out of ranges of tags in that array.
You can learn more about multikeys here and can get more information about indexing in MongoDB by watching this presentation: MongoDB Internals
Does the secondary index only store the item/object id or more?
The indexes consist of the indexed field (lets say it's a tags array in your case, then the field would be a single tag) and an offset used to efficiently locate the document in memory. It also has some padding + other overhead as described here
If a query contains k tags, are k subqueries - one for each tag - executed and the k partial results are combined one the initiating node?
It depends, but if, for example, the query were using an $or on the tag field I think the queries are performed in parallel, each in O(log n) time, and the results are combined to form the result set but I'm not sure about this though.