MongoDB vs. Elasticsearch filtering

MongoDB vs. Elasticsearch filtering - mongodb

I have data in MongoDB and synced data in ElasticSearch. My requirement is to filter data based on certain parameters.
Let's say I am filtering data based on couple of parameters and retrieving a couple of hundred results from 10,000 documents.(I am mentioning numbers for perspective.)
Since this query is based on filtering and not search, which of the two perform better? MongoDB or ElasticSearch?
Intuitively it feels that ElasticSearch is fast and returns data quickly.
Given this scenario and indexed values in DB, is Mongo competitive with ElasticSearch? Should I even consider ElasticSearch at this scale?

Elasticsearch is the right choice for the your requirement. It has two different concept query and filter
Please find below link for more explanation
http://blog.quarkslab.com/mongodb-vs-elasticsearch-the-quest-of-the-holy-performances.html

Related

MongoDB vs Elasticsearch - indexing parallel arrays

I have an application that needs to do filter the data based on more than 7+ fields.
2+ of these fields are array and currently stored on MongoDB (each of them individually store almost thousands of hexadecimal id). In MongoDB it's not possible to create parallel indexes (for very understandable reasons) Therefore, I'm just able to index based on one single field. In the following thread, the similar issue has been already discussed.
elasticsearch v.s. MongoDB for filtering application
The answer provides some good insights about how ElasticSearch differs from NoSQL databases. But I'm still confused about, will ElasticSearch be performant if I just create the nested mappings for two array fields.
Will the described "Vector Space Model" help me on filtering based on multiple array fields with a good performance when I do exact match / range searches?

Filtering data from elastic search based on mongodb

I've a list of items in my ElasticSearch. User enters a query and I fetch the results from elastic search. Now, I've some user preferences stored in mongodb based on which I want to filter the results of elastic search.
Suppose, I get a list of items(item_ids) from Elasticsearch.
Mongo DB has following schema.
id, user_id, item_id
I choose this MongoDB schema because a user could have a very big list of items(in order of Millions), which he doesn't want to see in results.
How do I achieve this with scale? Do I need to change my schema?

You should use elasticsearch filtering for this, you can include the filter criteria in your ES query which would reduce the number of results to return without which
You have to return huge data set from ES and then do the filtering in MongoDB which is two step process and costly at both ES and mongo side.
With filters at ES, it would return less data which would avoid extra post-processing at mongoDB and filters are executed first and by default cached at elasticsearch side so you don't need further caching solution like redis etc.
Refer filter and query context and from same official doc, info about filter cache.
Frequently used filters will be cached automatically by Elasticsearch,
to speed up performance.

mongoDB vs. elasticsearch query/aggregation performance comparison

This question is about choosing the type of database to run queries on for an application. Keeping other factors aside for the moment, and given that the choice is between mongodb and elastic, the key criterion is that the query should be resolved in near real time. The queries will be ad-hoc and as such can contain any of the fields in the JSON objects and will likely contain aggregations and subaggregations. Furthermore, there will not be nested objects and none of the fields will be containing 'descriptive' text (like movie reviews etc.), i.e., all the fields will be keyword type fields like State, Country, City, Name etc.
Now, I have read that elasticsearch performance is near real time and that elasticsearch uses inverted indices and creates them automatically for every field.
Given all the above, my questions are as follows.
(there is a similar question posted in stack but I do not think it answers my questions
elasticsearch v.s. MongoDB for filtering application)
1) Since the fields in the use case I mentioned do not contain descriptive text and hence would not require the full-text search capability and other additional features that elastic provides (especially for text search), what would be a better choice between elastic and mongo? How would elastic search and mongo query/aggregation performance compare if I were to create single field indices on all the available fields in mongo?
2) I am not familiar with advanced indexing, so I am assuming that it would be possible to create indices on all available fields in mongo (either using multiple single field indices or maybe compound indices?). I understand that this will come with a cost for storage and write speed which is true for elastic as well.
3) Also, in elastic the user can trade off write speed (indexing rate) with the speed with which the written document becomes available (refresh_interval) for a query. Is there a similar feature in mongo?

I think the size of your data set is also a very important aspect about choosing DB engine. According to this benckmark (2015), if you have over 10 millions of documents, Elasticsearch could be a better choice. If your data set is small there should be no obvious different about performance between Elasticsearch and MongoDB.

should I use elasticsearch for non-free-text searches

I use Postgres as a data warehouse. I need to do free text search on many of the fields. My DBA recommends not to use Postgres for free text searches. I am now considering elasticsearch. The question is what to do if the user filters both by free text and some structured dimension. Should I query both elastic and postgres and take the intersection, or can I serve all query from elastic? What if there are no free text in the filter - is elastic appropriate for my general purpose querying?
EDIT: as requested some more information. database will contain a few million rows. I cannot give concrete details about data except that a row will contain ~30 columns, half of them are strings, between one word and a few sentences. The reasons to use elastic are not just the DBAs objection to full text index in postgres, but elastic also gives results ranking and specific text search semantics.

It is true that elasticsearch is great for full-text search, since it uses lucene under the covers, but it's also very good for structured search through filters. One other great thing that you can do with it is data analytics, that allows to visualize aggregations of your data.
That said, you don't necessarily need full-text search requirements in order to make good use of elasticsearch. There are many usecases where elasticsearch is used only for one of those three aspects that I mentioned: full-text search, structured search or data analytics. The next step is also to combine those together.
Your usecase is quite common and I would suggest to go ahead and consider running structured queries too against elasticsearch instead of querying two systems. The only obstacle that I can foresee could be document relations, that need to be properly represented and handled within elasticsearch.
Have a look at the elasticsearch query DSL, used to represent queries, and effectively combine structured and unstructured search together.

What is the typical usage of ElasticSearch in conjuncion with other storage?

It is not recommended to use ElasticSearch as the only storage from some obvious reasons like security, transactions etc. So how it is usually used together with other database?
Say, I want to store some documents in MongoDB and be able to effectively search by some of their properties. What I'd do would be to store full document in Mongo as usual and then trigger insertion to ElasticSearch but I'd insert only searchable properties plus MongoDB ObjectID there. Then I can search using ElasticSearch and having ObjectID found, go to Mongo and fetch whole documents.
Is this correct usage of ElasticSearch? I don't want to duplicate whole data as I have them already in Mongo.

The best practice is for now to duplicate documents in ES.
The cool thing here is that when you search, you don't have to return to your database to fetch content as ES provide it in only one single call.
You have everything with ES Search Response to display results to your user.
My 2 cents.

You may like to use mongodb river take a look at this post
There are more issue then the size of the data you store or index, you might like to have MongoDB as a backup with "near real time" query for inserted data. and as a queue for the data to indexed (you may like to use mongodb as cluster with the relevant write concern suited for you application

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse