Filtering data from elastic search based on mongodb - mongodb

I've a list of items in my ElasticSearch. User enters a query and I fetch the results from elastic search. Now, I've some user preferences stored in mongodb based on which I want to filter the results of elastic search.
Suppose, I get a list of items(item_ids) from Elasticsearch.
Mongo DB has following schema.
id, user_id, item_id
I choose this MongoDB schema because a user could have a very big list of items(in order of Millions), which he doesn't want to see in results.
How do I achieve this with scale? Do I need to change my schema?

You should use elasticsearch filtering for this, you can include the filter criteria in your ES query which would reduce the number of results to return without which
You have to return huge data set from ES and then do the filtering in MongoDB which is two step process and costly at both ES and mongo side.
With filters at ES, it would return less data which would avoid extra post-processing at mongoDB and filters are executed first and by default cached at elasticsearch side so you don't need further caching solution like redis etc.
Refer filter and query context and from same official doc, info about filter cache.
Frequently used filters will be cached automatically by Elasticsearch,
to speed up performance.

Related

Mongoose skip and limit (MongoDB Pagination)

I am using skip and limit in mongodb mongoose it returns random documents.
Sometimes it returns same documents.
I am trying to get limit number of documents and after skip i want to get next limit documents.
I guess you are trying to use the pagination concept here. In both SQL and NoSQL, the data must be sorted in a specific order to achieve pagination without jumbled data on each db call.
for example:
await Event.find({}).sort({"createdDate": 0}).skip(10).limit(10);
I'm trying to fetch the 10-20 data in the above case, which is sorted using createdDate. So, the data won't shuffled while fetching, unless you insert or delete the data.
I hope this answers your question

Use simultaneously Elasticsearch and Postgres to perform queries

I'm working on a querying tool that would allow to make complex relational queries along to fulltext search on quite big datasets. My data is stored in a Postgres database with a elasticsearch engine for convenient and efficient fulltext search. However, some queries require complex join, cardinality tests, filters on joined data ...
My dilemma is that I cannot use only ElasticSearch or only Postgres. I need both to answer specific needs. But combining them seems to be a really difficult task.
My approach
... was to perform the ES query first and then use the results' id as a filter for the SQL query. The problem is that ES has a max_result_window that prevents to get all the matching data at once. Even worse, ES may return the first 10K results matching the fulltext search, but the subsequent SQL query may narrow those results number to something ridiculously small, while there's actually much more matching items but the 10K limit acted as bottleneck.
Taking the other way around is no better since if we use the result of the SQL query as the document ids as filters for the ElasticSearch query, the max_clause_count limit will be easily reached and the ES query wouldn't be able to be performed on more than (default) 1024 items.
Maybe my logic isn't good. Is there any other approach to combine both ES and Postgres queries simultaneously ? Thanks.

mongoDB vs. elasticsearch query/aggregation performance comparison

This question is about choosing the type of database to run queries on for an application. Keeping other factors aside for the moment, and given that the choice is between mongodb and elastic, the key criterion is that the query should be resolved in near real time. The queries will be ad-hoc and as such can contain any of the fields in the JSON objects and will likely contain aggregations and subaggregations. Furthermore, there will not be nested objects and none of the fields will be containing 'descriptive' text (like movie reviews etc.), i.e., all the fields will be keyword type fields like State, Country, City, Name etc.
Now, I have read that elasticsearch performance is near real time and that elasticsearch uses inverted indices and creates them automatically for every field.
Given all the above, my questions are as follows.
(there is a similar question posted in stack but I do not think it answers my questions
elasticsearch v.s. MongoDB for filtering application)
1) Since the fields in the use case I mentioned do not contain descriptive text and hence would not require the full-text search capability and other additional features that elastic provides (especially for text search), what would be a better choice between elastic and mongo? How would elastic search and mongo query/aggregation performance compare if I were to create single field indices on all the available fields in mongo?
2) I am not familiar with advanced indexing, so I am assuming that it would be possible to create indices on all available fields in mongo (either using multiple single field indices or maybe compound indices?). I understand that this will come with a cost for storage and write speed which is true for elastic as well.
3) Also, in elastic the user can trade off write speed (indexing rate) with the speed with which the written document becomes available (refresh_interval) for a query. Is there a similar feature in mongo?
I think the size of your data set is also a very important aspect about choosing DB engine. According to this benckmark (2015), if you have over 10 millions of documents, Elasticsearch could be a better choice. If your data set is small there should be no obvious different about performance between Elasticsearch and MongoDB.

MongoDB vs. Elasticsearch filtering

I have data in MongoDB and synced data in ElasticSearch. My requirement is to filter data based on certain parameters.
Let's say I am filtering data based on couple of parameters and retrieving a couple of hundred results from 10,000 documents.(I am mentioning numbers for perspective.)
Since this query is based on filtering and not search, which of the two perform better? MongoDB or ElasticSearch?
Intuitively it feels that ElasticSearch is fast and returns data quickly.
Given this scenario and indexed values in DB, is Mongo competitive with ElasticSearch? Should I even consider ElasticSearch at this scale?
Elasticsearch is the right choice for the your requirement. It has two different concept query and filter
Please find below link for more explanation
http://blog.quarkslab.com/mongodb-vs-elasticsearch-the-quest-of-the-holy-performances.html

What is the typical usage of ElasticSearch in conjuncion with other storage?

It is not recommended to use ElasticSearch as the only storage from some obvious reasons like security, transactions etc. So how it is usually used together with other database?
Say, I want to store some documents in MongoDB and be able to effectively search by some of their properties. What I'd do would be to store full document in Mongo as usual and then trigger insertion to ElasticSearch but I'd insert only searchable properties plus MongoDB ObjectID there. Then I can search using ElasticSearch and having ObjectID found, go to Mongo and fetch whole documents.
Is this correct usage of ElasticSearch? I don't want to duplicate whole data as I have them already in Mongo.
The best practice is for now to duplicate documents in ES.
The cool thing here is that when you search, you don't have to return to your database to fetch content as ES provide it in only one single call.
You have everything with ES Search Response to display results to your user.
My 2 cents.
You may like to use mongodb river take a look at this post
There are more issue then the size of the data you store or index, you might like to have MongoDB as a backup with "near real time" query for inserted data. and as a queue for the data to indexed (you may like to use mongodb as cluster with the relevant write concern suited for you application