I was looking through the documentation for Algolia and could not find anything related to doing aggregations like you can in elasticsearch: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html
An example of what I want to do is this:
When a user searches, I show in real time the results of an aggregation on that query as well, such as a count of everything matching a particular filter (e.g. count of all the red items, blue items, yellow items etc)
What ES calls "aggregations" is actually a subset of the "facets" that Algolia provides (see https://www.algolia.com/doc/search/filtering-faceting#faceting). The Algolia faceting capabilities is what you can achieve in ES using the term aggregation.
Faceting will compute the counts associated to each faceted value and provides you a way to filter on those values.
Since Algolia has been highly optimized and designed for as-you-type full-text search, the engine doesn't provide deep aggregation capabilities. The only aggregation you would get is the min, max, avg values if the underlying facet values are numbers.
Related
This RediSearch page refers to 5 scoring models listed below.
We are using MongoDB as our primary store, but using RediSearch for faster cached queries. We would like the same result for each.
Does one of the scoring models listed below for RedisSearch match the one in MongoDB? Do they both use Lucene under the covers?
Scoring model ΒΆ RediSearch comes with a few very basic scoring
functions to evaluate document relevance. They are all based on
document scores and term frequency. This is regardless of the ability
to use sortable fields (see below). Scoring functions are specified by
adding the SCORER {scorer_name} argument to a search request.
If you prefer a custom scoring function, it is possible to add more
functions using the Extension API .
These are the pre-bundled scoring functions available in RediSearch:
TFIDF (Default)
Basic TF-IDF scoring with document score and proximity boosting
factored in.
TFIDF.DOCNORM Identical to the default TFIDF scorer, with one
important distinction:
BM25
A variation on the basic TF-IDF scorer, see this Wikipedia article for
more info .
DISMAX
A simple scorer that sums up the frequencies of the matched terms; in
the case of union clauses, it will give the maximum value of those
matches.
DOCSCORE
A scoring function that just returns the priory score of the document
without applying any calculations to it. Since document scores can be
updated, this can be useful if you'd like to use an external score and
nothing further.
This question is about choosing the type of database to run queries on for an application. Keeping other factors aside for the moment, and given that the choice is between mongodb and elastic, the key criterion is that the query should be resolved in near real time. The queries will be ad-hoc and as such can contain any of the fields in the JSON objects and will likely contain aggregations and subaggregations. Furthermore, there will not be nested objects and none of the fields will be containing 'descriptive' text (like movie reviews etc.), i.e., all the fields will be keyword type fields like State, Country, City, Name etc.
Now, I have read that elasticsearch performance is near real time and that elasticsearch uses inverted indices and creates them automatically for every field.
Given all the above, my questions are as follows.
(there is a similar question posted in stack but I do not think it answers my questions
elasticsearch v.s. MongoDB for filtering application)
1) Since the fields in the use case I mentioned do not contain descriptive text and hence would not require the full-text search capability and other additional features that elastic provides (especially for text search), what would be a better choice between elastic and mongo? How would elastic search and mongo query/aggregation performance compare if I were to create single field indices on all the available fields in mongo?
2) I am not familiar with advanced indexing, so I am assuming that it would be possible to create indices on all available fields in mongo (either using multiple single field indices or maybe compound indices?). I understand that this will come with a cost for storage and write speed which is true for elastic as well.
3) Also, in elastic the user can trade off write speed (indexing rate) with the speed with which the written document becomes available (refresh_interval) for a query. Is there a similar feature in mongo?
I think the size of your data set is also a very important aspect about choosing DB engine. According to this benckmark (2015), if you have over 10 millions of documents, Elasticsearch could be a better choice. If your data set is small there should be no obvious different about performance between Elasticsearch and MongoDB.
Status:
As far as I understand from reading the mongodb documentation and also playing with indexes, queries and monitoring results, my understanding of the way geo location queries in mongodb work is the following:
Start at the given location
look at EVERY document from close to far
keep those matching additional query criteria
until either number_limit or distance_limit is reached
Example
To show what we are trying to do: Let's take the mongodb tutorial example as a base: https://docs.mongodb.com/manual/tutorial/geospatial-tutorial/
Let's assume we have a list of restaurants with location and much more information on top, like established_at, type (chinese, thai, italian, ...), priceOfACoke, numberOfWaiters, wheelchairAccess, ...
Problem:
Let's assume you want to query the collection of all restaurants in the US of A to return all Italian restaurants close to the city center of Pittsburgh that have been established between 2-5 years ago, with wheelChair access and more than 50 waiters, where a coke is cheaper than 1$.
This is a geo query with with restrictive additional criteria and no distance limit; and since the "waiters>50 an coke cheaper than $1" filters out most/all of the results, this query seems to be running through the whole collection and takes very long.
If run without "geoNear", assuming there is a combined index of the fields in question, this query is quite fast, even if there are only 10 results out of 1 million documents.
However, as soon as geoNear comes into play, the performance is terrible
From what I understand, there can only be ONE geo index per collection and only ONE additional property in the geo index, so there is not much to do to help mongodb finding the results with several criteria, as a traditional index seems not to be used.
Also, when using aggregate, geo has to be the first filter...
Are there any hints or pointers to speed up queries like this?
If possible, I'd prefer not get "Use ElasticSearch" or "Use multiple collections " responses - I still hope there is another way to help mongodb reduce the number of documents to check BEFORE it starts doing the geoNear part.
I would like to upon user request graph median values of many documents. I'd prefer not to transfer entire documents from the database to my application solely for purposes of determining median values.
I understand that development is still planned for a median aggregator in MongoDB, however I see that currently the following operations are supported:
sort
count
limit
Short of editing mongo source code, Is there any reasonable way I can combine these operations to obtain median values; for example, to sort values, count them, and limit to return median values?
It appears that editing Mongo source code is the only solution.
I want to know as we have index creation feature in mognodb to speed up the query process https://docs.mongodb.org/v3.0/indexes/ what do we have for elasticsearch for this purpose? I googled it but I was unable to find any suitable information, I used indexing in mongodb on most frequently used fields to speed up the query process and now I want to do same in elasticsearch i want to know is there anything that elasticsearch provides .Thanks
Elasticsearch also has indices: https://www.elastic.co/blog/what-is-an-elasticsearch-index
They are also used as part of the database's key features to provide swift search capabilities.
It is annoying that "index" is used in a different context with ES and many other databases. I'm not as familiar with MongoDB so I'll resort to their documentation at v3.0/core/index-types.
Basically Elasticsearch was designed to serve efficient "filtering" (yes/no queries) and "scoring" (relevance ranking via tf-idf etc.), and it uses Lucene as the underlying inverted index.
MongoDB concepts and their ES counter-parts:
Single Field Index: trivially supported, perhaps as not_analyzed fields for exact matching
Compound Index: Lucene applies AND filter condition via efficient bitmaps, can ad-hoc merge any "single field" indexes
Multikey Index: Transparent support, no difference values and an array of values
Geospatial Index: directly supported via geo-shapes
Text Index: In some way ES was optimized for this use-case as analyzed field type
In my view at search applications relevance is more important that plain filtering the results, as some words occur at almost every document and thus are less relevant when searching.
Elasticsearch has other very useful concepts as well such as aggregations, nested documents and child/parent relationships.