Firestore index merge with inequality condition? - google-cloud-firestore

I have the following query:
.where("field1", "<", 10)
.where("field2", "==", false)
.where("field3", "==", "test1")
.where("field4", "==", "test2")
I am trying to understand why, despite the fact that there is no exact corresponding index, this query actually still works.
After some research the only explanation I found would be the index merging feature but as per the documentation this should only work when there is no inequality condition which is not the case here. I must be missing something, why is my query successful?
Fyi, I have the following indexes with some of the fields:
field1 DESC field2 ASC field3 ASC
field1 DESC field2 ASC field4 ASC

Since there are multiple equality filters, it looks like index merging is at play here.
In the Firestore docs, it doesn't specifically declare that inequality conditions will break the requirements for index merging.
Merging uses zig zag join alg, so the clause with the inequality will not be merged but others can. Here are details and requirements for index merging.
Although Cloud Firestore uses an index for every query, it does not necessarily require one index per query. For queries with multiple equality (==) clauses and, optionally, an orderBy clause, Cloud Firestore can re-use existing indexes.
It looks like the above query fits this criteria.

Related

In PostgreSQL, are all expressions in ORDER BY clause evaluated?

I'm looking to optimise a PostgreSQL query of mine and couldn't find anything on the internal workings of the query planner when it comes to ORDER BY.
Consider the following PostgreSQL query:
SELECT *
FROM mytable
ORDER BY rank, ST_Distance(geom1, geom2)
Is ST_Distance(geom1, geom2) calculated for all rows in mytable, even where rank is unique?
This is not a postgis question. ST_Distance(geom1, geom2) could be any expression that requires (expensive) computations.
All are evaluated. PostgreSQL's sort projects all the expressions it needs to do the comparison up front and stores the results. It does not defer the expression evaluation until there is a tie to break.

What is the performance/logical difference between jsonb_path_match and ->> in where clause

I am trying to query FHIR data in a PostgreSQL database. The data is stored in jsonb field. Though the following queries yield the same result, I would like to know if there are any major differences in terms of performance or execution plan.
Query 1 - Using ->> operator
SELECT resource->'subject' FROM resourcetable
WHERE resource ->> 'resourceType' = 'MedicationRequest' AND resource ->> 'status' = 'active';
Query 2 - Using jsonb_path_match function
SELECT jsonb_path_query(resource, '$.subject') from resourcetable
WHERE jsonb_path_match(resource, '$.resourceType == "MedicationRequest"')
AND jsonb_path_match(resource, '$.status=="active"');
You should try it on your own data and see. If you use EXPLAIN before the query text, you will get the execution plan. If you put EXPLAIN (ANALYZE, BUFFERS) before the query, it will actually run that plan and report the timing and statistics.
Your first formulation might benefit from a functional index defined like so:
create index on resourcetable ((resource ->> 'resourceType'),(resource ->> 'status'));
Your second formulation can't benefit from an index, but if you rewrite it into operational format:
SELECT jsonb_path_query(resource, '$.subject') from resourcetable
WHERE resource ## '$.resourceType == "MedicationRequest"'
AND resource ## '$.status=="active"';
Then it can benefit from an index defined as:
create index on resourcetable using gin (resource jsonb_path_ops);
The index defined without jsonb_path_ops would also be usable, but is probably less efficient.
You could also rewrite the condition to use JSON_PATH's &&, rather than SQL's AND.
This second index can also be used on a query that uses containment #> rather than json_path ##, which I find less confusing.
SELECT resource->'subject' FROM resourcetable
WHERE resource #> '{"resourceType": "MedicationRequest", "status":"active"}';
Finally, your two queries actually return different results under some conditions, like if a row matching the WHERE does not have a key "subject".

Postgres - Index with multiple where clauses

I have a query that includes two WHERE clauses. Looks like this:
SELECT m
FROM Media m
WHERE m.userid = :id
AND m.timestamp = (SELECT MAX(mm.timestamp)
FROM Media mm
WHERE mm.userid = :id
AND mm.source IN :sources
AND mm.timestamp < :date)
What I want to know if this query will be faster with one index or should I create two seperate indexes for each WHERE clauses? Like:
1st index for 1st WHERE = (userid, timestamp)
2nd index for 2nd WHERE = (userid, source, timestamp)
EDIT:
I have created 2 indexes.
1 - (userid, source, timestamp)
2 - (userid, timestamp)
When I analyze the query, It always showing the second index used for the query.
Assuming that user.id is really userid, the perfect index would be
CREATE INDEX ON media(userid, source, timestamp);
That is perfect for the inner query, and the index is also good for the outer query.
To extend on that, the above assumes that all these conditions are selective, that is, they significantly reduce the number of result rows.
In your case, it seems that the condition mm.source IN :sources is not very selective, perhaps because there are only few distinct values for the column, or because you happen to query for a value that occurs frequently.
In that case, it is better to omit the column from the index, because that will make the index smaller without much loss. All other things equal, PostgreSQL will choose to scan the smaller index.

Are there plans to add 'OR' to attribute searches in Sphinx?

A little background is in order for this question since it is on surface too generic:
Recently I ran into an issue where I had to move the attribute values I was pushing into my sphinxql query as full-text because the attribute needed to be part of an 'OR' query.
In other words I was doing:
Select * from idx_test where MATCH('Terms') and name_id in (1,2,3)
When I tried to add an 'OR' to the attributes such as:
Select * from idx_test where MATCH('Terms') and name_id in (1,2,3) OR customer_id in (4,5,6)
it failed because Sphinx 2.* does not support OR in the attribute query.
I was also unable to simply put the name and customer IDs in to the query:
Select * from idx_test where MATCH('Terms ((#(name_id) 1|2|3)|(#customer_id) 4|5|6))')
Because (as far as I can tell) you can't push integer fields into the full_text search.
My solution was to index the id fields a second time appended by _text:
Select name_id, name_id as name_id_text
and then add that to the field list:
sql_attr_uint = name_id
sql_field_string = name_id_text
sql_attr_uint = customer_id
sql_field_string = customer_id_text
So now I can do my OR query as full_text:
Select * from idx_test where MATCH('Terms ((#(name_id_text) 1|2|3)|(#customer_id_text) 4|5|6))')
However recently I found an article that discussed the tradeoff between attribute and full-text searches. The upshot is that "it could reduce performance of queries that otherwise match few records". Which is precisely what my name_id/city_id query does. In an ideal world then I'd be able to go back to:
Select * from idx_test where MATCH('Terms') and name_id in (1,2,3) OR customer_id in (4,5,6)
If Sphinx would only allow for OR between attributes since as far as I can tell once I have a query that is filtering down to a relatively low # of results I'd have a much faster query using attributes vs full_text.
So my two-part question therefor is:
Am I in fact correct that this is the case (a query that would reduce the # of results significantly is better served doing attributes then full-text)?
If so are there plans to add OR to the attribute part of the SphinxQL query?
If so, when?
OR filter has been added in the Sphinx fork (from 2.3 branch) - Manticore, see https://github.com/manticoresoftware/manticore/commit/76b04de04feb8a4db60d7309bf1e57114052e298
For now it's only between attributes, OR between MATCH and attributes is not supported yet.
While yes, OR is not supported directly in WHERE, can still run the query. Your
Select * from idx_test where MATCH('Terms') and name_id in (1,2,3) OR customer_id in (4,5,6)
example can be written as
Select *, IN(name_id,1,2,3) + IN(customer_id,4,5,6) as filter
from idx_test where MATCH('Terms') and filter > 0
It is a bit more cumbersome, but should work. You still get the full benefit of the full-text inverted index, so performance actully shoudnt be bad. The fitler is only executed against docs matching the terms.
(this may look crazy, if coming from say mysql background, but remeber sphinxQL isnt mysql :)
You dont get 'short circuiting (ie customer_id filter, will still be run, even if matches name_id), so perhaps
Select *, IF(IN(name_id,1,2,3) OR IN(customer_id,4,5,6),1,0) as filter
from idx_test where MATCH('Terms') and filter =1
is even better, the if function has an OR operator! (as sphinx could potentially short-circuit, but don't know if it does)
(but also yes, if the 'filter' is highly selective (matching few rows), than including in the full-text query can be good. As it discards the rows earlier in processing. The problem with non-selective filters, is they have lots of matching rows, so a long doclist to process during text-query processing)

Improve query oracle

How can I modify this query to improve it?
I think that doing a join It'd be better.
UPDATE t1 HIJA
SET IND_ESTADO = 'P'
WHERE IND_ESTADO = 'D'
AND NOT EXISTS
(SELECT COD_OPERACION
FROM t1 PADRE
WHERE PADRE.COD_SISTEMA_ORIGEN = HIJA.COD_SISTEMA_ORIGEN
AND PADRE.COD_OPERACION = HIJA.COD_OPERACION_DEPENDIENTE)
Best regards.
According to this article by Quassnoi:
Oracle's optimizer is able to see that NOT EXISTS, NOT IN and LEFT JOIN / IS NULL are semantically equivalent as long as the list values are declared as NOT NULL.
It uses same execution plan for all three methods, and they yield same results in same time.
In Oracle, it is safe to use any method of the three described above to select values from a table that are missing in another table.
However, if the values are not guaranteed to be NOT NULL, LEFT JOIN / IS NULL or NOT EXISTS should be used rather than NOT IN, since the latter will produce different results depending on whether or not there are NULL values in the subquery resultset.
So what you have is already fine. A JOIN would be as good, but not better.
If performance is a problem, there are several guidelines for re-writing a where not exists into a more efficient form:
When given the choice between not exists and not in, most DBAs prefer to use the not exists clause.
When SQL includes a not in clause, a subquery is generally used, while with not exists, a correlated subquery is used.
In many case a NOT IN will produce the same execution plan as a NOT EXISTS query or a not equal query (!=).
In some case a correlated NOT EXISTS subquery can be re-written with a standard outer join with a NOT NULL test.
Some NOT EXISTS subqueries can be tuned using the MINUS operator.
See Burleson for more information.