Firebird 3 not using index in plan - firebird

I am trying understand why Firebird 3 does not use an index in my simplified query below. If I run it without the plan clause it uses
PLAN JOIN (L NATURAL, T INDEX (FK_TRANS_LEDGER), P INDEX (IDX_PRODUCTIONS1))
If I explicitly add the plan with the index (as below) it throws an error saying I cannot use the index in the plan (The index is unique on ledgerkey)
SELECT P.prodname FROM TRANS t
join productions p on t.jobkey=p.prodkey
join ledger l on l.ledgerkey=t.lkey
PLAN JOIN (L INDEX (IDX_LEDGER4), T INDEX (FK_TRANS_LEDGER), P INDEX (IDX_PRODUCTIONS1))

Related

PostgreSQL - should i only create index for the rest of columns that don't have index yet?

if table_name has 3 column (a,b,c), and I'm going to create an index with those 3 column:
CREATE INDEX idx_table_name_a_b_c ON table_name (a,b,c);
But there's already an index of column a that I previously created :
CREATE INDEX idx_table_name_a ON table_name (a);
Should I create only for the other 2, or create for those 3 columns that also include column a (with above query)?
Note that index considerations are only possible if you have a query. It never makes sense to index a table as such, but only to index a table for a query.
So let's assume that you have a query that would benefit from a three-column index, like
SELECT count(*) FROM table_name
WHERE a = 12 AND b = 42 AND c BETWEEN 7 AND 22;
The best option is to create that index and drop the existing one, because the three-column index can serve all purposes that the single-column index can (that is because a is the leading column in the index).
Such an index will lead to a single index-only scan on the table, which (if you have VACUUMed the table) is the most efficient way to execute the query.
The second best option is to create the two-column index you proposed and leave the single-column index on a.
Then the optimizer's strategy will depend on the distribution of values.
If the condition on a is selective enough, PostgreSQL will ignore your new index and just scan the one on a.
If the condition on b and c is selective, PostgreSQL will scan only your new index.
If all conditions together are not selective, PostgreSQL may choose a sequential scan of the table and ignore all your indexes.
If neither the condition on a nor the conditions on b and c together are selective, but all three conditions together are selective, PostgreSQL can opt to perform a bitmap index scan on both indexes and combine the result.

In outer join, where does a plain filter condition come from?

From PostgreSQL document, when explaining basics of EXPLAIN command:
When dealing with outer joins, you might see join plan nodes with both
“Join Filter” and plain “Filter” conditions attached. Join Filter
conditions come from the outer join's ON clause, so a row that
fails the Join Filter condition could still get emitted as a
null-extended row. But a plain Filter condition is applied after the outer-join rules and so acts to remove rows unconditionally. In an inner join there is no semantic
difference between these types of filters.
"Join Filter conditions come from the outer join's ON clause". Then in outer join, where does a plain filter condition come from?
Could you give some examples?
Thanks.
There is no other use of the term "plain Filter condition" used elsewhere in the Postgres documentation, so I would suspect that the author meant the word "plain" literally like not decorated or elaborate; simple or ordinary in character.
So really they are saying "When a filter is applied in an OUTER JOIN's ON clause the table or derived table being joined is just plainly filtered before the join occurs. This will lead to any columns from this table or derived table in the result set to be null".
Here is a little example that might enlighten you:
CREATE TABLE a(a_id) AS VALUES (1), (3), (4);
CREATE TABLE b(b_id) AS VALUES (1), (2), (5);
Now we have to force a nested loop join:
SET enable_hashjoin = off;
SET enable_mergejoin = off;
Our query is:
SELECT *
FROM a
LEFT JOIN b ON a_id = b_id
WHERE a_id > coalesce(b_id, 0);
a_id | b_id
------+------
3 |
4 |
(2 rows)
The plan is:
QUERY PLAN
------------------------------------------
Nested Loop Left Join
Join Filter: (a.a_id = b.b_id)
Filter: (a.a_id > COALESCE(b.b_id, 0))
-> Seq Scan on a
-> Materialize
-> Seq Scan on b
The “plain filter” is a condition that is applied after the join.
It is a frequent mistake to believe that conditions in the WHERE clause are the same as conditions in a JOIN … ON clause. That is only the case for inner joins. For outer joins, rows from the outer side that don't meet the condition are also included in the result.
That makes it necessary to have two different filters.

PostgreSQL - existence of index causes hash-join

I was looking at the EXPLAIN of a natural join query of two simple tables. At first, the postgresql planner is using merge-join. Then, I add index on the join's attribute, and it causes the planner to use hash-join instead (and, with sequential read of the data!).
So my question is: why is the existence of an index causes an hash-join?
Additional data & code:
I defined two relations: R(A,B) and S(B,C). (without primary keys or
such).
Filled the tables with few lines of data (~5 each, such that there are common values of the attribute B in R and S).
then executed:
EXPLAIN VERBOSE SELECT * FROM R NATURAL JOIN S;
which resulted
Merge Join (cost=317.01..711.38 rows=25538 width=12)...
and finally, executed:
CREATE INDEX SI on S(B);
EXPLAIN VERBOSE SELECT * FROM R NATURAL JOIN S;
which resulted
Hash Join (cost=1.09..42.62 rows=45 width=12)...
Seq Scan on "user".s (cost=0.00..1.04 rows=4 width=8)

Why PostgreSQL doesn't use indexes on "WHERE NOT IN" conditions.

I have two tables db100 and db60 with the same fields: x, y, z.
Indexes are created for both the tables on field z like this:
CREATE INDEX db100_z_idx
ON db100
USING btree
(z COLLATE pg_catalog."default");
CREATE INDEX db60_z_idx
ON db60
USING btree
(z COLLATE pg_catalog."default");
Trying to find z values from db60 that don't exist in db100:
select db60.z from db60 where db60.z not in (select db100.z from db100)
As far as I understand, all the information required to execute the query is presented in the indexes. So, I would expect only indexes used.
However it uses sequential scan on tables instead:
"Seq Scan on db60 (cost=0.00..25951290012.84 rows=291282 width=4)"
" Filter: (NOT (SubPlan 1))"
" SubPlan 1"
" -> Materialize (cost=0.00..80786.26 rows=3322884 width=4)"
" -> Seq Scan on db100 (cost=0.00..51190.84 rows=3322884 width=4)"
Can someone pls explain why PostgreSQL doesn't use indexes in this example?
Both the tables contain a few millions records and execution takes a while.
I know that using a left join with "is null" condition gives better results. However, the question is about this particular syntax.
I'm on PG v 9.5
SubPlan 1 is for select db100.z from db100. You select all rows and hence an index is useless. You really want to select DISTINCT z from db100 here and then the index should be used.
In the main query you have select db60.z from db60 where db60.z not in .... Again, you select all rows except where a condition is not true, so again the index does not apply because it applies to the inverse condition.
In general, an index is only used if the planner thinks that such a use will speed up the query processing. It always depends on how many distinct values there are and how the rows are distributed over the physical pages on disk. An index to search for all rows having a column with a certain value is not the same as finding the rows that do not have that same value; the index indicates on which pages and at which locations to find the rows, but that set can not simply be inversed.
Given - in your case - that z is some text type, a meaningful "negative" index can not be constructed (this is actually almost a true-ism, although in some cases a "negative" index could be conceivable). You should look into trigram indexes, as these tend to work much faster than btree on text indexing.
You really want to extract all 291,282 rows with the same z value, or perhaps use a DISTINCT clause here too? That should speed things up quite a bit.

Why index seek and not a scan for the following setup in SQL Server 2005

I have created a table
create table #temp(a int, b int, c int)
I have 2 indexes on this table:
Non clustered non unique index on c
Clustered Index on a
When I try to execute the following query:
select b from #temp where c = 3
I see that the system goes for index scan. This is fine, because the non clustered index doesn't have b as the key value. Hence it does an index scan from column a.
But when I try to execute the below query:-
select b from #temp where c= 3 and a = 3
I see that the execute plan has got only index seek. No scan. Why is that?
Neither the clustered index nor the nonclustered index as b as one of the columns?
I was hoping an index scan.
Please clarify
If you have a as your clustering key, then that column is included in all non-clustered indices on that table.
So your index on c also includes a, so the condition
where c= 3 and a = 3
can be found in that index using an index seek. Most likely, the query optimizer decided that doing a index seek to find a and c and a key lookup to get the rest of the data is faster/more efficient here than using an index scan.
BTW: why did you expect / prefer an index scan over an index seek? The index seek typically is faster and uses a lot less resources - I would always strive to get index seeks over scans.
This is fine, because the non clustered index doesn't have b as the key value. Hence it does an index scan from column a.
This assumption is not right. index seek and scan has to deal with WHERE clause and not the select clause.
Now your question -
Where clause is optimised by sql optimizer and as there is a=3 condition, clustered index can be applied.