PostgreSQL is not selecting index when OR is used - postgresql

I have a strange situation when PostgreSQL is ignoring index when 'OR' statement is used when table is joined.
I have a setup where data is divided by half where foreign key is used and another part is having 'lose reference'.
Table names are used just for the demonstration (synthetic example),
but the logic is next:
When 'order' cannot be exactly mapped we need to use another text fields to find a match.
I have tried next queries:
Select DISTICNT product.id, client.id, order.id
From clients as client
CROSS JOIN products as product
Left JOIN orders as order on
(
order.product_id_fk = products.id AND
order.user_fk = client.id
)
OR
(
order.product_id_fk = products.id AND
order.user_fk is null AND
order.user_group = client.user_group -- text
)
WHERE product.id = 1 # param
and this query
Select DISTICNT product.id, client.id, order.id
From clients as client
CROSS JOIN products as product
Left JOIN orders as order on
order.product_id_fk = products.id AND
(
order.user_fk = client.id
OR
(
order.user_fk is null AND
order.user_group = client.user_group -- this search should be applied only when fk is not set
)
)
WHERE product.id = 1 # param
For both queries index is ignored and query is taking 12 seconds to perform.
At the same time, next query is working ultra fast using both indexes and PostgreSQL is choosing indexes correctly:
Select DISTICNT client.id, COALESCE(order1.id, order2.id)
From clients as client
CROSS JOIN products
Left JOIN orders as order on
order.product_id_fk = products.id AND order.user_fk = client.id
Left JOIN orders as order2 on
order.product_id_fk = products.id AND order.user_fk is null AND order.user_group = client.user_group
WHERE product.id = 1
I have the following indices:
Create Index on orders(product_id_fk, user_fk) where user_fk is not null
Create Index on orders(product_id_fk, user_group) where user_fk is null
Create Index on orders(product_id_fk, user_fk, user_group) where user_fk is null
I have tried as well to use index without condition, but it was also ignored.
'Explain' is just showing that Seq Scan will be used for first two queries.
Would appreciate any ideas why indexes for the first two queries indexes are ignored and how to analyze it better.

Creating separate indexes instead of combined indexes might help.
For eg.
CREATE INDEX ON orders(product_id_fk);
CREATE INDEX ON orders(user_group);
CREATE INDEX ON orders(user_fk);
Also, where conditions can be checked while creating indexes if needed.

Related

operator does not exist: integer = integer[] postgres error in inner join question

I tried to do this in postgres why is not append?
SELECT
users.countries
FROM
users
INNER JOIN countries
ON countries.id = users.countries
ORDER BY countries;
You are using the join condition integer = array integer, it is not right. You must extract array elements after then you can use the joining condition. The best way for extracting array elements on PostgreSQL is to use unnest function. Also the performance of unnest is high. Examples:
-- Sample 1
select us.*, ct.country_name
from
users us
inner join
countries ct on ct.id in (select unnest(us.countries))
order by
us.countries;
-- Sample 2
select t_us.*, ct.country_name from
(
select us.username, us.first_name, us.last_name, unnest(us.countries) as country_id
from users us
) as t_us
inner join countries ct on ct.id = t_us.country_id
You need to use the ANY operator to compare a single value with an array of values:
SELECT users.countries, countries.*
FROM users
JOIN countries ON countries.id = ANY(users.countries)
ORDER BY countries;
I do not recommend modeling one-to-many (or actually many-to-many) relationships with arrays. It's better (=more efficient, more robust) to do that with a "classic" mapping table between users and countries.

Indexes for optimising SQL Joins in Postgres

Given the below query
SELECT * FROM A
INNER JOIN B ON (A.b_id = B.id)
WHERE (A.user_id = 'XXX' AND B.provider_id = 'XXX' AND A.type = 'PENDING')
ORDER BY A.created_at DESC LIMIT 1;
The variable values in the query are A.user_id and B.provider_id, the type is always queried on 'PENDING'.
I am planning to add a compound + partial index on A
A(user_id, created_at) where type = 'PENDING'
Also the number of records in A >> B.
Given A.user_id, B.provider_id, A.b_id all are foreign keys. Is there any way I can optimize the query?
Given that you are doing an inner join, I would first express the query as follows, with the join in the opposite direction:
SELECT *
FROM B
INNER JOIN A ON A.b_id = B.id
WHERE A.user_id = 'XXX' AND A.type = 'PENDING' AND
B.provider_id = 'XXX'
ORDER BY
A.created_at DESC
LIMIT 1;
Then I would add the following index to the A table:
CREATE INDEX idx_a ON A (user_id, type, created_at, b_id);
This four column index should cover the join from B to A, as well as the WHERE clause and also the ORDER BY sort at the end of the query. Note that we could probably also have left the query with the join order as you originally wrote above, and this index could still be used.

JPA Criteria join query

I have written a complex JPA 2 Criteria API query (my provider is EclipseLink), where I find myself re-using the same subquery over and over again. Unless the DB (Oracle) does something clever, I think that the subquery will be executed each time it is found in the query. I am looking for a way to execute the subquery only once.
We have field-level access, which means that a user has visibility to a DB Column if certain conditions are met. In the example below, the user has the following access:
COLUMN_1 is visible if the result belongs to category 1
COLUMN_2 is visible if the result belongs to category 2
COLUMN_3 is visible if the result belongs to category 1 or category 2
This is a pseudo-query:
SELECT T.PK
FROM MY_TABLE T
WHERE
(
T.COLUMN_1 = 'A'
AND
T.PK IN (SELECT PKs of category 1)
)
AND
(
T.COLUMN_2 = 'B'
AND
T.PK IN (SELECT PKs of category 2)
)
AND
(
T.COLUMN_3 = 'C'
AND
(
T.PK IN (SELECT PKs of category 1)
OR
T.PK IN (SELECT PKs of category 2)
)
)
If I would write it by hand in SQL, I would write it by OUTER JOINing the two queries, like this:
SELECT T.PK
FROM MY_TABLE T
LEFT OUTER JOIN (SELECT PKs of category 1) IS_CAT_1 ON T.PK = IS_CAT_1.PK
LEFT OUTER JOIN (SELECT PKs of category 2) IS_CAT_2 ON T.PK = IS_CAT_2.PK
WHERE
(
T.COLUMN_1 = 'A'
AND
IS_CAT_1.RESULT = true
)
AND
(
T.COLUMN_2 = 'B'
AND
IS_CAT_2.RESULT = true
)
AND
(
T.COLUMN_3 = 'C'
AND
(
IS_CAT_1.RESULT = true
OR
IS_CAT_2.RESULT = true
)
)
Can I join a query as a table with the Criteria API? Creating a View would by my very last choice (the DB is not maintained by me).
Note: I have seen that EclipseLink provides such vendor-specific support in JPQL (link), but I haven't seen this available for Criteria.
In the end, we created a DB View, mapped it as a new JPA entity and joined it to the other entities.

T-SQL query one table, get presence or absence of other table value

I'm not sure what this type of query is called so I've been unable to search for it properly. I've got two tables, Table A has about 10,000 rows. Table B has a variable amount of rows.
I want to write a query that gets all of Table A's results but with an added column, the value of that column is a boolean that says whether the result also appears in Table B.
I've written this query which works but is slow, it doesn't use a boolean but rather a count that will be either zero or one. Any suggested improvements are gratefully accepted:
SELECT u.number,u.name,u.deliveryaddress,
(SELECT COUNT(productUserid)
FROM ProductUser
WHERE number = u.number and productid = #ProductId)
AS IsInPromo
FROM Users u
UPDATE
I've run the query with actual execution plan enabled, I'm not sure how to show the results but various costs are:
Nested Loops (left semi join): 29%]
Clustered Index scan (User Table): 41%
Clustered Index Scan (ProductUser table): 29%
NUMBERS
There are 7366 users in the users table and currently 18 rows in the productUser table (although this will change and could be in the thousands)
You can use EXISTS to short circuit after the first row is found rather than COUNT-ing all matching rows.
SQL Server does not have a boolean datatype. The closest equivalent is BIT
SELECT u.number,
u.name,
u.deliveryaddress,
CASE
WHEN EXISTS (SELECT *
FROM ProductUser
WHERE number = u.number
AND productid = #ProductId) THEN CAST(1 AS BIT)
ELSE CAST(0 AS BIT)
END AS IsInPromo
FROM Users u
RE: "I'm not sure what this type of query is called". This will give a plan with a semi join. See Subqueries in CASE Expressions for more about this.
Which management system are you using?
Try this:
SELECT u.number,u.name,u.deliveryaddress,
case when COUNT(p.productUserid) > 0 then 1 else 0 end
FROM Users u
left join ProductUser p on p.number = u.number and productid = #ProductId
group by u.number,u.name,u.deliveryaddress
UPD: this could be faster using mssql
;with fff as
(
select distinct p.number from ProductUser p where p.productid = #ProductId
)
select u.number,u.name,u.deliveryaddress,
case when isnull(f.number, 0) = 0 then 0 else 1 end
from Users u left join fff f on f.number = u.number
Since you seem concerned about performance, this query can perform faster as this will cause index seek on both tables versus an index scan:
SELECT u.number,
u.name,
u.deliveryaddress,
ISNULL(p.number, 0) IsInPromo
FROM Users u
LEFT JOIN ProductUser p ON p.number = u.number
WHERE p.productid = #ProductId

Selecting non-repeating values in Postgres

SELECT DISTINCT a.s_id, select2Result.s_id, select2Result."mNrPhone",
select2Result."dNrPhone"
FROM "Table1" AS a INNER JOIN
(
SELECT b.s_id, c."mNrPhone", c."dNrPhone" FROM "Table2" AS b, "Table3" AS c
WHERE b.a_id = 1001 AND b.s_id = c.s_id
ORDER BY b.last_name) AS select2Result
ON a.a_id = select2Result.student_id
WHERE a.k_id = 11211
It returns:
1001;1001;"";""
1002;1002;"";""
1002;1002;"2342342232123";"2342342"
1003;1003;"";""
1004;1004;"";""
1002 value is repeated twice, but it shouldn't because I used DISTINCT and no other table has an id repeated twice.
You can use DISTINCT ON like this:
SELECT DISTINCT ON (a.s_id)
a.s_id, select2Result.s_id, select2Result."mNrPhone",
select2Result."dNrPhone"
...
But like other persons have told you, the "repeated records" are different really.
The qualifier DISTINCT applies to the entire row, not to the first column in the select-list. Since columns 3 and 4 (mNrPhone and dNrPhone) are different for the two rows with s_id = 1002, the DBMS correctly lists both rows. You have to write your query differently if you only want the s_id = 1002 to appear once, and you have to decide which auxilliary data you want shown.
As an aside, it is strongly recommended that you always use the explicit JOIN notation (which was introduced in SQL-92) in all queries and sub-queries. Do not use the old implicit join notation (which is all that was available in SQL-86 or SQL-89), and especially do not use a mixture of explicit and implicit join notations (where your sub-query uses the implicit join, but the main query uses explicit join). You need to know the old notation so you can understand old queries. You should write new queries in the new notation.
First of all, the query displayed does not work at all, student_id is missing in the sub-query. You use it in the JOIN later.
More interestingly:
Pick a certain row out of a set with DISTINCT
DISTINCT and DISTINCT ON return distinct values by sorting all rows according to the set of columns to be distinct, then it picks the first row from every set. It sorts by all rows for a general DISTINCT and only the specified rows for DISTINCT ON. Here lies the opportunity to pick certain rows out of a set over other.
For instance if you prefer rows with not-empty "mNrPhone" in your example:
SELECT DISTINCT ON (a.s_id) -- sure you didn't want a.a_id?
,a.s_id AS a_s_id -- use aliases to avoid dupe name
,s.s_id AS s_s_id
,s."mNrPhone"
,s."dNrPhone"
FROM "Table1" a
JOIN (
SELECT b.s_id, c."mNrPhone", c."dNrPhone", ??.student_id -- misssing!
FROM "Table2" b
JOIN "Table3" c USING (s_id)
WHERE b.a_id = 1001
-- ORDER BY b.last_name -- pointless, DISTINCT will re-order
) s ON a.a_id = s.student_id
WHERE a.k_id = 11211
ORDER BY a.s_id -- first col must agree with DISTINCT ON, could add DESC though
,("mNrPhone" <> '') DESC -- non-empty first
ORDER BY cannot disagree with DISTINCT on the same query level. To get around this you can either use GROUP BY instead or put the whole query in a sub-query and run another SELECT with ORDER BY on it.
The ORDER BY you had in the sub-query is voided now.
In this particular case, if - as it seems - the dupes come only from the sub-query (you'd have to verify), you could instead:
SELECT a.a_id, s.s_id, s."mNrPhone", s."dNrPhone" -- picking a.a_id over s_id
FROM "Table1" a
JOIN (
SELECT DISTINCT ON (b.s_id)
,b.s_id, c."mNrPhone", c."dNrPhone", ??.student_id -- misssing!
FROM "Table2" b
JOIN "Table3" c USING (s_id)
WHERE b.a_id = 1001
ORDER BY b.s_id, (c."mNrPhone" <> '') DESC -- pick non-empty first
) s ON a.a_id = s.student_id
WHERE a.k_id = 11211
ORDER BY a.a_id -- now you can ORDER BY freely