Unable to improve query performance in postgresql - postgresql

I am trying to join 9 tables together. The count and index of each tables are given below along with the query. Green color in screenshot indicates keys used to join. But please note that I have used another column for visit_occurrence table called visit_occurrence_id to join but it's not indexed
DROP MATERIALIZED VIEW IF EXISTS cdm.dummy CASCADE;
CREATE MATERIALIZED VIEW cdm.dummy as
select
f.person_id,f.gender_id
from cdm.visit_occurrence a
left join
cdm.condition_occurrence b
on a.person_id = b.person_id and a.visit_occurrence_id =
b.visit_occurrence_id
left join
cdm.measurement c
on a.person_id = c.person_id and a.visit_occurrence_id =
c.visit_occurrence_id
left join
cdm.drug_exposure d
on a.person_id = d.person_id and a.visit_occurrence_id =
d.visit_occurrence_id
left join
cdm.procedure_occurrence e
on a.person_id = e.person_id and a.visit_occurrence_id =
e.visit_occurrence_id
left join
cdm.person f
on a.person_id = f.person_id
left join
cdm.observation g
on a.person_id = g.person_id and a.visit_occurrence_id =
g.visit_occurrence_id
left join
cdm.observation_period h
on a.person_id = g.person_id
left join
cdm.death i
on a.person_id = i.person_id
explain output
explain outpt with enable_nestloop = off;
Please note that visit_occurrence is the base table. I pick the columns person_id and visit_occurrence_id from visit_occurrence table to join with other tables as shown in the query. I see that visit_occurrence_id used to join (from base table) with other tables isn't a index column (in base table).
a) Is this the reason for slow performance because it's a base table? But in all other tables, the joining keys are used as index as shown in screenshot above(green color indicates - keys used to join)
b) Are the records count an issue?
Can you help me adapt my query to fix this?
Its been running for more than 5-6 hours but not output yet.
Any help is much appreciated. Will be really helpful

Related

Snowflake "Exploding Join" issue while doing left join for multiple tables

I am trying to do some left joins on multiple tables and facing the following issue.
Row Counts of tables
Table 1: 1.6M
Table 2: 1.7M
Table 3: 1.5M
When I am doing left Join using Table 1 and 2 and following query, I get data count as 1.8 M (acceptable):
SELECT Table1.ID1, Table1.ID2, Table2.Name, Table2.City
FROM Table1
LEFT JOIN Table2
ON Table1.ID1 = Table2.ID1
AND Table1.ID2 = Table2.ID2
AND Table1.Source_System = Table2.Source_System
;
Similarly when I am doing left Join using Table 1 and 3 and following query, I get data count as 1.9 M (acceptable):
SELECT Table1.ID1, Table1.ID2, Table3.Name, Table3.City
FROM Table1
LEFT JOIN Table3
ON Table1.ID1 = Table3.ID1
AND Table1.ID2 = Table3.ID2
AND Table1.Source_System = Table3.Source_System
;
But when I am doing left Join using Table 1, 2 and 3 and following query, I get data count as 11.9 G (ISSUE):
SELECT
Table1.ID1, Table1.ID2,
Table2.Name, Table2.City,
Table3.Name as Name1, Table3.City as City1
FROM Table1
LEFT JOIN Table2
ON Table1.ID1 = Table2.ID1
AND Table1.ID2 = Table2.ID2
AND Table1.Source_System = Table2.Source_System
LEFT JOIN Table3
ON Table1.ID1 = Table3.ID1
AND Table1.ID2 = Table3.ID2
AND Table1.Source_System = Table3.Source_System
;
So it seems you have assumed the data in table1 and table2 join in a 1:1 ratio, and also assumed the table1 and table3 are also a 1:1 ratio, so assumed when those three tables joined, that ration should be in the order again of 1:1
But if half you entries in table1 are not in table2 to get the 1.8M result, the the common rows would have to be duplicated > 2.0 times that increase. If we change that from half not matching to a tenth not matching there would need to be > 10.0 duplicates. Thus to get the 4 magnitude growth you have, it seems like you have only 100th match, but greater than 100.0 duplicates, which when cross joined give the 10,000 growth in rows.
this could be seen via:
SELECT Table1.ID1, Table1.ID2, Table1.Source_System, counnt(*) as counts
FROM Table1
LEFT JOIN Table2
ON Table1.ID1 = Table2.ID1
AND Table1.ID2 = Table2.ID2
AND Table1.Source_System = Table2.Source_System
GROUP BY 1,2,3
ORDER BY counts DESC
;
this will show the total distinct pairs, and which are the worst contributors to the combination explosion
When your left join is producing more records than the referenced table it should not be acceptable! that should signal warning in your join condition and data. Either you investigate those records in the table to avoid it in the first place or you would need to keep tweaking your SQL to satisfy clean join that produces exact reference table row count. otherwise, it is very common that left joining to another table with a small duplicate records will produce exponential row count as you are facing here.
Try reading these questions here to help here and here
Just to add about investigating and finding those rows, use following SQL to find in each table what rows that have same ID1, ID2 and Source_System columns
i.e. :-
Select ID1, ID2 ,Source_System, COUNT(*) AS NUM_RECORDS_DUPS
FROM TABLE1
GROUP BY ID1, ID2 , Source_System
HAVING COUNT(*)>1 -- Filtering on duplicate rows that has more than a row satisfying the join condition
Use the same for each of the tables to find those records and either add another unique condition/ aggregate the table on the joining keys or ask for data cleansing ! for those records
Have you tried adding a DISTINCT clause?
SELECT DISTINCT columns, of, choice
FROM Table1
LEFT JOIN Table2 on ...
LEFT JOIN Table3 on ...
I think what's happening is you have dups that left join on another giant set of dups.
Use the proper keys to join the two tables, it solves the issue.

Postgresql query deletes all rows

I wrote a simple delete query in a PostgreSQL function with using clause, left join and a where clause. But the query does not take the where condition in consideration. It deletes all rows.
I wrote two types of query both produce same result
Query 1
delete from "StockInfos" using "StockInfos" as si
left outer join "PurchaseOrderInfos" as poi on poi."Id" = si."PurchaseOrderInfoId"
left outer join "ReceivingInfos" as ri on ri."PurchaseOrderInfoId" = poi."Id"
where ri."Id" = (delete_data->>'Id')::bigint;
Query 2
delete from "StockInfos" where exists (
select * from "StockInfos" as si
left join "PurchaseOrderInfos" as poi on poi."Id" = si."PurchaseOrderInfoId"
left outer join "ReceivingInfos" as ri on ri."PurchaseOrderInfoId" = poi."Id"
where ri."Id" = (delete_data->>'Id')::bigint
);
I don understand what is the problem. May anyone tell what is going wrong?
I would rephrase this with a correlated subquery. This makes the logic much cleaner, and should do what you want:
delete from "StockInfos" si
where exists (
select 1
from "PurchaseOrderInfos" poi
inner join "ReceivingInfos" as ri on ri."PurchaseOrderInfoId" = poi."Id"
where
oi."Id" = si."PurchaseOrderInfoId"
and ri."Id" = (si.delete_data->>'Id')::bigint
)

MYSQL- query too slow to load

My query is working but it takes time to display the data. Can you help me to make it quick.
$sql="SELECT allinvty3.*, stock_transfer_tb.* from stock_transfer_tb
INNER JOIN allinvty3 on stock_transfer_tb.in_code = allinvty3.in_code
where stock_transfer_tb.in_code NOT IN (SELECT barcode.itemcode from barcode where stock_transfer_tb.refnumber = barcode.refitem)";
I would recommend using the following query:
SELECT
a.*,
s.*
FROM stock_transfer_tb s
INNER JOIN allinvty3 a
ON s.in_code = a.in_code
WHERE
NOT EXISTS (SELECT 1 FROM barcode b
WHERE s.refnumber = b.refitem AND s.in_code = b.itemcode);
If this still doesn't give you the performance you want, then you should look into adding indices on all columns involved in the join and where clause.

Linq to SQL self join where right part of join is filtered

I am trying to map a self join where the right table must be filtered, e.g. SQL such as this:
select t2.* from table t
left join table t2
on t2.parentID = t.ID and t2.active=1;
I can figure out the syntax if I wanted to filter the left table:
// works
var query = from t in table
where t.active= 1
join t2 in table
on t.parentID equals t2.ID into joined
from r in joined.DefaultIfEmpty() ...
But I can't figure out how to filter the right table. It seems like it should be something like this...
// does not work
var query = from t in table
join t2 in table
where t.field = 1
on t.parentID equals t2.ID into joined
from r in joined.DefaultIfEmpty() ...
(not valid... join can't have where). There is discussion of using multiple from clauses, but if I create more than one from clause, so I can add a where to the 2nd one, I can't figure out how to join the results of them into a new temporary table.
I can't just add a "where" after the join; the right table must be filtered first or matches will occur, and a where clause at the end would remove the row from the left table that I do want in the output. That is, the output should have rows where there's nothing matched from filtered right table. So I need to filter the right table before the join.
I think you are looking to do this:
var query = from t in table
join t2 in
(from t3 in table
where t3.field = 1
select t3)
on t.parentID equals t2.ID into joined
from r in joined.DefaultIfEmpty() ...
Another way is to use multiple from like this:
var query = from t in table
from t2 in table.Where(x => x.field = 1)
.Where(x => x.ID == t.parentID)
.DefaultIfEmpty()
select ....

select more fields within one select

I manage to do the selection with more selects and a loop. 4 tables ( the last one was just for collecting all the data )
But now i'm thinking of a way to select all the fields i need with just one select statement. Here is the huge select :)
SELECT vbak~vbeln vbak~audat
tvakt~bezei
vbap~posnr vbap~matnr vbap~kwmeng vbap~vrkme
lips~vbeln lips~posnr lips~werks lips~lfimg
vbfa~vbtyp_n
FROM vbak JOIN vbap ON vbak~vbeln = vbap~vbeln
JOIN tvakt ON vbak~auart = tvakt~auart
LEFT JOIN vbfa ON vbfa~vbelv = vbak~vbeln AND vbfa~posnv = vbap~posnr
JOIN lips ON vbfa~vbeln = lips~vbeln AND vbfa~posnn = lips~posnr
INTO TABLE gt_salord
WHERE tvakt~spras = 'EN' AND
vbak~vbeln IN s_vbeln AND
vbak~audat IN s_audat.
The problem is this doesn't work. When i try to activate it throws this error: " Unable to compare with "VBAP~POSNR". A table can be joined with a maximum of one other table using LEFT OUTER JOIN "
If i don't use LEFT JOIN and only JOIN it works but i don't get all what i want. I need to get all the SALES ORDERS even if they don't have a DELIVERY ORDER assigned. Is there a way to do that, or do i really have to split my select?
I have noticed in SAP that it's more efficient to simplify select statements and proceed with LOOP and SELECT SINGLE for table that do not participate in data selection.
In your case data from table VBFA could be fetch after data selection (it is not restricting the amount of data fetched from the DB).
Of course it depends on indexes, application server buffering... but, even though it might be counter-intuitive for SQL experts, keeping select statements not too complex in SAP is best.
Can you try the following selection:
SELECT vbak~vbeln vbak~audat
tvakt~bezei
vbap~posnr vbap~matnr vbap~kwmeng vbap~vrkme
lips~vbeln lips~posnr lips~werks lips~lfimg
vbfa~vbtyp_n
FROM vbak JOIN vbap ON vbak~vbeln = vbap~vbeln
JOIN tvakt ON vbak~auart = tvakt~auart
LEFT JOIN vbfa ON vbfa~vbelv = vbap~vbeln AND vbfa~posnv = vbap~posnr
JOIN lips ON vbfa~vbeln = lips~vbeln AND vbfa~posnn = lips~posnr
INTO TABLE gt_salord
WHERE tvakt~spras = 'EN' AND
vbak~vbeln IN s_vbeln AND
vbak~audat IN s_audat.
I can't test the result, but the syntax check say: ok.
There is only one tiny difference:
x---- difference
v
LEFT JOIN vbfa ON vbfa~vbelv = vbap~vbeln AND vbfa~posnv = vbap~posnr
LEFT JOIN vbfa ON vbfa~vbelv = vbak~vbeln AND vbfa~posnv = vbap~posnr
You compared vbfa~vbelv with vbak~vbeln, I do it with vbap~vbeln. Both have the same value, but in the on-clause you use again vbap.
I dont know about SAP Abap . But from SQL point of view you can use derived query if it is supported in SAP.
here is some approach :
select * from
(
select * from
table1 inner join table2 on table1.key=table2.key
inner join table3 on table1.key=table3.key
) a left outer join table4 b
on a.key=b.key
Posting this as the question is tagged as SQL. Hope it works
Try to change the order of table fields in the on clause of left join. Put vbap~vbeln = vbfa~vbelv