How can I list other matching values ​even if there is an unmatched value in the query? - postgresql

In my query there is a value that will not match in the demand category table. Therefore, since one value does not match in the output of my query, other matching values ​​do not appear.
I want to do;
How can I list other matching values ​​even if there is an unmatched value in the query?
process Table
fk_unit_id fk_unit_position fk_demand_category
1 2 1
unit table
unit_id
1
unit_position table
unit_position
2
demand_category table
demand_category
1
Query:
SELECT unit_name,unit_position_name,demand_category_name From process
INNER JOIN unit ON process.fk_unit_id = unit_id and unit_id =1
INNER JOIN unit_position ON process.fk_unit_position_id = unit_position_id and unit_position_id = 2
INNER JOIN demand_category ON process.fk_demand_category_id = demand_category_id and demand_category_id =0 ;

Switch INNER JOIN on demand_category with LEFT JOIN
LEFT JOIN gets all records from the LEFT linked and the related record from the right table ,but if you have selected some columns from the RIGHT table, if there is no related records, these columns will contain NULL.
SELECT unit_name,unit_position_name,demand_category_name From process
INNER JOIN unit ON process.fk_unit_id = unit_id and unit_id =1
INNER JOIN unit_position ON process.fk_unit_position_id = unit_position_id and unit_position_id = 2
LEFT JOIN demand_category ON process.fk_demand_category_id = demand_category_id and demand_category_id =0 ;

You can use outer join to have the columns that don't match, just the corresponding values in other table will be padded with null. Other way is to use IN operator, but slower query performance.

Related

Snowflake "Exploding Join" issue while doing left join for multiple tables

I am trying to do some left joins on multiple tables and facing the following issue.
Row Counts of tables
Table 1: 1.6M
Table 2: 1.7M
Table 3: 1.5M
When I am doing left Join using Table 1 and 2 and following query, I get data count as 1.8 M (acceptable):
SELECT Table1.ID1, Table1.ID2, Table2.Name, Table2.City
FROM Table1
LEFT JOIN Table2
ON Table1.ID1 = Table2.ID1
AND Table1.ID2 = Table2.ID2
AND Table1.Source_System = Table2.Source_System
;
Similarly when I am doing left Join using Table 1 and 3 and following query, I get data count as 1.9 M (acceptable):
SELECT Table1.ID1, Table1.ID2, Table3.Name, Table3.City
FROM Table1
LEFT JOIN Table3
ON Table1.ID1 = Table3.ID1
AND Table1.ID2 = Table3.ID2
AND Table1.Source_System = Table3.Source_System
;
But when I am doing left Join using Table 1, 2 and 3 and following query, I get data count as 11.9 G (ISSUE):
SELECT
Table1.ID1, Table1.ID2,
Table2.Name, Table2.City,
Table3.Name as Name1, Table3.City as City1
FROM Table1
LEFT JOIN Table2
ON Table1.ID1 = Table2.ID1
AND Table1.ID2 = Table2.ID2
AND Table1.Source_System = Table2.Source_System
LEFT JOIN Table3
ON Table1.ID1 = Table3.ID1
AND Table1.ID2 = Table3.ID2
AND Table1.Source_System = Table3.Source_System
;
So it seems you have assumed the data in table1 and table2 join in a 1:1 ratio, and also assumed the table1 and table3 are also a 1:1 ratio, so assumed when those three tables joined, that ration should be in the order again of 1:1
But if half you entries in table1 are not in table2 to get the 1.8M result, the the common rows would have to be duplicated > 2.0 times that increase. If we change that from half not matching to a tenth not matching there would need to be > 10.0 duplicates. Thus to get the 4 magnitude growth you have, it seems like you have only 100th match, but greater than 100.0 duplicates, which when cross joined give the 10,000 growth in rows.
this could be seen via:
SELECT Table1.ID1, Table1.ID2, Table1.Source_System, counnt(*) as counts
FROM Table1
LEFT JOIN Table2
ON Table1.ID1 = Table2.ID1
AND Table1.ID2 = Table2.ID2
AND Table1.Source_System = Table2.Source_System
GROUP BY 1,2,3
ORDER BY counts DESC
;
this will show the total distinct pairs, and which are the worst contributors to the combination explosion
When your left join is producing more records than the referenced table it should not be acceptable! that should signal warning in your join condition and data. Either you investigate those records in the table to avoid it in the first place or you would need to keep tweaking your SQL to satisfy clean join that produces exact reference table row count. otherwise, it is very common that left joining to another table with a small duplicate records will produce exponential row count as you are facing here.
Try reading these questions here to help here and here
Just to add about investigating and finding those rows, use following SQL to find in each table what rows that have same ID1, ID2 and Source_System columns
i.e. :-
Select ID1, ID2 ,Source_System, COUNT(*) AS NUM_RECORDS_DUPS
FROM TABLE1
GROUP BY ID1, ID2 , Source_System
HAVING COUNT(*)>1 -- Filtering on duplicate rows that has more than a row satisfying the join condition
Use the same for each of the tables to find those records and either add another unique condition/ aggregate the table on the joining keys or ask for data cleansing ! for those records
Have you tried adding a DISTINCT clause?
SELECT DISTINCT columns, of, choice
FROM Table1
LEFT JOIN Table2 on ...
LEFT JOIN Table3 on ...
I think what's happening is you have dups that left join on another giant set of dups.
Use the proper keys to join the two tables, it solves the issue.

Update table with from sub select

I have two table a and b.
I want to update the row in table a that is the most recent insert for each id from the earliest insert in table b where a.id = b.id
I've been trying to use an update statement with a sub select in the from.
If I execute the sub query on its own it returns x number of rows, however when I execute the whole update statement it updated y number of rows.
update a
set title = b.title
created_at = b.created_at
from
(
select
e.id,e.title,e.created_at
from
(
select
l.id,
l.title,
l.created_at
l.t_insert
from b l
left join b r
l.id = r.id and l.t_insert > r.t_insert
) e
join
(
select
l.id,
l.title,
l.created_at,
l.t_insert
from a l
left join a r on l.report_id = r.report_id and l.t_insert <
r.t_insert
) f
)
where
a.id=b.id
I want the same number of rows to be updated as returned in the sub select query in the from.
In this case, having fewer rows updated than returned by the subquery could be because one row id is returned more than once in the subquery. If that happens, the update statement will still only update the row once. I'm assuming the statement you've provided is not exactly what you're running, but you should check that the subquery is not providing duplicates in the id field of the subquery (either using DISTINCT or GROUP BY or by double checking your JOIN conditions.

How to count total number of records after join the three tables in postgresql?

I have a query which gives me total 12408 records after executing but i want this give me total records as count column
select
c.complaint_id,c.server_time,c.completion_date,c.road_id,c.photo,c.dept_code,c.dist_code,c.eng_userid,c.feedback_type,c.status,p.dist_name,p.road_name,p.road_dept,e.display_name,e.mobile
from complaints as c INNER JOIN pwd_roads as p ON p.road_id=c.road_id
INNER JOIN enc_details as e ON CAST(e.enc_code as INTEGER) = p.enccode
where c.complaint_id=c.parent_complaint_id and c.dept_code='PWDBnR'
and c.server_time between '2018-09-03' and '2018-12-19'
You can solve this issue using window functions. For example, if you want your first columns to be a count of the total rows done by the SELECT statement:
select count(1) over(range between unbounded preceding and unbounded following) as total_row_count
, c.complaint_id,c.server_time,c.completion_date,c.road_id,c.photo,c.dept_code,c.dist_code,c.eng_userid,c.feedback_type,c.status,p.dist_name,p.road_name,p.road_dept,e.display_name,e.mobile from complaints as c INNER JOIN pwd_roads as p ON p.road_id=c.road_id INNER JOIN enc_details as e ON CAST(e.enc_code as INTEGER) = p.enccode where c.complaint_id=c.parent_complaint_id and c.dept_code='PWDBnR' and c.server_time between '2018-09-03' and '2018-12-19'
Note that the window function is evaluated before the LIMIT clause if one is used, so if you were to add LIMIT 100 to the query it might give a row count greater than 100 even though a max of 100 rows would be returned.
Easiest but not very elegant way to do this is:
select count(*)
from
(
select c.complaint_id,c.server_time,c.completion_date,c.road_id,c.photo,c.dept_code,c.dist_code,c.eng_userid,c.feedback_type,c.status,p.dist_name,p.road_name,p.road_dept,e.display_name,e.mobile from complaints as c INNER JOIN pwd_roads as p ON p.road_id=c.road_id INNER JOIN enc_details as e ON CAST(e.enc_code as INTEGER) = p.enccode where c.complaint_id=c.parent_complaint_id and c.dept_code='PWDBnR' and c.server_time between '2018-09-03' and '2018-12-19'
)

Getting Different results with "LEFT OUTER JOIN" and "IN", where did my logic go wrong?

I have four tables, one is a Master Invoice table and three others are Invoices from different region. What I am trying to achieve is to return only records from the Master Invoice table where the invoice number is in one of the other three tables. For example:
SELECT * FROM Invoice_Master M
LEFT OUTER JOIN Invoice_North N
ON M.InvNo = N.InvNo
LEFT OUTER JOIN Invoice_East E
ON M.InvNo = E.InvNo
LEFT OUTER Invoice_South S
ON M.InvNo = S.InvNo
WHERE N.InvNo IS NOT NULL
OR E.InvNo IS NOT NULL
OR S.InvNo IS NOT NULL
The logic is if I "LEFT OUTER JOIN" the 3 tables to the Master table, if any InvNo is not null then the invoice must exist in the original Master table.
However, when I write the code in this Implicit Join I get a slightly less records in return:
select * FROM Invoice_Master
WHERE InvNo IN (
SELECT InvNo FROM Invoice_North)
OR InvNo IN (
SELECT InvNo FROM Invoice_East)
OR InvNo IN (
SELECT InvNo FROM Invoice_South)
Where did my logic go wrong?
The difference could be due to the fact that the second query selects discrete rows from the master table, whereas your first query could be returing join-results that have duplicate rows. i.e. if the left outer join matched two rows in, say, invoice_north, then both those rows will be shown in the main select.

T-SQL Query not bringing back a count of 0

I have a feeling I am making some sort of foolish mistake here, however I am trying to do a query over two tables. One table contains the value I want to aggregate over, in this case I have called if the StoreCharge table. The other table contains the values I want to count.
SELECT StoreCharge.StoreId,
COUNT(DISTINCT(ISNULL(WholesalerInvoice.WholesalerId,0))) AS Invoices
FROM StoreCharge
LEFT OUTER JOIN
WholesalerInvoice ON StoreCharge.StoreId = WholesalerInvoice.StoreId
WHERE StoreCharge.CompanyId = 2
AND WholesalerInvoice.StoreInvoiceId IS NULL
AND DATEDIFF(day,WholesalerInvoice.InvoiceDate,'20100627') > =0
AND DATEDIFF(day,dateadd(day,-7,'20100627'),WholesalerInvoice.InvoiceDate) > 0
GROUP BY StoreCharge.StoreId
My problem is that if there are rows in the counting table that match the WHERE clause, the query works ok. However When no rows match the criteria nothing is returned instead of a list of the values in StoreCharge with a count of 0.
WHERE is evaluated after the LEFT OUTER JOIN
Try moving your WHERE filter related to WholesalerInvoice into the OUTER JOIN
SELECT StoreCharge.StoreId,
COUNT(DISTINCT(ISNULL(WholesalerInvoice.WholesalerId,0))) AS Invoices
FROM StoreCharge
LEFT OUTER JOIN
WholesalerInvoice ON StoreCharge.StoreId = WholesalerInvoice.StoreId
AND DATEDIFF(day,WholesalerInvoice.InvoiceDate,'20100627') > =0
AND DATEDIFF(day,dateadd(day,-7,'20100627'),WholesalerInvoice.InvoiceDate) > 0
WHERE StoreCharge.CompanyId = 2
GROUP BY StoreCharge.StoreId
This will filter the required WholesalerInvoice records out and leave the StoreCharge table intact.
Based on the query in the example, you don't actually use what you join in. Unless there is more to the query a Subquery would produce the desired result.
SELECT StoreCharge.StoreId,
(SELECT COUNT(0) FROM WholesalerInvoice WHERE WholesalerInvoice.StoreId = StoreCharge.StoreId
AND DATEDIFF(day,WholesalerInvoice.InvoiceDate,'20100627') > =0
AND DATEDIFF(day,dateadd(day,-7,'20100627'),WholesalerInvoice.InvoiceDate) > 0) [Invoices]
FROM StoreCharge
WHERE StoreCharge.CompanyId = 2