Updating for each row in a table - postgresql

I have this query here which returns an error because of too many rows returned:
UPDATE tmp_rsl2 SET comm_percent=( SELECT c2.comm_percent
FROM tmp_rsl2 t1
INNER JOIN gn_salesperson g1 ON t1.sales_person=g1.sales_person
INNER JOIN comm_schema c1 ON g1.comm_schema=c1.comm_schema
INNER JOIN comm_schema_dt c2 ON c1.comm_schema_id=c2.comm_schema_id AND (t1.balance_amount::numeric <= (COALESCE(c2.value_amount,0)) );`
Basically for each row of the comm_percent column, I want to update all of them using the subquery SELECT statement. I imagine using a FOR loop or something but I'd like to hear ideas or to know a proper way to do this.

The error TOO_MANY_ROWS is about assigning a value to a variable, that can only take '1' (one) value, whereas the SELECT query is returning more than one.
Without a reference schema, its difficult to give an SQL that'd work (not to say that the issue lies with the Schema), but you need to ensure that the value assigned to comm_percent from the SELECT statement returns only 1 row. A very blind attempt at how it 'might' work in your case (given below), but again without knowing the schema its difficult to gauge whether it'd work.
UPDATE tmp_rsl2
SET comm_percent = c2.comm_percent
FROM gn_salesperson g1 ON
INNER JOIN comm_schema c1 ON g1.comm_schema = c1.comm_schema
INNER JOIN comm_schema_dt c2 ON c1.comm_schema_id = c2.comm_schema_id
AND (tmp_rsl2.balance_amount::NUMERIC <= (COALESCE(c2.value_amount, 0)))
WHERE tmp_rsl2.sales_person = g1.sales_person
UPDATE
As per below comments, have given an unrelated SQLFiddle example that should give an idea of how to perform an UPDATE of all rows of a table looking up corresponding values from another table.

Related

Postgresql optional join

Is it possible in Postgres to have an optional join?
My use case is something like
select ...
from a
inner join b using (b_id)
where b.type in (...)
a is a very large reporting table. b is used to filter a, BUT the most common use case is that we will want all b.types, and therefore all the b records in the join. In other words, in most cases we don't want to filter by b at all, and would not need the join in that case, but the filtering optionality still needs to be there in cases when the user wants to filter by type.
So is it possible to invoke the join optionally, and save the join effort in cases when we just want all of a?
If not, what's my next best option? IF ... THEN or CTE with a union of separate queries?
If you don't need any of b's columns, there is no need to JOIN table b, You can filter by using EXISTS(SELECT .. FROM b WHERE ...).
If you want to conditionally exclude a part of the WHERE clause, you could use the following construct: (the ignore_b boolean will function as an on/off switch)
-- $ignore_b is a Boolean flag
-- when True, the optimiser will ignore the exists(...)
SELECT ...
FROM a
WHERE ( $ignore_b OR EXISTS (
SELECT *
FROM b
WHERE b.b_id = a.some_id
AND b.type in (1,2,3,4,5)
)
);
In our example, you are still filtering based on b, based on whether a row with that b_id exists in b in the first place.
Postgresql will remove unneeded joins under very specific circumstances. You write the join as a left join, so that no rows of A can be removed due to the absence of corresponding rows in B. The column B.b_id is a declared unique or primary key, so that no rows of A can be duplicated due to duplicate matches in B. And of course, no column of B can referenced in the query (except the reference to the key column in the left join condition).
In those cases, you can just always write the LEFT JOIN, and PostgreSQL will figure out that it can skip it.
You can argue that if you have a declared foreign key constraint on the join condition, then you shouldn't need the JOIN to be a LEFT JOIN in order to implement this optimization. I think that that argument is correct, but PostgreSQL does not implement it that way.
I would just do it programatically. If you are already programmatically adding references to B in the WHERE clause, you should be able to do it for the join as well.

Using EXCEPT and flagging column differences

What Im looking to do is select data from a postgres table, which does not appear in another. Both tables have identical columns, bar the use of boolean over Varchar(1) but the issue is that the data in those columns do not match up.
I know I can do this with a SELECT EXCEPT SELECT statement, which I have implemented and is working.
What I would like to do is find a method to flag the columns that do not match up. As an idea, I have thought to append a character to the end of the data in the fields that do not match.
For example if the updateflag is different in one table to the other, I would be returned '* f' instead of 'f'
SELECT id, number, "updateflag" from dbc.person
EXCEPT
SELECT id, number, "updateflag":bool from dbg.person;
Should I be joining the two tables together, post executing this statement to identify the differences, from whats returned?
I have tried to research methods to implement this but have no found anything on the topic
I prefer a full outer join for this
select *
from dbc.person p1
full join dbg.person p2 on p1.id = p2.id
where p1 is distinct from p2;
The id column is assumed the primary key column that "links" the two tables together.
This will only return rows where at least one column is different.
If you want to see the differences, you could use a hstore feature
select hstore(p1) - hstore(p2) as columns_diff_p1,
hstore(p2) - hstore(p1) as columns_diff_p2
from dbc.person p1
full join dbg.person p2 on p1.id = p2.id
where p1 is distinct from p2;

Update with leftjoin requires to join the update table itself

I want to update a table with values from another table, which not always exist. So I need to left join the other table. The only way I found is this:
UPDATE lessonentity update
SET title=a.test
FROM lessonentity l
LEFT JOIN (SELECT 'hoho1' test) a ON(true)
where l.lessonid=48552
AND update.lessonid=l.lessonid
My question: Is it possible to left-join another table, without inner-joining (where) the updating-table again?
Yes, but not using an explicit join. In your case, this is sufficient given that a has only one row:
UPDATE lessonentity le
SET title = a.test
FROM (SELECT 'hoho1' test) a
WHERE le.lessonid = 48552;
Normally, there would be an additional condition in the WHERE, connecting a and le, but that is not necessary in this case because the table has a single row.

How to specify two expressions in the select list when the subquery is not introduced with EXISTS

I have a query that uses a subquery and I am having a problem returning the expected results. The error I receive is..."Only one expression can be specified in the select list when the subquery is not introduced with EXISTS." How can I rewrite this to work?
SELECT
a.Part,
b.Location,
b.LeadTime
FROM
dbo.Parts a
LEFT OUTER JOIN dbo.Vendor b ON b.Part = a.Part
WHERE
b.Location IN ('A','B','C')
AND
Date IN (SELECT Location, MAX(Date) FROM dbo.Vendor GROUP BY Location)
GROUP BY
a.Part,
b.Location,
b.LeadTime
ORDER BY
a.Part
I think something like this may be what you're looking for. You didn't say what version of SQL Server--this works in SQL 2005 and up:
SELECT
p.Part,
p.Location, -- from *p*, otherwise if no match we'll get a NULL
v.LeadTime
FROM
dbo.Parts p
OUTER APPLY (
SELECT TOP (1) * -- * here is okay because we specify columns outside
FROM dbo.Vendor v
WHERE p.Location = v.Location -- the correlation part
ORDER BY v.Date DESC
) v
WHERE
p.Location IN ('A','B','C')
ORDER BY
p.Part
;
Now, your query can be repaired as is by adding the "correlation" part to change your query into a correlated subquery as demonstrated in Kory's answer (you'd also remove the GROUP BY clause). However, that method still requires an additional and unnecessary join, hurting performance, plus it can only pull one column at a time. This method allows you to pull all the columns from the other table, and has no extra join.
Note: this gives logically the same results as Lamak's answer, however I prefer it for a few reasons:
When there is an index on the correlation columns (Location, here) this can be satisfied with seeks, but the Row_Number solution has to scan (I believe).
I prefer the way this expresses the intent of the query more directly and succinctly. In the Row_Number method, one must get out to the outer condition to see that we are only grabbing the rn = 1 values, then bop back into the CTE to see what that is.
Using CROSS APPLY or OUTER APPLY, all the other tables not involved in the single-inner-row-per-outer-row selection are outside where (to me) they belong. We aren't squishing concerns together. Using Row_Number feels a bit like throwing a DISTINCT on a query to fix duplication rather than dealing with the underlying issue. I guess this is basically the same issue as the previous point worded in a different way.
The moment you have TWO tables from which you wish to pull the most recent value, the Row_Number() solution blows up completely. With this syntax, you just easily add another APPLY clause, and it's crystal clear what you're doing. There is a way to use Row_Number for the multiple tables scenario by moving the other tables outside, but I still don't prefer that syntax.
Using this syntax allows you to perform additional joins based on whether the selected row exists or not (in the case that no matching row was found). In the Row_Number solution, you can only reasonably do that NOT NULL checking in the outer query--so you are forced to split up the query into multiple, separated parts (you don't want to be joining to values you will be discarding!).
P.S. I strongly encourage you to use aliases that hint at the table they represent. Please don't use a and b. I used p for Parts and v for Vendor--this helps you and others make sense of the query more quickly in the future.
If I understood you corrrectly, you want the rows with the max date for locations A, B and C. Now, assuming SQL Server 2005+, you can do this:
;WITH CTE AS
(
SELECT
a.Part,
b.Location,
b.LeadTime,
RN = ROW_NUMBER() OVER(PARTITION BY a.Part ORDER BY [Date] DESC)
FROM
dbo.Parts a
LEFT OUTER JOIN dbo.Vendor b ON b.Part = a.Part
WHERE
b.Location IN ('A','B','C')
)
SELECT Part,
Location,
LeadTime
FROM CTE
WHERE RN = 1
ORDER BY Part
In your subquery you need to correlate the Location and Part to the outer query.
Example:
Date = (SELECT MAX(Date)
FROM dbo.Vender v
WHERE v.Location = b.Location
AND v.Part = b.Part
)
So this will bring back one date for each location and part

DB2 Query Structure Using User-Defined Function as a Table

I'm a little new to DB2, and am having trouble developing a query. I have created a user-defined function that returns a table of data which I want to then join and select from in larger select statement. I'm working on a sensitive db, so the query below isn't what I'm literally running, but it's almost exactly like it (without the other 10 joins I have to do lol).
select
A.customerId,
A.firstname,
A.lastname,
B.orderId,
B.orderDate,
F.currentLocationDate,
F.currentLocation
from
customer A
INNER JOIN order B
on A.customerId = B.customerId
INNER JOIN table(getShippingHistory(B.customerId)) as F
on B.orderId = F.orderId
where B.orderId = 35
This works great if I run this query without the where clause (or some other where clause that doesn't check for an ID). When I include the where clause, I get the following error:
Error during Prepare 58004(-901)[IBM][CLI Driver][DB2/LINUXX8664]
SQL0901N The SQL statement failed because of a non-severe system
error. Subsequent SQL statements can be processed. (Reason "Bad Plan;
Unresolved QNC found".) SQLSTATE=58004
I have tracked the issue down to fact that I'm using one of join criteria for the parameters (B.customerId). I have validated this fact by replacing B.customerId with a valid customerId, and the query works great. Problem is, I don't know the customerId when calling this query. I know only the orderId (in this example).
Any thoughts on how to restructure this so I can make only 1 call to get all the info? I know the plan is the problem b/c the customerId isn't getting resolved before the function is called.
So if I understand correctly, the function getShippingHistory(customerId) returns a table.
And if you call it with a single customer Id that table gets joined in your query above no problem at all.
But the way you have the query written above, you are asking db2 to call the function for every row returned by your query (i.e. every b.customerId that matches your join and where conditions).
So I'm not sure what behaviour you are expecting, because what you're asking for is a table back for every row in your query, and db2 (nor I) can figure out what the result is supposed to look like.
So in terms of restructuring your query, think about how you can change the getShippingHistory logic when multiple customer Ids are involved.
i found the best solution (given the current query structure) is to use a LEFT join instead of an INNER join in order force the LEFT part of the join to happen which will resolve the customerId to a value by the time it gets to the function call.
select
A.customerId,
A.firstname,
A.lastname,
B.orderId,
B.orderDate,
F.currentLocationDate,
F.currentLocation
from
customer A
INNER JOIN order B
on A.customerId = B.customerId
LEFT JOIN table(getShippingHistory(B.customerId)) as F
on B.orderId = F.orderId
where B.orderId = 35