Cancelled amount and a corresponding entry - Postgres - postgresql

I have the payment table:
There could be erroneous entries when a payment was made by mistake - see row 5 and then, this payment gets cancelled out - see row 6. I cannot figure out the query where I don't only cancel the negative amounts but also the corresponding pair. Here is the desired outcome:
You could also see the cases when several wrong payments were made and then, I need to cancel out all payments which if summed up give the cancelled amount.
The desired outcome:
I found Remove Rows That Sum Zero For A Given Key, Selecting positive aggregate value and ignoring negative in Postgres SQL and https://www.sqlservercentral.com/forums/topic/select-all-negative-values-that-have-a-positive-value but it is not exactly what I need
I already don't mind cases like case 2. At least, find a reliable way to exclude those like 5;-5.

you can try this for deleting the rows from the table :
WITH RECURSIVE cancel_list (id, total_cancel, sum_cancel, index_to_cancel) AS
( SELECT p.id, abs(p.amount), 0, array[p.index]
FROM payment_table AS p
WHERE p.amount < 0
AND p.id = id_to_check_and_cancel -- this condition can be suppressed in order to go through the full table payment
UNION ALL
SELECT DISTINCT ON (l.id) l.id, l.total_cancel, l.sum_cancel + p.amount, l.index_to_cancel || p.index
FROM cancel_list AS l
INNER JOIN payment_table AS p
ON p.id = l.id
WHERE l.sum_cancel + p.amount <= l.total_cancel
AND NOT l.index_to_cancel #> array[p.index] -- this condition is to avoid loops
)
DELETE FROM payment_table AS p
USING (SELECT DISTINCT ON (c.id) c.id, unnest(c.index_to_cancel) AS index_to_cancel
FROM cancel_list AS c
ORDER BY c.id, array_length(c.index_to_cancel, 1) DESC
) AS c
WHERE p.index = c.index_to_cancel;
you can try this for just querying the table without the hidden rows :
WITH RECURSIVE cancel_list (id, total_cancel, sum_cancel, index_to_cancel) AS
( SELECT p.id, abs(p.amount), 0, array[p.index]
FROM payment_table AS p
WHERE p.amount < 0
AND p.id = id_to_check_and_cancel -- this condition can be suppressed in order to go through the full table payment
UNION ALL
SELECT DISTINCT ON (l.id) l.id, l.total_cancel, l.sum_cancel + p.amount, l.index_to_cancel || p.index
FROM cancel_list AS l
INNER JOIN payment_table AS p
ON p.id = l.id
WHERE l.sum_cancel + p.amount <= l.total_cancel
AND NOT l.index_to_cancel #> array[p.index] -- this condition is to avoid loops
)
SELECT *
FROM payment_table AS p
LEFT JOIN (SELECT DISTINCT ON (c.id) c.id, c.index_to_cancel
FROM cancel_list AS c
ORDER BY c.id, array_length(c.index_to_cancel, 1) DESC
) AS c
ON c.index_to_cancel #> array[p.index]
WHERE c.index_to_cancel IS NULL ;

Related

Is it possible to do a "LIMIT 1" on a left join in Postgres?

I have two tables: one for money and attributes surrounding it (e.g. who earnt it) and a child table for the "ledger" - this contains one or more entries that represent the history of money that has moved.
SELECT SUM(pl.achieved)
FROM payout p
LEFT JOIN payout_ledgers pl ON pl.payout_id = p.id
This query works well when there is only one ledger item, but when more are added the SUM will increase. I want to join only the latest row. So hypothetically:
SELECT SUM(pl.achieved)
FROM payout p
LEFT JOIN payout_ledgers pl ON pl.payout_id = p.id ORDER BY pl.ts DESC LIMIT 1
WHERE ...
ORDER BY ...
LIMIT ...
(which sadly doesn't work)
What I have tried:
Using a subquery works, but is painfully slow given the size of the data set (and other omitted properties and where clauses etc.):
SELECT SUM(pl.achieved)
FROM payout p
LEFT JOIN payout_ledgers pl ON pl.payout_id = p.id AND pl.id = (SELECT id FROM payout_ledgers WHERE payout_id = p.id ORDER BY ts DESC LIMIT 1)
Incidentally, I'm unsure why this subquery is so slow (~12 seconds, as opposed to 150ms with no subquery). I would have expected it to be quicker given that we're only selecting based on the foreign key (payout_id).
Another thing I tried was to do a select from the join - my logic being that if we select from small joined dataset instead of the whole table it would be quicker. However I was met with relation "pl" does not exist error:
SELECT SUM(pl.achieved)
FROM payouts p
LEFT JOIN payout_ledgers pl ON pl.payout_id = p.id
WHERE pl.id = (SELECT id FROM pl ORDER BY ts DESC LIMIT 1)
Thank you in advance for any suggestions. I am also open to suggestions for schema changes that could make this type of logic easier, although my preference would be to try and get the query working since the schema is not easy to change on our production environment.
If you're on Postgres 9.4+, you can use a LEFT JOIN LATERAL (docs)
SELECT SUM(sub.achieved)
FROM payout p
LEFT JOIN LATERAL (SELECT achieved
FROM payout_ledgers pl
WHERE pl.payout_id = p.id
ORDER BY pl.ts DESC LIMIT 1) sub ON true
This will return the sum of the "achieved" field in the most recent entry in payout_ledgers for all payouts.
window functions:
-- using row_number()
SELECT SUM(sss.achieved)
FROM (SELECT pl.achieved
, row_number() OVER (PARTITION BY pl.payout_id, ORDER BY pl.ts DESC)
FROM payouts p
JOIN payout_ledgers pl ON pl.payout_id = p.id
) sss
WHERE sss.rn =1
;
-- using last_value()
SELECT SUM(sss.achieved)
FROM (SELECT
, last_value(achieved) OVER (PARTITION BY pl.payout_id, ORDER BY pl.ts ASC) AS achieved
FROM payouts p
JOIN payout_ledgers pl ON pl.payout_id = p.id
) sss
;
BTW: you do not need the LEFT JOIN (adding no value to the SUM does not change the sum)

How do I create a basic looping calculation in sql to avoid doing 200+ Joins

All -
I have a basic issue, which is becoming a major nightmare.
I am creating a payment schedule table for future dates. In order to calculate the future balances, I need to continuously reduce the starting balance in Table A on each date, based upon the future payment amount in table B.
The problem is that I have to left join Table B based upon what the balance is in Table A, and do that for every single row because the ending balance in Row 1 is the starting balance in Row 2, and the ending balance in Row 2, is the starting balance in Row 3. This is a cumulative / looping calculation.
Here is a depiction of what I am trying to do:
[![Table Examples][1]][1]
Here is the actual SQL:
with dates_rns as (
select
a.loan_id
,as_of_date as payment_date
,upb_usd as starting_balance
,principal_amount as payment
,new_upb as new_balance
,row_number() over (partition by loan_id order by a.as_of_date) as rn
from scratchpad.iit1 a
where row_value < 100
), payment_sched as (
select loan_id, payment_date, starting_balance,
payment, new_balance, rn
from dates_rns a
where rn = 1
union all
select n.loan_id, n.payment_date,
p.new_balance as starting_balance,
least(b.principal_amount, p.new_balance),
greatest(p.new_balance - b.principal_amount, 0.00) as new_balance,
n.rn
from dates_rns n -- 'n' is for this payment
join payment_sched p -- 'p' is for previous payment
on n.rn = p.rn + 1
join scratchpad.collectability_1_Princ b -- payment lookup
on b.loan_id = p.loan_id
and round(b.previous_upb) > round(p.new_balance)
and round(b.remaining_upb) <= round(p.new_balance)
)
select *
from payment_sched
You are looking for a recursion query here. These queries can go haywire depending on your data, so I will restrict it to one loan_id, passed in the params CTE.
Change the 12345 to a valid loan_id, and see if this works for you. Please let me know in comments if it gives you trouble.
You cannot use an outer join inside of the recursive CTE, so table_b has to cover the full range.
with recursive params as (
select 12345 as loan_id
), dates_rns as (
select a.loan_id, a.payment_date, a.starting_balance, a.payment, a.new_balance,
row_number() over (order by a.payment_date) as rn
from params p
join table_a a on a.loan_id = p.loan_id
), payment_sched as (
select loan_id, payment_date, starting_balance,
payment, new_balance, rn
from dates_rns a
where rn = 1
union all
select n.loan_id, n.payment_date,
p.new_balance as starting_balance,
least(b.payment, p.new_balance),
greatest(p.new_balance - b.payment, 0.00) as new_balance,
n.rn
from dates_rns n -- 'n' is for this payment
join payment_sched p -- 'p' is for previous payment
on n.rn = p.rn + 1
join table_b b -- payment lookup
on b.loan_id = p.loan_id
and round(b.previous_balance) > round(p.new_balance)
and round(b.remaining_balance) <= round(p.new_balance)
)
select *
from payment_sched;

Strange Behaviour on Postgresql query

We created a view in Postgres and I am getting strange result.
View Name: event_puchase_product_overview
When I try to get records with *, I get the correct result. but when I try to get specific fields, I get wrong values.
I hope the screens attached here can explain the problem well.
select *
from event_purchase_product_overview
where id = 15065;
select id, departure_id
from event_puchase_product_overview
where id = 15065;
VIEW definition:
CREATE OR REPLACE VIEW public.event_puchase_product_overview AS
SELECT row_number() OVER () AS id,
e.id AS departure_id,
e.type AS event_type,
e.name,
p.id AS product_id,
pc.name AS product_type,
product_date.attribute AS option,
p.upcomming_date AS supply_date,
pr.date_end AS bid_deadline,
CASE
WHEN (pt.categ_id IN ( SELECT unnest(tt.category_ids) AS unnest
FROM ( SELECT string_to_array(btrim(ir_config_parameter.value, '[]'::text), ', '::text)::integer[] AS category_ids
FROM ir_config_parameter
WHERE ir_config_parameter.key::text = 'trip_product_flight.product_category_hotel'::text) tt)) THEN e.maximum_rooms
WHEN (pt.categ_id IN ( SELECT unnest(tt.category_ids) AS unnest
FROM ( SELECT string_to_array(btrim(ir_config_parameter.value, '[]'::text), ', '::text)::integer[] AS category_ids
FROM ir_config_parameter
WHERE ir_config_parameter.key::text = 'trip_product_flight.product_category_flight'::text) tt)) THEN e.maximum_seats
WHEN (pt.categ_id IN ( SELECT unnest(tt.category_ids) AS unnest
FROM ( SELECT string_to_array(btrim(ir_config_parameter.value, '[]'::text), ', '::text)::integer[] AS category_ids
FROM ir_config_parameter
WHERE ir_config_parameter.key::text = 'trip_product_flight.product_category_bike'::text) tt)) THEN e.maximum_bikes
ELSE e.maximum_seats
END AS departure_qty,
CASE
WHEN now()::date > pr.date_end AND po.state::text = 'draft'::text THEN true
ELSE false
END AS is_deadline,
pl.product_qty::integer AS purchased_qty,
pl.comments,
pl.price_unit AS unit_price,
rp.id AS supplier,
po.id AS po_ref,
po.state AS po_state,
po.date_order AS po_date,
po.user_id AS operator,
pl.po_state_line AS line_status
FROM event_event e
LEFT JOIN product_product p ON p.related_departure = e.id
LEFT JOIN product_template pt ON pt.id = p.product_tmpl_id
LEFT JOIN product_category pc ON pc.id = pt.categ_id
LEFT JOIN purchase_order_line pl ON pl.product_id = p.id
LEFT JOIN purchase_order po ON po.id = pl.order_id
LEFT JOIN purchase_order_purchase_requisition_rel prr ON prr.purchase_order_id = po.id
LEFT JOIN purchase_requisition pr ON pr.id = prr.purchase_requisition_id
LEFT JOIN res_partner rp ON rp.id = po.partner_id
LEFT JOIN ( SELECT p_1.id AS product_id,
pav.name AS attribute
FROM product_product p_1
LEFT JOIN product_attribute_value_product_product_rel pa ON pa.prod_id = p_1.id
LEFT JOIN product_attribute_value pav ON pav.id = pa.att_id
LEFT JOIN product_attribute pat ON pat.id = pav.attribute_id
WHERE pat.name::text <> ALL (ARRAY['Date'::character varying, 'Departure'::character varying]::text[])) product_date ON product_date.product_id = p.id
WHERE (p.id IN ( SELECT DISTINCT mrp_bom_line.product_id
FROM mrp_bom_line)) AND p.active
ORDER BY e.id, pt.categ_id, p.id;
If I add new event_event or new product_product I'll get a new definition of row_number in my view, then the column ID of my view is not stable.
at least you can't use row_number as Id of the view,
If you insist to use row_number, you can use the Order By "creation DATE" by this way all new records will be as last lines in the view and this will not change the correspondency between ID (row_number) and other columns.
Hope that helps !
Very likely the execution plan of your query depends on the columns you select. Compare the execution plans!
Your id is generated using the row_number window function. Now window functions are executed before the ORDER BY clause, so the order will depend on the execution plan and hence on the columns you select.
Using row_number without an explicit ordering doesn't make any sense.
To fix that, don't use
row_number() OVER ()
but
row_number() OVER (ORDER BY e.id, pt.categ_id, p.id)
so that you have a reliable ordering.
In addition, you should omit the ORDER BY clause at the end.

can I rank results got from different WHERE clause?

Say I want to select the posts that has certain tags or matches the keyword.
select t1.*
from (
select p.*, count(p.id) from plainto_tsquery('hElLo') AS q , post p
left join post_tag pt on pt.post_id = p.id
left join tag t on t.id = pt.tag_id
WHERE (tsv ## q) or t.id in (2,3)
group by p.id
) as t1
order by count desc, ts_rank_cd(t1.tsv, plainto_tsquery('hElLo')) desc
limit 5;
the above does select what I want. In tsv, I gave title A weight and description D weight. it now becomes pretty pointless when sorting by count because each entry has the same weight. Is it possible to do things like if this row is picked from t.id in (2,3), they get to sorted to the first, then sort by ts_rank_cd, or give each match tag 'A' weight, title become B weight and description is D?
Try CASE WHEN
select t1.*
from (
select p.*, count(p.id),
(CASE WHEN t.id in (2,3) THEN 1 ELSE 2 END) as ranking
from plainto_tsquery('hElLo') AS q , post p
left join post_tag pt on pt.post_id = p.id
left join tag t on t.id = pt.tag_id
WHERE (tsv ## q) or t.id in (2,3)
group by p.id
) as t1
order by count desc, ranking asc, ts_rank_cd(t1.tsv, plainto_tsquery('hElLo')) desc
limit 5;
Edited(Correct Answer):
select t1.*
from (
select p.*, count(p.id),
COUNT(1) filter(where t.id in (2,3)) ranking
from plainto_tsquery('hElLo') AS q , post p
left join post_tag pt on pt.post_id = p.id
left join tag t on t.id = pt.tag_id
WHERE (tsv ## q) or t.id in (2,3)
group by p.id
) as t1
order by count desc, ranking asc, ts_rank_cd(t1.tsv, plainto_tsquery('hElLo')) desc
limit 5;
I'm not sure why the count would be the same, but you can add more keys to the order by:
order by count desc,
(t.id in (2, 3)) desc,
ts_rank_cd(t1.tsv, plainto_tsquery('hElLo')) desc ;
The desc is because true > false, and you want the true values to be first.

Avoiding Order By in T-SQL

Below sample query is a part of my main query. I found SORT operator in below query is consuming 30% of the cost.
To avoid SORT, there is need of creation of Indexes. Is there any other way to optimize this code.
SELECT TOP 1 CONVERT( DATE, T_Date) AS T_Date
FROM TableA
WHERE ID = r.ID
AND Status = 3
AND TableA_ID >ISNULL((
SELECT TOP 1 TableA_ID
FROM TableA
WHERE ID = r.ID
AND Status <> 3
ORDER BY T_Date DESC
), 0)
ORDER BY T_Date ASC
Looks like you can use not exists rather than the sorts. I think you'll probably get a better performance boost by use a CTE or derived table instead of the a scalar subquery.
select *
from r ... left outer join
(
select ID, min(t_date) as min_date from TableA t1
where status = 3 and not exists (
select 1 from TableA t2
where t2.ID = t1.ID
and t2.status <> 3 and t2.t_date > t1.t_date
)
group by ID
) as md on md.ID = r.ID ...
or
select *
from r ... left outer join
(
select t1.ID, min(t1.t_date) as min_date
from TableA t1 left outer join TableA t2
on t2.ID = t1.ID and t2.status <> 3
where t1.status = 3 and t1.t_date < t2.t_date
group by t1.ID
having count(t2.ID) = 0
) as md on md.ID = r.ID ...
It also appears that you're relying on an identity column but it's not clear what those values mean. I'm basically ignoring it and using the date column instead.
Try this:
SELECT TOP 1 CONVERT( DATE, T_Date) AS T_Date
FROM TableA a1
LEFT JOIN (
SELECT ID, MAX(TableA_ID) AS MaxAID
FROM TableA
WHERE Status <> 3
GROUP BY ID
) a2 ON a2.ID = a1.ID AND a1.TableA_ID > coalesce(a2.MAXAID,0)
WHERE a1.ID = r.ID AND a1.Status = 3
ORDER BY T_Date ASC
The use of TOP 1 in combination with the unexplained r alias concern me. There's almost certainly a MUCH better way to get this data into your results that doesn't involve doing this in a sub query (unless this is for an APPLY operation).