Postgresql running sum of previous groups? - postgresql

Given the following data:
sequence | amount
1 100000
1 20000
2 10000
2 10000
I'd like to write a sql query that gives me the sum of the current sequence, plus the sum of the previous sequence. Like so:
sequence | current | previous
1 120000 0
2 20000 120000
I know the solution likely involves windowing functions but I'm not too sure how to implement it without subqueries.

SQL Fiddle
select
seq,
amount,
lag(amount::int, 1, 0) over(order by seq) as previous
from (
select seq, sum(amount) as amount
from sa
group by seq
) s
order by seq

If your sequence is "sequencial" without holes you can simply do:
SELECT t1.sequence,
SUM(t1.amount),
(SELECT SUM(t2.amount) from mytable t2 WHERE t2.sequence = t1.sequence - 1)
FROM mytable t1
GROUP BY t1.sequence
ORDER BY t1.sequence
Otherwise, instead of t2.sequence = t1.sequence - 1 you could do:
SELECT t1.sequence,
SUM(t1.amount),
(SELECT SUM(t2.amount)
from mytable t2
WHERE t2.sequence = (SELECT MAX(t3.sequence)
FROM mytable t3
WHERE t3.sequence < t1.sequence))
FROM mytable t1
GROUP BY t1.sequence
ORDER BY t1.sequence;
You can see both approaches in this fiddle

Related

Express Nearest Neighbor Join in Postgresql?

I have two tables Q and T, both containing a column of float numbers.
What I want to do is, for each number in Q, I want to find a number in T that has the smallest distance to it.
For example, for T={1,7,9} and Q={2,6,10}, I want to return Q,T pairs as {(2,1),(6,7),(10,9)}.
How should I express this query with SQL?
In addition, is that possible to accelerate this join by index, e.g. add an operator class which bind "FOR ORDER BY <->" with fabs calculation?
create table t (val_t integer);
create table q (val_q integer);
insert into t values (1),(7),(9);
insert into q values (2),(6),(10);
Start with a query that cross joins the two tables and adds a rank based on the difference:
SELECT val_q, val_t, rank() OVER (PARTITION BY val_q ORDER BY abs(val_t - val_q))
FROM t
JOIN q ON true ;
Use this query in a cte or subquery and filter by rank:
WITH src AS(
SELECT val_q, val_t, rank() OVER (PARTITION BY val_q ORDER BY abs(val_t - val_q))
FROM t
JOIN q ON true )
SELECT val_q, val_t FROM src
WHERE rank = 1;
val_q | val_t
-------+-------
2 | 1
6 | 7
10 | 9
See https://www.postgresql.org/docs/12/tutorial-window.html
Given this schema:
create table t (tn float);
insert into t values (1), (7), (9);
create table q (qn float);
insert into q values (2), (6), (10);
DISTINCT ON is the most straightforward way:
select distinct on (qn) qn, tn
from q
cross join t
order by qn, abs(qn - tn);
Exploiting a numeric range may perform better depending on your data sizes. If performance is an issue, then you can create an actual temp table for the range_tn CTE and put a gist index on it:
with all_tn as (
select tn
from t
union select null
), range_tn as (
select numrange(tn::numeric, (lead(tn) over w)::numeric, '[]') as tr
from all_tn
window w as (order by tn nulls first)
)
select qn,
case
when lower_inf(tr) then upper(tr)
when upper_inf(tr) then lower(tr)
when 2 * qn - lower(tr) - upper(tr) > 0 then upper(tr)
else lower(tr)
end as tn
from q
join range_tn
on qn::numeric <# tr;
Fiddle here

Summing Multiple Records by maxdate

I have a table with the following data
Bldg Suit SQFT Date
1 1 1,000 9/24/2012
1 1 1,500 12/31/2011
1 2 800 8/31/2012
1 2 500 10/1/2005
I want to write a query that will sum the max date for each suit record, so the desired result would be 1,800, and must be in one cell/row. This will ultimately be part of subquery, I am just not getting what I expect with the queries I have writtren so far.
Thanks in advance.
You can use the following (See SQL Fiddle with Demo):
select sum(t1.sqft) Total
from yourtable t1
inner join
(
select max(dt) mxdt, suit, bldg
from yourtable
group by suit, bldg
) t2
on t1.dt = t2.mxdt
and t1.bldg = t2.bldg
and t1.suit = t2.suit
; With Data As
(
Select Bldg, Suit, SQFT, Row_Number() Over (Partition By Bldg, Suit Order By Date DESC) As RowID
From YourTableNameHere
)
Select Bldg, Sum(SQFT) As TotalSQFT
From Data
Where RowId = 1
Group By Bldg

TSQL - COUNT number of rows in a different state than current row

It's kind of hard to explain, but from this example it should be clear.
Table TABLE:
Name State Time
--------------------
A 1 1/4/2012
B 0 1/3/2012
C 0 1/2/2012
D 1 1/1/2012
Would like to
select * from TABLE where state=1 order by Time desc
plus an additional column 'Skipped' containing the number of rows after one where state=1 in state 0, in other words the output should look like this:
Name State Time Skipped
A 1 1/4/2012 2 -- 2 rows after A where State != 1
D 1 1/1/2012 0 -- 0 rows after D where State != 1
0 should also be reported in case of 2 consecutive rows are in state = 1, i.e. there is nothing between these rows in a state other than 1.
It seems like CTE are must here, but can't figure out how to count rows where state != 1.
Any help will be appreciated.
(MS Sql Server 2008)
I've used a CTE to establish RowNo, so that you're not dependent on consecutive dates:
WITH CTE_Rows as
(
select name,state,time,
rowno = ROW_NUMBER() over (order by [time])
from MyTable
)
select name,state,time,
gap = isnull(r.rowno - x.rowno - 1,0)
from
CTE_Rows r
outer apply (
select top 1 rowno
from CTE_Rows sub
where sub.rowno < r.rowno and sub.state = 1
order by sub.rowno desc) x
where r.state = 1
If you just want to do it by date, then its simpler - just need an outer apply:
select name,state,r.time,
gap = convert(int,isnull(r.time - x.time - 1,0))
from
MyTable r
outer apply (
select top 1 time
from MyTable sub
where sub.time < r.time and sub.state = 1
order by sub.time desc) x
where r.state = 1
FYI the test data is used was created as follows:
create table MyTable
(Name char(1), [state] tinyint, [Time] datetime)
insert MyTable
values
('E',1,'2012-01-05'),
('A',1,'2012-01-04'),
('B',0,'2012-01-03'),
('C',0,'2012-01-02'),
('D',1,'2012-01-01')
Okay, here you go (it gets a little messy):
SELECT U.CurrentTime,
(SELECT COUNT(*)
FROM StateTable AS T3
WHERE T3.State=0
AND T3.Time BETWEEN U.LastTime AND U.CurrentTime) AS Skipped
FROM (SELECT T1.Time AS CurrentTime,
(SELECT TOP 1 T2.Time
FROM StateTable AS T2
WHERE T2.Time < T1.Time AND T2.State=1
ORDER BY T2.Time DESC) AS LastTime
FROM StateTable AS T1 WHERE T1.State = 1) AS U

Postgresql Update Based on count, min and group by

Thank you for taking the time to look at my question.
I've seen similar questions, but not the same depth. Please help!
I would like to update a column all rows in a table that holds user_id and date_created with the lowest date_created for the user_id.
The following select gives me all the rows I would like to update:
select user_id, min(date_created) from mytable s1 where
(select count(1) from mytable s2 where
s1.user_id = s2.user_id group by s2.user_id)
> 1 group by user_id order by user_id;
I would have expected this update to work:
update mytable set join_status = 1 where date_created =
(select min(date_created) from mytable s1 where
(select count(1) from simplepay_payment s2 where
s1.user_id = s2.user_id group by s2.user_id)
> 1 group by user_id);
But is gave the following error:
ERROR: more than one row returned by a subquery used as an expression
I've tried a few different solutions, but nothing seems to help.
Does anyone have any ideas fro me?
Thanks again.
Change your SQL to:
update mytable set join_status = 1 where date_created IN
(select min(date_created) from mytable s1 where
(select count(1) from simplepay_payment s2 where
s1.user_id = s2.user_id group by s2.user_id)
> 1 group by user_id);
Read more on row comparison in the docs.
EDIT:
In the subquery you're performing GROUP BY user_id. This means that you will receive many rows, based on the number of unique user_id values in your simplepay_payment table.
To make your query working as expected, you should join using 2 columns: user_id and date_created. As you've mentioned, you already have the query that gives you the correct results, so you can use it like this:
WITH desired AS (
SELECT user_id, min(date_created) AS mindt
FROM mytable s1 where
(SELECT count(1) FROM mytable s2
WHERE s1.user_id = s2.user_id GROUP BY s2.user_id) > 1
GROUP BY user_id)
UPDATE mytable m SET join_status = 1 FROM desired d
WHERE d.user_id = m.user_id AND d.mindt = m.date_created;
I've wrapped in your query into the Common Table Expression and used it in the UPDATE statement. You can add RETURNING m.* at the end of the query to see the rows that had been updated and their new values.
You can test this query on SQL Fiddle.
EDIT2:
Common Table Expressions (WITH-queries) are not available before version 9.1 for UPDATE statements. You can simply move the CTE subquery into the update, like this:
UPDATE mytable m SET join_status = 1 FROM (
SELECT user_id, min(date_created) AS mindt
FROM mytable s1 where
(SELECT count(1) FROM mytable s2
WHERE s1.user_id = s2.user_id GROUP BY s2.user_id) > 1
GROUP BY user_id) d
WHERE d.user_id = m.user_id AND d.mindt = m.date_created;

Perform arithmetic in select statement

Let's suppose I have balance 2000, and want to select balance as
balance=balance-Cr+Dr
So my balance column will give values as below.
balance DR Cr
40000 0 60000
100000 60000 0
0 0 100000
How is this possible in SQL query?
Please check similar question like me
enter link description here
Here is a recursive CTE that calculates the balance using the balance from the previous row. You need something that defines the order of the rows. I use the ID column in the sample table.
-- Test table
declare #T table
(
ID int identity primary key,
DR int,
Cr int
)
-- Sample data
insert into #T (DR, Cr)
select 0, 60000 union all
select 60000, 0 union all
select 0, 100000
-- In value
declare #StartBalance int
set #StartBalance = 100000
-- Recursive cte calculating balance as a running sum
;with cte as
(
select
T.ID,
#StartBalance - T.Cr + T.DR as Balance,
T.DR,
T.Cr
from #T as T
where T.ID = 1
union all
select
T.ID,
C.Balance - T.Cr + T.DR as Balance,
T.DR,
T.Cr
from cte as C
inner join #T as T
on C.ID+1 = T.ID
)
select Balance, DR, Cr
from cte
option (maxrecursion 0)
Result:
Balance DR Cr
----------- ----------- -----------
40000 0 60000
100000 60000 0
0 0 100000
This should work:
SELECT (T.BALANCE-T.CR+T.DR) as "Balance", T.DR, T.CR
FROM <table-name> T
If you use Oracle, there is a function called LAG to reach the previous row data: http://www.adp-gmbh.ch/ora/sql/analytical/lag.html
If you read this link I think you will see that this is exactly what you need. But only if you use Oracle..