Querying historical balance data from a transactions table - postgresql

In need of help to get the total balances of customers on a daily basis if I backtracked the data.
I have the following table structures in a Postgres database:
Table1: accounts (acc)
|id|acc_created|
|1 |2019-01-01 |
|2 |2019-01-01 |
|3 |2019-01-01 |
Table2: transactions
|transaction_id|acc_id|balance|txn_created |
|1 |1 |100 |2019-01-01 07:00:00|
|2 |1 |50 |2019-01-01 16:32:10|
|3 |1 |25 |2019-01-01 22:10:59|
|4 |2 |200 |2019-01-02 18:34:22|
|5 |3 |150 |2019-01-02 15:09:43|
|6 |1 |125 |2019-01-04 04:52:31|
|7 |1 |0 |2019-01-05 05:10:00|
|8 |2 |300 |2019-01-05 12:34:56|
|9 |3 |120 |2019-01-06 23:59:59|
The transactions table shows the balance after a transaction is made on the account.
To be honest, I am unsure how to write the query, or whether I am overthinking the situation. I know it would involve last_value() and coalesce(), and possibly lag() and lead(). Essentially the criterias I would like to fulfill are:
It takes the last balance value of that day, for that account.
(i.e. the balance for acc_id = '1' on 2019-01-01 would be $25, acc_id ='2' and '3' would be $0)
For days where there are no transaction made by an account, the balance would take from the previous balance of that account.
(i.e. the balance for acc_id = '1' on 2019-01-03 would be $25)
Lastly, I would like the total balance of all accounts aggregated by date.
(i.e. At end of 2019-01-02, the total balance should be $375 (=25+200+150)
I have tried the query below:
SELECT date_trunc('day',date), sum(balance_of_day) FROM (
SELECT txn.created as date,
acc_id,
row_number() over (partition BY acc_id ORDER BY txn_created ASC) as order_of_created,
last_value(balance) over (partition by acc_id ORDER BY txn_created RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as balance_of_day
FROM transactions) X
where X.order_of_created = 1
GROUP BY 1
However, this only gives me the total balance if a transaction was made by any account on a certain day.
The expected end result (based on the example) should be:
|date |total_balance|
|2019-01-01 |25 |
|2019-01-02 |375 |
|2019-01-03 |375 |
|2019-01-04 |475 |
|2019-01-05 |450 |
|2019-01-06 |420 |
I won't need to present the different account numbers, just the total accumulated balance from all customers at the end of the day. Please let me know how I can solve this! Many thanks!

You can use a few cool postgres feature to accomplish this. First, to get the last balance per day, use DISTINCT ON:
SELECT DISTINCT on(acc_id, txn_created::date)
transaction_id, acc_id, balance, txn_created::date as day
FROM transactions
ORDER BY acc_id, txn_created::date, txn_created desc;
To figure out the balance on any given day, we'll use a daterange per row that includes the current row and excludes the next row, partitioned by acc_id:
SELECT transaction_id, acc_id, balance, daterange(day, lead(day, 1) OVER (partition by acc_id order by day), '[)')
FROM (
SELECT DISTINCT on(acc_id, txn_created::date)
transaction_id, acc_id, balance, txn_created::date as day
FROM transactions
ORDER BY acc_id, txn_created::date, txn_created desc
) sub;
Lastly, join to generate_series. We can join where the date in generate_series is contained by the daterange we created in the last step. The dateranges are intentionally not overlapping, so we can query on any date safely.
WITH balances as (
SELECT transaction_id, acc_id, balance, daterange(day, lead(day, 1) OVER (partition by acc_id order by day), '[)') as drange
FROM (
SELECT DISTINCT on(acc_id, txn_created::date)
transaction_id, acc_id, balance, txn_created::date as day
FROM transactions
ORDER BY acc_id, txn_created::date, txn_created desc
) sub
)
SELECT d::date, sum(balance)
FROM generate_series('2019-01-01'::date, '2019-01-06'::date, '1 day') as g(d)
JOIN balances ON d::date <# drange
GROUP BY d::date;
d | sum
------------+-----
2019-01-01 | 25
2019-01-02 | 375
2019-01-03 | 375
2019-01-04 | 475
2019-01-05 | 450
2019-01-06 | 420
(6 rows)
Here's a fiddle.

Related

Select the maximum rows of sorted subgroups

Using PostgreSQL 11, I have a table containing a DAY and MONTH_TO_DAY entry for each day of every month. I would like to select the most recent MONTH_TO_DAY entry for each account.
My table is:
+------+------------+--------------+------------+--------------------------+
|id |account |code |interval |timestamp |
+------+------------+--------------+------------+--------------------------+
|387276|ALPBls6EsP |52 |MONTH_TO_DAY|2020-09-01 01:05:00.000000|
|387275|ALPBls6EsP |52 |DAY |2020-09-01 01:05:00.000000|
|387272|YkON8lk8A8 |25 |MONTH_TO_DAY|2020-09-01 01:05:00.000000|
|387271|YkON8lk8A8 |25 |DAY |2020-08-01 01:05:00.000000|
|387273|ALPBls6EsP |32 |MONTH_TO_DAY|2020-08-31 01:05:00.000000|
|387274|ALPBls6EsP |32 |DAY |2020-08-31 01:05:00.000000|
|387272|ALPBls6EsP |27 |MONTH_TO_DAY|2020-08-30 01:05:00.000000|
|387271|ALPBls6EsP |27 |DAY |2020-08-30 01:05:00.000000|
+------+------------+--------------+------------+--------------------------+
If it helps, the entries are always in descending order timewise.
In a query asking for all accounts, since the 31st is the last day of 08 and the 1st is the most recent entry of 09, my expected output would be
+------+------------+--------------+------------+--------------------------+
|id |account |code |interval |timestamp |
+------+------------+--------------+------------+--------------------------+
|387276|ALPBls6EsP |52 |MONTH_TO_DAY|2020-09-01 01:05:00.000000|
|387272|YkON8lk8A8 |25 |MONTH_TO_DAY|2020-09-01 01:05:00.000000|
|387273|ALPBls6EsP |32 |MONTH_TO_DAY|2020-08-31 01:05:00.000000|
+------+------------+--------------+------------+--------------------------+
I was thinking I'd like to group entries by month (truncate the dd/hh/ss), and then select the row with the maximum timestamp in each group. I can get the right rows with this but I can't figure out how to get any of the other fields.
SELECT max(timestamp)
FROM mytable
GROUP BY date_trunc('month', mytable.timestamp);
I also thought I could use distinct on something like the below, but I'm not too familiar with distinct on or date_trunc and I can't figure out how to use them together.
SELECT distinct on (timestamp)
*
FROM mytable
ORDER BY date_trunc('month', mytable.timestamp)
You do want distinct on, but you want to apply it to the account:
select distinct on (account) *
from mytable
where interval = 'MONTH_TO_DAY'
order by account, timestamp desc;
If you want the latest by account by month, then this should work:
select distinct on (date_trunc('month', timestamp), account) *
from mytable
where interval = 'MONTH_TO_DAY'
order by date_trunc('month', timestamp), account, timestamp desc;

I need a type of group-sort that I couldn't figure out with ROW_NUMBER on T-SQL

I have a table with a table_id row and 2 other rows. I want type of numbering with row_number function and I want result to seem like this:
id |col1 |col2 |what I want
------------------------------
1 |x |a |1
2 |x |b |2
3 |x |a |3
4 |x |a |3
5 |x |c |4
6 |x |c |4
7 |x |c |4
please consider that;
there's only one x, so "partition by col1" is OK. other than that;
there are two sequences of a's, and they'll be counted seperately
(not 1,2,1,1,3,3,3). and sorting must be by id, not by col2 (so
order by col2 is NOT OK).
I want that number to increase by one anytime col2 changes compared to previous line.
row_number () over (partition by col1 order by col2) DOESN'T WORK. because I want it ordered by id.
Using LAG and a windowed COUNT appears to get you what you are after:
WITH Previous AS(
SELECT V.id,
V.col1,
V.col2,
V.[What I want],
LAG(V.Col2,1,V.Col2) OVER (ORDER BY ID ASC) AS PrevCol2
FROM (VALUES(1,'x','a',1),
(2,'x','b',2),
(3,'x','a',3),
(4,'x','a',3),
(5,'x','c',4),
(6,'x','c',4),
(7,'x','c',4))V(id, col1, col2, [What I want]))
SELECT P.id,
P.col1,
P.col2,
P.[What I want],
COUNT(CASE P.Col2 WHEN P.PrevCol2 THEN NULL ELSE 1 END) OVER (ORDER BY P.ID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) +1 AS [What you get]
FROM Previous P;
DB<>Fiddle

Aggregating a table based on one column and then joining it with another table

I am working with the following two tables;
Table 1
Key |Clicks |Impressions
-------------+-------+-----------
USA-SIM-CARDS|55667 |544343
DE-SIM-CARDS |4563 |234829
AU-SIM-CARDS |3213 |232242
UK-SIM-CARDS |3213 |1333223
CA-SIM-CARDS |4321 |8883111
MX-SIM-CARDS |3193 |3291023
Table 2
Key |Conversions |Final Conversions|Active Sims
-----------------+------------+-----------------+-----------
USA-SIM-CARDS |456 |43 |4
USA-SIM-CARDS |65 |2 |1
UK-SIM-CARDS |123 |4 |3
UK-SIM-CARDS |145 |34 |5
The goal is to get the following output;
Key |Clicks |Impressions|Conversions|Final Conversions|Active Sims
-------------+-------+-----------+-----------+-----------------+-----------
USA-SIM-CARDS|55667 |544343 |521 |45 |5
DE-SIM-CARDS |4563 |234829 | | |
AU-SIM-CARDS |3213 |232242 | | |
UK-SIM-CARDS |3213 |1333223 |268 |38 |8
CA-SIM-CARDS |4321 |8883111 | | |
MX-SIM-CARDS |3193 |3291023 | | |
The most crucial part of this function involves aggregating the second table based on conversions
I would then I imagine execute this with an inner join.
Thank you.
Take this in two steps then:
1) Aggregate the second table:
SELECT Key, sum(Conversions) as Conversions, sum("Final Conversions") as FinalConversions, Sum("Active Sims") as ActiveSims FROM Table2 GROUP BY key
2) Use that as a subquery/derived table joining to your first table:
SELECT
t1.key,
t1.clicks,
t1.impressions,
t2.conversions,
t2.finalConversions,
t2.ActiveSims
From Table1 t1
LEFT OUTER JOIN (SELECT Key, sum(Conversions) as Conversions, sum("Final Conversions") as FinalConversions, Sum("Active Sims") as ActiveSims FROM Table2 GROUP BY 2) t2
ON t1.key = t2.key;
As an alternative, you could join and then group by as well since there isn't any need to aggregate twice or anything:
SELECT
t1.key,
t1.clicks,
t1.impressions,
sum(Conversions) as Conversions,
sum("Final Conversions") as FinalConversions,
Sum("Active Sims") as ActiveSims
From Table1 t1
LEFT OUTER JOIN table2 t2
ON t1.key = t2.key
GROUP BY t1.key, t1.clicks, t1.impressions
The only other important thing here is that we are using a LEFT OUTER JOIN since we want all record from Table1 and any records from Table2 that match on the key.

SQL calculating stock per month

I have specific task, and don't know how to realize it. I hope someone can help me =)
I have stock_move table:
product_id |location_id |location_dest_id |product_qty |date_expected |
-----------|------------|-----------------|------------|--------------------|
327 |80 |84 |10 |2014-05-28 00:00:00 |
327 |80 |84 |10 |2014-05-23 00:00:00 |
327 |80 |84 |10 |2014-02-26 00:00:00 |
327 |80 |85 |10 |2014-02-21 00:00:00 |
327 |80 |84 |10 |2014-02-12 00:00:00 |
327 |84 |85 |20 |2014-02-06 00:00:00 |
322 |84 |80 |120 |2015-12-16 00:00:00 |
322 |80 |84 |30 |2015-12-10 00:00:00 |
322 |80 |84 |30 |2015-12-04 00:00:00 |
322 |80 |84 |15 |2015-11-26 00:00:00 |
i.e. it's table of product moves from one warehouse to second.
I can calculate stock at custom date if I use something like this:
select
coalesce(si.product_id, so.product_id) as "Product",
(coalesce(si.stock, 0) - coalesce(so.stock, 0)) as "Stock"
from
(
select
product_id
,sum(product_qty * price_unit) as stock
from stock_move
where
location_dest_id = 80
and date_expected < now()
group by product_id
) as si
full outer join (
select
product_id
,sum(product_qty * price_unit) as stock
from stock_move
where
location_id = 80
and date_expected < now()
group by product_id
) as so
on si.product_id = so.product_id
Result I have current stock:
Product |Stock |
--------|------|
325 |1058 |
313 |34862 |
304 |2364 |
BUT what to do if I need stock per month?
something like this?
Month |Total Stock |
--------|------------|
Jan |130238 |
Feb |348262 |
Mar |2323364 |
How can I sum product qty from start period to end of each month?
I have just one idea - it's use 24 sub queries for get stock per each month (ex. below)
Jan |Feb | Mar |
----|----|-----|
123 |234 |345 |
End after this rotate rows and columns?
I think this's stupid, but I don't know another way... Help me pls =)
Something like this could give you monthly "ending" inventory snapshots. The trick is your data may omit certain months for certain parts, but that part will still have a balance (ie 50 received in January, nothing happened in February, but you still want to show February with a running total of 50).
One way to handle this is to come up with all possible combinations part/dates. I assumed 1/1/14 + 24 months in this example, but that's easily changed in the all_months subquery. For example, you may only want to start with the minimum date from the stock_move table.
with all_months as (
select '2014-01-01'::date + interval '1 month' * generate_series(0, 23) as month_begin
),
stock_calc as (
select
product_id, date_expected,
date_trunc ('month', date_expected)::date as month_expected,
case
when location_id = 80 then -product_qty * price_unit
when location_dest_id = 80 then product_qty * price_unit
else 0
end as qty
from stock_move
union all
select distinct
s.product_id, m.month_begin::date, m.month_begin::date, 0
from
stock_move s
cross join all_months m
),
running_totals as (
select
product_id, date_expected, month_expected,
sum (qty) over (partition by product_id order by date_expected) as end_qty,
row_number() over (partition by product_id, month_expected
order by date_expected desc) as rn
from stock_calc
)
select
product_id, month_expected, end_qty
from running_totals
where
rn = 1

postgresql Recycle ID numbers

I have a table as follows
|GroupID | UserID |
--------------------
|1 | 1 |
|1 | 2 |
|1 | 3 |
|2 | 1 |
|2 | 2 |
|3 | 20 |
|3 | 30 |
|5 | 200 |
|5 | 100 |
Basically what this does is create a "group" which user IDs get associated with, so when I wish to request members of a group I can call on the table.
Users have the option of leaving a group, and creating a new one.
When all users have left a group, there's no longer that groupID in my table.
Lets pretend this is for a chat application where users might close and open chats constantly, the group IDs will add up very quickly, but the number of chats will realistically not reach millions of chats with hundreds of users.
I'd like to recycle the group ID numbers, such that when I goto insert a new record, if group 4 is unused (as is the case above), it gets assigned.
There are good reasons not to do this, but it's pretty straightforward in PostgreSQL. The technique--using generate_series() to find gaps in a sequence--is useful in other contexts, too.
WITH group_id_range AS (
SELECT generate_series((SELECT MIN(group_id) FROM groups),
(SELECT MAX(group_id) FROM groups)) group_id
)
SELECT min(gir.group_id)
FROM group_id_range gir
LEFT JOIN groups g ON (gir.group_id = g.group_id)
WHERE g.group_id IS NULL;
That query will return NULL if there are no gaps or if there are no rows at all in the table "groups". If you want to use this to return the next group id number regardless of the state of the table "groups", use this instead.
WITH group_id_range AS (
SELECT generate_series(
(COALESCE((SELECT MIN(group_id) FROM groups), 1)),
(COALESCE((SELECT MAX(group_id) FROM groups), 1))
) group_id
)
SELECT COALESCE(min(gir.group_id), (SELECT MAX(group_id)+1 FROM groups))
FROM group_id_range gir
LEFT JOIN groups g ON (gir.group_id = g.group_id)
WHERE g.group_id IS NULL;