Looking for a way to generate "snapshots" of historical data when I have the final data, and the change to the data each week.
I have this data in table CurrentData:
CurrentDate CurrentAmount
--------------------------
2013-07-24 400
And I also have this data in table ChangeData:
ChangeDate ChangeAmount
--------------------------
2013-07-23 -2
2013-07-22 -4
2013-07-21 10
2013-07-20 1
And I want to be able to show what the data looked like over time. For example:
TotalDate TotalAsOfThisDate
--------------------------------
2013-07-24 400
2013-07-23 402
2013-07-22 406
2013-07-21 396
2013-07-20 395
Understanding I will have to build the total based off of the prior day's data each day, I have tried a plethora of different things, cursors, temp tables, etc. Wondering how I would go about building this type of view in SQL. I am running SQL Server 2008R2.
This might not be the best performing query, but should work fine for small tables:
SELECT CurrentDate as TotalDate, CurrentAmount as TotalAsOfThisDate
FROM CurrentData
UNION ALL
SELECT ChangeDate, (SELECT CurrentAmount from CurrentData)
-(SELECT SUM(ChangeAmount)
FROM ChangeData cd2
WHERE cd2.ChangeDate >= cd1.ChangeDate)
FROM ChangeData cd1
ORDER BY TotalDate DESC
There's a trick you can pull with the OVER clause and a CTE....
;WITH Source AS (
SELECT CurrentDate , CurrentValue FROM CurrentData
UNION
SELECT ChangeDate , -ChangeAmount FROM ChangeData )
SELECT CurrentDate, SUM(CurrentValue)
OVER (ORDER BY CurrentDate DESC)
FROM Source
Related
I am working on a query to return the next 7 days worth of data every time an event happens indicated by "where event = 1". The goal is to then group all the data by the user id and perform aggregate functions on this data after the event happens - the event is encoded as binary [0, 1].
So far, I have been attempting to use nested select statements to structure the data how I would like to have it, but using the window functions is starting to restrict me. I am now thinking a self join could be more appropriate but need help in constructing such a query.
The query currently first creates daily aggregate values grouped by user and date (3rd level nested select). Then, the 2nd level sums the data "value_x" to obtain an aggregate value grouped by the user. Then, the 1st level nested select statement uses the lead function to grab the next rows value over and partitioned by each user which acts as selecting the next day's value when event = 1. Lastly, the select statement uses an aggregate function to calculate the average "sum_next_day_value_after_event" grouped by user and where event = 1. Put together, where event = 1, the query returns the avg(value_x) of the next row's total value_x.
However, this doesn't follow my time rule; "where event = 1", return the next 7 days worth of data after the event happens. If there is not 7 days worth of data, then return whatever data is <= 7 days. Yes, I currently only have one lead with the offset as 1, but you could just put 6 more of these functions to grab the next 6 rows. But, the lead function currently just grabs the next row without regard to date. So theoretically, the next row's "value_x" could actually be 15 days from where "event = 1". Also, as can be seen below in the data table, a user may have more than one row per day.
Here is the following query I have so far:
select
f.user_id
avg(f.sum_next_day_value_after_event) as sum_next_day_values
from (
select
bld.user_id,
lead(bld.value_x, 1) over(partition by bld.user_id order by bld.daily) as sum_next_day_value_after_event
from (
select
l.user_id,
l.daily,
sum(l.value_x) as sum_daily_value_x
from (
select
user_id, value_x, date_part('day', day_ts) as daily
from table_1
group by date_part('day', day_ts), user_id, value_x) l
group by l.user_id, l.day_ts
order by l.user_id) bld) f
group by f.user_id
Below is a snippet of the data from table_1:
user_id
day_ts
value_x
event
50
4/2/21 07:37
25
0
50
4/2/21 07:42
45
0
50
4/2/21 09:14
67
1
50
4/5/21 10:09
8
0
50
4/5/21 10:24
75
0
50
4/8/21 11:08
34
0
50
4/15/21 13:09
32
1
50
4/16/21 14:23
12
0
50
4/29/21 14:34
90
0
55
4/4/21 15:31
12
0
55
4/5/21 15:23
34
0
55
4/17/21 18:58
32
1
55
4/17/21 19:00
66
1
55
4/18/21 19:57
54
0
55
4/23/21 20:02
34
0
55
4/29/21 20:39
57
0
55
4/30/21 21:46
43
0
Technical details:
PostgreSQL, supported by EDB, version = 14.1
pgAdmin4, version 5.7
Thanks for the help!
"The query currently first creates daily aggregate values"
I don't see any aggregate function in your first query, so that the GROUP BY clause is useless.
select
user_id, value_x, date_part('day', day_ts) as daily
from table_1
group by date_part('day', day_ts), user_id, value_x
could be simplified as
select
user_id, value_x, date_part('day', day_ts) as daily
from table_1
which in turn provides no real added value, so this first query could be removed and the second query would become :
select user_id
, date_part('day', day_ts) as daily
, sum(value_x) as sum_daily_value_x
from table_1
group by user_id, date_part('day', day_ts)
The order by user_id clause can also be removed at this step.
Now if you want to calculate the average value of the sum_daily_value_x in the period of 7 days after the event (I'm referring to the avg() function in your top query), you can use avg() as a window function that you can restrict to the period of 7 days after the event :
select f.user_id
, avg(f.sum_daily_value_x) over (order by f.daily range between current row and '7 days' following) as sum_next_day_values
from (
select user_id
, date_part('day', day_ts) as daily
, sum(value_x) as sum_daily_value_x
from table_1
group by user_id, date_part('day', day_ts)
) AS f
group by f.user_id
The partition by f.user_id clause in the window function is useless because the rows have already been grouped by f.user_id before the window function is applied.
You can replace the avg() window function by any other one, for instance sum() which could better fit with the alias sum_next_day_values
I have two tables and I am trying to find data gaps in them where the dates do not overlap.
Item Table:
id unique start_date end_date data
1 a 2019-01-01 2019-01-31 X
2 a 2019-02-01 2019-02-28 Y
3 b 2019-01-01 2019-06-30 Y
Plan Table:
id item_unique start_date end_date
1 a 2019-01-01 2019-01-10
2 a 2019-01-15 'infinity'
I am trying to find a way to produce the following
Missing:
item_unique from to
a 2019-01-11 2019-01-14
b 2019-01-01 2019-06-30
step-by-step demo:db<>fiddle
WITH excepts AS (
SELECT
item,
generate_series(start_date, end_date, interval '1 day') gs
FROM items
EXCEPT
SELECT
item,
generate_series(start_date, CASE WHEN end_date = 'infinity' THEN ( SELECT MAX(end_date) as max_date FROM items) ELSE end_date END, interval '1 day')
FROM plan
)
SELECT
item,
MIN(gs::date) AS start_date,
MAX(gs::date) AS end_date
FROM (
SELECT
*,
SUM(same_day) OVER (PARTITION BY item ORDER BY gs)
FROM (
SELECT
item,
gs,
COALESCE((gs - LAG(gs) OVER (PARTITION BY item ORDER BY gs) >= interval '2 days')::int, 0) as same_day
FROM excepts
) s
) s
GROUP BY item, sum
ORDER BY 1,2
Finding the missing days is quite simple. This is done within the WITH clause:
Generating all days of the date range and subtract this result from the expanded list of the second table. All dates that not occur in the second table are keeping. The infinity end is a little bit tricky, so I replaced the infinity occurrence with the max date of the first table. This avoids expanding an infinite list of dates.
The more interesting part is to reaggregate this list again, which is the part outside the WITH clause:
The lag() window function take the previous date. If the previous date in the list is the last day then give out true (here a time changing issue occurred: This is why I am not asking for a one day difference, but a 2-day-difference. Between 2019-03-31 and 2019-04-01 there are only 23 hours because of daylight saving time)
These 0 and 1 values are aggregated cumulatively. If there is one gap greater than one day, it is a new interval (the days between are covered)
This results in a groupable column which can be used to aggregate and find the max and min date of each interval
Tried something with date ranges which seems to be a better way, especially for avoiding to expand long date lists. But didn't come up with a proper solution. Maybe someone else?
I'm trying to do an insert with this "with" statement but it seems like it supports select statement only, therefore I want to convert it into select statement. I'm just amazed on how this is working. Got a similar example on stack and changed it to fit my needs.
with temp (startdate, enddate, maxdate) as (
select min(salesdate) startdate, min(salesdate)+3 months enddate, max(salesdate) maxdate
from SALES
union all
select startdate + 3 months + 1 days, enddate + 3 months + 1 days, maxdate from temp
where enddate <= maxdate
)
select startdate, min(enddate, maxdate) from temp;
Thanks in advance.
Edit: It seems my query is misunderstood. Here is the pseudo code of what the query is supposed to be doing. The query is returning the expected result which is pretty amazing to me. I don't know how the recursive doesn't overlap after I added 1 day. After writing the pseudo code, I see that the select startdate + 3 months + 1 days should have been written as select enddate + 1 days which logically says what it's supposed to do instead of magically work:
rows = []
startdate = min(salesdate)
enddate = startdate + 3 months
maxdate = max(salesdate)
i = 0;
do {
rows[i++] = [startdate, min(enddate, maxdate)] // min for final iteration where enddate > maxdate.
startdate = enddate + 1 days
enddate = enddate + 1 days + 3 months // aka: startdate + 3 months
} while (enddate <= maxdate)
return rows
Hence, I've broken a huge date range into smaller chunks of 3 months ranges. Whether it is exactly 90 days or 91 days is not important, as long as I get every single date without gap and without overlap.
I'm curious about your decision, that a query with a recursive common table expression (RCTE) is "not normal". IBM calls it as 'select-statement' and considers it as normal. If it's some educational question, and you don't want to use RCTE due to some reason, then consider the the following example.
select s + (3*(x.i-1)) month start, s + (3*x.i) month - 1 day end
from table(values (date('2011-01-01'), date('2012-01-01'))) d(s, e)
, xmltable('for $id in (1 to $e) return <i>{number($id)}</i>'
passing ((year(e)-year(s))*12 + (month(e)-month(s)))/3 as "e"
columns i int path '.'
) x;
START END
---------- ----------
2011-01-01 2011-03-31
2011-04-01 2011-06-30
2011-07-01 2011-09-30
2011-10-01 2011-12-31
;
It's a little bit complicated, since you must pass desired number of rows to return to the xmltable table function, which returns a single column with values 1 to N. In other words you must compute desired number of 3-months intervals and pass it to the function.
(R)CTE can't be used in the UPDATE/DELETE statements, where you are able to use so called fullselect statements only (they don't allow CTEs). If you really need to use CTE for UPDATE/DELETE as in this case, you can do one of the following:
If you ARE ABLE to compute a temporary result set for whole delete/update statement, you can do something like this (I don't use here RCTE for simplicity, but a simple CTE only):
with a (id) as (values 1)
select count(1)
from old table(
delete from test t
where exists (select 1 from a where a.id=t.id)
);
If you ARE NOT ABLE to compute a temporary result set for whole delete/update statement, you can create a table/scalar function with the corresponding parameters, where you are able to use your RCTE. This function can be used in the outer statement afterwards.
I was able to insert it by moving the insert statement in front of the "with" statement. Copying answer here so I know next time. Although, I'm still interested in seeing how to convert it to a pure select statement. Will select that answer as the correct answer.
insert into my_temp_table
with temp (startdate, enddate, maxdate) as (
select min(salesdate) startdate, min(salesdate)+3 months enddate, max(salesdate) maxdate
from SALES
union all
select startdate + 3 months + 1 days, enddate + 3 months + 1 days, maxdate from temp
where enddate <= maxdate
)
select startdate, min(enddate, maxdate) from temp;
I have a database table that contains a start visdate and an end visdate. If a date is within this range the asset is marked available. Assets belong to a user. My query takes in a date range (start and end date). I need to return data so that for a date range it will query the database and return a count of assets for each day in the date range that assets are available.
I know there are a few examples, I was wondering if it's possible to just execute this as a query/common table expression rather than using a function or a temporary table. I'm also finding it quite complicated because the assets table does not contain one date which an asset is available on. I'm querying a range of dates against a visibility window. What is the best way to do this? Should I just do a separate query for each day in the date range I'm given?
Asset Table
StartvisDate Timestamp
EndvisDate Timestamp
ID int
User Table
ID
User & Asset Join table
UserID
AssetID
Date | Number of Assets Available | User
11/11/14 5 UK
12/11/14 6 Greece
13/11/14 4 America
14/11/14 0 Italy
You need to use a set returning function to generate the needed rows. See this related question:
SQL/Postgres datetime division / normalizing
Example query to get you started:
with data as (
select id, start_date, end_date
from (values
(1, '2014-12-02 14:12:00+00'::timestamptz, '2014-12-03 06:45:00+00'::timestamptz),
(2, '2014-12-05 15:25:00+00'::timestamptz, '2014-12-05 07:29:00+00'::timestamptz)
) as rows (id, start_date, end_date)
)
select data.id,
count(data.id)
from data
join generate_series(
date_trunc('day', data.start_date),
date_trunc('day', data.end_date),
'1 day'
) as days (d)
on days.d >= date_trunc('day', data.start_date)
and days.d <= date_trunc('day', data.end_date)
group by data.id
id | count
----+-------
1 | 2
2 | 1
(2 rows)
You'll want to convert it to using ranges instead, and adapt it to your own schema and data, but it's basically the same kind of query as the one you want.
I want to return the sum of daily spent since the beginning of the current insertion order (invoice) for a number of clients. Each client unfortunately has a different start date for the current insertion order.
I don't have any problem to pull the start date for each client but I don't get how to create a sort of lookup to a table with the start dates associated to each client.
Let's say I have a table IO:
ClientId StartDate
1 2014-10-01
2 2014-10-04
3 2014-09-17
...
And another table with the DailySpend for each Client:
Date Client Spend
2014-10-01 1 2325
2014-10-01 2 195
2014-10-01 3 434
2014-10-02 1 43624
...
Now, I would simply want to check for each client how much we spend from the start date of the current insertion order until yesterday.
May be something lyk this
SELECT a.client,
Sum(b.spend)
FROM [IO] a
JOIN DailySpend b
ON a.id = b.id
and a.startdate=>b.date
WHERE b.date <= Dateadd(dd, -1, Cast(Getdate() AS DATE))
GROUP BY client
select *
from IO
join DailySpend
on IO.ClientId = DailySpend.Client
and DailySpend.Date <= IO.StartDate
and datediff(dd, getdate(), DailySpend.Date) <= 1
select DailySpend.Client, sum(DailySpend.Spend)
from IO
join DailySpend
on IO.ClientId = DailySpend.Client
and DailySpend.Date >= IO.StartDate
and datediff(dd, getdate(), DailySpend.Date) <= 1
group by DailySpend.Client
you may need to flip the date order in the datediff