Complex logic to create time series in Postgres - postgresql

I have a sample dataset like below and I would like to create a report in such a format that the Value is updated for all the dates between the Start and End date.
Input Dataset
ID Start End Value
232 "2022-06-08 18:49:00" "2022-11-18 08:06:00" 55
456 "2022-10-17 10:24:00" "2022-12-16 12:52:00" 100
From the above Dataset I would like to create another dataset as below.
I need to generate the date series from the START and END date from the Input dataset and fill the same value to all of those value.
Any ideas or suggestions will be helpful.
Expected Output
ID Date Value
232 "2022-06-08" 55
232 "2022-06-09" 55
232 "2022-06-10" 55
232 "2022-06-11" 55
232 "2022-06-12" 55
.
.
232 "2022-11-17" 55
232 "2022-11-18" 55
456 "2022-10-17" 100
456 "2022-10-18" 100
456 "2022-10-19" 100
.
.
456 "2022-12-15" 100
456 "2022-12-16" 100
Database : Postgres 12

You can use generate_series()
select t.id,
g.dt::date as date,
t.value
from the_table t
cross join generate_series(t."Start"::date, t."End"::date, interval '1 day') as g(dt)
order by t.id, g.dt

Related

Getting percentage change between selected data within a column in PostgreSQL

I am using PostgreSQL and I am trying to calculate the percentage change for two values in the same column and group them by the name column and I am having trouble.
Suppose I have the following table:
name
day
score
Allen
1
87
Allen
2
89
Allen
3
95
Bob
1
64
Bob
2
68
Bob
3
75
Carl
1
71
Carl
2
77
Carl
3
80
I want the result to be the name and the percentage change for each person between day 3 and day 1. So Allen would be 9.2 because from 87 to 95 is a 9.2 percent increase.
I want the result to be:
name
percent_change
Allen
9.2
Bob
17.2
Carl
12.7
Thanks for your help.
Try this...
with dummy_table as (
select
name,
day,
score as first_day_score,
lag(score, 2) over (partition by name order by day desc) as last_day_score
from YOUR_TABLE_NAME
)
select
name,
(last_day_score - first_day_score) / first_day_score::decimal as percentage_change
from dummy_table where last_day_score is not null
Just replace YOUR_TABLE_NAME. There are likely more performant and fancier solutions, but this works.
You can try with lag function, something like this:
select name, day, score, 100*(score - lag(score, 1) over (partition by name order by day))/(lag(score, 1) over (partition by name order by day)) as growth_percentage

Adding a column to a table from the previous row in T-SQL

Given a row with a timestamp column and some value column (from a device) which are already in a table in Azure SQL database, I want to add a new column to the row from a most recent record which meets certain criteria (most recent will be defined by the timestamp column). The criteria is whether the value falls into a range (between 95 and 5). I want to do this for every row.
Here is an input table:
ts (Timestamp) value (integer)
------------------------------------
2019-09-22 00:00:00 90
2019-09-21 23:10:05 75
2019-09-21 23:09:00 85
2019-09-21 22:09:00 00
2019-09-21 14:09:00 70
Now I want to add a column to this table:
ts (Timestamp) value prev_value
---------------------------------------
2019-09-22 00:00:00 90 75
2019-09-21 23:10:05 75 85
2019-09-21 23:09:00 85 70
2019-09-21 22:09:00 00 70
2019-09-21 14:09:00 70 NULL
I have been trying different SQL statements but haven't bee successful so far.
So basically you want something like lag, but with a condition.
The easy way to do that is to use a correlated subquery.
First, create and populate sample table (Please save us this step in your future questions):
DECLARE #T AS TABLE
(
ts datetime2,
[value] int
)
INSERT INTO #T (ts, [value]) VALUES
('2019-09-22T00:00:00', 90),
('2019-09-21T23:10:05', 75),
('2019-09-21T23:09:00', 85),
('2019-09-21T22:09:00', 00),
('2019-09-21T14:09:00', 70);
The query:
SELECT ts,
value,
(
SELECT TOP 1 value
FROM #T T1
WHERE T0.ts > T1.ts
AND T1.value >= 5
AND T1.value <= 95
ORDER BY t1.ts DESC
) As prev_value
FROM #T T0
ORDER BY ts DESC
Results:
ts value prev_value
2019-09-22 00:00:00 90 75
2019-09-21 23:10:05 75 85
2019-09-21 23:09:00 85 70
2019-09-21 22:09:00 0 70
2019-09-21 14:09:00 70 NULL

Tableau: How to perform "Summarize totals except top 3"?

I have data something like below for the name of person and the total sales he/she made:
ABC1 34
ABC2 45
ABC3 78
ABC4 79
ABC5 23
ABC6 61
ABC7 34
ABC8 54
ABC9 90
I have to display the dashboard as below, top 3 sales guys and the overall total sales made by rest of the team as ROT which is 498 - (90 + 78 + 79) = 251 team:
ABC9 90
ABC4 79
ABC3 78
ROT 251
For the top sales made, I gave a filter by sales person name, with Limit condition as "Top 3". But I am struggling to display the ROI even in a separate worksheet. Any help is appreciated.
Right click on your dimension [Sales Guy] and choose Create/Set
Define the set by the Top N (either hard code it or use a parameter to change it easily) and call it [TopNSalesGuy]
Create a calculated field [TopNSalesGuysPlusOther] with the formular:
IF [TopNSalesGuy] THEN [Sales Guy] ELSE 'ROT' END
Use [TopNSalesGuysPlusOther] in your table/graph and you should have the top N sales guys by name and everythign else as 'ROT

Postgresql Query for display of records every 45 days

I have a table that has data of user_id and the timestamp they joined.
If I need to display the data month-wise I could just use:
select
count(user_id),
date_trunc('month',(to_timestamp(users.timestamp))::timestamp)::date
from
users
group by 2
The date_trunc code allows to use 'second', 'day', 'week' etc. Hence I could get data grouped by such periods.
How do I get data grouped by "n-day" period say 45 days ?
Basically I need to display number users per 45 day period.
Any suggestion or guidance appreciated!
Currently I get:
Date Users
2015-03-01 47
2015-04-01 72
2015-05-01 123
2015-06-01 132
2015-07-01 136
2015-08-01 166
2015-09-01 129
2015-10-01 189
I would like the data to come in 45 days interval. Something like :-
Date Users
2015-03-01 85
2015-04-15 157
2015-05-30 192
2015-07-14 229
2015-08-28 210
2015-10-12 294
UPDATE:
I used the following to get the output, but one problem remains. I'm getting values that are offset.
with
new_window as (
select
generate_series as cohort
, lag(generate_series, 1) over () as cohort_lag
from
(
select
*
from
generate_series('2015-03-01'::date, '2016-01-01', '45 day')
)
t
)
select
--cohort
cohort_lag -- This worked. !!!
, count(*)
from
new_window
join users on
user_timestamp <= cohort
and user_timestamp > cohort_lag
group by 1
order by 1
But the output I am getting is:
Date Users
2015-04-15 85
2015-05-30 157
2015-07-14 193
2015-08-28 225
2015-10-12 210
Basically The users displayed at 2015-03-01 should be the users between 2015-03-01 and 2015-04-15 and so on.
But I seem to be getting values of users upto a date. ie: upto 2015-04-15 users 85. which is not the results I want.
Any help here ?
Try this query :
SELECT to_char(i::date,'YYYY-MM-DD') as date, 0 as users
FROM generate_series('2015-03-01', '2015-11-30','45 day'::interval) as i;
OUTPUT :
date users
2015-03-01 0
2015-04-15 0
2015-05-30 0
2015-07-14 0
2015-08-28 0
2015-10-12 0
2015-11-26 0
This looks like a hot mess, and it might be better wrapped in a function where you could use some variables, but would something like this work?
with number_of_intervals as (
select
min (timestamp)::date as first_date,
ceiling (extract (days from max (timestamp) - min (timestamp)) / 45)::int as num
from users
),
intervals as (
select
generate_series(0, num - 1, 1) int_start,
generate_series(1, num, 1) int_end
from number_of_intervals
),
date_spans as (
select
n.first_date + 45 * i.int_start as interval_start,
n.first_date + 45 * i.int_end as interval_end
from
number_of_intervals n
cross join intervals i
)
select
d.interval_start, count (*) as user_count
from
users u
join date_spans d on
u.timestamp >= d.interval_start and
u.timestamp < d.interval_end
group by
d.interval_start
order by
d.interval_start
With this sample data:
User Id timestamp derived range count
1 3/1/2015 3/1-4/15
2 3/26/2015 "
3 4/4/2015 "
4 4/6/2015 " (4)
5 5/6/2015 4/16-5/30
6 5/19/2015 " (2)
7 6/16/2015 5/31-7/14
8 6/27/2015 "
9 7/9/2015 " (3)
10 7/15/2015 7/15-8/28
11 8/8/2015 "
12 8/9/2015 "
13 8/22/2015 "
14 8/27/2015 " (5)
Here is the output:
2015-03-01 4
2015-04-15 2
2015-05-30 3
2015-07-14 5

Not getting desired format from Oracle query

I am trying to fetch data from data base in below format,
Month Count
----- -----
201208 124
201209 0
201210 56
201211 25
201212 0
201301 184
201302 0
In database I have entries like,
Month Count
----- -----
201206 56
201208 124
201210 56
201211 25
201301 184
201304 49
Below is my query,
SELECT MONTH, Count
FROM TABLE_NAME
WHERE MONTH BETWEEN 201208 AND 201302
AND ID = 'X'
Output :
Month Count
----- -----
201208 124
201210 56
201211 25
201301 184
Can anyone help me getting data in desired format.
First you should generate full month's sequence between these dates. You can do it with CONNECT BY LEVEL in Oracle. then just JOIN this sequence with your table:
SELECT MonthSeq.MONTH,
NVL(Count,0) Count
FROM TABLE_NAME
RIGHT JOIN
(
SELECT
TO_CHAR(ADD_MONTHS(TO_DATE('201208','YYYYMM'),
(ROWNUM-1))
,'YYYYMM') MONTH
FROM DUAL
CONNECT BY LEVEL<=
MONTHS_BETWEEN(TO_DATE('201302','YYYYMM') ,
TO_DATE('201208','YYYYMM'))+1
) MonthSeq
ON TABLE_NAME.MONTH=MonthSeq.MONTH
ORDER BY MonthSeq.MONTH
SQLFiddle demo
UPD:
Your query from the comment should looks like the following. You should move WHERE condition to the JOIN ON. If you use it in WHERE you don't get rows with zero counts.
SELECT MonthSeq.MONTH,
NVL(SUM(TOTAL_SESSIONS),0) AS SESSIONS
FROM X
RIGHT JOIN
(
SELECT
TO_CHAR(ADD_MONTHS(TO_DATE('201208','YYYYMM'),
(ROWNUM-1))
,'YYYYMM') MONTH
FROM DUAL
CONNECT BY LEVEL<=
MONTHS_BETWEEN(TO_DATE('201302','YYYYMM') ,
TO_DATE('201208','YYYYMM'))+1
) MonthSeq
ON X.MONTH=MonthSeq.MONTH and X.acct_id = 'ABCD'
ORDER BY MonthSeq.MONTH
You need to use TO_DATE function to convert the month field to DATE format. Refer here for more in detail. Try like this,
SELECT TO_CHAR(TO_DATE(MONTH, 'YYYYMM'), 'YYYYMM') month, count
FROM TABLE_NAME
WHERE month BETWEEN TO_DATE('201208', 'YYYYMM') AND TO_DATE('201302', 'YYYYMM')
AND id = 'X'
ORDER BY TO_DATE(month, 'YYYYMM');