PostgreSQL - How can I SUM until a certain hour of the day? - postgresql

I'm trying to create a metric for a PostgreSQL integrated dashboard which would show today's "Total Payment Value" (TPV) of a certain product, as well as yesterday's TPV of the same product, up until the same moment as today, so if I'm accessing the dashboard at 5 pm, it will show what it was yesterday until 5 pm and today's TPV.
edit: My question wasn't very clear so I'm adding a few more lines and editing the query, which had a mistake.
I tried this:
select
sum(case when table.product in (13,14,15,16) then amount else 0 end) as "TPV"
,date_trunc('day', table.date) as "Day"
from table
where
date > current_date - 1
group by date_trunc('day', table.date)
order by 2,1
I only want to sum the amount when product = 13, 14, 15 or 16
An example of the product, date and amount would be like this:
product amount date
8 4750 19/03/2019 00:21
14 7840 12/04/2019 22:40
14 15000 22/03/2019 18:27
14 11715 19/03/2019 00:12
14 1054 22/03/2019 18:22
14 18491 17/03/2019 14:28
14 12253 17/03/2019 14:30
14 27600 17/03/2019 14:32
14 3936 17/03/2019 14:28
14 19007 19/03/2019 00:14
8 9400 19/03/2019 00:21
8 4750 19/03/2019 00:21
8 25000 19/03/2019 00:17
14 10346 22/03/2019 18:23
I would like to have a metric that always calculates the sum of the product value today up until the current moment - when the "product" corresponds to values 13, 14, 15 or 16 - as well as the same metric for yesterday, e.g., it's 1 PM now, I want today's TPV until 1 PM and yesterday's TPV until 1 PM as well!

Related

Maximum count of overlapping intervals in PostgreSQL

Suppose there is a table structured as follows:
id start end
--------------------
01 00:18 00:23
02 00:22 00:31
03 00:23 00:48
04 00:23 00:39
05 00:24 00:25
06 00:24 00:31
07 00:24 00:38
08 00:25 00:37
09 00:26 00:42
10 00:31 00:34
11 00:33 00:38
The objective is to compute the overall maximum number of rows having been active (i.e. between start and end) at any given moment in time. This would be relatively straightforward using a procedural algorithm, but I'm not sure how to do this in SQL.
According to the above example, this maximum value would be 8 and would correspond to the 00:31 timestamp where active rows were 2, 3, 4, 6, 7, 8, 9, 10 (as shown in the schema below).
Obtaining the timestamp(s) and the active rows corresponding to the maximum value is not important, all is needed is the actual value itself.
I was thinking of at first, using generate_series() to iterate every minute and get the count of active intervals for each, then take the max of this.
You can improve your idea and iterate only "start" values from the table because one of "start" points includes in time interval with maximum active rows.
select id, start,
(select count(1) from tbl t where tbl.start between t.start and t."end")
from tbl;
Here results
id start count
-----------------
1 00:18:00 1
2 00:22:00 2
3 00:23:00 4
4 00:23:00 4
5 00:24:00 6
6 00:24:00 6
7 00:24:00 6
8 00:25:00 7
9 00:26:00 7
10 00:31:00 8
11 00:33:00 7
So, this query gives you maximum number of rows having been active
select
max((select count(1) from tbl t where tbl.start between t.start and t."end"))
from tbl;
max
-----
8

Billing cycle, get a date every month (no such Feb 30)

I have a column called anchor which is a timestamp. I have a row with value of jan 30 2020. I want to compare this to feb 29 2020, and it should give me 1 month. Even though its not 30 days, but feb has no more days after 29. I am trying to bill every month.
Here is my sql fiddle - http://sqlfiddle.com/#!17/6906d/2
create table subscription (
id serial,
anchor timestamp
);
insert into subscription (anchor) values
('2020-01-30T00:00:00.0Z'),
('2019-01-30T00:00:00.0Z');
select id,
anchor,
AGE('2020-02-29T00:00:00.0Z', anchor) as "monthsToFeb29-2020",
AGE('2019-02-28T00:00:00.0Z', anchor) as "monthsToFeb28-2019"
from subscription;
Is it possible to get age in the way I am speaking?
My expected results:
For age from jan 30 2020 to feb 29 2020 i expect 1.0 month
For age from jan 30 2020 to feb 28 2019 i expect -11.0 month
For age from jan 30 2019 to feb 29 2020 i expect 13.0 month
For age from jan 30 2019 to feb 28 2019 i expect 1.0 month
(this is how momentjs library does it for those node/js guys out there):
const moment = require('moment');
moment('Jan 30 2019', 'MMM DD YYYY').diff(moment('Feb 29 2020', 'MMM DD YYYY'), 'months', true) === -13.0
moment('Jan 30 2019', 'MMM DD YYYY').diff(moment('Feb 28 2019', 'MMM DD YYYY'), 'months', true) === -1.0
How about:
select round(('2/29/2020'::date - '1/30/2020'::date) / 30.0);
round
-------
1
select round(('02/28/2019'::date - '1/30/2020'::date ) / 30.0);
round
-------
-11
select round(('2/29/2020'::date - '1/30/2019'::date) / 30.0);
round
-------
13
select round(('2/28/2019'::date - '01/30/2019'::date) / 30.0);
round
-------
1
The date subtraction gives you a integer value of days, then you divide by a 30 day month and round to nearest integer. You could put this in a function and use that.

Past year sales value flag in tableau

I have a graph where I have sales value for Jan, Feb, Mar state wise. I want to mark a flag in a column, if sales exceeds previous month for March month say.
Sales Sales Sales
City1 Person 1 12 29 10
Person 2 14 15 19
Person 3 23 24 11
City2 Person 4 22 28 30
Person 5 14 15 10
Person 6 23 24 2
Jan Feb Mar and so on
So basically expected output would be:
Sales Sales Sales Flag
City1 Person 1 12 29 10 Down arrow
Person 2 14 15 19 Up arrow
Person 3 23 24 11 Down Arrow
City2 Person 4 22 28 30 Up arrow
Person 5 14 15 10 Down arrow
Person 6 23 24 2 Down arrow
Jan Feb Mar and so on
Can anyone tell how to do this in a graph visualisation?
The values of sales are sum (Sales) for Jan , Feb and Mar respectively for Cities corresponding to each person

Day and night average per day in R

I have a data set from april to october with registered data every 5 minutes per day. I want to get the average temperature and RH of day and night for every day, considering "day" from 7:30 to 18:30 and "night" for the rest of hours,
The table looks like this:
Date Time Temp RH
18/04/2018 00:00:00 21.9 73
18/04/2018 00:05:00 21.9 73
18/04/2018 00:10:00 21.8 73
18/04/2018 00:15:00 21.6 73
18/04/2018 00:20:00 21.6 72
18/04/2018 00:25:00 21.5 72
18/04/2018 00:30:00 21.4 74
And so on till october. I have tried codes from similar questions but for some reason or the other, I always get an error. In one example I saw that there is a column with "AM/PM" values to make this simpler, but then I'd have to create this new column for all the rows. Also tried with "hourly.apply" but it seems that the function doesn't exist.
What I want to obtain is this:
Date Time Temp RH
18/04/2018 day 25.8 80
18/04/2018 night 17.3 43
19/04/2018 day 24.2 73
19/04/2018 night 15.1 42
I typed the code:
> n=287
> T24_GH111 <- aggregate(GH111[,3],list(rep(1:nrow(GH111%%n+1), each=n, leng=nrow(GH111))),mean)[-1];`
But this will give me the average of 24 hours.
Thanks in advance!
Let's start with a simple example and create a dateframe with datetimes.
library(lubridate) # for datetime manipulation
# Creating simple example
Datetime <- c(as.POSIXct("2018-04-17 22:00", tz="Europe/Berlin"),
as.POSIXct("2018-04-18 01:00", tz="Europe/Berlin"),
as.POSIXct("2018-04-18 10:00", tz="Europe/Berlin"),
as.POSIXct("2018-04-18 13:00", tz="Europe/Berlin"),
as.POSIXct("2018-04-18 22:00", tz="Europe/Berlin"),
as.POSIXct("2018-04-19 01:00", tz="Europe/Berlin")
)
x <- c(1,3,10,20,2,5)
df <- data.frame(Datetime,x)
Now, we are using local_time() from the lubridate package to define a new day/night variable.
# Getting local time in hours
df$time <- local_time(df$Datetime, units ="hours")
# Setting day night parameter
t1 <- 7.5 # 07:30
t2 <- 18.5 # 18:30
df$dayNight <- ""
idx <- xor(t1 < df$time ,df$time < t2)
df$dayNight[idx] <- "day"
df$dayNight[!idx] <- "night"
To aggregate by day, we need to change the dates for all datetimes < 07:30. Fortunately, we have already set up the local time. So, let's use this for setting up a dummyDate variable. (This will be the resulting Date)
cond <- df$time < t1
# Using dummyDate for aggregate for dayNight values per day
df$dummyDate <- df$Datetime
df$dummyDate[nightCondition] <- df$Datetime[nightCondition] - days(1)
df$dummyDate <- floor_date(df$dummyDate, unit = "day") # flooring date for aggregation
df
Datetime x time dayNight dummyDate
1 2018-04-17 22:00:00 1 22 hours day 2018-04-17
2 2018-04-18 01:00:00 3 1 hours day 2018-04-17
3 2018-04-18 10:00:00 10 10 hours night 2018-04-18
4 2018-04-18 13:00:00 20 13 hours night 2018-04-18
5 2018-04-18 22:00:00 2 22 hours day 2018-04-18
6 2018-04-19 01:00:00 5 1 hours day 2018-04-18
Now, we have set up all variables to use the aggregate function to calculate the mean of x by dayNight and dummyDate
# Aggregating x value per dummyDate and daynight variables
dfAgg <- aggregate(df[,2], list(Date = df$dummyDate, Time = df$dayNight), mean)
dfAgg
Date Time x
1 2018-04-17 day 2.0
2 2018-04-18 day 3.5
3 2018-04-18 night 15.0

Calculating Running Avg for YTD Sum with constant denominator for a year

I have the following table from SQL
ID Date Score
-----+-------------+----------
10 2015-01-10 5
20 2015-01-10 5
10 2015-02-10 15
40 2015-02-10 25
30 2015-02-10 5
10 2015-03-10 15
10 2014-01-10 25
20 2014-02-10 35
50 2014-03-10 45
In Tableau I want a line graph to display
(YTD Sum of Score)/Total number of IDs for a year.
For Jan 2015 - 10/4=2.5
For Feb 2015 - 55/4=13.75
For Jan 2014 - 60/3=20
The denominator should remain constant throughout the year and not change monthwise.
Looks like you can achieve your desired result with two calculated fields. First, make a [Year] field with:
year([Date])
Then make a second calculated field as follows:
sum([Score])/sum({fixed [Year] : countd([Id])})
This will sum the score and divide by IDs for the given year. It uses Level of Detail calculation.