Day and night average per day in R - aggregate

I have a data set from april to october with registered data every 5 minutes per day. I want to get the average temperature and RH of day and night for every day, considering "day" from 7:30 to 18:30 and "night" for the rest of hours,
The table looks like this:
Date Time Temp RH
18/04/2018 00:00:00 21.9 73
18/04/2018 00:05:00 21.9 73
18/04/2018 00:10:00 21.8 73
18/04/2018 00:15:00 21.6 73
18/04/2018 00:20:00 21.6 72
18/04/2018 00:25:00 21.5 72
18/04/2018 00:30:00 21.4 74
And so on till october. I have tried codes from similar questions but for some reason or the other, I always get an error. In one example I saw that there is a column with "AM/PM" values to make this simpler, but then I'd have to create this new column for all the rows. Also tried with "hourly.apply" but it seems that the function doesn't exist.
What I want to obtain is this:
Date Time Temp RH
18/04/2018 day 25.8 80
18/04/2018 night 17.3 43
19/04/2018 day 24.2 73
19/04/2018 night 15.1 42
I typed the code:
> n=287
> T24_GH111 <- aggregate(GH111[,3],list(rep(1:nrow(GH111%%n+1), each=n, leng=nrow(GH111))),mean)[-1];`
But this will give me the average of 24 hours.
Thanks in advance!

Let's start with a simple example and create a dateframe with datetimes.
library(lubridate) # for datetime manipulation
# Creating simple example
Datetime <- c(as.POSIXct("2018-04-17 22:00", tz="Europe/Berlin"),
as.POSIXct("2018-04-18 01:00", tz="Europe/Berlin"),
as.POSIXct("2018-04-18 10:00", tz="Europe/Berlin"),
as.POSIXct("2018-04-18 13:00", tz="Europe/Berlin"),
as.POSIXct("2018-04-18 22:00", tz="Europe/Berlin"),
as.POSIXct("2018-04-19 01:00", tz="Europe/Berlin")
)
x <- c(1,3,10,20,2,5)
df <- data.frame(Datetime,x)
Now, we are using local_time() from the lubridate package to define a new day/night variable.
# Getting local time in hours
df$time <- local_time(df$Datetime, units ="hours")
# Setting day night parameter
t1 <- 7.5 # 07:30
t2 <- 18.5 # 18:30
df$dayNight <- ""
idx <- xor(t1 < df$time ,df$time < t2)
df$dayNight[idx] <- "day"
df$dayNight[!idx] <- "night"
To aggregate by day, we need to change the dates for all datetimes < 07:30. Fortunately, we have already set up the local time. So, let's use this for setting up a dummyDate variable. (This will be the resulting Date)
cond <- df$time < t1
# Using dummyDate for aggregate for dayNight values per day
df$dummyDate <- df$Datetime
df$dummyDate[nightCondition] <- df$Datetime[nightCondition] - days(1)
df$dummyDate <- floor_date(df$dummyDate, unit = "day") # flooring date for aggregation
df
Datetime x time dayNight dummyDate
1 2018-04-17 22:00:00 1 22 hours day 2018-04-17
2 2018-04-18 01:00:00 3 1 hours day 2018-04-17
3 2018-04-18 10:00:00 10 10 hours night 2018-04-18
4 2018-04-18 13:00:00 20 13 hours night 2018-04-18
5 2018-04-18 22:00:00 2 22 hours day 2018-04-18
6 2018-04-19 01:00:00 5 1 hours day 2018-04-18
Now, we have set up all variables to use the aggregate function to calculate the mean of x by dayNight and dummyDate
# Aggregating x value per dummyDate and daynight variables
dfAgg <- aggregate(df[,2], list(Date = df$dummyDate, Time = df$dayNight), mean)
dfAgg
Date Time x
1 2018-04-17 day 2.0
2 2018-04-18 day 3.5
3 2018-04-18 night 15.0

Related

How to split and aggregate days into different month

db fiddle
run select *, return_date - pickup_date as total from order_history order by id; return the following result:
id pickup_date return_date date_ranges total
1 2020-03-01 2020-03-12 [2020-03-01,2020-04-01) 11
2 2020-03-01 2020-03-22 [2020-03-01,2020-04-01) 21
3 2020-03-11 2020-03-22 [2020-03-01,2020-04-01) 11
4 2020-02-11 2020-03-22 [2020-02-01,2020-03-01) 40
5 2020-01-01 2020-01-22 [2020-01-01,2020-02-01) 21
6 2020-01-01 2020-04-22 [2020-01-01,2020-02-01) 112
for example:
--id=6. total = 112. 112 = 22+ 31 + 29 + 30
--therefore toal should split: jan2020: 30, feb2020:29, march2020: 31, 2020apr:22.
first split then aggregate. aggregate based over range min(pickup_date), max(return_date) then tochar cast to 'YYYY-MM'; In this case the aggregate should group by 2020-01, 2020-02, 2020-03,2020-04.
but if pickup_date in the same month with return_date then compuate return_date - pickup_date then aggregate/sum the result, group by to_char(pickup_date,'YYYY-MM')
step-by-step demo: db<>fiddle
Not quite perfect, but a sketch:
SELECT
id,
ARRAY_AGG( -- 4
LEAST(return_date, gs + interval '1 month - 1 day') -- 2
- GREATEST(pickup_date, gs) -- 3
+ interval '1 day'
)
FROM order_history,
generate_series( -- 1
date_trunc('month', pickup_date),
date_trunc('month', return_date),
interval '1 month'
) gs
GROUP BY id
Generate a set of months that are included in the given date range
a) Calculate the last day of the month (first of a month + 1 month is first of the next month; minus 1 day is last of the current month). This is the max day for returning in this month. b) if it happened earlier, then take the earler day (LEAST())
Same for pickup day. Afterwards calculate the difference of the days kept in one month.
Aggregate the values for one month.
Open questions / Potential enhancements:
You said:
jan2020: 30, feb2020:29, march2020: 31, 2020apr:22.
Why is JAN given with 30 days? On the other hand you count APR 22 days (1st - 22nd). Following the logic, JAN should be 31, shouldn't it?
If you don't want to count the very first day, then you can change (3.) to
GREATEST(pickup_date + interval '1 day', gs)
There's a problem with day saving time in March (30 days, 23 hours instead of 31 days). This can be faced by some rounding, for example.

PostgreSQL - How can I SUM until a certain hour of the day?

I'm trying to create a metric for a PostgreSQL integrated dashboard which would show today's "Total Payment Value" (TPV) of a certain product, as well as yesterday's TPV of the same product, up until the same moment as today, so if I'm accessing the dashboard at 5 pm, it will show what it was yesterday until 5 pm and today's TPV.
edit: My question wasn't very clear so I'm adding a few more lines and editing the query, which had a mistake.
I tried this:
select
sum(case when table.product in (13,14,15,16) then amount else 0 end) as "TPV"
,date_trunc('day', table.date) as "Day"
from table
where
date > current_date - 1
group by date_trunc('day', table.date)
order by 2,1
I only want to sum the amount when product = 13, 14, 15 or 16
An example of the product, date and amount would be like this:
product amount date
8 4750 19/03/2019 00:21
14 7840 12/04/2019 22:40
14 15000 22/03/2019 18:27
14 11715 19/03/2019 00:12
14 1054 22/03/2019 18:22
14 18491 17/03/2019 14:28
14 12253 17/03/2019 14:30
14 27600 17/03/2019 14:32
14 3936 17/03/2019 14:28
14 19007 19/03/2019 00:14
8 9400 19/03/2019 00:21
8 4750 19/03/2019 00:21
8 25000 19/03/2019 00:17
14 10346 22/03/2019 18:23
I would like to have a metric that always calculates the sum of the product value today up until the current moment - when the "product" corresponds to values 13, 14, 15 or 16 - as well as the same metric for yesterday, e.g., it's 1 PM now, I want today's TPV until 1 PM and yesterday's TPV until 1 PM as well!

PostgreSQL - filter function for dates

I am trying to use the built-in filter function in PostgreSQL to filter for a date range in order to sum only entries falling within this time-frame.
I cannot understand why the filter isn't being applied.
I am trying to filter for all product transactions that have a created_at date of the previous month (so in this case that were created in June 2017).
SELECT pt.created_at::date, pt.customer_id,
sum(pt.amount/100::double precision) filter (where (date_part('month', pt.created_at) =date_part('month', NOW() - interval '1 month') and
date_part('year', pt.created_at) = date_part('year', NOW()) ))
from
product_transactions pt
LEFT JOIN customers c
ON c.id= pt.customer_id
GROUP BY pt.created_at::date,pt.customer_id
Please find my expected results (sum of the amount for each day in the previous month - for each customer_id if an entry for that day exists) and the actual results I get from the query - below (using date_trunc).
Expected results:
created_at| customer_id | amount
2017-06-30 1 220.5
2017-06-28 15 34.8
2017-06-28 12 157
2017-06-28 48 105.6
2017-06-27 332 425.8
2017-06-25 1 58.0
2017-06-25 23 22.5
2017-06-21 14 88.9
2017-06-17 2 34.8
2017-06-12 87 250
2017-06-05 48 135.2
2017-06-05 12 95.7
2017-06-01 44 120
Results:
created_at| customer_id | amount
2017-06-30 1 220.5
2017-06-28 15 34.8
2017-06-28 12 157
2017-06-28 48 105.6
2017-06-27 332 425.8
2017-06-25 1 58.0
2017-06-25 23 22.5
2017-06-21 14 88.9
2017-06-17 2 34.8
2017-06-12 87 250
2017-06-05 48 135.2
2017-06-05 12 95.7
2017-06-01 44 120
2017-05-30 XX YYY
2017-05-25 XX YYY
2017-05-15 XX YYY
2017-04-30 XX YYY
2017-03-02 XX YYY
2016-11-02 XX YYY
The actual results give me the sum for all dates in the database, so no date time-frame is being applied in the query for a reason I cannot understand. I'm seeing dates that are both not for June 2017 and also from previous years.
Use date_trunc(..) function:
SELECT pt.created_at::date, pt.customer_id, c.name,
sum(pt.amount/100::double precision) filter (where date_trunc('month', pt.created_at) = date_trunc('month', NOW() - interval '1 month'))
from
product_transactions pt
LEFT JOIN customers c
ON c.id= pt.customer_id
GROUP BY pt.created_at::date

Calculating Running Avg for YTD Sum with constant denominator for a year

I have the following table from SQL
ID Date Score
-----+-------------+----------
10 2015-01-10 5
20 2015-01-10 5
10 2015-02-10 15
40 2015-02-10 25
30 2015-02-10 5
10 2015-03-10 15
10 2014-01-10 25
20 2014-02-10 35
50 2014-03-10 45
In Tableau I want a line graph to display
(YTD Sum of Score)/Total number of IDs for a year.
For Jan 2015 - 10/4=2.5
For Feb 2015 - 55/4=13.75
For Jan 2014 - 60/3=20
The denominator should remain constant throughout the year and not change monthwise.
Looks like you can achieve your desired result with two calculated fields. First, make a [Year] field with:
year([Date])
Then make a second calculated field as follows:
sum([Score])/sum({fixed [Year] : countd([Id])})
This will sum the score and divide by IDs for the given year. It uses Level of Detail calculation.

Group by each date in a range having events with StartingTimestamp and EndingTimestamp

I would like to count all the events having in a calendar within January and group them by date. This events got a StartingTimestamp and an EndingTimestamp.
For example (Table rp.Calendar):
StartingTimestamp EndingTimestamp Title
24.01.2014 08:00 24.01.2014 10:00 Meeting
25.01.2014 17:00 26.01.2014 08:00 Home time
24.01.2014 26.01.2014 Holiday
26.01.2014 17:00 29.01.2014 08:00 Weekend
Now, the result I need, is:
Date Counter
24.01.2014 2
25.01.2014 2
26.01.2014 3
27.01.2014 1
28.01.2014 1
29.01.2014 1
This is your answer:
SELECT CONVERT(varchar(10),StartingTimestamp,110) AS Date, Count(*) AS Counter
FROM YourTableName
GROUP BY CONVERT(varchar(10),StartingTimestamp,110)
Change 110 to desire format:
101 mm/dd/yy
102 yy.mm.dd
103 dd/mm/yy
104 dd.mm.yy
105 dd-mm-yy
106 dd mon yy
107 Mon dd, yy
108 hh:mm:ss
110 mm-dd-yy
111 yy/mm/dd
112 yymmdd
see more on http://technet.microsoft.com/en-us/library/aa226054(v=sql.80).aspx
This will do for January or any month but it can be tweaked for longer periods if required:
WITH January AS (
SELECT 1 AS n
UNION ALL
SELECT n+1 FROM January WHERE n+1<=31
)
SELECT n,COUNT(*)
FROM January
JOIN yourtable ON n BETWEEN datepart(d,StartingTimestamp) AND datepart(d,EndingTimestamp)
GROUP BY n