Generate TSQL random dates between a range - tsql

Generate 650 dates between 3rd Jan to 16th Jan 2023 and exclude weekends
and time between working hours i.e 9 AM to 5 PM
randomised to the day and right to milliseconds .

WITH CTE AS (
SELECT TOP 1000
DATEADD(MILLISECOND, ABS(CHECKSUM(NEWID())) % 1000, DATEADD(MINUTE, ABS(CHECKSUM(NEWID())) % 480 + 540,
DATEADD(DAY, ABS(CHECKSUM(NEWID())) % 14, '2023-01-03 00:00:00'))) AS RandomDate
FROM
sys.all_objects
)
SELECT
TOP 650 RandomDate
FROM
CTE
WHERE
DATEPART(WEEKDAY, RandomDate) NOT IN (1, 7) -- exclude weekends
AND RandomDate BETWEEN '2023-01-03 09:00:00' AND '2023-01-16 17:00:00'
ORDER BY
RandomDate;
In this query, the DATEADD function is used three times to generate a random date and time between 3rd Jan 2023 and 16th Jan 2023 with a random milliseconds value. The first DATEADD call adds a random number of days (ABS(CHECKSUM(NEWID())) % 14) to 3rd Jan 2023, generating a random date between 3rd Jan 2023 and 16th Jan 2023. The second DATEADD call generates a random time between 9:00 AM and 5:00 PM by adding a random number of minutes (ABS(CHECKSUM(NEWID())) % 480 + 540) to the random date generated in the first DATEADD call. The third DATEADD call adds a random milliseconds value. The WHERE clause is then used to exclude weekends and to further restrict the random dates and times to only those between 9:00 AM and 5:00 PM on each date.

Related

Certain Range of Date in each Month

I'd like to have a range of day 20th - 25th in each month in BigQuery but i dont know what syntax should i use. For ex:
Jan 20 - 25
Feb 20 - 25
and so on
I only can think of creating a CTE for every month then union all those.
Consider below query.
SELECT DATE_ADD(month, INTERVAL day - 1 DAY) date_range,
FROM UNNEST(GENERATE_DATE_ARRAY('2022-01-01', '2022-03-01', INTERVAL 1 MONTH)) month,
UNNEST(GENERATE_ARRAY(20, 25)) day;
Query results
Below seem to be more simple than my original answer and you could adjust date range by specifying condition on WHERE clause.
SELECT *
FROM UNNEST(GENERATE_DATE_ARRAY('2022-01-01', '2022-12-31', INTERVAL 1 DAY)) date_range
WHERE EXTRACT(DAY FROM date_range) BETWEEN 21 AND 25
For the usecase that you commented,
WHERE EXTRACT(DAY FROM date_range) >= 21 OR EXTRACT(DAY FROM date_range) = 1

How to split and aggregate days into different month

db fiddle
run select *, return_date - pickup_date as total from order_history order by id; return the following result:
id pickup_date return_date date_ranges total
1 2020-03-01 2020-03-12 [2020-03-01,2020-04-01) 11
2 2020-03-01 2020-03-22 [2020-03-01,2020-04-01) 21
3 2020-03-11 2020-03-22 [2020-03-01,2020-04-01) 11
4 2020-02-11 2020-03-22 [2020-02-01,2020-03-01) 40
5 2020-01-01 2020-01-22 [2020-01-01,2020-02-01) 21
6 2020-01-01 2020-04-22 [2020-01-01,2020-02-01) 112
for example:
--id=6. total = 112. 112 = 22+ 31 + 29 + 30
--therefore toal should split: jan2020: 30, feb2020:29, march2020: 31, 2020apr:22.
first split then aggregate. aggregate based over range min(pickup_date), max(return_date) then tochar cast to 'YYYY-MM'; In this case the aggregate should group by 2020-01, 2020-02, 2020-03,2020-04.
but if pickup_date in the same month with return_date then compuate return_date - pickup_date then aggregate/sum the result, group by to_char(pickup_date,'YYYY-MM')
step-by-step demo: db<>fiddle
Not quite perfect, but a sketch:
SELECT
id,
ARRAY_AGG( -- 4
LEAST(return_date, gs + interval '1 month - 1 day') -- 2
- GREATEST(pickup_date, gs) -- 3
+ interval '1 day'
)
FROM order_history,
generate_series( -- 1
date_trunc('month', pickup_date),
date_trunc('month', return_date),
interval '1 month'
) gs
GROUP BY id
Generate a set of months that are included in the given date range
a) Calculate the last day of the month (first of a month + 1 month is first of the next month; minus 1 day is last of the current month). This is the max day for returning in this month. b) if it happened earlier, then take the earler day (LEAST())
Same for pickup day. Afterwards calculate the difference of the days kept in one month.
Aggregate the values for one month.
Open questions / Potential enhancements:
You said:
jan2020: 30, feb2020:29, march2020: 31, 2020apr:22.
Why is JAN given with 30 days? On the other hand you count APR 22 days (1st - 22nd). Following the logic, JAN should be 31, shouldn't it?
If you don't want to count the very first day, then you can change (3.) to
GREATEST(pickup_date + interval '1 day', gs)
There's a problem with day saving time in March (30 days, 23 hours instead of 31 days). This can be faced by some rounding, for example.

Groupby year, calculate sum and percentage per year

I have a table with the columns
datefield area
I want to calculate sum of area per year and a percentage column
year sum percentage
2022 5 12
2023 10 24
2024 6 15
[null] 20 49
(I have many more years in the table which I want to include)
WITH total as(
select extract(YEAR from "datefield") theyear, sum(area) as totalarea
from thetable
group by extract(YEAR from "datefield")
)
select total.theyear, total.totalareal,
totalarea/(SUM(totalarea) OVER (PARTITION BY theyear))*100
from total
I get correct sum, but all the percentages are 100..
What am I doing wrong?
Some sample data:
2019 7.05
2020 4.77
2020 3.56
2021 1.64
2021 8.37
2021 3.51
2021 1.43
2021 9.94
2022 1.91
2022 5.3
I would like the result
2019 7.05 15
2020 8.33 18
2021 24.89 52
2022 7.21 15
WITH
total as
(
select extract(YEAR from "datefield") theyear, sum(area) as totalarea,
SUM(sum(area)) OVER() as SUM_totalarea
from thetable
group by extract(YEAR from "datefield")
)
SELECT theyear, totalarea, 100.0 * totalarea / SUM_totalarea AS PERCENTAGE
FROM total

7 Day Return/Retention Rate

I've been trying to calculate 7 Day Return Rate (also known as Classic Retention Rate, as described here: https://www.braze.com/blog/calculate-retention-rate/) and then taking a 30 day average to reduce noise in Postgresql.
However, I'm sure I'm doing something wrong. First of all, the numbers look waaay higher than intuitively I feel they should be (generally around 5% for the rest of the sector). Also, I believe the first 7 days should show 0, as theoretically users should take at least 7 days to count as a "return". However, I get around 40-70%, as shown below.
Would someone mind taking a look at the code below and seeing if there are any errors? 7 Day Return Rate is a really common metric for apps, and I haven't found any questions using postgresql that calculate it to this level of sophistication on Stack Exchange (or even the rest of the web), so I feel like a solid response could be very useful to a lot of people.
Sample data
Wednesday, August 1, 2018 12:00 AM 71.14
Thursday, August 2, 2018 12:00 AM 55.44
Friday, August 3, 2018 12:00 AM 50.09
Saturday, August 4, 2018 12:00 AM 45.81
Sunday, August 5, 2018 12:00 AM 43.27
Monday, August 6, 2018 12:00 AM 40.61
Tuesday, August 7, 2018 12:00 AM 39.38
Wednesday, August 8, 2018 12:00 AM 38.46
Thursday, August 9, 2018 12:00 AM 36.81
Friday, August 10, 2018 12:00 AM 35.94
with
user_first_event as (
select distinct id, min(timestamp)::date as first_event_date
from log
where
timestamp <= current_date
and timestamp >= {{start_date}} and timestamp <= {{end_date}}
group by id),
event as (
select distinct id, timestamp::date as user_event_date
from log
where timestamp <= current_date and timestamp >= {{start_date}}),
gap as (
select
user_first_event.id,
user_first_event.first_event_date,
event.user_event_date,
event.user_event_date - user_first_event.first_event_date as days_since_signup
from user_first_event
join event on user_first_event.id = event.id
where user_first_event.first_event_date <= event.user_event_date),
conversion_rate as (
select
first_event_date,
(sum(case when days_since_signup = 7 then 1 else 0 end) * 100.0 /
count(distinct id)
) as seven_day_retention_rate
from gap
group by first_event_date
)
SELECT first_event_date,
AVG(seven_day_retention_rate)
OVER(ORDER BY first_event_date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW) AS rolling_avg_retention_rate
FROM conversion_rate
The problem is a bit easier than your query makes it seem, you can actually do it with just one subquery and one out query as follows:
select first_event_date
, avg(seven_day_return) as seven_day_return_day_only
, avg( avg(seven_day_return) ) OVER(ORDER BY first_event_date asc ROWS BETWEEN 29 preceding AND CURRENT ROW ) AS thirty_day_rolling_retention
from (
--inner query to get value for user, 1 if they retain and 0 if they do not
select min(timestamp)::date as first_event_date
, case when array_agg(timestamp::date) #> ARRAY[ (min(timestamp)::date + 7) ] then 1 else 0 end as seven_day_return
from log
group by id ) t
group by t.first_event_date;
Note that this weights each day equally rather than each user equally across days. If you want to weight the average by user across days then you can update the outer calculation using more aggregates and windows to compute the value with weightings.
Reference: http://sqlfiddle.com/#!17/ee17e/1/0
If you don't have access to array_agg (but have access to window functions) you can use:
select first_event_date
, avg(seven_day_return) as day_seven_day_return
, avg( avg(seven_day_return) ) OVER(ORDER BY first_event_date asc ROWS BETWEEN 29 preceding AND CURRENT ROW ) AS thirty_day_rolling_retention
from (
--inner query to get value for user
select min(timestamp)::date as first_event_date
, case when exists(select 1 from log l2 where l2.id = log.id and l2.timestamp::date = min(log.timestamp)::date + 7) then 1 else 0 end as seven_day_return
from log
group by id ) t
group by t.first_event_date;

Date Range search - tSQL using nvarchar

Apologies for not using the correct type (date). Poor choice using nvarchar, but I cannot convert at this stage.
To the question:
I want to be able to search for data in a certain date range, e.g., 10.01.16 -> 19.02.16
However, it seems to bring back only the first two digits worth of data, so everything between 10 and 19 regardless of month and year.
My query is as follows:
SELECT ID, Day, Date FROM oneHr$
WHERE date >= CONVERT(NVARCHAR, '10.01.16', 4)
AND date <= CONVERT(NVARCHAR , '19.02.16', 4)
ORDER BY Date ASC
Any ideas? Help very much appreciated and thanks in advance.
This is what is being returned:
ID Day Date
--------------------
943 fri 10.02.15
746 mon 10.02.16
234 tue 10.03.15
835 fri 10.04.15
988 tue 10.05.15
487 wed 11.01.16
343 wed 11.02.15
874 mon 12.01.16
663 thu 12.01.15
198 tue 12.02.15
775 wed 13.01.16
993 thu 14.01.15
375 fri 15.03.15
337 wed 16.12.15
784 tue 17.11.15
777 mon 18.08.15
252 thu 19.01.16
664 wed 19.02.15
UPDATE
So, I've changed Date to be of type datetime and all looking good. However, I'm trying to define a range rather than hard code it and it isn't working. Any ideas?
set #date1 = '2016-01-01 00:00:00' -- Date1 (start range)
set #date2 = '2016-01-10 00:00:00' -- Date2 (end range)
/* Not Working */
select * from oneHr$
where Date >= #date1
and Date <= #date2
order by ID
/* Working */
select * from oneHr$
where Date >= '2016-01-01 00:00:00'
and Date <= '2016-01-10 00:00:00'
order by ID
Why not do something like this?
SELECT ID, Day, Date FROM oneHr$
WHERE CONVERT(DATE, date, 4) >= #Date1
AND CONVERT(DATE, date, 4) <= #Date2
ORDER BY Date ASC
Then you won't have to convert your inputs to nvarchar at all.