Using generate_series() to produce 30 day date ranges for each day? - postgresql

My goal is to create a table that looks something like this using PostgreSQL:
date date_start date_end
12/16/2015 11/17/2015 12/16/2015
12/17/2015 11/18/2015 12/17/2015
etc.
So that I can then join to a different table to get the aggregations for each date on a rolling 30 day window. I've been doing some research and I think generate_series() is what I want to use, but I am unsure.

Something like this:
SELECT '2015-12-16'::date + g - 30 AS date_start
, '2015-12-16'::date + g AS date_end
FROM generate_series (0, 25) g; -- number of rows
I skipped the redundant date column. (You shouldn't use a basic type name as identifier anyways.)
There is a variant of generate_series() that works with timestamps, but the simple version generating integer numbers is just as good for dates. Maybe even better because you avoid possible confusion with time zones.
Always use the ISO 8601 format for date literals, which is unambiguous with any datestyle or locale settings.
Related:
Rolling sum / count / average over date interval

Related

Get truncked data from a table - postgresSQL

I want to get truncked data over the last month. My time is in unix timestamps and I need to get data from last 30 days for each specific day.
The data is in the following form:
{
"id":"648637",
"exchange_name":"BYBIT",
"exchange_icon_url":"https://cryptowisdom.com.au/wp-content/uploads/2022/01/Bybit-colored-logo.png",
"trade_time":"1675262081986",
"price_in_quote_asset":23057.5,
"price_in_usd":1,
"trade_value":60180.075,
"base_asset_icon":"https://assets.coingecko.com/coins/images/1/large/bitcoin.png?1547033579",
"qty":2.61,
"quoteqty":60180.075,
"is_buyer_maker":true,
"pair":"BTCUSDT",
"base_asset_trade":"BTC",
"quote_asset_trade":"USDT"
}
I need to truncate data based on trade_time
How do I write the query?
The secret sauce is the date_trunc function, which takes a timestamp with time zone and truncates it to a specific precision (hour, day, week, etc). You can then group based on this value.
In your case we need to convert these unix timestamps javascript style timestamps to timestamp with time zone first, which we can do with to_timestamp, but it's still a fairly simple query.
SELECT
date_trunc('day', to_timestamp(trade_time / 1000.0)),
COUNT(1)
FROM pings_raw
GROUP BY date_trunc('day', to_timestamp(trade_time / 1000.0))
Another approach would be to leave everything as numbers, which might be marginally faster, though I find it less readable
SELECT
(trade_time/(1000*60*60*24))::int * (1000*60*60*24),
COUNT(1)
FROM pings_raw
GROUP BY (trade_time/(1000*60*60*24))::int

I need to add up lots of values between date ranges as quickly as possible using PostgreSQL, what's the best method?

Here's a simple example of what I'm trying to do:
CREATE TABLE daily_factors (
factor_date date,
factor_value numeric(3,1));
CREATE TABLE customer_date_ranges (
customer_id int,
date_from date,
date_to date);
INSERT INTO
daily_factors
SELECT
t.factor_date,
(random() * 10 + 30)::numeric(3,1)
FROM
generate_series(timestamp '20170101', timestamp '20210211', interval '1 day') AS t(factor_date);
WITH customer_id AS (
SELECT generate_series(1, 100000) AS customer_id),
date_from AS (
SELECT
customer_id,
(timestamp '20170101' + random() * (timestamp '20201231' - timestamp '20170101'))::date AS date_from
FROM
customer_id)
INSERT INTO
customer_date_ranges
SELECT
d.customer_id,
d.date_from,
(d.date_from::timestamp + random() * (timestamp '20210211' - d.date_from::timestamp))::date AS date_to
FROM
date_from d;
So I'm basically making two tables:
a list of daily factors, one for every day from 1st Jan 2017 until today's date;
a list of 100,000 "customers" all who have a date range between 1st Jan 2017 and today, some long, some short, basically random.
Then I want to add up the factors for each customer in their date range, and take the average value.
SELECT
cd.customer_id,
AVG(df.factor_value) AS average_value
FROM
customer_date_ranges cd
INNER JOIN daily_factors df ON df.factor_date BETWEEN cd.date_from AND cd.date_to
GROUP BY
cd.customer_id;
Having a non-equi join on a date range is never going to be pretty, but is there any way to speed this up?
The only index I could think of was this one:
CREATE INDEX performance_idx ON daily_factors (factor_date);
It makes a tiny difference to the execution time. When I run this locally I'm seeing around 32 seconds with no index, and around 28s with the index.
I can see that this is a massive bottleneck in the system I'm building, but I can't think of any way to make things faster. The ideas I did have were:
instead of using daily factors I could largely get away with monthly ones, but now I have the added complexity of "whole months and partial months" to work with. It doesn't seem like it's going to be worth it for the added complexity, e.g. "take 7 whole months for Feb to Aug 2020, then 10/31 of Jan 2020 and 15/30 of September 2020";
I could pre-calculate every average I will ever need, but with 1,503 factors (and that will increase with each new day), that's already 1,128,753 numbers to store (assuming we ignore zero date ranges and that my maths is right). Also my real world system has an extra level of complexity, a second identifier with 20 possible values, so this would mean having c.20 million numbers to pre-calculate. Also, every day the number of values to store grows exponentially;
I could take this work out of the database, and do it in code (in memory), as it seems like a relational database might not be the best solution here?
Any other suggestions?
The classic way to deal with this is to store running sums of factor_value, not (or in addition to) individual values. Then you just look up the running sum at the two end points (actually at the end, and one before the start), and take the difference. And of course divide by the count, to turn it into an average. I've never done this inside a database, but there is no reason it can't be done there.

Difference between two timestamps as timestamp across multiple days

I have two timestamps and I would like to have a result with the difference between them. I found a similar question asked here but I have noticed that:
select
to_char(column1::timestamp - column2::timestamp, 'HH:MS:SS')
from
table
Gives me an incorrect return if these timestamps cross multiple days. I know that I can use EPOCH to work out the number of hours/days/minutes/seconds etc but my use case requires the result as a timestamp (or a string...anything not an interval!).
In the case of multiple days I would like to continue counting the hours, even if it should go past 24. This would allow results like:
36:55:01
I'd use the built-in date_part function (as previously described in an older thread: How to convert an interval like "1 day 01:30:00" into "25:30:00"?) but finally cast the result to the type you desire:
SELECT
from_date,
to_date,
to_date - from_date as date_diff_interval,
(date_part('epoch', to_date - from_date) * INTERVAL '1 second')::text as date_diff_text
from (
(select
'2018-01-01 04:03:06'::timestamp as from_date,
'2018-01-02 16:58:07'::timestamp as to_date)
) as dates;
This results in the following:
I'm currently unaware of any way to convert this interval into a timestamp and also not sure whether there is a use for it. You're still dealing with an interval and you'd need a point of reference in time to transform that interval into an actual timestamp.

Q (KDB) selecting today's date within date range

I am trying to set up an dynamic threshold by different user, but only return result from today's date. I was able to return all the records from past 30 days, but I am having trouble only outputting today's date based on the calculation from past 30 days.. I am new to q language and really having a trouble with this simple statement :( (have tried and/or statement but not executing..) Thank you for all the help in advance!
select user, date, real*110 from table where date >= .z.D - 30, real> (3*(dev;real) fby user)+((avg;real) fby user)
Are you saying that you want to determine if any of todays "real" values are greater than 3 sigma based on the past 30 days? If so (without knowing much about your table structure) I'm guessing you could use something like this:
q)t:t,update user:`user2,real+(.0,39#10.0) from t:([] date:.z.D-til 40;user:`user1;real:20.1,10.0+39?.1 .0 -.1);
q)sigma:{avg[y]+x*dev y};
q)select from t where date>=.z.D-30, ({(.z.D=x`date)&x[`real]>sigma[3]exec real from x where date<>.z.D};([]date;real)) fby user
date user real
---------------------
2016.03.21 user1 20.1

how to change the first day of the week in PostgreSQL

I need to change the week start date from Monday to Saturday in PostgreSQL. I have tried SET DATEFIRST 6; but it doesn't work in PostgreSQL. Please suggest solution for this.
It appears that DATEFIRST is a thing in Microsoft Transact-SQL.
I don't believe Postgres has any exact equivalent, but you should be able to approximate it.
Postgres supports extracting various parts of a TIMESTAMP via the EXTRACT function. For your purposes, you would want to use either DOW or ISODOW.
DOW numbers Sunday (0) through Saturday (6), while ISODOW, which adheres to the ISO 8601 standard, numbers Monday (1) through Sunday (7).
From the Postgres doc:
This:
SELECT EXTRACT(DOW FROM TIMESTAMP '2001-02-18 20:38:40');
Returns 0, while this:
SELECT EXTRACT(ISODOW FROM TIMESTAMP '2001-02-18 20:38:40');
Returns 7.
So you would use a version of EXTRACT in your queries to get the day of the week number. If you're going to use it in many queries, I would recommend creating a function which would centralize the query in one spot, and return the number transposed as you would like such that it started on Saturday (the transposition would vary depending on which numbering method you used in EXTRACT). Then you could simply call that function in any SELECT and it would return the transposed number.
You can just add +x, where x is the offset between postgres dow and what you want to achieve.
For example:
You want Sunday to be the first day of the week. 2018-06-03 is a Sunday and you want to extract the dow of it:
SELECT EXTRACT(DOW FROM DATE '2018-06-03')+1;
returns 1
That is probably the reason why postgres dow yields 0 for Sunday. Would it be 7, then
SELECT EXTRACT(DOW FROM DATE '2018-06-03')+1;
would return 8.