Get truncked data from a table - postgresSQL - postgresql

I want to get truncked data over the last month. My time is in unix timestamps and I need to get data from last 30 days for each specific day.
The data is in the following form:
{
"id":"648637",
"exchange_name":"BYBIT",
"exchange_icon_url":"https://cryptowisdom.com.au/wp-content/uploads/2022/01/Bybit-colored-logo.png",
"trade_time":"1675262081986",
"price_in_quote_asset":23057.5,
"price_in_usd":1,
"trade_value":60180.075,
"base_asset_icon":"https://assets.coingecko.com/coins/images/1/large/bitcoin.png?1547033579",
"qty":2.61,
"quoteqty":60180.075,
"is_buyer_maker":true,
"pair":"BTCUSDT",
"base_asset_trade":"BTC",
"quote_asset_trade":"USDT"
}
I need to truncate data based on trade_time
How do I write the query?

The secret sauce is the date_trunc function, which takes a timestamp with time zone and truncates it to a specific precision (hour, day, week, etc). You can then group based on this value.
In your case we need to convert these unix timestamps javascript style timestamps to timestamp with time zone first, which we can do with to_timestamp, but it's still a fairly simple query.
SELECT
date_trunc('day', to_timestamp(trade_time / 1000.0)),
COUNT(1)
FROM pings_raw
GROUP BY date_trunc('day', to_timestamp(trade_time / 1000.0))
Another approach would be to leave everything as numbers, which might be marginally faster, though I find it less readable
SELECT
(trade_time/(1000*60*60*24))::int * (1000*60*60*24),
COUNT(1)
FROM pings_raw
GROUP BY (trade_time/(1000*60*60*24))::int

Related

Subract Additional Time from $__timeFilter

I want to subract additional time in $__timeFilter in grafana. Like if I have selected Last 7 days, I want to run 2 queries which do a comparison like one query gives me avg cpu utilization for last 7 days and another one gives me avg cpu utilzation for now() - 14d to now() - 7d. And this is dynamic. I can get for 6hrs, 2days or anything selected.
My database is TimescaleDB and grafana version in 8.3.5
Edit
Query is
select avg(cpu) from cpu_utilization where $__timeFilter(timestamp)
Whatever is selected in the time filter in grafana, the query is manipulated accordingly
Now with grafana understands this query becomes as following. if I select last 24hrs
select avg(cpu) from cpu_utilization where timestamp BETWEEN '2022-09-07 05:32:10' and '2022-09-08 05:32:10'
This is normal behaviour. Now I wanted that if I select last 24hrs, this query to behave as it is but an additional query becomes
select avg(cpu) from cpu_utilization where timestamp BETWEEN '2022-09-06 05:32:10' and '2022-09-07 05:32:10'
(I just don't want it for last 24hrs, but any relative time period selected in the filter)
Answer : https://stackoverflow.com/a/73658919/14817486
You can use the global variables $__to and $__from.
For example, ${__from:date:seconds} will give you a timestamp in seconds. You can then subtract 7 days (= 604800 seconds) from it and use it in your query's WHERE clause. Depending on your SQL dialect, that might be by using TIMESTAMP(), TO_TIMESTAMP() or something similar. So it would look similar to this:
[...] WHERE timestamp BETWEEN TO_TIMESTAMP(${__from:date:seconds}-604800) AND TO_TIMESTAMP(${__to:date:seconds}-604800) [...]
Interesting question! If I understood correctly, you could use the timestamp column as the reference as the grafana is already filtering by this to the comparison query. So you can get the min(timestamp) and max(timestamp) to know the limits of your period and then build something from it.
Like min(timestamp) - INTERVAL '7 days' would give you the start of the previous range, and max(timestamp) - INTERVAL '7 days' would offer the final scope.

Difference between two timestamps as timestamp across multiple days

I have two timestamps and I would like to have a result with the difference between them. I found a similar question asked here but I have noticed that:
select
to_char(column1::timestamp - column2::timestamp, 'HH:MS:SS')
from
table
Gives me an incorrect return if these timestamps cross multiple days. I know that I can use EPOCH to work out the number of hours/days/minutes/seconds etc but my use case requires the result as a timestamp (or a string...anything not an interval!).
In the case of multiple days I would like to continue counting the hours, even if it should go past 24. This would allow results like:
36:55:01
I'd use the built-in date_part function (as previously described in an older thread: How to convert an interval like "1 day 01:30:00" into "25:30:00"?) but finally cast the result to the type you desire:
SELECT
from_date,
to_date,
to_date - from_date as date_diff_interval,
(date_part('epoch', to_date - from_date) * INTERVAL '1 second')::text as date_diff_text
from (
(select
'2018-01-01 04:03:06'::timestamp as from_date,
'2018-01-02 16:58:07'::timestamp as to_date)
) as dates;
This results in the following:
I'm currently unaware of any way to convert this interval into a timestamp and also not sure whether there is a use for it. You're still dealing with an interval and you'd need a point of reference in time to transform that interval into an actual timestamp.

Postgres: using timestamps for pagination

I have table with created (timestamptz) property. Now, i need to create pagination based on timestamp, because while user is watching first page, new items could be submitted into this table, which will make data inconsistent in case if i'll use OFFSET for pagination.
So, the question is: should i keep created type as timestamptz or it's better to convert it into integer (unix, e.g. 1472031802812). If so, is there any disadvantages? Also, atm i have now() as default value in created - is there alternative function to create unix timestamp?
Let me rewrite things from comments to my answer. You want to use timestamp type instead of integer simply because that's exactly what it was designed for. Doing manual convertions between timestamp integers and timestamp objects is just a pain and you gain nothing. And you will need it eventually for more complex datetime based queries.
To answer a question about pagination. You simply do a query
SELECT *
FROM table_name
WHERE created < lastTimestamp
ORDER BY created DESC
LIMIT 30
If it is first query then you set say lastTimestamp = '3000-01-01'. Otherwise you set lastTimestamp = last_query.last_row.created.
Optimization
Note that if the table is big then ORDER BY created DESC might not be efficient (especially if called parallely with different ranges). In this case you can use moving "time windows", for example:
SELECT *
FROM table_name
WHERE
created < lastTimestamp
AND created >= lastTimestamp - interval '1 day'
The 1 day interval is picked arbitrarly (tune it to your needs). You can also sort results in the app.
If results is not empty then you update (in your app)
lastTimestamp = last_query.last_row.created
(assuming you've done sorting, otherwise you take min(last_query.row.created))
If results is empty then you repeat the query with lastTimestamp = lastTimestamp - interval '1 day' until you fetch something. Also you have to stop if lastTimestamp becomes to low, i.e. when it is lower then any other timestamp in the table (which has to be prefetched).
All of that is under some assumptions for inserts:
new_row.created >= any_row.created and
new_row.created ~ current_time
The distribution of new_row.created is more or less uniform
Assumption 1 ensures that pagination results in consistent data while assumption 2 is only needed for the default 3000-01-01 date. Assumption 3 is to make sure that you don't have big empty gaps when you have to issue many empty queries.
You mean something like this?
select extract(epoch from now())::integer as unix_time

Using generate_series() to produce 30 day date ranges for each day?

My goal is to create a table that looks something like this using PostgreSQL:
date date_start date_end
12/16/2015 11/17/2015 12/16/2015
12/17/2015 11/18/2015 12/17/2015
etc.
So that I can then join to a different table to get the aggregations for each date on a rolling 30 day window. I've been doing some research and I think generate_series() is what I want to use, but I am unsure.
Something like this:
SELECT '2015-12-16'::date + g - 30 AS date_start
, '2015-12-16'::date + g AS date_end
FROM generate_series (0, 25) g; -- number of rows
I skipped the redundant date column. (You shouldn't use a basic type name as identifier anyways.)
There is a variant of generate_series() that works with timestamps, but the simple version generating integer numbers is just as good for dates. Maybe even better because you avoid possible confusion with time zones.
Always use the ISO 8601 format for date literals, which is unambiguous with any datestyle or locale settings.
Related:
Rolling sum / count / average over date interval

Selecting records between two timestamps

I am converting an Unix script with a SQL transact command to a PostgreSQL command.
I have a table with records that have a field last_update_time(xtime) and I want to select every record in the table that has been updated within a selected period.
Say, the current time it 05/01/2012 10:00:00 and the selected time is 04/01/2012 23:55:00. How do I select all the records from a table that have been updated between these dates. I have converted the 2 times to seconds in the Unix script prior to issuing the psql command, and have calculated the interval in seconds between the 2 periods.
I thought something like
SELECT A,B,C FROM table
WHERE xtime BETWEEN now() - interval '$selectedtimeParm(in secs)' AND now();
I am having trouble evaluating the Parm for the selectedtimeParm - it doesn't resolve properly.
Editor's note: I did not change the inaccurate use of the terms period, time frame, time and date for the datetime type timestamp because I discuss that in my answer.
What's wrong with:
SELECT a,b,c
FROM table
WHERE xtime BETWEEN '2012-04-01 23:55:00'::timestamp
AND now()::timestamp;
If you want to operate with a count of seconds as interval:
...
WHERE xtime BETWEEN now()::timestamp - (interval '1s') * $selectedtimeParm
AND now()::timestamp;
Note the standard ISO 8601 date format YYYY-MM-DD h24:mi:ss which is unambiguous with any locale or DateStyle setting.
The first value for the BETWEEN construct must be the smaller one. If you don't know which value is smaller use BETWEEN SYMMETRIC instead.
In your question you refer to the datetime type timestamp as "date", "time" and "period". In the title you used the term "time frames", which I changed to "timestamps". All of these terms are wrong. Freely interchanging them makes the question even harder to understand.
That, and the fact that you only tagged the question psql (the problem hardly concerns the command line terminal) might help to explain why nobody answered for days. Normally, it's a matter of minutes around here. I had a hard time understanding your question.
Understand the data types date, interval, time and timestamp - with or without time zone. Start by reading the chapter "Date/Time Types" in the manual.
Error message would have gone a long way, too.
For anyone who is looking for the fix to this. You need to remove timestamp from the where clause and use BETWEEN!
TABLENAME.COL-NAME-FOR-TIMESTAMP BETWEEN '2020-01-29 04:18:00-06' AND CURRENT_TIMESTAMP