Get the last timestamps in a group by time query in Influxdb - select

I have a database with price and timestamps in nanoseconds measurement in InfluxDB. When I do a select grouped by time like this one:
select first(price),last(price) from priceseries where time>=1496815212834974866 and time<=1496865599580302882 group by time(1s)
I received a time column in which the timestamps is aligned to the second beginning the group. For example, the timestamp will be 08:00:00 and the next timestamps will be 08:00:01
How to
apply aggregation function on the record timestamps itself like last(time) or first(time) so that to have the real first and last timestamps of the group (I can have many prices within my group) ?
and how the time column in the response could be the closing second and not the opening second, that is if the group goes from 08:00:00 to 08:00:01, I want to see 08:00:01 in my time column instead of 08:00:00 which I see now ?

Not when using an aggregation function, which implies use of group by.
select first(price), last(price) where time >= <..> and time <= <..> will give you the first and last price within that time window.
When the query has a group by, the aggregation applies only to values within the intervals. The values themselves are the real values that fall in the 08:00:00 - 08:00:01 interval, it's just that the timestamp shown is for the interval itself, not the actual values.
Meaning that the query for between 08:00:00 and 08:00:01 without a group by and the query with a group by time(1s) for same period will give same result. Only difference is query without group by will have the value's actual timestamp and the group by query will have the interval's timestamp instead.
The timestamp when using group by indicates the starting time of the interval. From that, you can calculate end time is start time + interval. What timestamp to show is not configurable in the query language.

Related

Get truncked data from a table - postgresSQL

I want to get truncked data over the last month. My time is in unix timestamps and I need to get data from last 30 days for each specific day.
The data is in the following form:
{
"id":"648637",
"exchange_name":"BYBIT",
"exchange_icon_url":"https://cryptowisdom.com.au/wp-content/uploads/2022/01/Bybit-colored-logo.png",
"trade_time":"1675262081986",
"price_in_quote_asset":23057.5,
"price_in_usd":1,
"trade_value":60180.075,
"base_asset_icon":"https://assets.coingecko.com/coins/images/1/large/bitcoin.png?1547033579",
"qty":2.61,
"quoteqty":60180.075,
"is_buyer_maker":true,
"pair":"BTCUSDT",
"base_asset_trade":"BTC",
"quote_asset_trade":"USDT"
}
I need to truncate data based on trade_time
How do I write the query?
The secret sauce is the date_trunc function, which takes a timestamp with time zone and truncates it to a specific precision (hour, day, week, etc). You can then group based on this value.
In your case we need to convert these unix timestamps javascript style timestamps to timestamp with time zone first, which we can do with to_timestamp, but it's still a fairly simple query.
SELECT
date_trunc('day', to_timestamp(trade_time / 1000.0)),
COUNT(1)
FROM pings_raw
GROUP BY date_trunc('day', to_timestamp(trade_time / 1000.0))
Another approach would be to leave everything as numbers, which might be marginally faster, though I find it less readable
SELECT
(trade_time/(1000*60*60*24))::int * (1000*60*60*24),
COUNT(1)
FROM pings_raw
GROUP BY (trade_time/(1000*60*60*24))::int

Subract Additional Time from $__timeFilter

I want to subract additional time in $__timeFilter in grafana. Like if I have selected Last 7 days, I want to run 2 queries which do a comparison like one query gives me avg cpu utilization for last 7 days and another one gives me avg cpu utilzation for now() - 14d to now() - 7d. And this is dynamic. I can get for 6hrs, 2days or anything selected.
My database is TimescaleDB and grafana version in 8.3.5
Edit
Query is
select avg(cpu) from cpu_utilization where $__timeFilter(timestamp)
Whatever is selected in the time filter in grafana, the query is manipulated accordingly
Now with grafana understands this query becomes as following. if I select last 24hrs
select avg(cpu) from cpu_utilization where timestamp BETWEEN '2022-09-07 05:32:10' and '2022-09-08 05:32:10'
This is normal behaviour. Now I wanted that if I select last 24hrs, this query to behave as it is but an additional query becomes
select avg(cpu) from cpu_utilization where timestamp BETWEEN '2022-09-06 05:32:10' and '2022-09-07 05:32:10'
(I just don't want it for last 24hrs, but any relative time period selected in the filter)
Answer : https://stackoverflow.com/a/73658919/14817486
You can use the global variables $__to and $__from.
For example, ${__from:date:seconds} will give you a timestamp in seconds. You can then subtract 7 days (= 604800 seconds) from it and use it in your query's WHERE clause. Depending on your SQL dialect, that might be by using TIMESTAMP(), TO_TIMESTAMP() or something similar. So it would look similar to this:
[...] WHERE timestamp BETWEEN TO_TIMESTAMP(${__from:date:seconds}-604800) AND TO_TIMESTAMP(${__to:date:seconds}-604800) [...]
Interesting question! If I understood correctly, you could use the timestamp column as the reference as the grafana is already filtering by this to the comparison query. So you can get the min(timestamp) and max(timestamp) to know the limits of your period and then build something from it.
Like min(timestamp) - INTERVAL '7 days' would give you the start of the previous range, and max(timestamp) - INTERVAL '7 days' would offer the final scope.

Median of time interval in Amazon Redshift?

I'm trying to get the median time interval in a group by. My dataset is two columns, column1 is user_ids and column2 is a time interval with the time that user spent on a website. When I group by id and call the MEDIAN function, Redshift throws an error stating that "median(interval)" is not allowed. I left the other columns out of the description since they dont really matter.
Interval is not a Redshift native data type - you cannot have a column of type interval. However, Redshift does understand interval literals. Just convert your timestamp differences into seconds (or minutes or hours or days), run MEDIAN(), and then display the result as an interval (if that is what you want).

Extract highest date per month from a list of dates

I have a date column which I am trying to query to return only the largest date per month.
What I currently have, albeit very simple, returns 99% of what I am looking for. For example, If I list the column in ascending order the first entry is 2016-10-17 and ranges up to 2017-10-06.
A point to note is that the last day of every month may not be present in the data, so I'm really just looking to pull back whatever is the "largest" date present for any existing month.
The query I'm running at the moment looks like
SELECT MAX(date_col)
FROM schema_name.table_name
WHERE <condition1>
AND <condition2>
GROUP BY EXTRACT (MONTH FROM date_col)
ORDER BY max;
This does actually return most of what I'm looking for - what I'm actually getting back is
"2016-11-30"
"2016-12-30"
"2017-01-31"
"2017-02-28"
"2017-03-31"
"2017-04-28"
"2017-05-31"
"2017-06-30"
"2017-07-31"
"2017-08-31"
"2017-09-29"
"2017-10-06"
which are indeed the maximal values present for every month in the column. However, the result set doesn't seem to include the maximum date value from October 2016 (The first months worth of data in the column). There are multiple values in the column for that month, ranging up to 2016-10-31.
If anyone could point out why the max value for this month isn't being returned, I'd much appreciate it.
You are grouping by month (1 to 12) rather than by month and year. Since 2017-10-06 is greater than any day in October 2016, that's what you get for the "October" group.
You should
GROUP BY date_trunc('month', date_col)

Get each value's difference from overall min value in Postgres

I have a table of data that is in timestamp with time zone format (called "time"). I also have an empty table that takes in interval data type values. For each row in the empty table, I want to insert the interval difference between that row's timestamp in the original data and the overall minimum timestamp value in the original data. I'm trying to do something like this:
INSERT INTO
time_pyramid
SELECT
"time" - MIN("time")
FROM
time_raw;
But it tells me "ERROR: column "time_raw.time" must appear in the GROUP BY clause or be used in an aggregate function". I know I want each timestamp value's interval difference from the table's overall minimum timestamp value, and "time" is not going to end up having duplicate values from this interval conversion, so I don't really think I should use GROUP BY in that context. I also see no reason to use an aggregation function on the first "time", so how can I fix my query to reflect what I want?
Edit: Actually, "Get each value as its interval difference from the min" is a better title for this question
Use min() as a window function:
with time_raw("time") as (
values
('2016-01-11'::timestamp),
('2016-01-01'::timestamp),
('2016-01-21'::timestamp)
)
select
"time"- min("time") over () as interval
from
time_raw;
interval
----------
10 days
00:00:00
20 days
(3 rows)