I have a very simple table like below
Events:
Event_name
Event_time
A
2022-02-10
B
2022-05-11
C
2022-07-17
D
2022-10-20
To a table like this are added new events, but we always take the event from the last X days (for example, 30 days), so the query result for this table is changeable.
I would like to transform the above table into this:
A
B
C
D
2022-02-10
2022-05-11
2022-07-17
2022-10-20
In general, the number of columns won't be constant. But if it's not possible we can add a limitation for the number of columns- for example, 10 columns.
I tried with crosstab, but I had to add the column name manually this is not what I mean and it doesn't work with the CTE query
WITH CTE AS (
SELECT DISTINCT
1 AS "Id",
event_time,
event_name,
ROW_NUMBER() OVER(ORDER BY event_time) AS nr
FROM
events
WHERE
event_time >= CURRENT_DATE - INTERVAL '31 days')
SELECT *
FROM
crosstab (
'SELECT id, event_name, event_time
FROM
CTE
WHERE
nr <= 10
ORDER BY
nr') AS ct(id int,
event_name text,
EventTime1 timestamp,
EventTime2 timestamp,
EventTime3 timestamp,
EventTime4 timestamp,
EventTime5 timestamp,
EventTime6 timestamp,
EventTime7 timestamp,
EventTime8 timestamp,
EventTime9 timestamp,
EventTime10 timestamp)
This query will be used as the data source in Tableau (data visualization and analysis software) it would be great if it could be one query (without temp tables, adding new functions, etc.)
Thanks!
Related
Is there a way we add an extra group by to the toolkit_experimental.interpolated_average function? Say my data has power measurements for different sensors; how would I add a group by on the sensor_id?
with s as (
select sensor_id,
time_bucket('30 minutes', timestamp) bucket,
time_weight('LOCF', timestamp, value) agg
from
measurements m
inner join sensor_definition sd on m.sensor_id = sd.id
where asset_id = '<battery_id>' and sensor_name = 'power' and
timestamp between '2023-01-05 23:30:00' and '2023-01-07 00:30:00'
group by sensor_id, bucket)
select sensor_id,
bucket,
toolkit_experimental.interpolated_average(
agg,
bucket,
'30 minutes'::interval,
lag(agg) over (order by bucket),
lead(agg) over (order by bucket)
)
from s
group by sensor_id;
The above query does not work as I'd need to add bucket and agg as a group by column as well.
You can find the relevant schemas below.
create table measurements
(
sensor_id uuid not null,
timestamp timestamp with time zone not null,
value double precision not null
);
create table sensor_definition
(
id uuid default uuid_generate_v4() not null
primary key,
asset_id uuid not null,
sensor_name varchar(256) not null,
sensor_type varchar(256) not null,
unique (asset_id, sensor_name, sensor_type)
);
Any suggestions?
This is a great question and cool use case. There's definitely a way to do this! I like your CTE at the top, though I prefer to name them a little more descriptively. The join looks good for selection and you could even quite easily then sub out the "on-the-fly" aggregation for a continuous aggregate at some point in the future and just do the same join against the continuous aggregate...so that's great!
The only thing you need to do is modify the window clause of the lead and lag functions so that they understand that it's working on not the full ordered data set, and then you don't need a group by clause at all!
WITH weighted_sensor AS (
SELECT
sensor_id,
time_bucket('30 minutes', timestamp) bucket,
time_weight('LOCF', timestamp, value) agg
FROM
measurements m
INNER JOIN sensor_definition sd ON m.sensor_id = sd.id
WHERE asset_id = '<battery_id>' AND sensor_name = 'power' and
timestamp between '2023-01-05 23:30:00' and '2023-01-07 00:30:00'
GROUP BY sensor_id, bucket)
SELECT
sensor_id,
bucket,
toolkit_experimental.interpolated_average(
agg,
bucket,
'30 minutes'::interval,
lag(agg) OVER (PARTITION BY sensor_id ORDER BY bucket),
lead(agg) OVER (PARTITION BY sensor_id ORDER BY bucket)
)
FROM weighted_sensor;
You can also split out the window clause into a separate clause in the query and name it, this helps especially if you're using it more times, so if you were to use the integral function as well, for instance, to get total energy utilization in a period, you might do something like this:
WITH weighted_sensor AS (
SELECT
sensor_id,
time_bucket('30 minutes', timestamp) bucket,
time_weight('LOCF', timestamp, value) agg
FROM
measurements m
INNER JOIN sensor_definition sd ON m.sensor_id = sd.id
WHERE asset_id = '<battery_id>' AND sensor_name = 'power' and
timestamp between '2023-01-05 23:30:00' and '2023-01-07 00:30:00'
GROUP BY sensor_id, bucket)
SELECT
sensor_id,
bucket,
toolkit_experimental.interpolated_average(
agg,
bucket,
'30 minutes'::interval,
lag(agg) OVER sensor_times,
lead(agg) OVER sensor_times
),
toolkit_experimental.interpolated_integral(
agg,
bucket,
'30 minutes'::interval,
lag(agg) OVER sensor_times,
lead(agg) OVER sensor_times,
'hours'
)
FROM weighted_sensor
WINDOW sensor_times AS (PARTITION BY sensor_id ORDER BY bucket);
I used hours as the unit as I figure energy is often measured in watt-hours or the like...
I am trying to get time difference in minutes between the event time of below data having the same nodeid and code for nodeid having count > 1
nodeid code event_time
CAI0015 14961045 2017-04-22 21:22:00
CAI0024 14961045 2017-04-23 19:44:00
CAI0024 14961045 2017-04-23 09:07:00
CAI0040 14971047 2017-04-23 13:58:00
CAI0046 14961045 2017-04-23 11:19:00
CAI0050 14961045 2017-04-24 02:06:00
output should be like this:
nodeid code difference(min)
CAI0024 14961045 637
Use the formula extract(epoch from <later> - <earlier>) / 60 to get timestamp difference in minutes.
To select all permutations of differences (where nodeid & code is the same) within the table, use a self-join:
select nodeid, code, extract(epoch from e2.event_time - e1.event_time) / 60 difference
from events e1
join events e2 using (nodeid, code)
where e1.event_time < e2.event_time
However, this seems not to be actually useful, when you have more than 2 rows for a given nodeid & code pair. To calculate difference for the previous ones only, use the lag() window function:
select nodeid, code, extract(epoch from event_time - lag) / 60 difference
from (select *, lag(event_time) over (partition by nodeid, code order by event_time)
from events) e
where lag is not null
http://rextester.com/HGY2600
Note: both of these will give you only one row for every nodeid & code pair, if you only have max 2 rows for all of the pairs.
try to use dense_rank to find the next event.
SELECT t_a.*, EXTRACT(EPOCH FROM (t_a.event_time - t_b.event_time))
FROM
(SELECT nodeid, code, event_time, dense_rank() over (partition by node_id order by event_time) as rnk
FROM table) t_a
JOIN
(SELECT nodeid, code, event_time, dense_rank() over (partition by node_id order by event_time) as rnk
FROM table) t_b
ON (t_a.nodeid=t_b.nodeid and t_a.rnk + 1 = t_b.rnk)
Just have a standard orders table:
order_id
order_date
customer_id
order_total
Trying to write a query that generates a column that shows the days since the last purchase, for each customer. If the customer had no prior orders, the value would be zero.
I have tried something like this:
WITH user_data AS (
SELECT customer_id, order_total, order_date::DATE,
ROW_NUMBER() OVER (
PARTITION BY customer_id ORDER BY order_date::DATE DESC
)
AS order_count
FROM transactions
WHERE STATUS = 100 AND order_total > 0
)
SELECT * FROM user_data WHERE order_count < 3;
Which I could feed into tableau, then use some table calculations to wrangle the data, but I really would like to understand the SQL approach. My approach also only analyzes the most recent 2 transactions, which is a drawback.
Thanks
You should use lag() function:
select *,
lag(order_date) over (partition by customer_id order by order_date)
as prior_order_date
from transactions
order by order_id
To have the number of days since last order, just subtract the prior order date from the current order date:
select *,
order_date- lag(order_date) over (partition by customer_id order by order_date)
as days_since_last_order
from transactions
order by order_id
The query selects null if there is no prior order. You can use coalesce() to change it to zero.
You indicated that you need to calculate number of days since the last purchase.
..Trying to write a query that generates a column that shows the days
since the last purchase
So, basically you need get a difference between now and last purchase date for each client. Query can be the following:
-- test DDL
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
order_date DATE,
customer_id INTEGER,
order_total INTEGER
);
INSERT INTO orders(order_date, customer_id, order_total) VALUES
('01-01-2015'::DATE,1,2),
('01-02-2015'::DATE,1,3),
('02-01-2015'::DATE,2,4),
('02-02-2015'::DATE,2,5),
('03-01-2015'::DATE,3,6),
('03-02-2015'::DATE,3,7);
WITH orderdata AS (
SELECT customer_id,order_total,order_date,
(now()::DATE - max(order_date) OVER (PARTITION BY customer_id)) as days_since_purchase
FROM orders
WHERE order_total > 0
)
SELECT DISTINCT customer_id ,days_since_purchase FROM orderdata ORDER BY customer_id;
I need to select the rows for which the difference between max(date) and the date just before max(date) is smaller than 366 days. I know about SELECT MAX(date) FROM table to get the last date from now, but how could I get the date before?
I would need a query of this kind:
SELECT code, MAX(date) - before_date FROM troncon WHERE MAX(date) - before_date < 366 ;
NB : before_date does not refer to anything and is to be replaced by a functionnal stuff.
Edit : Example of the table I'm testing it on:
CREATE TABLE troncon (code INTEGER, ope_date DATE) ;
INSERT INTO troncon (code, ope_date) VALUES
('C086000-T10001', '2014-11-11'),
('C086000-T10001', '2014-11-11'),
('C086000-T10002', '2014-12-03'),
('C086000-T10002', '2014-01-03'),
('C086000-T10003', '2014-08-11'),
('C086000-T10003', '2014-03-03'),
('C086000-T10003', '2012-02-27'),
('C086000-T10004', '2014-08-11'),
('C086000-T10004', '2013-12-30'),
('C086000-T10004', '2013-06-01'),
('C086000-T10004', '2012-07-31'),
('C086000-T10005', '2013-10-01'),
('C086000-T10005', '2012-11-01'),
('C086000-T10006', '2014-04-01'),
('C086000-T10006', '2014-05-15'),
('C086000-T10001', '2014-07-05'),
('C086000-T10003', '2014-03-03');
Many thanks!
The sub query contains all rows joined with the unique max date, and you select only ones which there differente with the max date is smaller than 366 days:
select * from
(
SELECT id, date, max(date) over(partition by code) max_date FROM your_table
) A
where max_date - date < interval '366 day'
PS: As #a_horse_with_no_name said, you can partition by code to get maximum_date for each code.
I have two tables that I am joining together. I want to filter the results based on whether or not it had been created 24 hours prior. Here are my tables.
table user_infos (
id integer,
date_created timestamp with timezone,
name varchar(40)
);
table user_data (
id integer,
team_name varchar(40)
);
This is my query that I am using to join them together and hopefully filter them:
SELECT timestampdiff(HOUR, user_infos.date_created, now()) as hours_since,
user_data.id, user_data.team_name,
user_infos.name, user_infos.date_created
FROM user_data
JOIN user_infos
ON user_infos.id=user_data.id
WHERE timestampdiff(HOUR, user_infos.date_created, now()) < 24
ORDER BY name ASC, id ASC
LIMIT 50 OFFSET 0
What I am trying to do is join the two tables such that the id, team_name, name, and date-created would be treated as one table.
Then I would like to filter it such that I only get the results that were created 24 hours ago. This is what I am using the timestampdiff for.
Then I ORDER then by name and id in ascending order.
then limit the results to 50.
Everything look good except that I doesn't work. When I run this query it tells me that the "hour" column does not exist.
Clearly there is something subtle here that is messing everything up. Does anyone have any suggestions?
Alternatively, I've tried this, but it tells me that there is a syntax error at 1;
SELECT
user_data.id, user_data.team_name,
user_infos.name, user_infos.date_created
FROM user_data
JOIN user_infos
ON user_infos.id=user_data.id
WHERE user_infos.date_created
BETWEEN DATE( DATE_SUB( NOW() , INTERVAL 1 DAY ) ) AND
DATE ( NOW() )
ORDER BY name ASC, id ASC
LIMIT 50 OFFSET 0
I think your problem is with your data types. You are checking if a timestamp field is between a casted date field (which removes the time from the date). NOW() is different than the DATE(NOW()).
So you have 2 options. You can either remove the DATE() casting and it should work, or you can cast the date_created to a date.
SELECT
user_data.id, user_data.team_name,
user_infos.name, user_infos.date_created
FROM user_data
JOIN user_infos
ON user_infos.id=user_data.id
WHERE user_infos.date_created
BETWEEN DATE_SUB( NOW() , INTERVAL 1 DAY ) AND
NOW()
ORDER BY name ASC, id ASC
LIMIT 50 OFFSET 0
SQL Fiddle Demo