How to group by month off a timestamp field from Redshift in Superset - amazon-redshift

I am trying to show some trend over month in Superset from a table which has a timestamp field called created_at but have no idea how to get it right.
The SQL query generated from this is the followings:
SELECT
DATE_TRUNC('month', created_at) AT TIME ZONE 'UTC' AS __timestamp,
SUM(cost) AS "SUM(cost)"
FROM xxxx_from_redshift
WHERE created_at >= '2017-01-01 00:00:00'
AND created_at <= '2018-07-25 20:42:13'
GROUP BY DATE_TRUNC('month', created_at) AT TIME ZONE 'UTC'
ORDER BY "SUM(cost)" DESC
LIMIT 50000;
Like I mentioned above, I don't know how to make this work and 2nd question is why ORDER BY is using SUM(cost)? If this is a time-series, shouldn't it use ORDER BY 1 instead? I tried to change Sort By but to no avail.

It is quite silly but I found that SUM(cost) doesn't work while sum(cost) works. It is a bug in Superset and will be addressed in https://github.com/apache/incubator-superset/pull/5487

Related

Redshift with Grafana - ERROR: This type of correlated subquery pattern is not supported due to internal error

I am facing with a query done on Redshift from Grafana where on adding a specific where clause, the above mentioned error is coming. Without this where clause, it is doing fine. Also, if we put a direct value say where first_week > 2, then no error is coming.
where clause:
WHERE first_week >= EXTRACT(YEAR FROM $__timeFrom AT TIME ZONE 'UTC')
from
(select a.distinct_id, a.login_week, b.first_week as first_week, a.login_week
first_week as week_number
from (select distinct_id, EXTRACT(WEEK FROM timestamp AT TIME ZONE 'UTC') AS login_week from posthog_event where distinct_id IN ( select distinct_id from activated_user) group by 1, 2) a,
select distinct_id, MIN(EXTRACT(WEEK FROM timestamp AT TIME ZONE 'UTC')) AS first_week from posthog_event where distinct_id IN ( select distinct_id from activated_user) group by 1) b
where a.distinct_id = b.distinct_id
) as with_week_number
where first_week>= EXTRACT(YEAR FROM '2022-03-01T17:01:18Z' AT TIME ZONE 'UTC')
group by first_week order by first_week
Any idea where I am going wrong ? Or what could be done to get the where clause added.
https://grafana.com/grafana/plugins/grafana-redshift-datasource/
$__timeFrom() outputs the current starting time of the range of the panel with quotes
=> Grafana macro is $__timeFrom() and not only $__timeFrom, so correct condition:
WHERE first_week >= EXTRACT(YEAR FROM $__timeFrom() AT TIME ZONE 'UTC')
As usual: doc is your good friend

LAG() Function To Get The Average REDSHIFT

I am trying to find the AVERAGE time users spend whilst going through registration within an app.
This is my dataset:
customer_id
app_event
timestamp
1
OPEN_APP
'2022-01-01 19:00:25'
1
CLICK_REGISTER
'2022-01-01 19:00:30'
1
ENTER_DETAILS
'2022-01-01 19:00:40'
1
CLOSE_APP
'2022-01-01 19:00:50'
2
OPEN_APP
'2022-01-01 20:00:25'
2
CLICK_REGISTER
'2022-01-01 20:00:26'
2
ENTER_DETAILS
'2022-01-01 20:00:27'
2
CLOSE_APP
'2022-01-01 20:00:28'
This is my query:
WITH cte AS (
SELECT
customer_id,
app_event,
timestamp AS ts,
EXTRACT(EPOCH ((ts - lag(ts, 1) OVER (PARTITION BY customer_id, app_event ORDER BY ts ASC))) AS time_spent
FROM table
GROUP BY customer_id, app_event, timestamp
)
SELECT
app_event,
AVG(time_spent)
FROM cte
GROUP BY app_event
This is the outcome I want:
app_event
time_spent
OPEN_APP
CLICK_REGISTER
3
ENTER_DETAILS
5.5
CLOSE_APP
5.5
I see a few issues with your query. You haven't posted your exact concern so I'll operate from these.
First EXTRACT() operates on a timestamp but the difference between two timestamps is an interval. I think you want to use DATEDIFF(sec, ts1, ts2) which will give the time difference (in seconds) between the two timestamps.
Most importantly you are partitioning by app_event which will make LAG() only consider events with the same value. So you will get the difference between the timestamps for only for CLICK_REGISTER for example. You need to remove app_event from the partition list of the LAG() function.

Postgresql Newbie - Looking for insight

I am in an introduction to sql class (using postgresql) and struggling to take simple queries to the next step. I have a single table with two datetime columns (start_time & end_time) that I want to extract as two date only columns. I figured out how to extract just the date from datetime using the following:
Select start_time,
CAST(start_time as date) as Start_Date
from [table];
or
Select end_time,
CAST(end_time as date) as End_Date
from [table];
Problem: I can't figure out the next step to combine both of these queries into a single step. I tried using WHERE but i am still doing something wrong.
1st wrong example
SELECT start_time, end_time
From baywheels_2017
WHERE
CAST(start_time AS DATE) AS Start_Date
AND (CAST(end_time AS DATE) AS End_Date);
Any help is greatly appreciated. Thanks for taking the time to look.
You don't need to select the underlying field in order to later cast it; each field in the "select" clause is relatively independent. With the table created by:
CREATE TABLE test (
id SERIAL PRIMARY KEY,
start_time TIMESTAMP WITH TIME ZONE NOT NULL,
end_time TIMESTAMP WITH TIME ZONE NOT NULL
);
INSERT INTO test(start_time, end_time)
VALUES ('2022-10-31T12:30:00Z', '2022-12-31T23:59:59Z');
You could run the select:
SELECT
cast(start_time as date) as start_date,
cast(end_time as date) as end_date
FROM test;
(You can try this out on a website like DB-Fiddle.)

PostgreSQL: Syntax error at or near "14" when trying to run a delete query

First question here!
So I have a table with a row like this:
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
I would like to run a query that delete all the data in my table older than 14 days.
This is my query:
DELETE FROM customers WHERE timestamp < NOW() - INTERVAL 14 DAY;
This is the error: syntax error at or near "14"
Anyone knows why this is not working and/or how could I achieve my goal??
Many thanks!!
The interval value must be quoted:
DELETE FROM customers WHERE created_at < NOW() - INTERVAL '14 DAYS';
See the doc
DELETE FROM customers WHERE created_at < NOW() - INTERVAL '14 DAY';
Another variation on the existing answers (possibly easier to remember):
DELETE FROM customers WHERE timestamp < NOW() - '14 days'::interval;

Converting a MySQL GROUP BY to Postgres

I'm working out a query that I've ran successfully in MySQL for a while, but in Postgres it's not working with the ole -
ERROR: column "orders.created_at" must appear in the GROUP BY clause
or be used in an aggregate function
Here's the query:
SELECT SUM(total) AS total, to_char(created_at, 'YYYY/MM/DD') AS order_date
FROM orders
WHERE created_at >= (NOW() - INTERVAL '2 DAYS')
GROUP BY to_char(created_at, 'DD')
ORDER BY created_at ASC;
It's just supposed to return something like this:
total | order_date
---------+------------
1099.90 | 2013/01/15
650.00 | 2013/01/16
4399.00 | 2013/01/17
The main thing is I want the sum grouped by each individual day of the month.
Anyone have ideas?
UPDATE:
The reason I'm grouping by day is because the graph will be labeled with each day of the month, and the total sales for each.
1st - $3400.00
2nd - $2237.00
3rd - $1489.00
etc.
I'm not sure why you're doing a conversion there. I think the better thing to do would be this:
SELECT
SUM(total) AS total,
created_at::date AS order_date
FROM
orders
WHERE
created_at >= (NOW() - INTERVAL '2 DAYS')
GROUP BY
created_at::date
ORDER BY
created_at::date ASC;
I would recommend this query and then format the daily labels in your graph through the graph settings to ensure you do not have any weird issues of the same day in different months getting grouped. However, to get what you display in your edit you can do this:
SELECT
SUM(total) AS total,
to_char(created_at, 'DDth') AS order_date
FROM
orders
WHERE
created_at >= (NOW() - INTERVAL '2 DAYS')
GROUP BY
to_char(created_at, 'DDth')
ORDER BY
to_char(created_at, 'DDth') ASC;
Here is the sql you need in order to run this. The group by and order by need to contain the same expression.
SELECT SUM(total) AS total,
to_char(created_at, 'YYYY/MM/DD') AS order_date
FROM orders
WHERE created_at >= (NOW() - INTERVAL '2 DAYS')
GROUP BY to_char(created_at, 'YYYY/MM/DD')
order by to_char(created_at, 'YYYY/MM/DD')
http://sqlfiddle.com/#!12/52d99/2
Hope this helps,
Matt