Converting a MySQL GROUP BY to Postgres - postgresql

I'm working out a query that I've ran successfully in MySQL for a while, but in Postgres it's not working with the ole -
ERROR: column "orders.created_at" must appear in the GROUP BY clause
or be used in an aggregate function
Here's the query:
SELECT SUM(total) AS total, to_char(created_at, 'YYYY/MM/DD') AS order_date
FROM orders
WHERE created_at >= (NOW() - INTERVAL '2 DAYS')
GROUP BY to_char(created_at, 'DD')
ORDER BY created_at ASC;
It's just supposed to return something like this:
total | order_date
---------+------------
1099.90 | 2013/01/15
650.00 | 2013/01/16
4399.00 | 2013/01/17
The main thing is I want the sum grouped by each individual day of the month.
Anyone have ideas?
UPDATE:
The reason I'm grouping by day is because the graph will be labeled with each day of the month, and the total sales for each.
1st - $3400.00
2nd - $2237.00
3rd - $1489.00
etc.

I'm not sure why you're doing a conversion there. I think the better thing to do would be this:
SELECT
SUM(total) AS total,
created_at::date AS order_date
FROM
orders
WHERE
created_at >= (NOW() - INTERVAL '2 DAYS')
GROUP BY
created_at::date
ORDER BY
created_at::date ASC;
I would recommend this query and then format the daily labels in your graph through the graph settings to ensure you do not have any weird issues of the same day in different months getting grouped. However, to get what you display in your edit you can do this:
SELECT
SUM(total) AS total,
to_char(created_at, 'DDth') AS order_date
FROM
orders
WHERE
created_at >= (NOW() - INTERVAL '2 DAYS')
GROUP BY
to_char(created_at, 'DDth')
ORDER BY
to_char(created_at, 'DDth') ASC;

Here is the sql you need in order to run this. The group by and order by need to contain the same expression.
SELECT SUM(total) AS total,
to_char(created_at, 'YYYY/MM/DD') AS order_date
FROM orders
WHERE created_at >= (NOW() - INTERVAL '2 DAYS')
GROUP BY to_char(created_at, 'YYYY/MM/DD')
order by to_char(created_at, 'YYYY/MM/DD')
http://sqlfiddle.com/#!12/52d99/2
Hope this helps,
Matt

Related

LAG() Function To Get The Average REDSHIFT

I am trying to find the AVERAGE time users spend whilst going through registration within an app.
This is my dataset:
customer_id
app_event
timestamp
1
OPEN_APP
'2022-01-01 19:00:25'
1
CLICK_REGISTER
'2022-01-01 19:00:30'
1
ENTER_DETAILS
'2022-01-01 19:00:40'
1
CLOSE_APP
'2022-01-01 19:00:50'
2
OPEN_APP
'2022-01-01 20:00:25'
2
CLICK_REGISTER
'2022-01-01 20:00:26'
2
ENTER_DETAILS
'2022-01-01 20:00:27'
2
CLOSE_APP
'2022-01-01 20:00:28'
This is my query:
WITH cte AS (
SELECT
customer_id,
app_event,
timestamp AS ts,
EXTRACT(EPOCH ((ts - lag(ts, 1) OVER (PARTITION BY customer_id, app_event ORDER BY ts ASC))) AS time_spent
FROM table
GROUP BY customer_id, app_event, timestamp
)
SELECT
app_event,
AVG(time_spent)
FROM cte
GROUP BY app_event
This is the outcome I want:
app_event
time_spent
OPEN_APP
CLICK_REGISTER
3
ENTER_DETAILS
5.5
CLOSE_APP
5.5
I see a few issues with your query. You haven't posted your exact concern so I'll operate from these.
First EXTRACT() operates on a timestamp but the difference between two timestamps is an interval. I think you want to use DATEDIFF(sec, ts1, ts2) which will give the time difference (in seconds) between the two timestamps.
Most importantly you are partitioning by app_event which will make LAG() only consider events with the same value. So you will get the difference between the timestamps for only for CLICK_REGISTER for example. You need to remove app_event from the partition list of the LAG() function.

Postgres: How to change start day of week and apply it in date_part?

with partial as(
select
date_part('week', activated_at) as weekly,
count(*) as count
from vendors
where activated_at notnull
group by weekly
)
This is the query counts number of vendors activating per week. I need to change the start day of week from Monday to Saturday. Similar posts like how to change the first day of the week in PostgreSQL or Making Postgres date_trunc() use a Sunday based week but non explain how to embed it in date_part function. I would like to know how to use this function in my query and start day from Saturday.
Thanks in advance.
maybe a little bit overkill for that, you can use some ctes and window functions, so first generate your intervals, start with your first saturday, you want e.g. 2018-01-06 00:00 and the last day you want 2018-12-31, then select your data, join it , sum it and as benefit you also get weeks with zero activations:
with temp_days as (
SELECT a as a ,
a + '7 days'::interval as e
FROM generate_series('2018-01-06 00:00'::timestamp,
'2018-12-31 00:00', '7 day') as a
),
temp_data as (
select
1 as counter,
vendors.activated_at
from vendors
where activated_at notnull
),
temp_order as
(
select *
from temp_days
left join temp_data on temp_data.activated_at between (temp_days.a) and (temp_days.e)
)
select
distinct on (temp_order.a)
temp_order.a,
temp_order.e,
coalesce(sum(temp_order.counter) over (partition by temp_order.a),0) as result
from temp_order

How to group by month off a timestamp field from Redshift in Superset

I am trying to show some trend over month in Superset from a table which has a timestamp field called created_at but have no idea how to get it right.
The SQL query generated from this is the followings:
SELECT
DATE_TRUNC('month', created_at) AT TIME ZONE 'UTC' AS __timestamp,
SUM(cost) AS "SUM(cost)"
FROM xxxx_from_redshift
WHERE created_at >= '2017-01-01 00:00:00'
AND created_at <= '2018-07-25 20:42:13'
GROUP BY DATE_TRUNC('month', created_at) AT TIME ZONE 'UTC'
ORDER BY "SUM(cost)" DESC
LIMIT 50000;
Like I mentioned above, I don't know how to make this work and 2nd question is why ORDER BY is using SUM(cost)? If this is a time-series, shouldn't it use ORDER BY 1 instead? I tried to change Sort By but to no avail.
It is quite silly but I found that SUM(cost) doesn't work while sum(cost) works. It is a bug in Superset and will be addressed in https://github.com/apache/incubator-superset/pull/5487

Redshift (Postgres) RANGE Function

I have the following query that's worked in another DBMS, but I can't get it to work in Redshift. It doesn't seem to like the range with the interval. Any advice on how to modify it accordingly?
COUNT(1) OVER (partition by include_flag,grouping_dimension,requested_hour,dow ORDER BY ts RANGE BETWEEN interval '30 days' PRECEDING AND interval '1 second' PRECEDING) as observations_grouping
Thank you!
unfortunately in Redshift you can just specify the number of rows to look back/forward and not the precise condition, for a condition you can use join
select include_flag,grouping_dimension,requested_hour,dow,count(1)
from source_table t1
join source_table t2
using (include_flag,grouping_dimension,requested_hour,dow)
where t2.ts between t1.ts-interval '30 day' and t1.ts-interval '1 second'
group by 1,2,3,4

What is wrong with my PostgreSQL statement? This runs on MySql

When running the following query in PostgreSQL:-
SELECT group_,
COUNT(*),
MIN(time)
FROM TRIP
WHERE time >= NOW()
GROUP BY group_
ORDER BY time ASC
I am getting error:-
ERROR: column "trip.time" must appear in the GROUP BY clause or be used in an aggregate function
I don't understand. Isn't min an aggregate function? This query runs on MySql.
time isn't a group by item or an aggregate term, so you cannot use it in a query that has a group by clause. Presumably, you meant to order by min(time), which can be stated explicitly:
SELECT group_,
COUNT(*),
MIN(time)
FROM TRIP
WHERE time >= NOW()
GROUP BY group_
ORDER BY MIN(time) ASC
Or by a positional index:
SELECT group_,
COUNT(*),
MIN(time)
FROM TRIP
WHERE time >= NOW()
GROUP BY group_
ORDER BY 3 ASC