Update Redshift table from query - postgresql

I'm trying to update a table in Redshift from query:
update mr_usage_au au
inner join(select mr.UserId,
date(mr.ActionDate) as ActionDate,
count(case when mr.EventId in (32) then mr.UserId end) as Moods,
count(case when mr.EventId in (33) then mr.UserId end) as Activities,
sum(case when mr.EventId in (10) then mr.Duration end) as Duration
from mr_session_log mr
where mr.EventTime >= current_date - interval '1 days' and mr.EventTime < current_date
Group By mr.UserId,
date(mr.ActionDate)) slog on slog.UserId=au.UserId
and slog.ActionDate=au.Date
set au.Moods = slog.Moods,
au.Activities=slog.Activities,
au.Durarion=slog.Duration
But I receive the following error:
ERROR: syntax error at or near "au".

This is completely invalid syntax for Redshift (or Postgres). Reminds me of SQL Server ...
Should work like this (at least on current Postgres):
UPDATE mr_usage_au
SET Moods = slog.Moods
, Activities = slog.Activities
, Durarion = slog.Duration
FROM (
select UserId
, ActionDate::date
, count(CASE WHEN EventId = 32 THEN UserId END) AS Moods
, count(CASE WHEN EventId = 33 THEN UserId END) AS Activities
, sum(CASE WHEN EventId = 10 THEN Duration END) AS Duration
FROM mr_session_log
WHERE EventTime >= current_date - 1 -- just subtract integer from a date
AND EventTime < current_date
GROUP BY UserId, ActionDate::date
) slog
WHERE slog.UserId = mr_usage_au.UserId
AND slog.ActionDate = mr_usage_au.Date;
This is generally the case for Postgres and Redshift:
Use a FROM clause to join in additional tables.
You cannot table-qualify target columns in the SET clause.
Also, Redshift was forked from PostgreSQL 8.0.2, which is very long ago. Only some later updates to Postgres were applied.
For instance, Postgres 8.0 did not allow a table alias in an UPDATE statement, yet - which is the reason behind the error you see.
I simplified some other details.

Related

Calculations inside window function in PostgreSQL

I have a dataset of sales. To summarize, the structure is
client_id
date_purchase
There might be several purchases done by the same customer on different dates. There can also be several purchases done on the same date (by different or the same customer).
My goal is to get the number of customers, for any given day, that made 2 or more purchases between that day and 90 days prior.
That is, the expected output is
date_purchase
number_of_customers
2022-12-19
200
2022-12-18
194
(...)
Please note this calculates, for any given date, the number of customer with 2+ purchases between that date and 90 days prior.
I know it has something to do with a window function. But so far I have not found a way to calculate, for every window of 90 days, how many customers have done 2+ purchases.
I've tried several window functions with no success:
partition by date_purchase
range between interval '90 days' preceding and current row
So far I can't get to calculate correctly the number for each date.
Window function doesn't seem to be relevant here because there is no relationship between the rows of the same window. A simple query or a self-join query should provide the expected result.
Assuming that client_id and date_purchase are two columns of my_table :
1. Query for a given date reference_date :
SELECT a.reference_date AS date_purchase, count(*) AS number_of_customers
FROM ( SELECT reference_date , client_id
FROM my_table
WHERE date_purchase <= reference_date AND date_purchase >= reference_date - INTERVAL '90 days'
GROUP BY client_id
HAVING count(*) >= 2
) AS a
2. Query for a given interval of dates reference_date => reference_date + INTERVAL '20 days' :
SELECT a.date AS date_purchase, count(*) AS number_of_customers
FROM ( SELECT ref.date, t.client_id
FROM my_table AS t
INNER JOIN generate_series(reference_date, reference_date + INTERVAL '20 days', '1 day') AS ref(date)
ON t.date_purchase <= ref.date AND t.date_purchase >= ref.date - INTERVAL '90 days'
GROUP BY ref.date, t.client_id
HAVING count(*) >= 2
) AS a
GROUP BY a.date
ORDER BY a.date
3. Query for all the date_purchase in mytable :
SELECT a.date AS date_purchase, count(*) AS number_of_customers
FROM ( SELECT ref.date, t.client_id
FROM my_table AS t
INNER JOIN (SELECT DISTINCT date_purchase AS date FROM my_table) AS ref
ON t.date_purchase <= ref.date AND t.date_purchase >= ref.date - INTERVAL '90 days'
GROUP BY ref.date, t.client_id
HAVING count(*) >= 2
) AS a
GROUP BY a.date
ORDER BY a.date

How to subtract a seperate count from one grouping

I have a postgres query like this
select application.status as status, count(*) as "current_month" from application
where to_char(application.created, 'mon') = to_char('now'::timestamp - '1 month'::interval, 'mon')
and date_part('year',application.created) = date_part('year', CURRENT_DATE)
and application.job_status != 'expired'
group by application.status
it returns the table below that has the number of applications grouped by status for the current month. However I want to subtract a total count of a seperate but related query from the internal review number only. I want to count the number of rows with type = abc within the same table and for the same date range and then subtract that amount from the internal review number (Type is a seperate field). Current_month_desired is how it should look.
status
current_month
current_month_desired
fail
22
22
internal_review
95
22
pass
146
146
UNTESTED: but maybe...
The intent here is to use an analytic and case expression to conditionally sum. This way, the subtraction is not needed in the first place as you are only "counting" the values needed.
SELECT application.status as status
, sum(case when type = 'abc'
and application.status ='internal_review' then 0
else 1 end) over (partition by application.status)) as
"current_month"
FROM application
WHERE to_char(application.created, 'mon') = to_char('now'::timestamp - '1 month'::interval, 'mon')
and date_part('year',application.created) = date_part('year', CURRENT_DATE)
and application.job_status != 'expired'
GROUP BY application.status

How to create a pivot table in Postgresql using case when?

I want to create a pivot table using postgresql. I could accomplish this using SQLite, and I thought the logic would be similar, but it doesn't seem to be the case.
Here's the sample table:
create table df(
campaign varchar(50),
date date not null,
revenue integer not null
);
insert into df(campaign,date,revenue) values('A','2019-01-01',10000);
insert into df(campaign,date,revenue) values('B','2019-01-02',7000);
insert into df(campaign,date,revenue) values('A','2018-01-01',5000);
insert into df(campaign,date,revenue) values('B','2018-01-01',3500);
here's my sqlite code to transform the tidy data into pivot table:
select
sum(case when strftime('%Y', date) = '2019' then revenue else 0 end) as '2019',
sum(case when strftime('%Y', date) = '2018' then revenue else 0 end) as '2018',
campaign
from df
group by campaign
the result would be like this:
2018 2019 campaign
5000 10000 A
3500 7000 B
I tried making the similar code using postgres, I will just use the year 2019:
select
sum(case when extract('year' from date) = '2019' then revenue else 0 end) as '2019',
campaign
from df
group by campaign
somehow the code doesn't work, I don't understand what's wrong.
Query Error: error: syntax error at or near "'2019'"
what do I miss here?
db-fiddle link:
https://www.db-fiddle.com/f/f1WjMAAxwSPRvB8BrxECN7/0
The function strftime() is used to extract various parts of a date in SQLite, but is not supported by Postgresql.
Use date_part():
select campaign,
sum(case when date_part('year', date) = '2019' then revenue else 0 end) as "2019",
sum(case when date_part('year', date) = '2018' then revenue else 0 end) as "2018"
from df
group by campaign
Or use Postgresql's FILTER clause:
select campaign,
sum(revenue) filter (where date_part('year', date) = '2019') as "2019",
sum(revenue) filter (where date_part('year', date) = '2018') as "2018"
from df
group by campaign
Also, don't use single quotes for table/column names.
SQLite allows it but Postgresql does not.
It accepts only double quotes which is the SQL standard.
See the demo.

Adding rows to SQL query result

I have a custom query in my Java application that looks like that:
select
to_char(search.timestamp,'Mon') as mon,
COUNT(DISTINCT(search.ip_address))
from
searches
WHERE
searches.city = 1
group by 1;
which should return all months that occur within the database, and number of distinct IP addresses within each month. However, at this point, some months do not have any entries, and they are missing in the SQL query result. How can I make sure that all of the months are displayed there, even if their count is 0?
Got it working with:
select
to_char (gs.m,'Mon') as mon,
count (distinct search.ip_address)
from
generate_series (
date_trunc('month', current_date - interval '11 month'),
current_date,
'1 month'
) gs (m)
left join searches
on date_trunc('month', search.timestamp) = gs.m AND search.city = 1
group by gs.m
order by gs.m;
select
to_char (gs.m,'Mon') as mon,
count (distinct(search.ip_address))
from
searches
right join
generate_series (
date_trunc('month', current_date - interval '1 year'),
current_date,
'1 month'
) gs (m) on date_trunc('month', search.timestamp) = gs.m
where searches.city = 1
group by gs.m
order by gs.m;
Something like this (untested):
select
months.mon
, COUNT(DISTINCT(searchs.ip_address))
from
(select
to_char(searches.timestamp,'Mon') as mon
from
searches
group by 1
) months
left join searches
on to_char(searchs.timestamp,'Mon') = months.mon
and searches.city = 1
group by 1;
And if you wanted the years in there, too, try something like this (untested):
select
months.mon
, COUNT(DISTINCT(searchs.ip_address))
from
(select
extract(year from searches.timestamp) as yr
, to_char(searches.timestamp,'Mon') as mon
, to_char(yr,'9999') || mon yrmon
from
searches
group by 1
) months
left join searches
on to_char(extract(year from searches.timestamp),'9999' ||
to_char(searchs.timestamp,'Mon') = months.yrmon
and searches.city = 1
group by 1;

Monthly retention in Amazon redshift

I'm trying to calculate monthly retention rate in Amazon Redshift and have come up with the following query:
Query 1
SELECT EXTRACT(year FROM activity.created_at) AS Year,
EXTRACT(month FROM activity.created_at) AS Month,
COUNT(DISTINCT activity.member_id) AS active_users,
COUNT(DISTINCT future_activity.member_id) AS retained_users,
COUNT(DISTINCT future_activity.member_id) / COUNT(DISTINCT activity.member_id)::float AS retention
FROM ads.fbs_page_view_staging activity
LEFT JOIN ads.fbs_page_view_staging AS future_activity
ON activity.mongo_id = future_activity.mongo_id
AND datediff ('month',activity.created_at,future_activity.created_at) = 1
GROUP BY Year,
Month
ORDER BY Year,
Month
For some reason this query returns zero retained_users and zero retention. I'd appreciate any help regarding why this may be happening or maybe a completely different query for monthly retention would work.
I modified the query as per another SO post and here it goes:
Query 2
WITH t AS (
SELECT member_id
,date_trunc('month', created_at) AS month
,count(*) AS item_transactions
,lag(date_trunc('month', created_at)) OVER (PARTITION BY member_id
ORDER BY date_trunc('month', created_at))
= date_trunc('month', created_at) - interval '1 month'
OR NULL AS repeat_transaction
FROM ads.fbs_page_view_staging
WHERE created_at >= '2016-01-01'::date
AND created_at < '2016-04-01'::date -- time range of interest.
GROUP BY 1, 2
)
SELECT month
,sum(item_transactions) AS num_trans
,count(*) AS num_buyers
,count(repeat_transaction) AS repeat_buyers
,round(
CASE WHEN sum(item_transactions) > 0
THEN count(repeat_transaction) / sum(item_transactions) * 100
ELSE 0
END, 2) AS buyer_retention
FROM t
GROUP BY 1
ORDER BY 1;
This query gives me the following error:
An error occurred when executing the SQL command:
WITH t AS (
SELECT member_id
,date_trunc('month', created_at) AS month
,count(*) AS item_transactions
,lag(date_trunc('m...
[Amazon](500310) Invalid operation: Interval values with month or year parts are not supported
Details:
-----------------------------------------------
error: Interval values with month or year parts are not supported
code: 8001
context: interval months: "1"
query: 616822
location: cg_constmanager.cpp:145
process: padbmaster [pid=15116]
-----------------------------------------------;
I have a feeling that Query 2 would fare better than Query 1, so I'd prefer to fix the error on that.
Any help would be much appreciated.
Query 1 looks good. I tried similar one. See below. You are using self join on table (ads.fbs_page_view_staging) and the same column (created_at). Assuming mongo_id is unique, the datediff('month'....) will always return 0 and datediff ('month',activity.created_at,future_activity.created_at) = 1 will always be false.
-- Count distinct events of join_col_id that have lapsed for one month.
SELECT count(distinct E.join_col_id) dist_ct
FROM public.fact_events E
JOIN public.dim_table Z
ON E.join_col_id = Z.join_col_id
WHERE datediff('month', event_time, sysdate) = 1;
-- 2771654 -- dist_ct