with tab as
(select abc
from job_desc
where id =5 and jd.jd_date >= current_date - interval '12 months' and jd_date < current_date
and (loc is not null and loc != '')
and workloads && '{Warehousing}'
and skills && '{Customer Management}'
)
select abc, count(*)
from tab
group by tab.abc
order by count desc
Queries not executing parallely. I have my data partitioned monthly basis. Any suggestion.
Related
I have a dataset of sales. To summarize, the structure is
client_id
date_purchase
There might be several purchases done by the same customer on different dates. There can also be several purchases done on the same date (by different or the same customer).
My goal is to get the number of customers, for any given day, that made 2 or more purchases between that day and 90 days prior.
That is, the expected output is
date_purchase
number_of_customers
2022-12-19
200
2022-12-18
194
(...)
Please note this calculates, for any given date, the number of customer with 2+ purchases between that date and 90 days prior.
I know it has something to do with a window function. But so far I have not found a way to calculate, for every window of 90 days, how many customers have done 2+ purchases.
I've tried several window functions with no success:
partition by date_purchase
range between interval '90 days' preceding and current row
So far I can't get to calculate correctly the number for each date.
Window function doesn't seem to be relevant here because there is no relationship between the rows of the same window. A simple query or a self-join query should provide the expected result.
Assuming that client_id and date_purchase are two columns of my_table :
1. Query for a given date reference_date :
SELECT a.reference_date AS date_purchase, count(*) AS number_of_customers
FROM ( SELECT reference_date , client_id
FROM my_table
WHERE date_purchase <= reference_date AND date_purchase >= reference_date - INTERVAL '90 days'
GROUP BY client_id
HAVING count(*) >= 2
) AS a
2. Query for a given interval of dates reference_date => reference_date + INTERVAL '20 days' :
SELECT a.date AS date_purchase, count(*) AS number_of_customers
FROM ( SELECT ref.date, t.client_id
FROM my_table AS t
INNER JOIN generate_series(reference_date, reference_date + INTERVAL '20 days', '1 day') AS ref(date)
ON t.date_purchase <= ref.date AND t.date_purchase >= ref.date - INTERVAL '90 days'
GROUP BY ref.date, t.client_id
HAVING count(*) >= 2
) AS a
GROUP BY a.date
ORDER BY a.date
3. Query for all the date_purchase in mytable :
SELECT a.date AS date_purchase, count(*) AS number_of_customers
FROM ( SELECT ref.date, t.client_id
FROM my_table AS t
INNER JOIN (SELECT DISTINCT date_purchase AS date FROM my_table) AS ref
ON t.date_purchase <= ref.date AND t.date_purchase >= ref.date - INTERVAL '90 days'
GROUP BY ref.date, t.client_id
HAVING count(*) >= 2
) AS a
GROUP BY a.date
ORDER BY a.date
I have a query below to query max and min of day interval in a range of time ( current_date - 2 to current_date - 1). Now, I need to query dayshift and extra shift separately ( dayshift from 5am to 3pm, extra shift will be the remains).
select sum(gap) from (
select to_char(time_stamp, 'yyyy/mm/dd') as day,
EXTRACT(EPOCH FROM (max(time_stamp) - min(time_stamp))) /3600 as gap
from group_table_debarker
where time_stamp >= (current_date - 2)
and time_stamp <= (current_date - 1)
and to_char(time_stamp, 'hh:mi') > '03:00' and to_char(time_stamp, 'hh:mi') < '15:00'
group by to_char(time_stamp, 'yyyy/mm/dd')
) as xxx
select sum(gap) from (
select to_char(time_stamp, 'yyyy/mm/dd') as day,
EXTRACT(EPOCH FROM (max(time_stamp) - min(time_stamp))) /3600 as gap
from group_table_debarker
where time_stamp >= (current_date - 2)
and time_stamp <= (current_date - 1)
and to_char(time_stamp, 'hh:mi') > '03:00' and to_char(time_stamp, 'hh:mi') < '15:00'
group by to_char(time_stamp, 'yyyy/mm/dd')
) as xxx
I've tried this but result wasn't expected
I can unify the two select below in a single, where in the first column return the result of the first and second column the result of the second.
select count(*) from rrhh.empleado where fecha_contratado > current_date - interval '100 days'; // select1
select count(*) from rrhh.empleado where fecha_fin_contrato > current_date - interval '100 days'; //select2
Thank you
try:
with a as (
select
case when fecha_contratado > current_date - interval '100 days' then 1
else 0 end q1
, case when fecha_fin_contrato > current_date - interval '100 days' then 1
else 0 end q2
from rrhh.empleado
)
select sum(q1), sum(q2)
from a
;
This is a typical case for conditional aggregation:
select count(*) filter (where fecha_contratado > current_date - interval '100 days'),
count(*) filter (where fecha_fin_contrato > current_date - interval '100 days')
from rrhh.empleado
You can use the CASE expression (and the fact that most aggregates does not use NULL values) for versions earlier than 9.4:
select count(case when fecha_contratado > current_date - interval '100 days' then 1 end),
count(case when fecha_fin_contrato > current_date - interval '100 days' then 1 end)
from rrhh.empleado
Note: these queries will scan the whole table, while your original queries could make use of indexes on fecha_contratado and fecha_fin_contrato. If performance matters to you, you could append a filter to these queries too:
where least(fecha_contratado, fecha_fin_contrato) > current_date - interval '100 days'
and you could index the expression: least(fecha_contratado, fecha_fin_contrato).
I'm trying to calculate monthly retention rate in Amazon Redshift and have come up with the following query:
Query 1
SELECT EXTRACT(year FROM activity.created_at) AS Year,
EXTRACT(month FROM activity.created_at) AS Month,
COUNT(DISTINCT activity.member_id) AS active_users,
COUNT(DISTINCT future_activity.member_id) AS retained_users,
COUNT(DISTINCT future_activity.member_id) / COUNT(DISTINCT activity.member_id)::float AS retention
FROM ads.fbs_page_view_staging activity
LEFT JOIN ads.fbs_page_view_staging AS future_activity
ON activity.mongo_id = future_activity.mongo_id
AND datediff ('month',activity.created_at,future_activity.created_at) = 1
GROUP BY Year,
Month
ORDER BY Year,
Month
For some reason this query returns zero retained_users and zero retention. I'd appreciate any help regarding why this may be happening or maybe a completely different query for monthly retention would work.
I modified the query as per another SO post and here it goes:
Query 2
WITH t AS (
SELECT member_id
,date_trunc('month', created_at) AS month
,count(*) AS item_transactions
,lag(date_trunc('month', created_at)) OVER (PARTITION BY member_id
ORDER BY date_trunc('month', created_at))
= date_trunc('month', created_at) - interval '1 month'
OR NULL AS repeat_transaction
FROM ads.fbs_page_view_staging
WHERE created_at >= '2016-01-01'::date
AND created_at < '2016-04-01'::date -- time range of interest.
GROUP BY 1, 2
)
SELECT month
,sum(item_transactions) AS num_trans
,count(*) AS num_buyers
,count(repeat_transaction) AS repeat_buyers
,round(
CASE WHEN sum(item_transactions) > 0
THEN count(repeat_transaction) / sum(item_transactions) * 100
ELSE 0
END, 2) AS buyer_retention
FROM t
GROUP BY 1
ORDER BY 1;
This query gives me the following error:
An error occurred when executing the SQL command:
WITH t AS (
SELECT member_id
,date_trunc('month', created_at) AS month
,count(*) AS item_transactions
,lag(date_trunc('m...
[Amazon](500310) Invalid operation: Interval values with month or year parts are not supported
Details:
-----------------------------------------------
error: Interval values with month or year parts are not supported
code: 8001
context: interval months: "1"
query: 616822
location: cg_constmanager.cpp:145
process: padbmaster [pid=15116]
-----------------------------------------------;
I have a feeling that Query 2 would fare better than Query 1, so I'd prefer to fix the error on that.
Any help would be much appreciated.
Query 1 looks good. I tried similar one. See below. You are using self join on table (ads.fbs_page_view_staging) and the same column (created_at). Assuming mongo_id is unique, the datediff('month'....) will always return 0 and datediff ('month',activity.created_at,future_activity.created_at) = 1 will always be false.
-- Count distinct events of join_col_id that have lapsed for one month.
SELECT count(distinct E.join_col_id) dist_ct
FROM public.fact_events E
JOIN public.dim_table Z
ON E.join_col_id = Z.join_col_id
WHERE datediff('month', event_time, sysdate) = 1;
-- 2771654 -- dist_ct
I'm trying to update a table in Redshift from query:
update mr_usage_au au
inner join(select mr.UserId,
date(mr.ActionDate) as ActionDate,
count(case when mr.EventId in (32) then mr.UserId end) as Moods,
count(case when mr.EventId in (33) then mr.UserId end) as Activities,
sum(case when mr.EventId in (10) then mr.Duration end) as Duration
from mr_session_log mr
where mr.EventTime >= current_date - interval '1 days' and mr.EventTime < current_date
Group By mr.UserId,
date(mr.ActionDate)) slog on slog.UserId=au.UserId
and slog.ActionDate=au.Date
set au.Moods = slog.Moods,
au.Activities=slog.Activities,
au.Durarion=slog.Duration
But I receive the following error:
ERROR: syntax error at or near "au".
This is completely invalid syntax for Redshift (or Postgres). Reminds me of SQL Server ...
Should work like this (at least on current Postgres):
UPDATE mr_usage_au
SET Moods = slog.Moods
, Activities = slog.Activities
, Durarion = slog.Duration
FROM (
select UserId
, ActionDate::date
, count(CASE WHEN EventId = 32 THEN UserId END) AS Moods
, count(CASE WHEN EventId = 33 THEN UserId END) AS Activities
, sum(CASE WHEN EventId = 10 THEN Duration END) AS Duration
FROM mr_session_log
WHERE EventTime >= current_date - 1 -- just subtract integer from a date
AND EventTime < current_date
GROUP BY UserId, ActionDate::date
) slog
WHERE slog.UserId = mr_usage_au.UserId
AND slog.ActionDate = mr_usage_au.Date;
This is generally the case for Postgres and Redshift:
Use a FROM clause to join in additional tables.
You cannot table-qualify target columns in the SET clause.
Also, Redshift was forked from PostgreSQL 8.0.2, which is very long ago. Only some later updates to Postgres were applied.
For instance, Postgres 8.0 did not allow a table alias in an UPDATE statement, yet - which is the reason behind the error you see.
I simplified some other details.