need to capture NULL in a query - postgresql

I have a query that shows success rate for staff and works splendidly except: If staff "Bob" has not had any activity in the date range, he will not appear in the results. If he had at least one code in the query it would result in a 0% or 100%. If there are no codes attached to his name, he does not show in the results. I have seen an example of -
ISNULL(s.code, 'No Entry') AS NoContact to use but I guess I am not using it correctly
and I just cannot figure out how to add it into the query. Can someone assist?
Here is the current query that works great (but omits any staff who do not have any of the codes:
SELECT st.staff_id
,round((count(s.code IN ('10401','10402','10403') OR NULL) * 100.0)
/ count(*), 1) AS successes
-- unsuccessful code is 10405
FROM notes n
JOIN services s ON s.zzud_service = n.zrud_service
JOIN staff st ON st.zzud_staff = n.zrud_staff
WHERE n.date_service >= DATE '07/01/2014' AND n.date_service <= CURRENT_DATE
-- n.date_service BETWEEN (now() - '30 days'::interval) AND now()
AND s.code IN ('10401','10402','10403','10405')
GROUP BY st.staff_id;
Here is a sample result:
Staff SuccessRate Explination
Sam 100% (has 1 successful and 0 unsuccessful)
Joe 50% (has 1 successful and 1 unsuccessful)
Amy 0% (has 1 unsuccessful)
Bob does not show ( no discharges in the date range)

Since you place the staff table at the end you need to right join it and move the conditions to the join conditions.
select
st.staff_id,
round(
count(s.code in ('10401','10402','10403') or null) * 100.0
/
count(*)
, 1) as successes
-- unsuccessful code is 10405
from
notes n
inner join
services s on
s.zzud_service = n.zrud_service and
n.date_service >= date '07/01/2014' and
n.date_service <= current_date
right join
staff st on
st.zzud_staff = n.zrud_staff
-- n.date_service between (now() - '30 days'::interval) and now()
and s.code in ('10401','10402','10403','10405')
group by st.staff_id;

Related

Calculations inside window function in PostgreSQL

I have a dataset of sales. To summarize, the structure is
client_id
date_purchase
There might be several purchases done by the same customer on different dates. There can also be several purchases done on the same date (by different or the same customer).
My goal is to get the number of customers, for any given day, that made 2 or more purchases between that day and 90 days prior.
That is, the expected output is
date_purchase
number_of_customers
2022-12-19
200
2022-12-18
194
(...)
Please note this calculates, for any given date, the number of customer with 2+ purchases between that date and 90 days prior.
I know it has something to do with a window function. But so far I have not found a way to calculate, for every window of 90 days, how many customers have done 2+ purchases.
I've tried several window functions with no success:
partition by date_purchase
range between interval '90 days' preceding and current row
So far I can't get to calculate correctly the number for each date.
Window function doesn't seem to be relevant here because there is no relationship between the rows of the same window. A simple query or a self-join query should provide the expected result.
Assuming that client_id and date_purchase are two columns of my_table :
1. Query for a given date reference_date :
SELECT a.reference_date AS date_purchase, count(*) AS number_of_customers
FROM ( SELECT reference_date , client_id
FROM my_table
WHERE date_purchase <= reference_date AND date_purchase >= reference_date - INTERVAL '90 days'
GROUP BY client_id
HAVING count(*) >= 2
) AS a
2. Query for a given interval of dates reference_date => reference_date + INTERVAL '20 days' :
SELECT a.date AS date_purchase, count(*) AS number_of_customers
FROM ( SELECT ref.date, t.client_id
FROM my_table AS t
INNER JOIN generate_series(reference_date, reference_date + INTERVAL '20 days', '1 day') AS ref(date)
ON t.date_purchase <= ref.date AND t.date_purchase >= ref.date - INTERVAL '90 days'
GROUP BY ref.date, t.client_id
HAVING count(*) >= 2
) AS a
GROUP BY a.date
ORDER BY a.date
3. Query for all the date_purchase in mytable :
SELECT a.date AS date_purchase, count(*) AS number_of_customers
FROM ( SELECT ref.date, t.client_id
FROM my_table AS t
INNER JOIN (SELECT DISTINCT date_purchase AS date FROM my_table) AS ref
ON t.date_purchase <= ref.date AND t.date_purchase >= ref.date - INTERVAL '90 days'
GROUP BY ref.date, t.client_id
HAVING count(*) >= 2
) AS a
GROUP BY a.date
ORDER BY a.date

How to subtract a seperate count from one grouping

I have a postgres query like this
select application.status as status, count(*) as "current_month" from application
where to_char(application.created, 'mon') = to_char('now'::timestamp - '1 month'::interval, 'mon')
and date_part('year',application.created) = date_part('year', CURRENT_DATE)
and application.job_status != 'expired'
group by application.status
it returns the table below that has the number of applications grouped by status for the current month. However I want to subtract a total count of a seperate but related query from the internal review number only. I want to count the number of rows with type = abc within the same table and for the same date range and then subtract that amount from the internal review number (Type is a seperate field). Current_month_desired is how it should look.
status
current_month
current_month_desired
fail
22
22
internal_review
95
22
pass
146
146
UNTESTED: but maybe...
The intent here is to use an analytic and case expression to conditionally sum. This way, the subtraction is not needed in the first place as you are only "counting" the values needed.
SELECT application.status as status
, sum(case when type = 'abc'
and application.status ='internal_review' then 0
else 1 end) over (partition by application.status)) as
"current_month"
FROM application
WHERE to_char(application.created, 'mon') = to_char('now'::timestamp - '1 month'::interval, 'mon')
and date_part('year',application.created) = date_part('year', CURRENT_DATE)
and application.job_status != 'expired'
GROUP BY application.status

Create a list of objects in plsql - postgresql

I need to create a list of objects in PL/SQL - postgres and return it as table to user.
Here is the scenario. I have two table called
create table ProcessDetails(
processName varchar,
processstartdate timestamp,
processenddate timestamp);
create table processSLA(
processName varchar,
sla numeric);
Now I need to loop over all the records in processDetails table and check which records for each activity type has breached sla, within sla and those that are more 80% of sla.
I would need help in understanding how to loop over records and create a collection in which for each processtype I have details required.
sample data from processdetails table
ProcessName processstartdate processenddate
-----------------------------------------------------
"Create" "2018-12-24 13:11:05.122694" null
"Delete" "2018-12-24 12:12:24.269266" null
"Delete" "2018-12-23 13:12:31.89164" null
"Create" "2018-12-22 13:12:37.505486" null
processSLA
ProcessName sla(in hrs)
---------------------------------
Create 1
Delete 10
And the output will look something like this
ProcessName WithinSLA(Count) BreachedSLA(Count) Exceeded80%SLA(Count)
---------------------------------------------------------------------
Create 1 1 3
Delete 1 2 1
For each SLA, you can look up all corresponding process details with a join. The link between two joined tables specified in a join condition. For your example, using (processName) would work.
To find processes that have exceeded the SLA, say that the allowed end date is smaller than the actual end date:
select processName
, count(case when det.processstartdate + interval '1 hour' * sla.hours >=
coalesce(det.processenddate, now()) then 1 end) as InSLA
, count(case when det.processstartdate + interval '1 hour' * sla.hours <
coalesce(det.processenddate, now()) then 1 end) as BreachedSLA
, count(case when det.processstartdate + interval '1 hour' * 0.8 * sla.hours <
coalesce(det.processenddate, now()) then 1 end) as 80PercentSLA
from processSLA sla
left join
ProcessDetails det
using (processName)
group by
processName
You can join both tables and use conditional aggregation based on the calculation of the difference between the timestamps.
Something like that:
SELECT count(CASE
WHEN extract(EPOCH FROM pd.processenddate - pd.processstartdate) / 3600 < ps.sla * .8 THEN
1
END) "less than 80%",
count(CASE
WHEN extract(EPOCH FROM pd.processenddate - pd.processstartdate) / 3600 >= ps.sla * .8
AND extract(EPOCH FROM pd.processenddate - pd.processstartdate) / 3600 <= ps.sla THEN
1
END) "80% to 100%",
count(CASE
WHEN extract(EPOCH FROM pd.processenddate - pd.processstartdate) / 3600 > ps.sla THEN
1
END) "more than 100%"
FROM processdetails pd
INNER JOIN processsla ps
ON ps.processname = pd.processname;

Adding rows to SQL query result

I have a custom query in my Java application that looks like that:
select
to_char(search.timestamp,'Mon') as mon,
COUNT(DISTINCT(search.ip_address))
from
searches
WHERE
searches.city = 1
group by 1;
which should return all months that occur within the database, and number of distinct IP addresses within each month. However, at this point, some months do not have any entries, and they are missing in the SQL query result. How can I make sure that all of the months are displayed there, even if their count is 0?
Got it working with:
select
to_char (gs.m,'Mon') as mon,
count (distinct search.ip_address)
from
generate_series (
date_trunc('month', current_date - interval '11 month'),
current_date,
'1 month'
) gs (m)
left join searches
on date_trunc('month', search.timestamp) = gs.m AND search.city = 1
group by gs.m
order by gs.m;
select
to_char (gs.m,'Mon') as mon,
count (distinct(search.ip_address))
from
searches
right join
generate_series (
date_trunc('month', current_date - interval '1 year'),
current_date,
'1 month'
) gs (m) on date_trunc('month', search.timestamp) = gs.m
where searches.city = 1
group by gs.m
order by gs.m;
Something like this (untested):
select
months.mon
, COUNT(DISTINCT(searchs.ip_address))
from
(select
to_char(searches.timestamp,'Mon') as mon
from
searches
group by 1
) months
left join searches
on to_char(searchs.timestamp,'Mon') = months.mon
and searches.city = 1
group by 1;
And if you wanted the years in there, too, try something like this (untested):
select
months.mon
, COUNT(DISTINCT(searchs.ip_address))
from
(select
extract(year from searches.timestamp) as yr
, to_char(searches.timestamp,'Mon') as mon
, to_char(yr,'9999') || mon yrmon
from
searches
group by 1
) months
left join searches
on to_char(extract(year from searches.timestamp),'9999' ||
to_char(searchs.timestamp,'Mon') = months.yrmon
and searches.city = 1
group by 1;

Monthly retention in Amazon redshift

I'm trying to calculate monthly retention rate in Amazon Redshift and have come up with the following query:
Query 1
SELECT EXTRACT(year FROM activity.created_at) AS Year,
EXTRACT(month FROM activity.created_at) AS Month,
COUNT(DISTINCT activity.member_id) AS active_users,
COUNT(DISTINCT future_activity.member_id) AS retained_users,
COUNT(DISTINCT future_activity.member_id) / COUNT(DISTINCT activity.member_id)::float AS retention
FROM ads.fbs_page_view_staging activity
LEFT JOIN ads.fbs_page_view_staging AS future_activity
ON activity.mongo_id = future_activity.mongo_id
AND datediff ('month',activity.created_at,future_activity.created_at) = 1
GROUP BY Year,
Month
ORDER BY Year,
Month
For some reason this query returns zero retained_users and zero retention. I'd appreciate any help regarding why this may be happening or maybe a completely different query for monthly retention would work.
I modified the query as per another SO post and here it goes:
Query 2
WITH t AS (
SELECT member_id
,date_trunc('month', created_at) AS month
,count(*) AS item_transactions
,lag(date_trunc('month', created_at)) OVER (PARTITION BY member_id
ORDER BY date_trunc('month', created_at))
= date_trunc('month', created_at) - interval '1 month'
OR NULL AS repeat_transaction
FROM ads.fbs_page_view_staging
WHERE created_at >= '2016-01-01'::date
AND created_at < '2016-04-01'::date -- time range of interest.
GROUP BY 1, 2
)
SELECT month
,sum(item_transactions) AS num_trans
,count(*) AS num_buyers
,count(repeat_transaction) AS repeat_buyers
,round(
CASE WHEN sum(item_transactions) > 0
THEN count(repeat_transaction) / sum(item_transactions) * 100
ELSE 0
END, 2) AS buyer_retention
FROM t
GROUP BY 1
ORDER BY 1;
This query gives me the following error:
An error occurred when executing the SQL command:
WITH t AS (
SELECT member_id
,date_trunc('month', created_at) AS month
,count(*) AS item_transactions
,lag(date_trunc('m...
[Amazon](500310) Invalid operation: Interval values with month or year parts are not supported
Details:
-----------------------------------------------
error: Interval values with month or year parts are not supported
code: 8001
context: interval months: "1"
query: 616822
location: cg_constmanager.cpp:145
process: padbmaster [pid=15116]
-----------------------------------------------;
I have a feeling that Query 2 would fare better than Query 1, so I'd prefer to fix the error on that.
Any help would be much appreciated.
Query 1 looks good. I tried similar one. See below. You are using self join on table (ads.fbs_page_view_staging) and the same column (created_at). Assuming mongo_id is unique, the datediff('month'....) will always return 0 and datediff ('month',activity.created_at,future_activity.created_at) = 1 will always be false.
-- Count distinct events of join_col_id that have lapsed for one month.
SELECT count(distinct E.join_col_id) dist_ct
FROM public.fact_events E
JOIN public.dim_table Z
ON E.join_col_id = Z.join_col_id
WHERE datediff('month', event_time, sysdate) = 1;
-- 2771654 -- dist_ct