how to divide string in a column to be a column name in redshift - amazon-redshift

I have built this query below in order to get the total vol by each route, sla_min and also the sla_status.
The sla_status is calculated with case when syntax to get the over sla and also meet sla
with data_manifest as (
select no,
concat(concat(origin,'-'),destination) as route_city,
sla_min,
case
when status>0 and datediff(day, sla_max_date_time_internal, last_valid_tracking_date_time) > 0 then 'OVER SLA'
when status=0 and datediff(day, sla_max_date_time_internal, current_date) > 0 then 'OVER SLA' else 'MEET SLA'
end as status_sla
from data
where trunc(tgltransaksi::date) between ('30 January,2023') and ('9 February,2023')
), data_vol as (
select
route_city,
count(distinct no) as volume,
status_sla,
sla_min,
from data_manifest
group by route_city, status_sla, sla_min
)
The query results this:
route_city vol status_sla sla_min
A - B 20 MEET SLA 2
A - B 40 OVER SLA 2
B - C 30 MEET SLA 1
B - C 30 OVER SLA 1
my question is how can I split the MEET SLA and OVER SLA become the column names so the structure would be like this:
route_city MEET SLA OVER SLA total_vol sla_min
A - B 20 40 60 2
B - C 30 30 60 1
how should I write the query to get the desired result in redshift?
thank you in advance

Not seeing your input data it isn't clear what exactly you need but here's a shot.
You need to stop grouping by status_sla and count the number for each value of status_sla.
with data_manifest as (
select no,
concat(concat(origin,'-'),destination) as route_city,
sla_min,
case
when status>0 and datediff(day, sla_max_date_time_internal, last_valid_tracking_date_time) > 0 then 'OVER SLA'
when status=0 and datediff(day, sla_max_date_time_internal, current_date) > 0 then 'OVER SLA' else 'MEET SLA'
end as status_sla
from data
where trunc(tgltransaksi::date) between ('30 January,2023') and ('9 February,2023')
), data_vol as (
select
route_city,
count(distinct no) as volume,
count(distinct decode(status_sla, 'MEET SLA', no, NULL)) as meet_sla,
count(distinct decode(status_sla, 'OVER SLA', no, NULL)) as over_sla,
sla_min,
from data_manifest
group by route_city, sla_min
)
There are other ways of doing this that might work betting for the edge cases. Not knowing what these are results in this minimal change approach.
Above code is untested.

Related

How to subtract a seperate count from one grouping

I have a postgres query like this
select application.status as status, count(*) as "current_month" from application
where to_char(application.created, 'mon') = to_char('now'::timestamp - '1 month'::interval, 'mon')
and date_part('year',application.created) = date_part('year', CURRENT_DATE)
and application.job_status != 'expired'
group by application.status
it returns the table below that has the number of applications grouped by status for the current month. However I want to subtract a total count of a seperate but related query from the internal review number only. I want to count the number of rows with type = abc within the same table and for the same date range and then subtract that amount from the internal review number (Type is a seperate field). Current_month_desired is how it should look.
status
current_month
current_month_desired
fail
22
22
internal_review
95
22
pass
146
146
UNTESTED: but maybe...
The intent here is to use an analytic and case expression to conditionally sum. This way, the subtraction is not needed in the first place as you are only "counting" the values needed.
SELECT application.status as status
, sum(case when type = 'abc'
and application.status ='internal_review' then 0
else 1 end) over (partition by application.status)) as
"current_month"
FROM application
WHERE to_char(application.created, 'mon') = to_char('now'::timestamp - '1 month'::interval, 'mon')
and date_part('year',application.created) = date_part('year', CURRENT_DATE)
and application.job_status != 'expired'
GROUP BY application.status

Create a list of objects in plsql - postgresql

I need to create a list of objects in PL/SQL - postgres and return it as table to user.
Here is the scenario. I have two table called
create table ProcessDetails(
processName varchar,
processstartdate timestamp,
processenddate timestamp);
create table processSLA(
processName varchar,
sla numeric);
Now I need to loop over all the records in processDetails table and check which records for each activity type has breached sla, within sla and those that are more 80% of sla.
I would need help in understanding how to loop over records and create a collection in which for each processtype I have details required.
sample data from processdetails table
ProcessName processstartdate processenddate
-----------------------------------------------------
"Create" "2018-12-24 13:11:05.122694" null
"Delete" "2018-12-24 12:12:24.269266" null
"Delete" "2018-12-23 13:12:31.89164" null
"Create" "2018-12-22 13:12:37.505486" null
processSLA
ProcessName sla(in hrs)
---------------------------------
Create 1
Delete 10
And the output will look something like this
ProcessName WithinSLA(Count) BreachedSLA(Count) Exceeded80%SLA(Count)
---------------------------------------------------------------------
Create 1 1 3
Delete 1 2 1
For each SLA, you can look up all corresponding process details with a join. The link between two joined tables specified in a join condition. For your example, using (processName) would work.
To find processes that have exceeded the SLA, say that the allowed end date is smaller than the actual end date:
select processName
, count(case when det.processstartdate + interval '1 hour' * sla.hours >=
coalesce(det.processenddate, now()) then 1 end) as InSLA
, count(case when det.processstartdate + interval '1 hour' * sla.hours <
coalesce(det.processenddate, now()) then 1 end) as BreachedSLA
, count(case when det.processstartdate + interval '1 hour' * 0.8 * sla.hours <
coalesce(det.processenddate, now()) then 1 end) as 80PercentSLA
from processSLA sla
left join
ProcessDetails det
using (processName)
group by
processName
You can join both tables and use conditional aggregation based on the calculation of the difference between the timestamps.
Something like that:
SELECT count(CASE
WHEN extract(EPOCH FROM pd.processenddate - pd.processstartdate) / 3600 < ps.sla * .8 THEN
1
END) "less than 80%",
count(CASE
WHEN extract(EPOCH FROM pd.processenddate - pd.processstartdate) / 3600 >= ps.sla * .8
AND extract(EPOCH FROM pd.processenddate - pd.processstartdate) / 3600 <= ps.sla THEN
1
END) "80% to 100%",
count(CASE
WHEN extract(EPOCH FROM pd.processenddate - pd.processstartdate) / 3600 > ps.sla THEN
1
END) "more than 100%"
FROM processdetails pd
INNER JOIN processsla ps
ON ps.processname = pd.processname;

How to create buckets and groups within those buckets using PostgresQL

How to find the distribution of credit cards by year, and completed transaction. Group these credit cards into three buckets: less than 10 transactions, between 10 and 30 transactions, more than 30 transactions?
The first method I tried to use was using the width_buckets function in PostgresQL, but the documentation says that only creates equidistant buckets, which is not what I want in this case. Because of that, I turned to case statements. However, I'm not sure how to use the case statement with a group by.
This is the data I am working with:
table 1 - credit_cards table
credit_card_id
year_opened
table 2 - transactions table
transaction_id
credit_card_id - matches credit_cards.credit_card_id
transaction_status ("complete" or "incomplete")
This is what I have gotten so far:
SELECT
CASE WHEN transaction_count < 10 THEN “Less than 10”
WHEN transaction_count >= 10 and transaction_count < 30 THEN “10 <= transaction count < 30”
ELSE transaction_count>=30 THEN “Greater than or equal to 30”
END as buckets
count(*) as ct.transaction_count
FROM credit_cards c
INNER JOIN transactions t
ON c.credit_card_id = t.credit_card_id
WHERE t.status = “completed”
GROUP BY v.year_opened
GROUP BY buckets
ORDER BY buckets
Expected output
credit card count | year opened | transaction count bucket
23421 | 2002 | Less than 10
etc
You can specify the bin sizes in width_bucket by specifying a sorted array of the lower bound of each bin.
In you case, it would be array[10,30]: anything less than 10 gets bin 0, between 10 and 29 gets bin 1 and 30 or more gets bin 2.
WITH a AS (select generate_series(5,35) cnt)
SELECT cnt, width_bucket(cnt, array[10,30])
FROM a;
To figure this out you need to count transactions per credit card in order to figure out the right bucket, then you need to count the credit cards per bucket per year. There are a couple of different ways to get the final result. One way is to first join up all your data and compute the first level of aggregate values. Then compute the final level of aggregate values:
with t1 as (
select year_opened
, c.credit_card_id
, case when count(*) < 10 then 'Less than 10'
when count(*) < 30 then 'Between [10 and 30)'
else 'Greater than or equal to 30'
end buckets
from credit_cards c
join transactions t
on t.credit_card_id = c.credit_card_id
where t.transaction_status = 'complete'
group by year_opened
, c.credit_card_id
)
select count(*) credit_card_count
, year_opened
, buckets
from t1
group by year_opened
, buckets;
However, it may be more perforamant first calculate the first level of aggregate data on the transactions table before joining it to the credit cards table:
select count(*) credit_card_count
, year_opened
, buckets
from credit_cards c
join (select credit_card_id
, case when count(*) < 10 then 'Less than 10'
when count(*) < 30 then 'Between [10 and 30)'
else 'Greater than or equal to 30'
end buckets
from transactions
group by credit_card_id) t
on t.credit_card_id = c.credit_card_id
group by year_opened
, buckets;
If you prefer to unroll the above query and uses Common Table Expressions, you can do that too (I find this easier to read/follow along):
with bkt as (
select credit_card_id
, case when count(*) < 10 then 'Less than 10'
when count(*) < 30 then 'Between [10 and 30)'
else 'Greater than or equal to 30'
end buckets
from transactions
group by credit_card_id
)
select count(*) credit_card_count
, year_opened
, buckets
from credit_cards c
join bkt t
on t.credit_card_id = c.credit_card_id
group by year_opened
, buckets;
Not sure if this is what you are looking for.
WITH cte
AS (
SELECT c.year_opened
,c.credit_card_id
,count(*) AS transaction_count
FROM credit_cards c
INNER JOIN transactions t ON c.credit_card_id = t.credit_card_id
WHERE t.STATUS = 'completed'
GROUP BY c.year_opened
,c.credit_card_id
)
SELECT cte.year_opened AS 'year opened'
,SUM(CASE
WHEN transaction_count < 10
THEN 1
ELSE 0
END) AS 'Less than 10'
,SUM(CASE
WHEN transaction_count >= 10
AND transaction_count < 30
THEN 1
ELSE 0
END) AS '10 <= transaction count < 30'
,SUM(CASE
WHEN transaction_count >= 30
THEN 1
ELSE 0
END) AS 'Greater than or equal to 30'
FROM CTE
GROUP BY cte.year_opened
and the output would be as below.
year opened | Less than 10 | 10 <= transaction count < 30 | Greater than or equal to 30
2002 | 23421 | |

need to capture NULL in a query

I have a query that shows success rate for staff and works splendidly except: If staff "Bob" has not had any activity in the date range, he will not appear in the results. If he had at least one code in the query it would result in a 0% or 100%. If there are no codes attached to his name, he does not show in the results. I have seen an example of -
ISNULL(s.code, 'No Entry') AS NoContact to use but I guess I am not using it correctly
and I just cannot figure out how to add it into the query. Can someone assist?
Here is the current query that works great (but omits any staff who do not have any of the codes:
SELECT st.staff_id
,round((count(s.code IN ('10401','10402','10403') OR NULL) * 100.0)
/ count(*), 1) AS successes
-- unsuccessful code is 10405
FROM notes n
JOIN services s ON s.zzud_service = n.zrud_service
JOIN staff st ON st.zzud_staff = n.zrud_staff
WHERE n.date_service >= DATE '07/01/2014' AND n.date_service <= CURRENT_DATE
-- n.date_service BETWEEN (now() - '30 days'::interval) AND now()
AND s.code IN ('10401','10402','10403','10405')
GROUP BY st.staff_id;
Here is a sample result:
Staff SuccessRate Explination
Sam 100% (has 1 successful and 0 unsuccessful)
Joe 50% (has 1 successful and 1 unsuccessful)
Amy 0% (has 1 unsuccessful)
Bob does not show ( no discharges in the date range)
Since you place the staff table at the end you need to right join it and move the conditions to the join conditions.
select
st.staff_id,
round(
count(s.code in ('10401','10402','10403') or null) * 100.0
/
count(*)
, 1) as successes
-- unsuccessful code is 10405
from
notes n
inner join
services s on
s.zzud_service = n.zrud_service and
n.date_service >= date '07/01/2014' and
n.date_service <= current_date
right join
staff st on
st.zzud_staff = n.zrud_staff
-- n.date_service between (now() - '30 days'::interval) and now()
and s.code in ('10401','10402','10403','10405')
group by st.staff_id;

postgres complicated query

I wonder is it possible to make such query. The problem is that I have a table where are some numbers for date.
Lets say I have 3 columns: Date, Value, Good/Bad
I.e:
2014-03-03 100 Good
2014-03-03 15 Bad
2014-03-04 120 Good
2014-03-04 10 Bad
And I want to select and subtract Good-Bad:
2014-03-03 85
2014-03-04 110
Is it possible? I am thinking a lot and don't have an idea yet. It would be rather simple if I had Good and Bad values in seperate tables.
The trick is to join your table back to it self as shown below. myTable as A will read only the Good rows and myTable as B will read only the Bad rows. Those rows then get joined into a signle row based on date.
SQL Fiddle Demo
select
a.date
,a.count as Good_count
,b.count as bad_count
,a.count-b.count as diff_count
from myTable as a
inner join myTable as b
on a.date = b.date and b.type = 'Bad'
where a.type = 'Good'
Output returned:
DATE GOOD_COUNT BAD_COUNT DIFF_COUNT
March, 03 2014 00:00:00+0000 100 15 85
March, 04 2014 00:00:00+0000 120 10 110
Another aproach would be to use Group by instead of the inner join:
select
a.date
,sum(case when type = 'Good' then a.count else 0 end) as Good_count
,sum(case when type = 'Bad' then a.count else 0 end) as Bad_count
,sum(case when type = 'Good' then a.count else 0 end) -
sum(case when type = 'Bad' then a.count else 0 end) as Diff_count
from myTable as a
group by a.date
order by a.date
Both approaches produce the same result.