How can I find the percentage of records in a table in DB2? - select

I have a single table - drivers
I want to know what percentage of the driver's have been terminated this month compared to all the active drivers. I think I have made it very complicated and while it does return a result it is just a 0. I tried using the cast as decimal but that doesn't work for me either as the calculation still results in 0.
WITH X AS
(
SELECT
CAST(COUNT(*) AS DECIMAL (5,2)) TERMINATED,
0 ACTIVE
FROM DRIVER WHERE
MONTH(TERMINATION_DATE) = MONTH(CURRENT TIMESTAMP) AND YEAR(TERMINATION_DATE) = YEAR(CURRENT TIMESTAMP)
UNION ALL
SELECT
0 TERMINATED,
CAST(COUNT(*) AS FLOAT) ACTIVE
FROM DRIVER WHERE
ACTIVE_IN_DISP = 'True'
)
SELECT
CAST(SUM(TERMINATED)/(SUM(ACTIVE) + SUM(TERMINATED)*100) AS DECIMAL (10,2))
FROM X

Your parentheses are a bit off, instead of SUM(TERMINATED)/(SUM(ACTIVE) + SUM(TERMINATED)*100)
it should read
SUM(TERMINATED)/(SUM(ACTIVE) + SUM(TERMINATED))*100
You got a too small number, that's why it wouldn't fit into two decimal places.
I would rewrite Your query like this:
WITH begin_of_month(begin_of_month) AS (VALUES CURRENT DATE - (DAY(CURRENT DATE)-1) DAYS),
X AS (
SELECT COUNT(*) AS terminated, 0 AS active
FROM driver
WHERE termination_date >= begin_of_month
AND termination_date < begin_of_month + 1 MONTH --index, if present, can be used!
UNION ALL
SELECT 0 AS terminated, COUNT(*) AS active
FROM driver
WHERE active_in_disp = 'True'
)
SELECT CAST(FLOAT(SUM(TERMINATED))/(SUM(ACTIVE) + SUM(TERMINATED))*100 AS DECIMAL (10,2)) --CAST only if You wish to display with 2 decimal places
FROM X

Related

write a query to calculate cumulative performance based on daily percent change in postgresql?

I have daily change in a table like below.
Table: performance
date
percent_change
2022/12/01
2
2022/12/02
-1
2022/12/03
3
I want to assume initial value as 100. and shows cumulative value till date, like below.
expected output:
date
percent_change
cumulative value
2022/12/01
2
102
2022/12/02
-1
100.98
2022/12/03
3
104.0094
A product of values, like the one you want to make, is nothing more than EXP(SUM(LN(...))). It results in a slightly verbose query but does not require new functions to be coded and can be ported as is to other DBMS.
In your case, as long as none of your percentages is below -100%:
SELECT date,
percent_change,
100 * EXP(SUM(LN(1+percent_change/100)) OVER (ORDER BY Date)) AS cumulative_value
FROM T
The SUM(...) OVER (ORDER BY ...) is what makes it a cumulative sum.
If you need to account for percentages lower than -100%, you need a bit more complexity.
SELECT date,
percent_change,
100 * -1 ^ SUM(CASE WHEN percent_change < -100 THEN 1 ELSE 0 END) OVER (ORDER BY Date)
* EXP(SUM(LN(ABS(1+percent_change/100))) OVER (ORDER BY Date))
AS cumulative_value
FROM T
WHERE NOT EXISTS (SELECT FROM T T2 WHERE T2.percent_change = -100 AND T2.date <= T.date)
UNION ALL
SELECT Date, percent_change, 0
FROM T
WHERE EXISTS (SELECT FROM T T2 WHERE T2.percent_change = -100 AND T2.date <= T.date)
Explanation:
An ABS(...) has been added to account for the values not supported in the previous query. It effectively strips the sign of 1 + percentage_value / 100
Before the EXP(SUM(LN(ABS(...)))), the -1 ^ SUM(...) is where the sign is put back to the calculation. Read it as: -1 to the power of how many times we encountered a negative value.
The part WHERE EXISTS(...) / WHERE NOT EXISTS(...) handles the special case of percentage_value = -100%. When we encounter -100, we cannot calculate the logarithm even with a call to ABS(...).However, this does not matter much as the products you want to calculate are going to be 0 from this point onward.
Side note:
You can save yourself some of the complexity of the above queries by changing how you store the changes.
Storing 0.02 to represent 2% removes the multiplications/divisions by 100.
Storing 0.0198026272961797 (LN(1 + 0.02)) removes the need to call for a logarithm in your query.
I assume that date in 3rd row is 2022/12/03. Otherwise you need to add an id or some other column to have order on percent changes that occurred in the same day.
Solution
To calculate value after percent_change, you need to multiply your current value by (100 + percent_change) / 100
For day n cumulative value is 100 multiplied by product of coefficients (100 + percent_change) / 100 up to day n.
In PostgreSQL "up to day n" can be implemented with window functions.
Since there is no aggregate function for multiplication, lets create it.
CREATE AGGREGATE PRODUCT(DOUBLE PRECISION) (
SFUNC = float8mul,
STYPE = FLOAT8
);
Final query will look like this:
SELECT
date,
percent_change,
100 * product((100 + percent_change)::float / 100) OVER (ORDER BY date) cumulative_value
FROM performance;

How to subtract a seperate count from one grouping

I have a postgres query like this
select application.status as status, count(*) as "current_month" from application
where to_char(application.created, 'mon') = to_char('now'::timestamp - '1 month'::interval, 'mon')
and date_part('year',application.created) = date_part('year', CURRENT_DATE)
and application.job_status != 'expired'
group by application.status
it returns the table below that has the number of applications grouped by status for the current month. However I want to subtract a total count of a seperate but related query from the internal review number only. I want to count the number of rows with type = abc within the same table and for the same date range and then subtract that amount from the internal review number (Type is a seperate field). Current_month_desired is how it should look.
status
current_month
current_month_desired
fail
22
22
internal_review
95
22
pass
146
146
UNTESTED: but maybe...
The intent here is to use an analytic and case expression to conditionally sum. This way, the subtraction is not needed in the first place as you are only "counting" the values needed.
SELECT application.status as status
, sum(case when type = 'abc'
and application.status ='internal_review' then 0
else 1 end) over (partition by application.status)) as
"current_month"
FROM application
WHERE to_char(application.created, 'mon') = to_char('now'::timestamp - '1 month'::interval, 'mon')
and date_part('year',application.created) = date_part('year', CURRENT_DATE)
and application.job_status != 'expired'
GROUP BY application.status

How can I query concurrent events, i.e. usage, in postgres?

From my first table of events below, what query would give me my second table of usage?
start
end
08:42
08:47
08:44
08:50
start
end
count
08:42
08:44
1
08:44
08:47
2
08:47
08:50
1
What if any indexes should I create to speed this up?
The main thing I often need is the peak usage and when it is (i.e. max count row from above), so also is there a quicker way to get one/both of these?
Also, is it quicker to query for each second (which I can imagine how to do), e.g:
time
count
08:42
1
08:43
1
08:44
2
08:45
2
08:46
2
08:47
1
08:48
1
08:49
1
NB my actual starts/ends are timestamp(6) with time zone and I have thousands of records, but I hope my example above is useful.
step-by-step demo:db<>fiddle
SELECT
t as start,
lead as "end",
sum as count
FROM (
SELECT
t,
lead(t) OVER (ORDER BY t), -- 2a
type,
SUM(type) OVER (ORDER BY t) -- 2b
FROM (
SELECT -- 1
start as t,
1 as type
FROM mytable
UNION
SELECT
stop,
-1 as type
FROM mytable
) s
) s
WHERE sum > 0 -- 3
Put all time values into one column. Add the 1 value to former start values and -1 to former end values
a) put the next time values into the current record b) use cumulative SUM() over the newly added 1/-1 value. Each start point increased the count, each end value decreased it. This is your expected count
Remove all records without an interval.
The above only works properly if your borders are distinct. If you have interval borders at the same time point, you have to change the UNION into UNION ALL (which keeps same values) and group this result afterwards to generate for example -2 from two -1 values at same time slot:
step-by-step demo:db<>fiddle
SELECT
t as start,
lead as "end",
sum as count
FROM (
SELECT
t,
lead(t) OVER (ORDER BY t),
type,
SUM(type) OVER (ORDER BY t)
FROM (
SELECT
t,
SUM(type) AS type
FROM (
SELECT
start as t,
1 as type
FROM mytable
UNION ALL
SELECT
stop,
-1 as type
FROM mytable
) s
GROUP BY t
) s
) s
WHERE sum > 0

Postgres: difference between two timestamps (hours:minutes:seconds)

i'm creating a select that calculate the difference between two timestamps
here the code: (isn't necessary you understand tables below. Just follow the thread)
(select value from demo.data where id=q.id and key='timestampend')::timestamp
- (select value from demo.data where id=q.id and key='timestampstart')::timestamp) as durata
Look at this example, if you want easier:
select timestamp_end::timestamp - timestamp_start as duration
here the result:
// "durata" is duration
The problem is that the first timestamp is 2017-06-21 and the second is 2017-06-22 so we have 1 day and some hours of difference.
How can i do for show the result not like "1 day 02:06:41.993657" but "26:06:41.993657" without milliseconds (26:06:41)?
Update
I'm testing this query:
select id as ticketid,
(select value from demo.data where id=q.id and key = 'timestampstart')::timestamp as TEnd,
(select value from demo.data where id=q.id and key = 'timestampend')::timestamp as TStart,
(select
make_interval
(
0,0,0,0, -- years, months, weeks, days
extract(days from duration1)::int * 24 + extract(hours from duration1)::int, -- calculated hours (days * 24 + hours)
extract(mins from duration1)::int, -- minutes
floor(extract(secs from duration1))::int -- seconds, without miliseconds, thus FLOOR()
) as duration1
from
(
(select value from demo.data where id=q.id and key='timestampstart')::timestamp - (select value from demo.data where id=q.id and key='timestampend')::timestamp
) t(duration) as dur
from (select distinct id from demo.data) q
error is the same: [Err] ERROR: syntax error at or near "::"
there is an error on id = q.id
data table is like this:
You could use EXTRACT function and wrap it up with MAKE_INTERVAL and some math. It's pretty straight forward, since you pass each part of timestamp to it:
select
make_interval(
0,0,0,0, -- years, months, weeks, days
extract(days from durdata)::int * 24 + extract(hours from durdata)::int, -- calculated hours (days * 24 + hours)
extract(mins from durdata)::int, -- minutes
floor(extract(secs from durdata))::int -- seconds, without miliseconds, thus FLOOR()
) as durdata
from (
select '2017-06-22 02:06:41.993657'::timestamp - '2017-06-21'::timestamp
) t(durdata);
Output:
durdata
----------
26:06:41
You could wrap it up within a function to make it easy to work with.
There is no worry about timestamp - timestamp returning an output with precision to more than days, and thus losing you some information, because even calculation for different years would still return days and additional time part.
Example:
postgres=# select ('2019-06-22 01:03:05.993657'::timestamp - '2017-06-21'::timestamp) as durdata;
durdata
------------------------
731 days 01:03:05.993657
In Postgres, although interval data type allows having hours value greater than 23 (see https://www.postgresql.org/docs/9.6/static/functions-formatting.html), to_char() function will cut out days and will take only "hours within a day" if you put delta value to it and try to get 'HH24' value.
So, I ended up with such trick, combining to_char(...) with extract('epoch' from...) and then putting the concatinated value to another to_char():
with timestamps(ts1, ts2) as (
select
'2017-06-21'::timestamptz,
'2017-06-22 01:03:05.1212'::timestamptz
), res as (
select
round(extract('epoch' from ts2 - ts1) / 3600) as hours,
to_char(ts2 - ts1, 'MI:SS') as min_sec
from timestamps
)
select hours, min_sec, to_char(format('%s:%s', hours, min_sec)::interval, 'HH24:MI:SS')
from res;
The result is:
hours | min_sec | to_char
-------+---------+----------
25 | 03:05 | 25:03:05
(1 row)
You can define an SQL function to make using it easier:
create or replace function extract_hhmmss(timestamptz, timestamptz) returns interval as $$
with delta(i) as (
select
case when $2 > $1 then $2 - $1
else $1 - $2
end
), res as (
select
round(extract('epoch' from i) / 3600) as hours,
to_char(i, 'MI:SS') as min_sec
from delta
)
select
(
case when $2 < $1 then '-' else '' end
|| to_char(format('%s:%s', hours, min_sec)::interval, 'HH24:MI:SS')
)::interval
from res;
$$ language sql stable;
Example of usage:
[local]:5432 nikolay#test=# select extract_hhmmss('2017-06-21'::timestamptz, '2017-06-22 01:03:05.1212'::timestamptz);
extract_hhmmss
----------------
25:03:05
(1 row)
Time: 0.882 ms
[local]:5432 nikolay#test=# select extract_hhmmss('2017-06-22 01:03:05.1212'::timestamptz, '2017-06-21'::timestamptz);
extract_hhmmss
----------------
-25:03:05
(1 row)
Notice, that it will give an error if timestamps are provided in reverse order, but it's not really hard to fix. // Update: already fixed.

Column of counts for time intervals

I want to get a table that constructs a column that tracks how many times an id appears in a given week. If the id appears once it is given a 1, if it appears twice it is given a 2, but if it appears more than two times it is given a 0.
id date
a 2015-11-10
a 2015-11-25
a 2015-11-09
b 2015-11-10
b 2015-11-09
a 2015-11-05
b 2015-11-23
b 2015-11-28
b 2015-12-04
a 2015-11-10
b 2015-12-04
a 2015-12-07
a 2015-12-09
c 2015-11-30
a 2015-12-06
c 2015-10-31
c 2015-11-04
b 2015-12-01
a 2015-10-30
a 2015-12-14
the one week intervals are given as follows
1 - 2015-10-30 to 2015-11-05
2 - 2015-11-06 to 2015-11-12
3 - 2015-11-13 to 2015-11-19
4 - 2015-11-20 to 2015-11-26
5 - 2015-11-27 to 2015-12-03
6 - 2015-12-04 to 2015-12-10
7 - 2015-12-11 to 2015-12-17
The table should look like this.
id interval count
a 1 2
b 1 0
c 1 2
a 2 0
b 2 2
c 2 0
a 3 0
b 3 0
c 3 0
a 4 1
b 4 1
c 4 0
a 5 0
b 5 2
c 5 1
a 6 0
b 6 2
c 6 0
a 7 1
b 7 0
c 7 0
The interval column doesn't have to be there, I simply added it for clarity.
I am new to sql and am unsure how to break the dates into intervals. The only thing I have is grouping by date and counting.
Select id ,date, count (*) as frequency
from data_1
group by id, date having frequency <= 2;
Looking at just the data you provided, this does the trick:
SELECT v.id,
i.interval,
coalesce((CASE WHEN sub.cnt < 3 THEN sub.cnt ELSE 0 END), 0) AS count
FROM (VALUES('a'), ('b'), ('c')) v(id)
CROSS JOIN generate_series(1, 7) i(interval)
LEFT JOIN (
SELECT id, ((date - '2015-10-30')/7 + 1)::int AS interval, count(*) AS cnt
FROM my_table
GROUP BY 1, 2) sub USING (id, interval)
ORDER BY 2, 1;
A few words of explanation:
You have three id values which are here recreated with a VALUES clause. If you have many more or don't know beforehand which id's to enumerate, you can always replace the VALUES clause with a sub-query.
You provide a specific date range over 7 weeks. Since you might have weeks where a certain id is not present you need to generate a series of the interval values and CROSS JOIN that to the id values above. This yields the 21 rows you are looking for.
Then you calculate the occurrences of ids in intervals. You can subtract a date from another date which will give you the number of days in between. So subtract the date of the row from the earliest date, divide that by 7 to get the interval period, add 1 to make the interval 1-based and convert to integer. You can then convert counts of > 2 to 0 and NULL to 0 with a combination of CASE and coalesce().
The query outputs the interval too, otherwise you will have no clue what the data refers to. Optionally, you can turn this into a column which shows the date range of the interval.
More flexible solution
If you have more ids and a larger date range, you can use the below version which first determines the distinct ids and the date range. Note that the interval is now 0-based to make calculations easier. Not that it matters much because instead of the interval number, the corresponding date range is displayed.
WITH mi AS (
SELECT min(date) AS min, ((max(date) - min(date))/7)::int AS intv FROM my_table)
SELECT v.id,
to_char((mi.min + i.intv * 7)::timestamp, 'YYYY-mm-dd') || ' - ' ||
to_char((mi.min + i.intv * 7 + 6)::timestamp, 'YYYY-mm-dd') AS period,
coalesce((CASE WHEN sub.cnt < 3 THEN sub.cnt ELSE 0 END), 0) AS count
FROM mi,
(SELECT DISTINCT id FROM my_table) v
CROSS JOIN LATERAL generate_series(0, mi.intv) i(intv)
LEFT JOIN LATERAL (
SELECT id, ((date - mi.min)/7)::int AS intv, count(*) AS cnt
FROM my_table
GROUP BY 1, 2) sub USING (id, intv)
ORDER BY 2, 1;
SQLFiddle with both solutions.
Assuming you have a table of all users, this will do the trick.
select
users.id,
interval_table.id,
CASE
WHEN count(log_table.user_id)>2 THEN 0
ELSE count(log_table.user_id)
END
from users
cross join interval_table
left outer join log_table
on users.id = log_table.user_id
and log_table.event_date >= interval_table.start_interval
and log_table.event_date < interval_table.stop_interval
group by users.id, interval_table.id
order by interval_table.id, users.id
Check it out: http://sqlfiddle.com/#!15/1a822/21