Salary Accumulation by mgr (DB2 for Oracle CONNECT_BY) - db2

We have data from scott.emp table:
select empno, ename,mgr, sal
from emp
order by empno
;
EMPNO|ENAME | MGR| SAL
-----|----------|-----|----------
7369|SMITH | 7902| 800
7499|ALLEN | 7698| 1600
7521|WARD | 7698| 1250
7566|JONES | 7839| 2975
7654|MARTIN | 7698| 1250
7698|BLAKE | 7839| 2850
7782|CLARK | 7839| 2450
7788|SCOTT | 7566| 3000
7839|KING | | 5000
7844|TURNER | 7698| 1500
7876|ADAMS | 7788| 1100
7900|JAMES | 7698| 950
7902|FORD | 7566| 3000
7934|MILLER | 7782| 1300
14 rows selected.
Then I want to list hirearchy of emp as well as Salary Accumulation, which returns rows like this:
DESKRIPSI | EMPNO| MGR| AMOUNT
----------|------|------|--------
KING | 7839| | 29.025
.JONES | 7566| 7839| 10.875
..SCOTT | 7788| 7566| 4.100
...ADAMS | 7876| 7788| 1.100
..FORD | 7902| 7566| 3.800
...SMITH | 7369| 7902| 800
.BLAKE | 7698| 7839| 9.400
..ALLEN | 7499| 7698| 1.600
..WARD | 7521| 7698| 1.250
..MARTIN | 7654| 7698| 1.250
..TURNER | 7844| 7698| 1.500
..JAMES | 7900| 7698| 950
.CLARK | 7782| 7839| 3.750
..MILLER | 7934| 7782| 1.300
14 rows selected.
With Oracle RDBMS, the approach is like this:
WITH pohon
AS (SELECT
DISTINCT CONNECT_BY_ROOT empno parent_id, empno AS id
FROM
emp
CONNECT BY
PRIOR empno = mgr),
trx
AS (SELECT
pohon.parent_id, SUM (tx.sal) AS amount
FROM
pohon JOIN emp tx ON pohon.id = tx.empno
GROUP BY
pohon.parent_id)
SELECT
LPAD (r0.ename, LENGTH (r0.ename) + LEVEL * 1 - 1, '.') AS deskripsi,empno, mgr,
trx.amount
FROM
emp r0 JOIN trx ON r0.empno = trx.parent_id
START WITH
r0.mgr IS NULL
CONNECT BY
r0.mgr = PRIOR r0.empno
;
How can I get the same result in DB2 RDBMS?
Regards.

There are two ways, either enable the Oracle-compatible CONNECT_BY clause
or use SQL Standard recursive SQL as per the example on this page
Example 2: Summarized explosion The second example is a summarized
explosion. The question posed here is, what is the total quantity of
each part required to build part '01'. The main difference from the
single level explosion is the requirement to aggregate the quantities.
The first example indicates the quantity of subparts required for the
part whenever it is required. It does not indicate how many of the
subparts are needed to build part '01'.
WITH RPL (PART, SUBPART, QUANTITY) AS
(
SELECT ROOT.PART, ROOT.SUBPART, ROOT.QUANTITY
FROM PARTLIST ROOT
WHERE ROOT.PART = '01'
UNION ALL
SELECT PARENT.PART, CHILD.SUBPART, PARENT.QUANTITY*CHILD.QUANTITY
FROM RPL PARENT, PARTLIST CHILD
WHERE PARENT.SUBPART = CHILD.PART
)
SELECT PART, SUBPART, SUM(QUANTITY) AS "Total QTY Used"
FROM RPL
GROUP BY PART, SUBPART
ORDER BY PART, SUBPART;

The following statement returns the results desired. Run it as is.
WITH EMP (EMPNO, ENAME, MGR, SAL) AS
(
VALUES
(7369, 'SMITH ', 7902, 800)
, (7499, 'ALLEN ', 7698, 1600)
, (7521, 'WARD ', 7698, 1250)
, (7566, 'JONES ', 7839, 2975)
, (7654, 'MARTIN ', 7698, 1250)
, (7698, 'BLAKE ', 7839, 2850)
, (7782, 'CLARK ', 7839, 2450)
, (7788, 'SCOTT ', 7566, 3000)
, (7839, 'KING ', NULL, 5000)
, (7844, 'TURNER ', 7698, 1500)
, (7876, 'ADAMS ', 7788, 1100)
, (7900, 'JAMES ', 7698, 950)
, (7902, 'FORD ', 7566, 3000)
, (7934, 'MILLER ', 7782, 1300)
)
, C (LVL, EMPNO, MGR, ENAME, SAL, CHAIN) AS
(
SELECT
0 AS LVL, EMPNO, MGR, ENAME, SAL
, CAST('|' || TRIM(CHAR(EMPNO)) || '|' AS VARCHAR(1024)) AS CHAIN
FROM EMP C
WHERE NOT EXISTS (SELECT 1 FROM EMP P WHERE P.EMPNO = C.MGR)
UNION ALL
SELECT C.LVL + 1 AS LVL, E.EMPNO, E.MGR, E.ENAME, E.SAL
, C.CHAIN || TRIM(CHAR(E.EMPNO)) || '|' AS CHAIN
FROM C, EMP E
WHERE E.MGR = C.EMPNO
)
SELECT
REPEAT('.', LVL) || ENAME AS DESKRIPSI
, EMPNO
, MGR
, (SELECT SUM(SAL) FROM C C2 WHERE C2.CHAIN LIKE C1.CHAIN || '%') AS AMOUNT
FROM C C1
ORDER BY CHAIN;

Related

how to drop rows if a variale is less than x, in sql

I have the following query code
query = """
with double_entry_book as (
SELECT to_address as address, value as value
FROM `bigquery-public-data.crypto_ethereum.traces`
WHERE to_address is not null
AND block_timestamp < '2022-01-01 00:00:00'
AND status = 1
AND (call_type not in ('delegatecall', 'callcode', 'staticcall') or call_type is null)
union all
-- credits
SELECT from_address as address, -value as value
FROM `bigquery-public-data.crypto_ethereum.traces`
WHERE from_address is not null
AND block_timestamp < '2022-01-01 00:00:00'
AND status = 1
AND (call_type not in ('delegatecall', 'callcode', 'staticcall') or call_type is null)
union all
)
SELECT address,
sum(value) / 1000000000000000000 as balance
from double_entry_book
group by address
order by balance desc
LIMIT 15000000
"""
In the last part, I want to drop rows where "balance" is less than, let's say, 0.02 and then group, order, etc. I imagine this should be a simple code. Any help will be appreciated!
We can delete on a CTE and use returning to get the id's of the rows being deleted, but they still exist until the transaction is comitted.
CREATE TABLE t (
id serial,
variale int);
insert into t (variale) values
(1),(2),(3),(4),(5);
✓
5 rows affected
with del as
(delete from t
where variale < 3
returning id)
select
t.id,
t.variale,
del.id ids_being_deleted
from t
left join del
on t.id = del.id;
id | variale | ids_being_deleted
-: | ------: | ----------------:
1 | 1 | 1
2 | 2 | 2
3 | 3 | null
4 | 4 | null
5 | 5 | null
select * from t;
id | variale
-: | ------:
3 | 3
4 | 4
5 | 5
db<>fiddle here

PostgreSQL: Find percentages of total_films_rented

The code below gives me the following results
Early: 7738
Late: 6586
On Time: 1720
How would I take this a step further and add a third column that finds the percentages?
Here is a link to the ERD and database set-up: https://www.postgresqltutorial.com/postgresql-sample-database/
WITH
t1
AS
(
SELECT *, DATE_PART('day', return_date - rental_date) AS days_rented
FROM rental
),
t2
AS
(
SELECT rental_duration, days_rented,
CASE WHEN rental_duration > days_rented THEN 'Early'
WHEN rental_duration = days_rented THEN 'On Time'
ELSE 'Late'
END AS rental_return_status
FROM film f, inventory i, t1
WHERE f.film_id = i.film_id AND t1.inventory_id = i.inventory_id
)
SELECT rental_return_status, COUNT(*) AS total_films_rented
FROM t2
GROUP BY 1
ORDER BY 2 DESC;
You can use a window function with one CTE table (instead of 2):
WITH raw_status AS (
SELECT rental_duration - DATE_PART('day', return_date - rental_date) AS days_remaining
FROM rental r
JOIN inventory i ON r.inventory_id=i.inventory_id
JOIN film f on f.film_id=i.film_id
)
SELECT CASE WHEN days_remaining > 0 THEN 'Early'
WHEN days_remaining = 0 THEN 'On Time'
ELSE 'Late' END AS rental_status,
count(*),
(100*count(*))/sum(count(*)) OVER () AS percentage
FROM raw_status
GROUP BY 1;
rental_status | count | percentage
---------------+-------+---------------------
Early | 7738 | 48.2298678633757168
On Time | 1720 | 10.7205185739217153
Late | 6586 | 41.0496135627025679
(3 rows)
Disclosure: I work for EnterpriseDB (EDB)
Use a window function to get the sum of the count column (sum(count(*)) over ()), then just divide the count by that (count(*)/sum(count(*)) over ()). Multiply by 100 to make it a percentage.
psql (12.1 (Debian 12.1-1))
Type "help" for help.
testdb=# CREATE TABLE faket2 AS (
SELECT 'early' AS rental_return_status UNION ALL
SELECT 'early' UNION ALL
SELECT 'ontime' UNION ALL
SELECT 'late');
SELECT 4
testdb=# SELECT
rental_return_status,
COUNT(*) as total_films_rented,
(100*count(*))/sum(count(*)) over () AS percentage
FROM faket2
GROUP BY 1
ORDER BY 2 DESC;
rental_return_status | total_films_rented | percentage
----------------------+--------------------+---------------------
early | 2 | 50.0000000000000000
late | 1 | 25.0000000000000000
ontime | 1 | 25.0000000000000000
(3 rows)

How to force query to return only first row from window?

I have data:
id | price | date
1 | 25 | 2019-01-01
2 | 35 | 2019-01-01
1 | 27 | 2019-02-01
2 | 37 | 2019-02-01
Is it possible to write such query which will return only first row from window? something like LIMIT 1 but for the window OVER( date )?
I expect next result:
id | price | date
1 | 25 | 2019-01-01
1 | 27 | 2019-02-01
Or ignore whole window if first window row has NULL:
id | price | date
1 | NULL | 2019-01-01
2 | 35 | 2019-01-01
1 | 27 | 2019-02-01
2 | 37 | 2019-02-01
result:
1 | 27 | 2019-02-01
Order the rows by date and id, and take only the first row per date.
Then remove those where the price is NULL.
SELECT *
FROM (SELECT DISTINCT ON (date)
id, price, date
FROM mytable
ORDER BY date, id
) AS q
WHERE price IS NOT NULL;
#Laurenz let me to provide a bit more explanation
select distinct on (<fldlist>) * from <table> order by <fldlist+>;
is equal to much more complex query:
select * from (
select row_number() over (partition by <fldlist> order by <fldlist+>) as rn,*
from <table>)
where rn = 1;
And here <fldlist> should be the beginning part (or equal) of <fldlist+>
As Myon on IRC said:
if you want to use a window function in WHERE, you need to put it into a subselect first
So the target query is:
select * from (
select
*
agg_function( my_field ) OVER( PARTITION BY other_field ) as agg_field
from sometable
) x
WHERE agg_field <condition>
In my case I have next query:
SELECT * FROM (
SELECT *,
FIRST_VALUE( p.price ) over( PARTITION BY crate.app_period ORDER BY st.DEPTH ) AS first_price,
ROW_NUMBER() over( PARTITION BY crate.app_period ORDER BY st.DEPTH ) AS row_number
FROM st
LEFT JOIN price p ON <COND>
LEFT JOIN currency_rate crate ON <COND>
) p
WHERE p.row_number = 1 AND p.first_price IS NOT null
Here I select only first rows from the group and where price IS NOT NULL

How do I join multiple select results into a single table?

I have a query which returns monthly averages from the same table, but for different pressure_level's:
SELECT some_id, avg(exposure_value) monthly_avg_1000
FROM mytable
WHERE pressure_level = 1000
AND some_id = 7
GROUP BY some_id, date_trunc('month', measurement_time)
I then have the same query, but for a different pressure_level:
SELECT some_id, avg(exposure_value) monthly_avg_925
FROM mytable
WHERE pressure_level = 925
AND some_id = 7
GROUP BY some_id, date_trunc('month', measurement_time)
Both queries return 12 rows (1 per month) with the ID and the average value for the month:
some_id | monthly_avg_1000
--------------------------
1 | 0.000023
1 | 0.000051
1 | 0.000009
some_id | monthly_avg_925
--------------------------
1 | 0.000014
1 | 0.000007
1 | 0.000131
I would like to combine the two queries so that the monthly_avg_* columns all appear in the final table:
some_id | monthly_avg_1000 | monthly_avg_925
--------------------------
1 | 0.000023 | 0.000014
1 | 0.000051 | 0.000007
1 | 0.000009 | 0.000131
How can I do this?
if you have same id, then you can try join:
with a as (
SELECT some_id, avg(exposure_value) monthly_avg_1000,date_trunc('month', measurement_time) d
FROM mytable
WHERE pressure_level = 1000
AND some_id = 7
GROUP BY some_id, date_trunc('month', measurement_time)
)
, b as (
SELECT some_id, avg(exposure_value) monthly_avg_925, date_trunc('month', measurement_time) d
FROM mytable
WHERE pressure_level = 925
AND some_id = 7
GROUP BY some_id, date_trunc('month', measurement_time)
)
select distinct a.some_id, monthly_avg_1000,monthly_avg_925
from a
join b on a.some_id = b.some_id and a.d = b.d

Joining many tables on same data and returning all rows

UPDATE:
my orgional attempt to use FULL OUTER JOIN did not work correctly. I have updated the question to reflex the true issue. Sorry for presenting a classic XY PROBLEM.
I'm trying to retrieve a dataset from multiple tables all in one query thats is grouped by year, month of the data.
The final result should look like this:
| Year | Month | Col1 | Col2 | Col3 |
|------+-------+------+------+------|
| 2012 | 11 | 231 | - | - |
| 2012 | 12 | 534 | 12 | 13 |
| 2013 | 1 | - | 22 | 14 |
Coming from data that looks like this:
Table 1:
| Year | Month | Data |
|------+-------+------|
| 2012 | 11 | 231 |
| 2012 | 12 | 534 |
Table 2:
| Year | Month | Data |
|------+-------+------|
| 2012 | 12 | 12 |
| 2013 | 1 | 22 |
Table 3:
| Year | Month | Data |
|------+-------+------|
| 2012 | 12 | 13 |
| 2013 | 1 | 14 |
I tried using FULL OUTER JOIN but this doesn't quite work because in my SELECT clause because no matter which table I select 'Year' and 'Month' from there are null values.
SELECT
Collase(t1.year,t2.year,t3.year)
,Collese(t1.month,t2.month,t3.month)
,t1.data as col1
,t2.data as col2
,t3.data as col3
From t1
FULL OUTER JOIN t2
on t1.year = t2.year and t1.month = t2.month
FULL OUTER JOIN t3
on t1.year = t3.year and t1.month = t3.month
Result is something like this (is too confusing to repeat exactly what i would get using this demo data):
| Year | Month | Col1 | Col2 | Col3 |
|------+-------+------+------+------|
| 2012 | 11 | 231 | - | - |
| 2012 | 12 | 534 | 12 | 13 |
| 2013 | 1 | - | 22 | |
| - | 1 | - | - | 14 |
If your data allows it (not 100 columns), this is usually a clean way of doing it:
select year, month, sum(col1) as col1, sum(col2) as col2, sum(col3) as col3
from (
SELECT t1.year, t1.month, t1.data as col1, 0 as col2, 0 as col3
From t1
union all
SELECT t2.year, t2.month, 0 as col1, t2.data as col2, 0 as col3
From t2
union all
SELECT t3.year, t3.month, 0 as col1, 0 as col2, t3.data as col3
From t3
) as data
group by year, month
If you are using SQL Server 2005 or later version, you could also try this PIVOT solution:
SELECT
Year,
Month,
Col1,
Col2,
Col3
FROM (
SELECT Year, Month, 'Col1' AS Col, Data FROM t1
UNION ALL
SELECT Year, Month, 'Col2' AS Col, Data FROM t2
UNION ALL
SELECT Year, Month, 'Col3' AS Col, Data FROM t3
) f
PIVOT (
SUM(Data) FOR Col IN (Col1, Col2, Col3)
) p
;
This query can be tested and played with at SQL Fiddle.
Perhaps you are looking for the COALESCE keyword? It takes a list of columns and returns the first one that is NOT NULL, or NULL if all arguments are null. In your example, you would do something like this.
SELECT COALESCE(t1.data, t2.data)
You would still need to join tables in this case. It would just cut down on the case statements.
You could derive the complete list of years and months from all the tables, than join every table to that list (using a left join):
SELECT
f.Year,
f.Month,
t1.data AS col1,
t2.data AS col2,
t3.data AS col3
FROM (
SELECT Year, Month FROM t1
UNION
SELECT Year, Month FROM t2
UNION
SELECT Year, Month FROM t3
) f
LEFT JOIN t1 ON f.year = t1.year and f.month = t1.month
LEFT JOIN t2 ON f.year = t2.year and f.month = t2.month
LEFT JOIN t3 ON f.year = t3.year and f.month = t3.month
;
You can see a live demonstration of this query at SQL Fiddle.
if you are looking for the non-null values from either tabloe then you will have to add t1.dat IS NOT NULL as well. I hope that I understand your question.
CREATE VIEW joined_SALES
AS SELECT t1.year, t1.month, t1.data , t2.data
FROM table1 t1, table2 t2
WHERE
t1.year = t2.year
and t1.month = t2.month
and t1.dat IS NOT NULL
GROUP BY t1.year, t1.month;
This might be a better way, especially if you are going to do something with the data before returning it. Basically you are translating the table the data came from into a typeId.
declare #temp table
([year] int,
[month] int,
typeId int,
data decimal)
insert into #temp
SELECT t1.year, t1.month, 1, sum(t1.data)
From t1
group by t1.year, t1.month
insert into #temp
SELECT t2.year, t2.month, 2, sum(t2.data)
From t2
group by t1.year, t1.month
insert into #temp
SELECT t3.year, t3.month, 3, sum(t3.data)
group by t1.year, t1.month
select t.year, t.month,
sum(case when t.typeId = 1 then t.data end) as col1,
sum(case when t.typeId = 2 then t.data end) as col2,
sum(case when t.typeId = 3 then t.data end) as col3
from #temp t
group by t.year, t.month