SQL Query in DB2 - db2

I have a table and below are examples of the columns I am interested in and how the data looks
Ref | Date
Ref 1 | 2016-05-01
Ref 1 | 2016-05-01
Ref 2 | 2017-02-20
Ref 2 | 2017-02-20
Ref 2 | 2017-02-20
Ref 2 | 2015-12-10
I want a query that will return a count of all references and the number of instances of that duplicate date there are. So for the above it should return
Ref 1 | 2016-05-01 | Count 2
Ref 2 | 2017-02-20 | Count 3
Ref 2 | 2015-12-10 | Count 1
Cheers

You need to group by the columns you want to be unique and then you can use aggregate functions (like count()) which apply to each group
select ref, date, count(*)
from your_table
group by ref, date
having count(*) > 10

Related

PostgresQL for each row, generate new rows and merge

I have a table called example that looks as follows:
ID | MIN | MAX |
1 | 1 | 5 |
2 | 34 | 38 |
I need to take each ID and loop from it's min to max, incrementing by 2 and thus get the following WITHOUT using INSERT statements, thus in a SELECT:
ID | INDEX | VALUE
1 | 1 | 1
1 | 2 | 3
1 | 3 | 5
2 | 1 | 34
2 | 2 | 36
2 | 3 | 38
Any ideas of how to do this?
The set-returning function generate_series does exactly that:
SELECT
id,
generate_series(1, (max-min)/2+1) AS index,
generate_series(min, max, 2) AS value
FROM
example;
(online demo)
The index can alternatively be generated with RANK() (example, see also #a_horse_­with_­no_­name's answer) if you don't want to rely on the parallel sets.
Use generate_series() to generate the numbers and a window function to calculate the index:
select e.id,
row_number() over (partition by e.id order by g.value) as index,
g.value
from example e
cross join generate_series(e.min, e.max, 2) as g(value);

How to count rows using a variable date range provided by a table in PostgreSQL

I suspect I require some sort of windowing function to do this. I have the following item data as an example:
count | date
------+-----------
3 | 2017-09-15
9 | 2017-09-18
2 | 2017-09-19
6 | 2017-09-20
3 | 2017-09-21
So there are gaps in my data first off, and I have another query here:
select until_date, until_date - (lag(until_date) over ()) as delta_days from ranges
Which I have generated the following data:
until_date | delta_days
-----------+-----------
2017-09-08 |
2017-09-11 | 3
2017-09-13 | 2
2017-09-18 | 5
2017-09-21 | 3
2017-09-22 | 1
So I'd like my final query to produce this result:
start_date | ending_date | total_items
-----------+-------------+--------------
2017-09-08 | 2017-09-10 | 0
2017-09-11 | 2017-09-12 | 0
2017-09-13 | 2017-09-17 | 3
2017-09-18 | 2017-09-20 | 15
2017-09-21 | 2017-09-22 | 3
Which tells me the total count of items from the first table, per day, based on the custom ranges from the second table.
In this particular example, I would be summing up total_items BETWEEN start AND end (since there would be overlap on the dates, I'd subtract 1 from the end date to not count duplicates)
Anyone know how to do this?
Thanks!
Use the daterange type. Note that you do not have to calculate delta_days, just convert ranges to dataranges and use the operator <# - element is contained by.
with counts(count, date) as (
values
(3, '2017-09-15'::date),
(9, '2017-09-18'),
(2, '2017-09-19'),
(6, '2017-09-20'),
(3, '2017-09-21')
),
ranges (until_date) as (
values
('2017-09-08'::date),
('2017-09-11'),
('2017-09-13'),
('2017-09-18'),
('2017-09-21'),
('2017-09-22')
)
select daterange, coalesce(sum(count), 0) as total_items
from (
select daterange(lag(until_date) over (order by until_date), until_date)
from ranges
) s
left join counts on date <# daterange
where not lower_inf(daterange)
group by 1
order by 1;
daterange | total_items
-------------------------+-------------
[2017-09-08,2017-09-11) | 0
[2017-09-11,2017-09-13) | 0
[2017-09-13,2017-09-18) | 3
[2017-09-18,2017-09-21) | 17
[2017-09-21,2017-09-22) | 3
(5 rows)
Note, that in the dateranges above lower bounds are inclusive while upper bound are exclusive.
If you want to calculate items per day in the dateranges:
select
daterange, total_items,
round(total_items::dec/(upper(daterange)- lower(daterange)), 2) as items_per_day
from (
select daterange, coalesce(sum(count), 0) as total_items
from (
select daterange(lag(until_date) over (order by until_date), until_date)
from ranges
) s
left join counts on date <# daterange
where not lower_inf(daterange)
group by 1
) s
order by 1
daterange | total_items | items_per_day
-------------------------+-------------+---------------
[2017-09-08,2017-09-11) | 0 | 0.00
[2017-09-11,2017-09-13) | 0 | 0.00
[2017-09-13,2017-09-18) | 3 | 0.60
[2017-09-18,2017-09-21) | 17 | 5.67
[2017-09-21,2017-09-22) | 3 | 3.00
(5 rows)

DB2 SQL to aggregate value for months with no gaps

I have 2 tables which I need to join against, along with a table that is generated inline using WITH. The WITH is a daterange, and I need to display all rows from 1 table for all months, even where no data exists in the 2nd table.
This is the data within the tables :
Table REFERRAL_GROUPINGS
referral_group
--------------
VER
FRD
FCC
Table DATA_VALUES
referral_group | task_date | task_id | over_threshold
---------------+------------+---------+---------------
VER | 2015-10-01 | 10 | 0
FRD | 2015-11-04 | 20 | 1
The date range will need to select 3 months :
Oct-2015
Nov-2015
Dec-2015
The data I expect to end up with will be :
MonthYear | referral_group | count_of_group | total_over_threshold
----------+----------------+----------------+---------------------
Oct-2015 | VER | 1 | 0
Oct-2015 | FRD | 0 | 0
Oct-2015 | FCC | 0 | 0
Nov-2015 | VER | 0 | 0
Nov-2015 | FRD | 1 | 1
Nov-2015 | FCC | 0 | 0
Dec-2015 | VER | 0 | 0
Dec-2015 | FRD | 0 | 0
Dec-2015 | FCC | 0 | 0
DDL to create the 2 tables and populate with data is as below..
CREATE TABLE test_data (
referral_group char(3),
task_date date,
task_id integer,
over_threshold integer);
insert into test_data values
('VER','2015-10-01',10,1),
('FRD','2015-11-04',20,0);
CREATE TABLE referral_grouper (
referral_group char(3));
insert into referral_grouper values
('FRD'),
('VER'),
('FCC');
This is a very cut-down example which uses the minimal tables/columns for this example, which is why I have no primary keys/indexes.
I can get this running under LUW no problem, by using NOT EXISTS in the joins as per this SQL.
WITH DATERANGE(FROM_DTE,yyyymm, TO_DTE) AS
(
SELECT DATE('2015-10-01'), YEAR('2015-10-01')*100+MONTH('2015-10-01'), '2015-12-31'
FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT FROM_DTE + 1 DAY, YEAR(FROM_DTE+1 DAY)*100+MONTH(FROM_DTE+1 DAY), TO_DTE
FROM DATERANGE
WHERE FROM_DTE < TO_DTE
)
select
referral_grouper.referral_group,
daterange.yyyymm,
count(test_data.task_id) AS total_count,
COALESCE(SUM(over_threshold),0) AS total_over_threshold
FROM
test_data
RIGHT OUTER JOIN daterange ON (daterange.from_dte=test_data.task_date OR NOT EXISTS (SELECT 1 FROM daterange d2 WHERE d2.from_dte=test_data.task_date))
RIGHT OUTER JOIN referral_grouper ON (referral_grouper.referral_group=test_data.referral_group OR NOT EXISTS (SELECT 1 FROM referral_grouper g2 WHERE g2.referral_group=test_data.referral_group))
GROUP BY
referral_grouper.referral_group,
daterange.yyyymm
However... This needs to work on ZOS, and under ZOS you cannot use subqueries with EXISTS in a join. Removing the NOT EXISTS means the non existing rows no longer show up.
There must be a way to write the SQL to return all rows from the 2 linking tables without using NOT EXISTS, but I just cannot seem to find it. Any help with this would be very appreciated as it has me stumped

Traversing Gaps in sequential data

I have a table [ContactCallDetail] which stores call data for each leg of a call from our phone system. The data is stored with a 4 part primary key: ([SessionID], [SessionSeqNum], [NodeID], [ProfileID]). The [NodeID], [ProfileID] , and [SessionID] together make up a call, and the [SessionSeqNum] defines each leg of the call as the caller is transferred from one department/rep to the next.
I need to look at each leg of a call and, if a transfer occured, find the next leg of the call so I can report on where the transfered call went.
The problems I am facing are 1) the session sequence does not always start with the same number 2) there can be gaps in the sequence number 3) The table has 15,000,000 rows and is added to via data import every night, so I need a non cursor based solution.
Sample data
| sessionid | sessionseqnum | nodeid | profileid |
| 170000459184 | 0 | 1 | 1 |
| 170000459184 | 1 | 1 | 1 |
| 170000459184 | 3 | 1 | 1 |
| 170000229594 | 1 | 1 | 1 |
| 170000229594 | 2 | 1 | 1 |
| 170000229598 | 0 | 1 | 1 |
| 170000229598 | 2 | 1 | 1 |
| 170000229600 | 0 | 1 | 1 |
| 170000229600 | 1 | 1 | 1 |
| 170000229600 | 3 | 1 | 1 |
| 170000229600 | 5 | 1 | 1 |
I think what I need to do is create a lookup table using an identity column or rownum() or the like to get a new sequence number for the call legs that will have no gaps. How would I do this? Or if there is a different, best practices solution you could point me to that would be great.
You can use the lead() analytic function to identify the next session sequence number.
SELECT sessionid ,
nodeid ,
profileid ,
sessionseqnum ,
lead(sessionseqnum) OVER ( PARTITION BY sessionid, nodeid, profileid ORDER BY sessionseqnum ) AS next_seq_num
FROM ContactCallDetail
ORDER BY sessionid ,
nodeid ,
profileid ,
sessionseqnum;
sessionid nodeid profileid sessionseqnum next_seq_num
--
170000229594 1 1 1 2
170000229594 1 1 2
170000229598 1 1 0 2
170000229598 1 1 2
170000229600 1 1 0 1
170000229600 1 1 1 3
170000229600 1 1 3 5
170000229600 1 1 5
170000459184 1 1 0 1
170000459184 1 1 1 3
170000459184 1 1 3
The ORDER BY clause isn't strictly necessary; it just makes it easier for humans to read the output.
Now you can join the original table to produce a row that shows relevant pairs of rows. There are several different ways to express that in standard SQL. Here, I'm using a common table expression.
WITH next_seq_nums
AS ( SELECT * ,
lead(sessionseqnum) OVER ( PARTITION BY sessionid, nodeid, profileid ORDER BY sessionseqnum ) AS next_seq_num
FROM ContactCallDetail
)
SELECT t1.sessionid ,
t1.nodeid ,
t1.profileid ,
t1.sessionseqnum ,
t2.sessionseqnum next_sessionseqnum ,
t2.nodeid next_nodeid ,
t2.profileid next_profileid
FROM next_seq_nums t1
LEFT JOIN ContactCallDetail t2 ON t1.sessionid = t2.sessionid
AND t1.nodeid = t2.nodeid
AND t1.profileid = t2.profileid
AND t1.next_seq_num = t2.sessionseqnum
ORDER BY t1.sessionid ,
t1.nodeid ,
t1.profileid ,
t1.sessionseqnum;
The LEFT JOIN will leave NULLs in the rows for the last session sequence numbers in each session. That makes sense--on the last row, there isn't a "next leg of the call". But it's easy enough to exclude those rows if you need to.
If your dbms doesn't support the lead() analytic function, you can replace the common table expression above with this one.
WITH next_seq_nums
AS ( SELECT t1.* ,
( SELECT MIN(sessionseqnum)
FROM contactcalldetail
WHERE sessionid = t1.sessionid
AND nodeid = t1.nodeid
AND profileid = t1.profileid
AND sessionseqnum > t1.sessionseqnum
) next_seq_num
FROM contactcalldetail t1
)
...
with cte
as
(SELECT *,
rank() OVER
(partition BY sessionid,profileid,nodeid
ORDER BY sessionseqnum ) AS Rank
FROM dbo.Table_1)
SELECT
cte.sessionid,cte.nodeid,cte.profileid,cte.sessionseqnum,cte_1.sessionseqnum
FROM cte LEFT JOIN
cte AS cte_1
ON cte.sessionid = cte_1.sessionid
and cte.profileid= cte_1.profileid
and cte.nodeid= cte_1.nodeid
and cte.rank= cte_1.rank-1

Count number of rows with distinct value in column in T-SQL

In T-SQL, how can I query this table to show me record counts based on how many times a distinct value appears in a column?
For example, I have a table defined as:
ControlSystemHierarchy
----------------------
ParentDeviceID int
ChildDeviceID int
Instrument bit
I want to display the number of records that match each distinct ParentDeviceID in the table so that this table
ParentDeviceID | ChildDeviceID | Instrument
1 | 1 | 0
1 | 2 | 0
1 | 2 | 1
2 | 3 | 0
would return
ParentDeviceID | Count
1 | 3
2 | 1
select ParentDeviceID, count(*) as [Count]
from ControlSystemHierarchy
group by ParentDeviceID