Need help in Group by in DB2 - group-by

This is Parent_Child table.
PARENT CHILD EFF_DATE
22716 2528 3/8/2011
22716 5696 3/8/2011
22716 3698 3/8/2011
22716 5698 3/18/2010
37091 4569 10/22/2013
37091 6931 9/17/2014
Query result should look like this:
PARENT CHILD EFF_DATE
22716 2528 3/8/2011
22716 5696 3/8/2011
22716 3698 3/8/2011
37091 6931 9/17/2014
Query tried:
SELECT DISTINCT P.PARENT,P.CHILD,C.MAX_DATE
FROM parent_child P
INNER JOIN
(SELECT CHILD,MAX(EFF_DT) AS MAX_DATE
FROM parent_child
GROUP BY CHILD) C
ON P.CHILD=C.CHILD AND P.EFF_DT=C.MAX_DATE
ORDER BY P.PARENT
But I end up getting both the values of parent 37091.
Appreciate any help on this.

Changes required
change inner query to group on parent instead of child
change join condition to match on parent
query
SELECT DISTINCT P.PARENT,P.CHILD,C.MAX_DATE
FROM parent_child P
INNER JOIN
(SELECT PARENT,MAX(EFF_DT) AS MAX_DATE
FROM parent_child
GROUP BY PARENT) C
ON P.PARENT=C.PARENT
AND P.EFF_DT=C.MAX_DATE
ORDER BY P.PARENT
;
output
PARENT CHILD MAX_DATE
22716 2528 March, 08 2011 00:00:00
22716 5696 March, 08 2011 00:00:00
22716 3698 March, 08 2011 00:00:00
37091 6931 September, 17 2014 00:00:00
sqlfiddle

Related

How to join on condition count

I have two tables. First one has id with versions:
Fid start_dt end_dt text
1 2021-09-01 00:00:00 2021-10-09 23:59:59 first_row
1 2021-10-10 00:00:00 2999-12-12 23:59:59 second_row
2 2021-10-02 00:00:00 2999-12-12 23:59:59 third_row
3 2021-09-05 00:00:00 2021-09-06 23:59:59 fourth_row
3 2021-09-07 00:00:00 2999-12-12 23:59:59 fifth_row
And second table with calls:
id dt Fid
1 2021-09-01 05:00:00 1
2 2021-10-01 18:00:00 2
3 2021-10-11 05:00:00 1
4 2021-10-01 16:50:00 2
The desired result is
id text
1 first_row
2 third_row
3 second_row
4 third_row
I want this script for rows in t1 which have several versions
select id,text
from t2
left join t1 on t1.Fid = t2.Fid and t2.dt between t1.start_dt and t1.end_dt
And this script for rows in t1 which have one version
select id,text
from t2
left join t1 on t1.Fid = t2.Fid and t2.dt between t1.start_dt and t1.end_dt
How to do this? I thought about setting to the row flag depending on the number of versions, but maybe there is an easier way?
You can do (Fiddle)
with x as (select fid,count(*) as qty from tb1 group by fid)
select tb2.id, tb1.text
from tb2 join tb1 on tb2.fid = tb1.fid and tb2.dt between tb1.start_dt and end_dt
join x on tb2.fid = x.fid and x.qty > 1
union
select tb2.id, tb1.text
from tb2 join tb1 on tb2.fid = tb1.fid
join x on tb2.fid = x.fid and x.qty = 1
order by id;

Group query results by month and year in postgresql with emply month sum

Based on this answer by Burak Arslan
SELECT date_trunc('month', txn_date) AS txn_month, sum(amount) as monthly_sum
FROM yourtable
GROUP BY txn_month
Is there a way to get months that have no results to show in the query?
So let's say I have :
id transDate Product Qty
1234 04/12/2019 ABCD 2
1245 04/05/2019 ABCD 1
1231 02/07/2019 ABCD 6
I also need to the the third Month returns with a 0 value
MonthYear totalQty
02/2019 6
03/2019 0
04/2019 3
Thanks,
---- UPDATE ---
Here is the final query that that gets last 24 months from the current date. with year and month ready for any charts.
Thanks to #a_horse_with_no_name
SELECT
--ONLY USE THE NEXT LINE IF YOU NEED TO HAVE THE ID IN YOUR RESULT
CASE WHEN t."ItemId" IS NULL THEN 10607 ELSE t."ItemId" END AS "ItemId",
TO_CHAR(y."transactionDate", 'yyyy-mm-dd') AS txn_month,
TO_CHAR(y."transactionDate", 'yyyy') AS "Year",
TO_CHAR(y."transactionDate", 'Mon') AS "Month",
-coalesce(SUM(t."transactionQty"),0) AS "TotalSold"
FROM generate_series(
TO_CHAR(CURRENT_DATE - INTERVAL '24 month', 'yyyy-mm-01')::date ,
TO_CHAR(CURRENT_DATE, 'yyyy-mm-01')::date,
INTERVAL '1 month') as y("transactionDate")
LEFT JOIN "ItemTransactions" AS t
ON date_trunc('month', t."transactionDate") = y."transactionDate"
AND t."ItemTransactionTypeId" = 1
AND t."ItemId" = 10607
GROUP BY txn_month, "Year", "Month", t."ItemId"
ORDER BY txn_month ASC;
EXEMPLE OUTPUT
ItemId txn_month Year Month TotalSold
10607 2018-03-01 2018 Mar 2
10607 2018-04-01 2018 Apr 0
10607 2018-05-01 2018 May 8
10607 2018-06-01 2018 Jun 12
10607 2018-07-01 2018 Jul 6
10607 2018-08-01 2018 Aug 4
10607 2018-09-01 2018 Sep 6
10607 2018-10-01 2018 Oct 8
10607 2018-11-01 2018 Nov 4
10607 2018-12-01 2018 Dec 0
10607 2019-01-01 2019 Jan 2
10607 2019-02-01 2019 Feb 3
10607 2019-03-01 2019 Mar 4
10607 2019-04-01 2019 Apr 1
10607 2019-05-01 2019 May 4
10607 2019-06-01 2019 Jun 3
10607 2019-07-01 2019 Jul 5
10607 2019-08-01 2019 Aug 6
10607 2019-09-01 2019 Sep 6
10607 2019-10-01 2019 Oct 6
10607 2019-11-01 2019 Nov 3
10607 2019-12-01 2019 Dec 0
10607 2020-01-01 2020 Jan 4
10607 2020-02-01 2020 Feb 2
10607 2020-03-01 2020 Mar 0
Left join to a list of months:
SELECT t.txn_month,
coalesce(sum(yt.amount),0) as monthly_sum
FROM generate_series(date '2019-02-01', date '2019-04-01', interval '1 month') as t(txn_month)
left join yourtable yt on date_trunc('month', yt.transdate) = t.txn_month
GROUP BY t.txn_month
Online example
In your actual query you need to move the conditions from the WHERE clause to the JOIN condition. Putting them into the WHERE clause turns the outer join back into an inner join:
SELECT t."ItemId",
y."transactionDate" AS txn_month,
-coalesce(SUM(t."transactionQty"),0) AS "TotalSold"
FROM generate_series(date '2018-01-01', date '2020-04-01', INTERVAL '1 month') as y("transactionDate")
LEFT JOIN "ItemTransactions" AS t
ON date_trunc('month', t."transactionDate") = y."transactionDate"
AND t."ItemTransactionTypeId" = 1
AND t."ItemId" = 10606
-- this WHERE clause isn't really needed because of the date values provided to generate_series()
WHERE AND y."transactionDate" >= NOW() - INTERVAL '2 year'
GROUP BY txn_month, t."ItemId"
ORDER BY txn_month DESC;

How to sort attendance date along with the month?

Attendance is sorting according to date, that is fine, but I want to sort date along with the month name January should come at the bottom, and December at the top.
Table
Attendance Date
---------------
26 Feb 2018
19 Dec 2018
18 Dec 2018
14 Dec 2018
12 June 2018
7 Dec 2018
5 Feb 2018
Query
select distinct
(select ARRAY_TO_STRING(ARRAY_AGG(ARRAY[to_char(t1.l_time,'HH12:mi AM')]::text), ',')
from
(select (al1.create_time AT TIME ZONE 'UTC+5:30')::time as l_time
from users.access_log as al1
where al1.user_id = al.user_id
and al1.login_status = 1
and al1.create_time::date = al.create_time::date
order by al1.create_time::time ASC
) as t1
) as login_time,
(select ARRAY_TO_STRING(ARRAY_AGG(ARRAY[to_char(t2.o_time,'HH:mi AM')]::text), ',')
from
(select (al2.create_time AT TIME ZONE 'UTC+5:30')::time as o_time
from users.access_log as al2
where al2.user_id = al.user_id
and al2.login_status = 0
and al2.create_time::date = al.create_time::date
order by al2.create_time::time ASC
) as t2
) as logout_time,
al.create_time::date
from users.access_log as al
where al.user_id = ?;
Attendance is sorting according to date, that is fine, but I want to sort date along with the month name January should come at the bottom, and December at the top.

Select first non-null value for multiple columns and different rows in PostgreSQL

I'm trying to create a view based off a table. I want to get a set of rows where there is an existing tax_id_no, with each row having the most recent information. So I'm ordering by timestamps descending. However, each tax_id_no can have multiple rows, and not every row will have all the information. So I want to get the first valid piece of information for each column. Right now I've got this:
SELECT * FROM
(
SELECT DISTINCT ON (store_id, tax_id_no)
event_id,
event_tstamp,
owner_id,
store_id,
tax_id_no,
first_value(year_built) OVER (ORDER BY year_built IS NULL, event_tstamp) AS year_built, --New
first_value(roof_replaced_year) OVER (ORDER BY roof_replaced_year IS NULL, event_tstamp) AS roof_replaced_year, --New
first_value(number_of_rooms) OVER (ORDER BY number_of_rooms IS NULL, event_tstamp) AS number_of_rooms, --New
FROM MySchema.Event
WHERE tax_id_no IS NOT NULL AND tax_id_no != ''
order by store_id, tax_id_no, event_tstamp DESC
) t1
WHERE owner_id IS NOT NULL OR owner_id != '';
This is getting the same first valid information for every row though. So instead of getting results like this, which is what I want:
event_id event_tstamp owner_id store_id tax_id_no year_built roof_replaced_year number_of_rooms
04 2016-05-12 123 02 12345 1996 2009 6
05 2017-02-02 245 02 23456 1970 1999 8
08 2017-03-03 578 03 34567 2002 2016 10
I'm getting this, which all the rows looking the same in the first_value columns:
event_id event_tstamp owner_id store_id tax_id_no year_built roof_replaced_year number_of_rooms
04 2016-05-12 123 02 12345 1996 2009 6
05 2017-02-02 245 02 23456 1996 2009 6
08 2017-03-03 578 03 34567 1996 2009 6
Is it possible to select a different first_value for each row? I was thinking I could do some kind of a join across multiple selects from the same table, but I'm not sure that would actually give me unique values for each row instead of just having the same problem again. There's also the length of time for such queries to consider, which so far have been prohibitively expensive.
You can use a partition in your window functions to group the rows before applying the function. That will generate a distinct result for each partition.
For example:
first_value(number_of_rooms) OVER (
PARTION BY tax_id_no
ORDER BY number_of_rooms IS NULL, event_tstamp
) AS number_of_rooms,

PostgreSQL - GROUP subsequent rows

I have a table which contains some records ordered by date.
And I want to get start and end dates for each subsequent group (grouped by some criteria e.g.position).
Example:
create table tbl (id int, date timestamp without time zone,
position int);
insert into tbl values
( 1 , '2013-12-01', 1),
( 2 , '2013-12-02', 2),
( 3 , '2013-12-03', 2),
( 4 , '2013-12-04', 2),
( 5 , '2013-12-05', 3),
( 6 , '2013-12-06', 3),
( 7 , '2013-12-07', 2),
( 8 , '2013-12-08', 2)
Of course if I simply group by position I will get wrong result as positions could be the same for different groups:
SELECT POSITION, min(date) MIN, max(date) MAX
FROM tbl GROUP BY POSITION
I will get:
POSITION MIN MAX
1 December, 01 2013 00:00:00+0000 December, 01 2013 00:00:00+0000
3 December, 05 2013 00:00:00+0000 December, 06 2013 00:00:00+0000
2 December, 02 2013 00:00:00+0000 December, 08 2013 00:00:00+0000
But I want:
POSITION MIN MAX
1 December, 01 2013 00:00:00+0000 December, 01 2013 00:00:00+0000
2 December, 02 2013 00:00:00+0000 December, 04 2013 00:00:00+0000
3 December, 05 2013 00:00:00+0000 December, 06 2013 00:00:00+0000
2 December, 07 2013 00:00:00+0000 December, 08 2013 00:00:00+0000
I found a solution for MySql which uses variables and I could port it but I believe PostgreSQL can do it in some smarter way using its advanced features like window functions.
I'm using PostgreSQL 9.2
There is probably more elegant solution but try this:
WITH tmp_tbl AS (
SELECT *,
CASE WHEN lag(position,1) OVER(ORDER BY id)=position
THEN position
ELSE ROW_NUMBER() OVER(ORDER BY id)
END AS grouping_col
FROM tbl
)
, tmp_tbl2 AS(
SELECT position,date,
CASE WHEN lag(position,1)OVER(ORDER BY id)=position
THEN lag(grouping_col,1) OVER(ORDER BY id)
ELSE ROW_NUMBER() OVER(ORDER BY id)
END AS grouping_col
FROM tmp_tbl
)
SELECT POSITION, min(date) MIN, max(date) MAX
FROM tmp_tbl2 GROUP BY grouping_col,position
There are some complete answers on Stackoverflow for that, so I'll not repeat them in detail, but the principle of it is to group the records according to the difference between:
The row number when ordered by the date (via a window function)
The difference between the dates and a static date of reference.
So you have a series such as:
rownum datediff diff
1 1 0 ^
2 2 0 | first group
3 3 0 v
4 5 1 ^
5 6 1 | second group
6 7 1 v
7 9 2 ^
8 10 2 v third group