lag function over multiple rows in Postgresql

lag function over multiple rows in Postgresql - postgresql

I have a table like this
b.b_id fp.fp_id b.name
1 10 Dan
1
1
1
2 15 Michelle
2
3 20 Steve
3
3
Im trying to get this output
b.b_id fp.fp_id b.name
1 10 Dan
1 Dan
1 Dan
1 Dan
2 15 Michelle
2 Michelle
3 20 Steve
3 Steve
3 Steve
My idea was using the lag function but with this code i am only able to fill 1 row below.
select b.b_id,fp.fp_id,
case
when fp.fp_id is null then lag(b.name,1) over (partition by b.b_id order by b.b_id,fp.fp_id)
else b.name
end as name
from b
left join fp on fp.id = b.fp_id
Output at the moment
b.b_id fp.fp_id b.name
1 10 Dan
1 Dan
1
1
2 15 Michelle
2 Michelle
3 20 Steve
3 Steve
3
Is there some easy way to solve this?

CREATE temp TABLE test101 (
id bigint,
fp_id bigint,
name text
);
INSERT INTO test101
VALUES (1, 10, 'dan'),
(1, NULL, NULL),
(1, NULL, NULL),
(1, NULL, NULL),
(2, 15, 'Michelle'),
(2, NULL, NULL),
(3, 20, 'Steve'),
(3, NULL, NULL),
(3, NULL, NULL);
SELECT
id,
fp_id,
first_value(name) OVER (PARTITION BY id ORDER BY id ASC nulls LAST, fp_id NULLS LAST)
FROM
test101;

Related

Recursive sum for each row

Consider this testing data:
CREATE TABLE IF NOT EXISTS area (
id integer,
parent_id integer,
name text,
population integer
);
INSERT INTO area VALUES
(1, NULL, 'North America', 0),
(2, 1, 'United States', 0),
(3, 1, 'Canada', 39),
(4, 1, 'Mexico', 129),
(5, 2, 'Contiguous States', 331),
(6, 2, 'Non-contiguous States', 2);
id
parent_id
name
population
1
NULL
North America
0
2
1
United States
0
3
1
Canada
39
4
1
Mexico
129
5
2
Contiguous States
331
6
2
Non-contiguous States
2
Note that population (in millions) means here the additional population, excluding area's children.
How do I query the recursive sum for each row? I need to get something like this:
name
sum
North America
501
United States
333
Canada
39
Mexico
129
Contiguous States
331
Non-contiguous States
2

Here is how I solved it:
WITH RECURSIVE parent AS (
SELECT
*,
id AS root -- Emit the root ID
FROM area
-- No WHERE clause, since we treat every row as a possible root
UNION ALL
SELECT
area.*,
parent.root -- Pass the root ID to the children
FROM area
JOIN parent ON parent.id = area.parent_id
)
SELECT
name,
sum
FROM (
SELECT
root,
SUM(population)
FROM parent
GROUP BY root -- Here is where we need the root IDs
) cumulative
JOIN area ON area.id = cumulative.root; -- Use the nice names instead of IDs

[postgresql - generate months from start_date and end_date base on total_x]

I have three columns in postgresql
No
total_car_sales
start_date
end_date
1
5
Jan-01-2022
Aug-03-2022
2
1
April-01-2022
July-03-2022
3
3
March-01-2022
May-03-2022
4
7
Jan-01-2022
July-03-2022
5
56
April-01-2022
April-25-2022
6
3
April-01-2022
Aug-04-2022
Here example from start_date No.1: 'Jan-01-2022' to 'August-03-2022': I will count only for August-2022 so the result for August-2022 is 5.
No.6 the result Aug-2022 is 3.
Result I wanna generate total_car_sales for whole table like this:
Months
total_car_sales
Jan-2022
0
Feb-2022
0
March-2022
0
April-2022
56
May-2022
3
June-2022
0
July-2022
8
August-2022
8
I have tried to use trunc_cate() but it is not works for it
Any help for suggestion for me really appreciate it
Thank you

Make a list of months (generate_series) and calculate total sales for each of them.
with the_table (no,total_car_sales,start_date,end_date) as
(
values
(1, 5, 'Jan-01-2022'::date, 'Aug-03-2022'::date),
(2, 1, 'April-01-2022', 'July-03-2022'),
(3, 3, 'March-01-2022', 'May-03-2022'),
(4, 7, 'Jan-01-2022', 'July-03-2022'),
(5, 56, 'April-01-2022', 'April-25-2022'),
(6, 3, 'April-01-2022', 'Aug-04-2022')
)
select
to_char(m, 'mon-yyyy') "month",
coalesce
(
(select sum(total_car_sales) from the_table where m = date_trunc('month', end_date)),
0
) total_car_sales
from generate_series ('2022-01-01', '2022-08-01', interval '1 month') m;

Improve performance on CTE with sub-queries

I have a table with this structure:
WorkerID Value GroupID Sequence Validity
1 '20%' 1 1 2018-01-01
1 '10%' 1 1 2017-06-01
1 'Yes' 1 2 2017-06-01
1 '2018-01-01' 2 1 2017-06-01
1 '17.2' 2 2 2017-06-01
2 '10%' 1 1 2017-06-01
2 'No' 1 2 2017-06-01
2 '2016-03-01' 2 1 2017-06-01
2 '15.9' 2 2 2017-06-01
This structure was created so that the client can create customized data for a worker. For example Group 1 can be something like "Salary" and Sequence is one value that belongs to that Group like "Overtime Compensation". The column Value is a VARCHAR(150) field and the correct validation and conversation is done in another part of the application.
The Validity column exist mainly for historical reasons.
Now I would like to show, for the different workers, the information in a grid where each row should be one worker (displaying the one with the most recent Validity):
Worker 1_1 1_2 2_1 2_2
1 20% Yes 2018-01-01 17.2
2 10% No 2016-03-01 15.9
To accomplish this I created a CTE that looks like this:
WITH CTE_worker_grid
AS
(
SELECT
worker,
/* 1 */
(
SELECT top 1 w.Value
FROM worker_values AS w
WHERE w.GroupID = 1
AND w.Sequence = 1
ORDER BY w.Validity DESC
) AS 1_1,
(
SELECT top 1 w.Value
FROM worker_values AS w
WHERE w.GroupID = 1
AND w.Sequence = 2
ORDER BY w.Validity DESC
) AS 1_2,
/* 2 */
(
SELECT top 1 w.Value
FROM worker_values AS w
WHERE w.GroupID = 2
AND w.Sequence = 1
ORDER BY w.Validity DESC
) AS 2_1,
(
SELECT top 1 w.Value
FROM worker_values AS w
WHERE w.GroupID = 2
AND w.Sequence = 2
ORDER BY w.Validity DESC
) AS 2_2
)
GO
This produces the correct result but it's very slow as it creates this grid for over 18'000 worker with almost 30 Groups and up to 20 Sequences in each Group.
How could one speed up the process of a CTE of this magnitude? Should CTE even be used? Can the sub-queries be changed or re-factored out to speed up the execution?

Use a PIVOT!
+----------+---------+---------+------------+---------+
| WorkerId | 001_001 | 001_002 | 002_001 | 002_002 |
+----------+---------+---------+------------+---------+
| 1 | 20% | Yes | 2018-01-01 | 17.2 |
| 2 | 10% | No | 2016-03-01 | 15.9 |
+----------+---------+---------+------------+---------+
SQL Fiddle: http://sqlfiddle.com/#!18/6e768/1
CREATE TABLE WorkerAttributes
(
WorkerID INT NOT NULL
, [Value] VARCHAR(50) NOT NULL
, GroupID INT NOT NULL
, [Sequence] INT NOT NULL
, Validity DATE NOT NULL
)
INSERT INTO WorkerAttributes
(WorkerID, Value, GroupID, Sequence, Validity)
VALUES
(1, '20%', 1, 1, '2018-01-01')
, (1, '10%', 1, 1, '2017-06-01')
, (1, 'Yes', 1, 2, '2017-06-01')
, (1, '2018-01-01', 2, 1, '2017-06-01')
, (1, '17.2', 2, 2, '2017-06-01')
, (2, '10%', 1, 1, '2017-06-01')
, (2, 'No', 1, 2, '2017-06-01')
, (2, '2016-03-01', 2, 1, '2017-06-01')
, (2, '15.9', 2, 2, '2017-06-01')
;WITH CTE_WA_RANK
AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY WorkerID, GroupID, [Sequence] ORDER BY Validity DESC) AS VersionNumber
, WA.WorkerID
, WA.GroupID
, WA.[Sequence]
, WA.[Value]
FROM
WorkerAttributes AS WA
),
CTE_WA
AS
(
SELECT
WA_RANK.WorkerID
, RIGHT('000' + CAST(WA_RANK.GroupID AS VARCHAR(3)), 3)
+ '_'
+ RIGHT('000' + CAST(WA_RANK.[Sequence] AS VARCHAR(3)), 3) AS SMART_KEY
, WA_RANK.[Value]
FROM
CTE_WA_RANK AS WA_RANK
WHERE
WA_RANK.VersionNumber = 1
)
SELECT
WorkerId
, [001_001] AS [001_001]
, [001_002] AS [001_002]
, [002_001] AS [002_001]
, [002_002] AS [002_002]
FROM
(
SELECT
CTE_WA.WorkerId
, CTE_WA.SMART_KEY
, CTE_WA.[Value]
FROM
CTE_WA
) AS WA
PIVOT
(
MAX([Value])
FOR
SMART_KEY IN
(
[001_001]
, [001_002]
, [002_001]
, [002_002]
)
) AS PVT

Selecting specific row from a sub query depending on lowest priority

I have a table with Clients and their Insurance Providers. There is a column called Priority that ranges from 1-8. I want to be able to select the lowest priority insurance into my 'master table' I have a query that provides Fees, Dates, Doctors etc. and I need a subquery that I can join to the Main query on Client_ID The priority doesn't always start with 1. The Insurance Table is the Many side of the relationship
Row# Client_id Insurance_id Priority active?
1 333 A 1 Y
2 333 B 2 Y
3 333 C 1 N
4 222 D 6 Y
5 222 A 8 Y
6 444 C 4 Y
7 444 A 5 Y
8 444 B 6 Y
Answer should be
Client_id Insurance_id Priority
333 A 1
222 D 6
444 C 4

I was able to achieve the results I think you're asking for pretty easily utilizing SQL's ROW_NUMBER() function:
declare #tbl table
(
Id int identity,
ClientId int,
InsuranceId char(1),
[Priority] int,
Active bit
)
insert into #tbl (ClientId, InsuranceId, [Priority], Active)
values (1, 'A', 1, 1),
(1, 'A', 2, 1),
(1, 'B', 3, 1),
(1, 'B', 4, 1),
(1, 'C', 1, 1),
(1, 'C', 2, 0),
(2, 'C', 1, 1),
(2, 'C', 2, 1)
select Id, ClientId, InsuranceId, [Priority]
from
(
select Id,
ClientId,
InsuranceId,
[Priority],
ROW_NUMBER() OVER (PARTITION BY ClientId, InsuranceId ORDER BY [Priority] desc) as RowNum
from #tbl
where Active = 1
) x
where x.RowNum = 1
Results:
(8 row(s) affected)
Id ClientId InsuranceId Priority
----------- ----------- ----------- -----------
2 1 A 2
4 1 B 4
5 1 C 1
8 2 C 2
(4 row(s) affected)

TSQL Select Max

Userid FirstName LastName UserUpdate
1 Dan Kramer 1/1/2005
1 Dan Kramer 1/1/2007
1 Dan Kramer 1/1/2009
2 Pamella Slattery 1/1/2005
2 Pam Slattery 1/1/2006
2 Pam Slattery 1/1/2008
3 Samamantha Cohen 1/1/2008
3 Sam Cohen 1/1/2009
I need to extract the latest updated for all these users, basically here's what I'm looking for:
Userid FirstName LastName UserUpdate
1 Dan Kramer 1/1/2009
2 Pam Slattery 1/1/2008
3 Sam Cohen 1/1/2009
Now when I run the following:
SELECT Userid, FirstName, LastName, Max(UserUpdate) AS MaxDate
FROM Table
GROUP BY Userid, FirstName, LastName
I still get duplicates, something like this:
Userid FirstName LastName UserUpdate
1 Dan Kramer 1/1/2009
2 Pamella Slattery 1/1/2005
2 Pam Slattery 1/1/2008
3 Samamantha Cohen 1/1/2008
3 Sam Cohen 1/1/2009

try:
declare #Table table (userid int,firstname varchar(10),lastname varchar(20), userupdate datetime)
INSERT #Table VALUES (1, 'Dan' ,'Kramer' ,'1/1/2005')
INSERT #Table VALUES (1, 'Dan' ,'Kramer' ,'1/1/2007')
INSERT #Table VALUES (1, 'Dan' ,'Kramer' ,'1/1/2009')
INSERT #Table VALUES (2, 'Pamella' ,'Slattery' ,'1/1/2005')
INSERT #Table VALUES (2, 'Pam' ,'Slattery' ,'1/1/2006')
INSERT #Table VALUES (2, 'Pam' ,'Slattery' ,'1/1/2008')
INSERT #Table VALUES (3, 'Samamantha' ,'Cohen' ,'1/1/2008')
INSERT #Table VALUES (3, 'Sam' ,'Cohen' ,'1/1/2009')
SELECT
dt.Userid,dt.MaxDate
,MIN(a.FirstName) AS FirstName, MIN(a.LastName) AS LastName
FROM (SELECT
Userid, Max(UserUpdate) AS MaxDate
FROM #Table GROUP BY Userid
) dt
INNER JOIN #Table a ON dt.Userid=a.Userid and dt.MaxDate =a.UserUpdate
GROUP BY dt.Userid,dt.MaxDate
OUTPUT:
Userid MaxDate FirstName LastName
----------- ----------------------- ---------- --------------------
1 2009-01-01 00:00:00.000 Dan Kramer
2 2008-01-01 00:00:00.000 Pam Slattery
3 2009-01-01 00:00:00.000 Sam Cohen

You aren't getting duplicates. 'Pam' is not equal to 'Pamella' from the perspective of the database; the fact that one is a colloquial shortening of the other doesn't mean anything to the database engine. There really is no reliable, universal way to do this (since there are names that have multiple abbreviations, like "Rob" or "Bob" for "Robert", as well as abbreviations that can suit multiple names like "Kel" for "Kelly" or "Kelsie", let alone the fact that names can have alternate spellings).
For your simple example, you could simply select and group by SUBSTRING(FirstName, 1, 3) instead of FirstName, but that's just a coincidence based upon your sample data; other name abbreviations would not fit this pattern.

Or use a subquery...
SELECT
a.userID,
a.FirstName,
a.LastName,
b.MaxDate
FROM
myTable a
INNER JOIN
( SELECT
UserID,
Max(ISNULL(UserUpdate,GETDATE())) as MaxDate
FROM
myTable
GROUP BY
UserID
) b
ON
a.UserID = b.UserID
AND a.UserUpdate = b.MaxDate
The subquery (named "b") returns the following:
Userid UserUpdate
1 1/1/2009
2 1/1/2008
3 1/1/2009
The INNER JOIN between the subquery and the original table causes the original table to be filtered for matching records only -- i.e., only records with a UserID/UserUpdate pair that matches a UserID/MaxDate pair from the subquery will be returned, giving you the unduplicated result set you were looking for:
Userid FirstName LastName UserUpdate
1 Dan Kramer 1/1/2009
2 Pam Slattery 1/1/2008
3 Sam Cohen 1/1/2009
Of course, this is just a work-around. If you really want to solve the problem for the long-term, you should normalize your original table by splitting it into two.
Table1:
Userid FirstName LastName
1 Dan Kramer
2 Pam Slattery
3 Sam Cohen
Table2:
Userid UserUpdate
1 1/1/2007
2 1/1/2007
3 1/1/2007
1 1/1/2008
2 1/1/2008
3 1/1/2008
1 1/1/2009
2 1/1/2009
3 1/1/2009
This would be a more standard way to store data, and would be much easier to query (without having to resort to a subquery). In that case, the query would look like this:
SELECT
T1.UserID,
T1.FirstName,
T1.LastName,
MAX(ISNULL(T2.UserUpdate,GETDATE()))
FROM
Table1 T1
LEFT JOIN
Table2 T2
ON
T1.UserID = T2.UserID
GROUP BY
T1.UserID,
T1.FirstName,
T1.LastName

Another alternative if you have SQL 2005(I think ?) or later would be to use a Common Table Expression and pull out the user id and max date from the table then join against that to get the matching firstname and lastname on the max date. NOTE - this assumes that userid + date would always be unique, the query will break if you get 2 rows with same userid and date. As others have already pointed out this is pretty awful database design - but sometimes thats life, the problem must still be solved. e.g.
declare #Table table (userid int,firstname varchar(10),lastname varchar(20), userupdate datetime)
INSERT #Table VALUES (1, 'Dan' ,'Kramer' ,'1/1/2005')
INSERT #Table VALUES (1, 'Dan' ,'Kramer' ,'1/1/2007')
INSERT #Table VALUES (1, 'Dan' ,'Kramer' ,'1/1/2009')
INSERT #Table VALUES (2, 'Pamella' ,'Slattery' ,'1/1/2005')
INSERT #Table VALUES (2, 'Pam' ,'Slattery' ,'1/1/2006')
INSERT #Table VALUES (2, 'Pam' ,'Slattery' ,'1/1/2008')
INSERT #Table VALUES (3, 'Samamantha' ,'Cohen' ,'1/1/2008')
INSERT #Table VALUES (3, 'Sam' ,'Cohen' ,'1/1/2009');
with cte ( userid , maxdt ) as
(select userid,
max(userupdate)
from #table
group by userid)
SELECT dt.Userid,
dt.firstname,
dt.lastname,
cte.maxdt
FROM
#Table dt
join cte on cte.userid = dt.userid and dt.userupdate = cte.maxdt
Output
Userid firstname lastname maxdt
----------- ---------- -------------------- -----------------------
3 Sam Cohen 2009-01-01 00:00:00.000
2 Pam Slattery 2008-01-01 00:00:00.000
1 Dan Kramer 2009-01-01 00:00:00.000