Flattening rows into a single row based on rules? - tsql

I have a result set that looks something like this:
customer
flag
date_from
date_to
ABC123
Y
22/01/2020
21/02/2021
ABC123
N
22/02/2021
31/03/2021
ABC123
Y
01/04/2021
30/09/2021
ABC123
Y
01/10/2021
31/03/2022
ABC123
Y
01/04/2022
30/09/2022
ABC123
Y
01/10/2022
01/01/9999
I want to 'flatten' it so that it outputs this:
customer
flag
date_from
date_to
ABC123
Y
22/01/2020
21/02/2021
ABC123
N
22/02/2021
31/03/2021
ABC123
Y
01/04/2021
01/01/9999
Is this possible?

This is a gaps and islands problem. One approach uses the difference in row numbers method:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY customer ORDER BY date_from) rn1,
ROW_NUMBER() OVER (PARTITION BY customer, flag ORDER BY date_from) rn2
FROM yourTable
)
SELECT
customer,
flag,
MIN(date_from) AS date_from,
MAX(date_to) AS date_to
FROM cte
GROUP BY
customer,
flag,
rn1 - rn2
ORDER BY
MIN(date_from);

Related

Remove duplicates based on condition and keep oldest element

Currently, the table is ordered in ascending order by row_number. I need help removing duplicates based on 2 conditions.
If there is a stage, that is online then I want to keep that row, doesn't matter which one, there can be multiple.
If there isn't a row with online for that org_id, then I keep row_number = 1 which would be the oldest element.
sales_id
org_id
stage
row_number
ccc_123
ccc
off-line
1
ccc_123
ccc
off-line
2
ccc_123
ccc
online
3
abc_123
abc
off-line
1
abc_123
abc
power-off
2
zzz_123
zzz
power-off
1
so the table should look like this after:
sales_id
org_id
stage
ccc_123
ccc
online
abc_123
abc
off-line
zzz_123
zzz
power-off
Looks like this, stackoverflow not working well with second table for some reason
I would use a combination of a CASE statement to modify the rownumber of records with stage='online' and then use ROW_NUMBER to allow me to filter for the lowest value in a group.
http://sqlfiddle.com/#!17/1421b/5
create table sales_stage (
sales_id varchar,
org_id varchar,
stage varchar,
row_num int);
insert into sales_stage (sales_id, org_id, stage, row_num) values
('ccc_123', 'ccc', 'off-line', 1),
('ccc_123', 'ccc', 'off-line', 2),
('ccc_123', 'ccc', 'online', 3),
('abc_123', 'abc', 'off-line', 1),
('abc_123', 'abc', 'power-off', 2),
('zzz_123', 'zzz', 'power-off', 1);
SELECT
sales_id, org_id, stage
FROM
(
SELECT
sales_id, org_id, stage,
ROW_NUMBER() OVER(PARTITION BY sales_id, org_id ORDER BY row_num) as rn
FROM (
SELECT sales_id, org_id, stage,
CASE WHEN stage='online' THEN -999 ELSE row_num END as row_num
FROM sales_stage
) x
) y
WHERE rn = 1

select first order for each customer from two tables

Hi guys I have two tables dbo.Sales (customer_id, order_date, product_id) and dbo.Menu (Product_id, product_name, price). The question is
What was the first item from the menu purchased by each customer?
My solution is
select A.customer_id,m.product_id, m.product_name
from dbo.menu m
cross apply
(select top 1 * from dbo.sales s
where s.product_id=m.product_id
group by s.customer_id,s.order_date, s.product_id
order by s.order_date) A
customer_id product_id product_name
A 1 sushi
A 2 curry
C 3 ramen
Missing customer is B. Instead of B it gives me the second first order by A.
I need for each customer
Murat
You could use a ROW_NUMBER() window function to get the earliest product_id per customer and then join to the Menu table to get your product details.
Edit: Updated ORDER to ASC.
;with cte
as (
select customer_id, product_id, row_number() over (partition by customer_id order by order_date acs) RN
from dbo.Sales)
select c.customer_id, c.product_id, m.product_name
from cte c
join dbo.menu m on c.product_id=m.product_id
where RN = 1
SELECT distinct s.customer_id,
FIRST_VALUE(m.product_name) OVER (partition by s.customer_id order by order_date )
as FirstItem_Customer
FROM [dbo].[sales] S
join [dbo].[menu] M on M.product_id=s.product_id

postgres - get top category purchased by customer

I have a denormalized table with the columns:
buyer_id
order_id
item_id
item_price
item_category
I would like to return something that returns 1 row per buyer_id
buyer_id, sum(item_price), item_category
-- but ONLY for the category with the highest rank of sales along that specific buyer_id.
I can't get row_number() or partition to work because I need to order by the sum of item_price relative to item_category relative to buyer. Am I overlooking anything obvious?
You need a few layers of fudging here:
SELECT buyer_id, item_sum, item_category
FROM (
SELECT buyer_id,
rank() OVER (PARTITION BY buyer_id ORDER BY item_sum DESC) AS rnk,
item_sum, item_category
FROM (
SELECT buyer_id, sum(item_price) AS item_sum, item_category
FROM my_table
GROUP BY 1, 3) AS sub2) AS sub
WHERE rnk = 1;
In sub2 you calculate the sum of 'item_price' for each 'item_category' for each 'buyer_id'. In sub you rank these with a window function by 'buyer_id', ordering by 'item_sum' in descending order (so the highest 'item_sum' comes first). In the main query you select those rows where rnk = 1.

Make a column values header for rest of columns using TSQL

I have following table
ID | Group | Type | Product
1 Dairy Milk Fresh Milk
2 Dairy Butter Butter Cream
3 Beverage Coke Coca cola
4 Beverage Diet Dew
5 Beverage Juice Fresh Juice
I need following output/query result:
ID | Group | Type | Product
1 Dairy
1 Milk Fresh Milk
2 Butter Butter Cream
2 Beverage
1 Coke Coca cola
2 Diet Dew
3 Juice Fresh Juice
For above sample a hard coded script can do the job but I look for a dynamic script for any number of groups. I do not have any idea how it can be done so, I do not have a sample query yet. I need ideas, examples that at least give me an idea. PIVOT looks a close option but does not looks to be fully fit for this case.
Here's a possible way. It basically unions the "Group-Headers" and the "Group-Items". The difficulty was to order them correctly.
WITH CTE AS
(
SELECT ID,[Group],Type,Product,
ROW_NUMBER() OVER (PARTITION BY [Group] Order By ID)AS RN
FROM Drink
)
SELECT ID,[Group],Type,Product
FROM(
SELECT RN AS ID,[Group],[Id]AS OriginalId,'' As Type,'' As Product, 0 AS RN, 'Group' As RowType
FROM CTE WHERE RN = 1
UNION ALL
SELECT RN AS ID,'' AS [Group],[Id]AS OriginalId,Type,Product, RN, 'Item' As RowType
FROM CTE
)X
ORDER BY OriginalId ASC
, CASE WHEN RowType='Group' THEN 0 ELSE 1 END ASC
, RN ASC
Here's a demo-fiddle: http://sqlfiddle.com/#!6/ed6ca/2/0
A slightly simplified approach:
With Groups As
(
Select Distinct Min(Id) As Id, [Group], '' As [Type], '' As Product
From dbo.Source
Group By [Group]
)
Select Coalesce(Cast(Z.Id As varchar(10)),'') As Id
, Coalesce(Z.[Group],'') As [Group]
, Z.[Type], Z.Product
From (
Select Id As Sort, Id, [Group], [Type], Product
From Groups
Union All
Select G.Id, Null, Null, S.[Type], S.Product
From dbo.Source As S
Join Groups As G
On G.[Group] = S.[Group]
) As Z
Order By Sort
It should be noted that the use of Coalesce is purely for aesthetic reasons. You could simply return null in these cases.
SQL Fiddle
And an approach with ROW_NUMBER:
IF OBJECT_ID('dbo.grouprows') IS NOT NULL DROP TABLE dbo.grouprows;
CREATE TABLE dbo.grouprows(
ID INT,
Grp NVARCHAR(MAX),
Type NVARCHAR(MAX),
Product NVARCHAR(MAX)
);
INSERT INTO dbo.grouprows VALUES
(1,'Dairy','Milk','Fresh Milk'),
(2,'Dairy','Butter','Butter Cream'),
(3,'Beverage','Coke','Coca cola'),
(4,'Beverage','Diet','Dew'),
(5,'Beverage','Juice','Fresh Juice');
SELECT
CASE WHEN gg = 0 THEN dr1 END GrpId,
CASE WHEN gg = 1 THEN rn1 END TypeId,
ISNULL(Grp,'')Grp,
CASE WHEN gg = 1 THEN Type ELSE '' END Type,
CASE WHEN gg = 1 THEN Product ELSE '' END Product
FROM(
SELECT *,
DENSE_RANK()OVER(ORDER BY Grp DESC) dr1
FROM(
SELECT *,
ROW_NUMBER()OVER(PARTITION BY Grp ORDER BY type,gg) rn1,
ROW_NUMBER()OVER(ORDER BY type,gg) rn0
FROM(
SELECT Grp,Type,Product, GROUPING(Grp) gg, GROUPING(type) tg FROM dbo.grouprows
GROUP BY Product, Type, Grp
WITH ROLLUP
)X1
WHERE tg = 0
)X2
WHERE gg=1 OR rn1 = 1
)X3
ORDER BY rn0

Grouping SQL results by continous time intervals (oracle sql)

I have following data in the table as below and I am looking for a way to group the continuous time intervals for each id to return:
CREATE TABLE DUMMY
(
ID VARCHAR2(10 BYTE),
TIME_STAMP VARCHAR2(8 BYTE),
NAME VARCHAR2(255 BYTE)
);
SELECT ID, min(TIME_STAMP) "startDate", max(TIME_STAMP) "endDate", NAME
GROUP BY ID , NAME
something like
100 20011128 20011203 David
100 20011204 20011207 Unknown
100 20011208 20011215 David
100 20011216 20011220 Sara
and so on ...
ps. I have a sample script, but i don't know how to attach my file.
Hi every one here is more input:
There is only one record with time_stamp for a specific ID.
Users can be different, for example for day 1 David, day 2 unknown, day 3 David and so on.
So there is one row for every day of year for each ID but with different users.
Now, i want to see the break point, differences base on time_stamp intervals from day one
until last day for a specific ID in day order from begin day until last day.
Query Result should be :
ID NAME MIN_DATE MAX_DATE
100 David 20011128 20050407
100 Sara 20050408 20050417
100 David 20050418 20080416
100 Unknown 20080417 20080507
100 David 20080508 20080508
100 Unknown 20080509 20080607
100 David 20080608 20080608
100 Unknown 20080609 20080921
100 David 20080922 20080922
100 Unknown 20080923 20081231
100 David 20090101 20090405
thanks
Hi again, many thanks to everyone, i have solved the problem, here is the solution:
select id, min(time_stamp), max(time_stamp), name
from ( select id, time_stamp, name,
max(rn) over (order by time_stamp) grp
from ( select id, time_stamp, name,
case
when lag(name) over (order by time_stamp) <> name or
row_number() over (order by time_stamp) = 1
then row_number() over (order by time_stamp)
end rn
from dummy
)
)
group by id, grp, name
order by 1
Select
ID,
Name,
min(time_stamp) min_date,
max(time_stamp) max_date
from
Dummy
group by
Id,
Name
That should work.
IF you want the date range for each Id, but all the names you can do:
Select
d.Id,
d.Name,
dr.min_date,
dr.max_date
from
Dummy d
JOIN
(Select
Id,
min(time_stamp) min_date,
max(time_stamp) max_date
from
Dummy
group by
Id
) dr
on ( dr.Id = d.Id)