T-SQL Grouping with LESS THAN {date} that breaks off on each occurrence of date - tsql

I am struggling with creating a grouping using LESS THAN that breaks off on each date for the parent row. I have created a contrived example to explain the data and what I would like out as a result:
CREATE TABLE [dbo].[CustomerOrderPoints](
[CustomerID] [int] NOT NULL,
[OrderPoints] [int] NOT NULL,
[OrderPointsExpiry] [date] NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[CustomerOrderPointsUsed](
[CustomerID] [int] NOT NULL,
[OrderPointsUsed] [int] NOT NULL,
[OrderPointsUseDate] [date] NOT NULL
) ON [PRIMARY]
GO
INSERT [dbo].[CustomerOrderPoints] ([CustomerID], [OrderPoints], [OrderPointsExpiry]) VALUES (10, 200, CAST(N'2018-03-18' AS Date))
GO
INSERT [dbo].[CustomerOrderPoints] ([CustomerID], [OrderPoints], [OrderPointsExpiry]) VALUES (10, 100, CAST(N'2018-04-18' AS Date))
GO
INSERT [dbo].[CustomerOrderPoints] ([CustomerID], [OrderPoints], [OrderPointsExpiry]) VALUES (20, 120, CAST(N'2018-05-10' AS Date))
GO
INSERT [dbo].[CustomerOrderPoints] ([CustomerID], [OrderPoints], [OrderPointsExpiry]) VALUES (30, 75, CAST(N'2018-02-10' AS Date))
GO
INSERT [dbo].[CustomerOrderPoints] ([CustomerID], [OrderPoints], [OrderPointsExpiry]) VALUES (30, 60, CAST(N'2018-04-24' AS Date))
GO
INSERT [dbo].[CustomerOrderPoints] ([CustomerID], [OrderPoints], [OrderPointsExpiry]) VALUES (30, 90, CAST(N'2018-06-25' AS Date))
GO
INSERT [dbo].[CustomerOrderPoints] ([CustomerID], [OrderPoints], [OrderPointsExpiry]) VALUES (40, 100, CAST(N'2018-06-13' AS Date))
GO
INSERT [dbo].[CustomerOrderPointsUsed] ([CustomerID], [OrderPointsUsed], [OrderPointsUseDate]) VALUES (10, 15, CAST(N'2018-02-10' AS Date))
GO
INSERT [dbo].[CustomerOrderPointsUsed] ([CustomerID], [OrderPointsUsed], [OrderPointsUseDate]) VALUES (10, 30, CAST(N'2018-02-17' AS Date))
GO
INSERT [dbo].[CustomerOrderPointsUsed] ([CustomerID], [OrderPointsUsed], [OrderPointsUseDate]) VALUES (10, 25, CAST(N'2018-03-16' AS Date))
GO
INSERT [dbo].[CustomerOrderPointsUsed] ([CustomerID], [OrderPointsUsed], [OrderPointsUseDate]) VALUES (10, 45, CAST(N'2018-04-10' AS Date))
GO
INSERT [dbo].[CustomerOrderPointsUsed] ([CustomerID], [OrderPointsUsed], [OrderPointsUseDate]) VALUES (20, 10, CAST(N'2018-02-08' AS Date))
GO
INSERT [dbo].[CustomerOrderPointsUsed] ([CustomerID], [OrderPointsUsed], [OrderPointsUseDate]) VALUES (20, 70, CAST(N'2018-04-29' AS Date))
GO
INSERT [dbo].[CustomerOrderPointsUsed] ([CustomerID], [OrderPointsUsed], [OrderPointsUseDate]) VALUES (20, 25, CAST(N'2018-05-29' AS Date))
GO
INSERT [dbo].[CustomerOrderPointsUsed] ([CustomerID], [OrderPointsUsed], [OrderPointsUseDate]) VALUES (30, 60, CAST(N'2018-02-05' AS Date))
GO
INSERT [dbo].[CustomerOrderPointsUsed] ([CustomerID], [OrderPointsUsed], [OrderPointsUseDate]) VALUES (30, 30, CAST(N'2018-03-13' AS Date))
GO
INSERT [dbo].[CustomerOrderPointsUsed] ([CustomerID], [OrderPointsUsed], [OrderPointsUseDate]) VALUES (40, 120, CAST(N'2018-06-10' AS Date))
Customers gain points, which have an expiry. We have a CustomerOrderPoints table which shows OrderPoints for customers together with the Expiry date for the points. A Customer may have many rows in this table.
We then also have the CustomerOrderPointsUsed table which shows the points that have been used and when they were used by a Customer.
I am trying to get a grouping of Customer data which will show OrderPoints used as a group against each customer but, separated on the ExpiryDate. The picture below shows an example of the Grouped Results that I would like to obtain.
We have bad, but working code that has been developed using a recursive method (RBAR), but it is very slow. I have tried a number of different SET Based grouping approaches, but cannot get the final Less Than grouping which takes into account the previous expiry dates.
This DB is on SQL Server 2008R2. Ideally I am looking for a solution that will work with SQL Server 2008R2, but will welcome options for later versions, as we may need to move this particular DB to solve this problem.
I have tried using a combination of RanK, DenseRank and RowNumber (for later versions) and LAG, but have not been able to get anything working that can be built upon.
Is there a way to use SET based T-SQL to achieve this?

First - this is ignoring the question I raised in the comments above, and just allocates all rows to the expiry date on or after the use by date. You would need to rethink this if you need to split one use among multiple expiry dates
First, allocate an expiry date to each PointsUsed row. This is done by joining to all OrderPoints rows with an expiry date on or after the UseDate, then taking the minimum date.
Then the second query reports all OrderPoints rows, joining to the first query by the allocated expiry date, which has all the data needed.
WITH allocatedPoints as
(
Select U.CustomerID, U.OrderPointsUsed, MIN(P.OrderPointsExpiry) as OrderPointsExpiry
from CustomerOrderPointsUsed U
inner join CustomerOrderPoints P on P.CustomerID = U.CustomerID and P.OrderPointsExpiry >= U.OrderPointsUseDate
GROUP BY U.CustomerID, U.OrderPointsUseDate, U.OrderPointsUsed
)
Select P.CustomerID, P.OrderPoints, P.OrderPointsExpiry,
ISNULL(SUM(AP.OrderPointsUsed), 0) as used,
P.OrderPoints - ISNULL(SUM(AP.OrderPointsUsed), 0) as remaining
from CustomerOrderPoints P
left outer join allocatedPoints AP on AP.CustomerID = P.CustomerID and AP.OrderPointsExpiry = P.OrderPointsExpiry
GROUP BY P.CustomerID, P.OrderPoints, P.OrderPointsExpiry

Related

How to add number on next row based on condition (consecutive hours calculation)

I am trying to calculate consecutive hours based on bottom conditions.
If an employee works continuously with less than 1.5 hours (90 minutes) of interval between each punch in and punch out, those punch hours are added as consecutive hours.
However, if there is more than 90 minute interval between each punch in and out, those punch hours are not added up.
I have bottom illustration in screenshot:
Here is dataset:
select *
into #temp
from
(values
(1, 100001, '2021-12-12 23:31', '2021-12-12 23:59', '2021-12-13 00:00', 1, 0.47, 'solo/add'),
(2, 100001, '2021-12-13 00:00', '2021-12-13 03:07', '2021-12-13 03:37', 30, 3.12, 'solo/add'),
(3, 100001, '2021-12-13 03:37', '2021-12-13 07:07', '2021-12-13 23:17', 970, 3.5, 'no add'),
(4, 100001, '2021-12-13 23:17', '2021-12-13 23:59', NULL, NULL, 0.7, 'solo/add'),
(5, 100003, '2021-12-12 05:50', '2021-12-12 11:00', '2021-12-12 11:30', 30, 5.17, 'solo/add'),
(6, 100003, '2021-12-12 11:30', '2021-12-12 14:25', '2021-12-13 05:51', 926, 2.92, 'no add'),
(7, 100003, '2021-12-13 05:51', '2021-12-13 11:05', '2021-12-13 11:36', 31, 5.23, 'solo/add'),
(8, 100003, '2021-12-13 11:36', '2021-12-13 14:25', NULL, NULL, 2.81, 'solo/add')
)
t1
(id, EmployeeID, punch_start, punch_end, next_punch_start, MinuteDiff, punch_hr, Decide)
The Excel file's screenshot shows the expected output in "ConsecutiveHours" column.
So, on this example, there are two incidents where two punch_hours were added together (illustrated in green and bold):
0.47 + 3.12 = 3.59
5.23 + 2.81 = 8.04
I have two different employees here and id was created (ordered) by EmployeeID and punch_start asc.
How do we go about writing this logic in T-SQL?
You need to group those consecutive rows together. You can use window function LAG() to identify. Once you have that, perform a cumulative sum partition by Employee and the group
with cte as
(
select *,
g = case when Decide
<> lag(Decide, 1, '') over (partition by EmployeeID
order by punch_start)
then 1
else 0
end
from #temp
),
cte2 as
(
select *, grp = sum(g) over (partition by EmployeeID order by punch_start)
from cte
)
select *,
Hours = sum(punch_hr) over (partition by EmployeeID, grp order by punch_start)
from cte2

Concatenate mutliple contiguous rows to single row

i have a huge table with iot-datas from a lot of iot-devices. Every device is sending data one time per minute but only if counter-input got some singals. If not, no data will be sended. So in my database the datas looks like
Today I'm loading all this data in my application and aggregate them by iterating and checking row by row to 3 rows based on contiguous rows. Contiguous rows are all rows where next row is one minute later. It is working but it feels not smart and nice.
Does it make sense to generate this aggregation on sql server - espacialy increase performance?
How would you start?
This is a classic Islands and Gaps problem. I'm still mastering Islands and Gaps so I'd love any feedback on my solution from others in the know (please be gentle). There are at least a couple different ways to solve Islands and Gaps but this is the one that is easiest on my brain. Here's how I got it to work:
DDL to set up data:
IF OBJECT_ID('tempdb..#tmp') IS NOT NULL
DROP TABLE #tmp;
CREATE TABLE #tmp
(IoT_Device INT,
Count INT,
TimeStamp DATETIME);
INSERT INTO #tmp
VALUES
(1, 5, '2021-10-27 14:03'),
(1, 4, '2021-10-27 14:04'),
(1, 7, '2021-10-27 14:05'),
(1, 8, '2021-10-27 14:06'),
(1, 5, '2021-10-27 14:07'),
(1, 4, '2021-10-27 14:08'),
(1, 7, '2021-10-27 14:12'),
(1, 8, '2021-10-27 14:13'),
(1, 5, '2021-10-27 14:14'),
(1, 4, '2021-10-27 14:15'),
(1, 5, '2021-10-27 14:21'),
(1, 4, '2021-10-27 14:22'),
(1, 7, '2021-10-27 14:23');
Islands and Gaps Solution:
;WITH CTE_TIMESTAMP_DATA AS (
SELECT
IoT_Device,
Count,
TimeStamp,
LAG(TimeStamp) OVER
(PARTITION BY IoT_Device ORDER BY TimeStamp) AS previous_timestamp,
LEAD(TimeStamp) OVER
(PARTITION BY IoT_Device ORDER BY TimeStamp) AS next_timestamp,
ROW_NUMBER() OVER
(PARTITION BY IoT_Device ORDER BY TimeStamp) AS island_location
FROM #tmp
)
,CTE_ISLAND_START AS (
SELECT
ROW_NUMBER() OVER (PARTITION BY IoT_Device ORDER BY TimeStamp) AS island_number,
IoT_Device,
TimeStamp AS island_start_timestamp,
island_location AS island_start_location
FROM CTE_TIMESTAMP_DATA
WHERE DATEDIFF(MINUTE, previous_timestamp, TimeStamp) > 1
OR previous_timestamp IS NULL
)
,CTE_ISLAND_END AS (
SELECT
ROW_NUMBER() OVER (PARTITION BY IoT_Device ORDER BY TimeStamp) AS island_number,
IoT_Device,
TimeStamp AS island_end_timestamp,
island_location AS island_end_location
FROM CTE_TIMESTAMP_DATA
WHERE DATEDIFF(MINUTE, TimeStamp, next_timestamp) > 1
OR next_timestamp IS NULL
)
SELECT
S.IoT_Device,
(SELECT SUM(Count)
FROM CTE_TIMESTAMP_DATA
WHERE IoT_Device = S.IoT_Device
AND TimeStamp BETWEEN S.island_start_timestamp AND E.island_end_timestamp) AS Count,
S.island_start_timestamp,
E.island_end_timestamp
FROM CTE_ISLAND_START AS S
INNER JOIN CTE_ISLAND_END AS E
ON E.IoT_Device = S.IoT_Device
AND E.island_number = S.island_number;
The CTE_TIMESTAMP_DATA query pulls the IoT_Device, Count, and TimeStamp along with the TimeStamp before and after each record using LAG and LEAD, and assigns a row number to each record ordered by TimeStamp.
The CTE_ISLAND_START query gets the start of each island.
The CTE_ISLAND_END query gets the end of each island.
The main SELECT at the bottom then uses this data to sum the Count within each island.
This will work with multiple IoT_Devices.
You can read more about Islands and Gaps here or numerous other places online.

How to filter a query based on a jsonb data?

Not even sure if it's possible to do this kind of query in postgres. At least i'm stuck.
I have two tables: a product recommendation list, containing multiple products to be recommended to a particular customer; and a transaction table indicating the product bought by customer and transaction details.
I'm trying to track the performance of my recommendation by plotting all the transaction that match the recommendations (both customer and product).
Below is my test case.
Kindly help
create table if not exists productRec( --Product Recommendation list
task_id int,
customer_id int,
detail jsonb);
truncate productRec;
insert into productRec values (1, 2, '{"1":{"score":5, "name":"KitKat"},
"4":{"score":2, "name":"Yuppi"}
}'),
(1, 3, '{"1":{"score":3, "name":"Yuppi"},
"4":{"score":2, "name":"GoldenSnack"}
}'),
(1, 4, '{"1":{"score":3, "name":"Chickies"},
"4":{"score":2, "name":"Kitkat"}
}');
drop table txn;
create table if not exists txn( --Transaction table
customer_id int,
item_id text,
txn_value numeric,
txn_date date);
truncate txn;
insert into txn values (1, 'Yuppi', 500, DATE '2001-01-01'), (2, 'Kitkat', 2000, DATE '2001-01-01'),
(3, 'Kitkat', 2000, DATE '2001-02-01'), (4, 'Chickies', 200, DATE '2001-09-01');
--> Query must plot:
--Transaction value vs date where the item_id is inside the recommendation for that customer
--ex: (2000, 2001-01-01), (200, 2001-09-01)
We can get each recommendation as its own row with jsonb_each. I don't know what to do with the keys so I just take the value (still jsonb) and then the name inside it (the ->> outputs text).
select
customer_id,
(jsonb_each(detail)).value->>'name' as name
from productrec
So now we have a list of customer_ids and item_ids they were recommended. Now we can just join this with the transactions.
select
txn.txn_value,
txn.txn_date
from txn
join (
select
customer_id,
(jsonb_each(detail)).value->>'name' as name
from productrec
) p ON (
txn.customer_id = p.customer_id AND
lower(txn.item_id) = lower(p.name)
);
In your example data you spelled Kitkat differently in the recommendation table for customer 2. I added lowercasing in the join condition to counter that but it might not be the right solution.
txn_value | txn_date
-----------+------------
2000 | 2001-01-01
200 | 2001-09-01
(2 rows)

How can I find distinct accountid's with no CEO contact?

I have a contact table which has unique contactid as primary key, but may or may not have multiple records for the same account. My task is to return a list of accountid's and account_name's that do not have any contact with a CEO designation.
I agree this should be simple, and I freely admit to being dumb, so what I did was create a temp table with all unique accountid's, then flag the ones that did have the CEO job title, then do a select distinct accountid, account_name where flag is null group by, etc., which worked quickly and correctly, but is pretty lame. I frequently write lame scripts which work great, but are shamefully elementary, namely because that's how I think.
There must a nice, elegant way to do this so maybe I can learn something. Can someone help out? Thanks heaps in advance for your help! (p.s. Using SS2014)
Sample data below, in which companies 2,3,5 do not have a CEO:
create table contact (
contactid int,
accountid int,
account_name varchar(10),
designation varchar(5));
insert into contact
values
(1, 100, 'COMPANY1', 'MGR'),
(2, 100, 'COMPANY1', 'MGR'),
(3, 100, 'COMPANY1', 'VP'),
(4, 100, 'COMPANY1', 'CEO'),
(5, 200, 'COMPANY2', 'COO'),
(6, 200, 'COMPANY2', 'CIO'),
(7, 200, 'COMPANY2', 'VP'),
(8, 200, 'COMPANY2', 'VP'),
(9, 300, 'COMPANY3', 'MGR'),
(10, 400, 'COMPANY4', 'MGR'),
(11, 400, 'COMPANY4', 'MGR'),
(12, 400, 'COMPANY4', 'CEO'),
(13, 500, 'COMPANY5', 'VP'),
(14, 500, 'COMPANY5', 'VP'),
(15, 500, 'COMPANY5', 'VP'),
(16, 500, 'COMPANY5', 'VP');
For something like this, I usually just go with a self-join where null, like this:
SELECT DISTINCT
C.accountid
FROM contact C
LEFT JOIN contact CEO
ON CEO.accountid = C.accountid
AND CEO.designation = 'CEO'
WHERE
CEO.contactid IS NULL
Something like this?
WITH CEO_IDs AS
(
SELECT DISTINCT accountID
FROM contact
WHERE designation='CEO'
)
SELECT DISTINCT accountID
FROM contact
WHERE accountid NOT IN(SELECT x.accountID FROM CEO_IDs AS x)
The CTE finds all accountID, which do have a CEO and uses this as a negative filter to get all accountIDs, which do not have a CEO...
You'd get the same with a sub-select:
SELECT DISTINCT accountID
FROM contact
WHERE accountid NOT IN
(SELECT x.accountID
FROM contact AS x
WHERE x.designation='CEO')

PostgreSQL Get holes in index column

I suppose it is not easy to query a table for data which don't exists but maybe here is some trick to achieve holes in one integer column (rowindex).
Here is small table for illustrating concrete situation:
DROP TABLE IF EXISTS examtable1;
CREATE TABLE examtable1
(rowindex integer primary key, mydate timestamp, num1 integer);
INSERT INTO examtable1 (rowindex, mydate, num1)
VALUES (1, '2015-03-09 07:12:45', 1),
(3, '2015-03-09 07:17:12', 4),
(5, '2015-03-09 07:22:43', 1),
(6, '2015-03-09 07:25:15', 3),
(7, '2015-03-09 07:41:46', 2),
(10, '2015-03-09 07:42:05', 1),
(11, '2015-03-09 07:45:16', 4),
(14, '2015-03-09 07:48:38', 5),
(15, '2015-03-09 08:15:44', 2);
SELECT rowindex FROM examtable1;
With showed query I get all used indexes listed.
But I would like to get (say) first five indexes which is missed so I can use them for insert new data at desired rowindex.
In concrete example result will be: 2, 4, 8, 9, 12 what represent indexes which are not used.
Is here any trick to build a query which will give n number of missing indexes?
In real, such table may contain many rows and "holes" can be anywhere.
You can do this by generating a list of all numbers using generate_series() and then check which numbers don't exist in your table.
This can either be done using an outer join:
select nr.i as missing_index
from (
select i
from generate_series(1, (select max(rowindex) from examtable1)) i
) nr
left join examtable1 t1 on nr.i = t1.rowindex
where t1.rowindex is null;
or an not exists query:
select i
from generate_series(1, (select max(rowindex) from examtable1)) i
where not exists (select 1
from examtable1 t1
where t1.rowindex = i.i);
I have used a hardcoded lower bound for generate_series() so that you would also detect a missing rowindex that is smaller than the lowest number.