T-SQL Query to process data in batches without breaking groups

T-SQL Query to process data in batches without breaking groups - tsql

I am using SQL 2008 and trying to process the data I have in a table in batches, however, there is a catch. The data is broken into groups and, as I do my processing, I have to make sure that a group will always be contained within a batch or, in other words, that the group will never be split across different batches. It's assumed that the batch size will always be much larger than the group size. Here is the setup to illustrate what I mean (the code is using Jeff Moden's data generation logic: http://www.sqlservercentral.com/articles/Data+Generation/87901)
DECLARE #NumberOfRows INT = 1000,
#StartValue INT = 1,
#EndValue INT = 500,
#Range INT
SET #Range = #EndValue - #StartValue + 1
IF OBJECT_ID('tempdb..#SomeTestTable','U') IS NOT NULL
DROP TABLE #SomeTestTable;
SELECT TOP (#NumberOfRows)
GroupID = ABS(CHECKSUM(NEWID())) % #Range + #StartValue
INTO #SomeTestTable
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
This will create a table with about 435 groups of records containing between 1 and 7 records in each. Now, let's say I want to process these records in batches of 100 records per batch. How can I make sure that my GroupID's don't get split between different batches? I am fine if each batch is not exactly 100 records, it could be a little more or a little less.
I appreciate any suggestions!

This will result in slightly smaller batches than 100 entries, it'll remove all groups that aren't entirely in the selection;
WITH cte AS (SELECT TOP 100 * FROM (
SELECT GroupID, ROW_NUMBER() OVER (PARTITION BY GroupID ORDER BY GroupID) r
FROM #SomeTestTable) a
ORDER BY GroupID, r DESC)
SELECT c1.GroupID FROM cte c1
JOIN cte c2
ON c1.GroupID = c2.GroupID
AND c2.r = 1
It'll select the groups with the lowest GroupID's, limited to 100 entries into a common table expression along with the row number, then it'll use the row number to throw away any groups that aren't entirely in the selection (row number 1 needs to be in the selection for the group to be, since the row number is ordered descending before cutting with TOP).

Related

Return closest timestamp from Table B based on timestamp from Table A with matching Product IDs

Goal: Create a query to pull the closest cycle count event (Table C) for a product ID based on the inventory adjustments results sourced from another table (Table A).
All records from Table A will be used, but is not guaranteed to have a match in Table C.
The ID column will be present in both tables, but is not unique in either, so that pair of IDs and Timestamps together are needed for each table.
Current simplified SQL
SELECT
A.WHENOCCURRED,
A.LPID,
A.ITEM,
A.ADJQTY,
C.WHENOCCURRED,
C.LPID,
C.LOCATION,
C.ITEM,
C.QUANTITY,
C.ENTQUANTITY
FROM
A
LEFT JOIN
C
ON A.LPID = C.LPID
WHERE
A.facility = 'FACID'
AND A.WHENOCCURRED > '23-DEC-22'
AND A.ADJREASONABBREV = 'CYCLE COUNTS'
ORDER BY A.WHENOCCURRED DESC
;
This is currently pulling the first hit on C.WHENOCCURRED on the LPID matches. Want to see if there is a simpler JOIN solution before going in a direction that creates 2 temp tables based on WHENOCCURRED.
I have a functioning INDEX(MATCH(MIN()) solution in Excel but that requires exporting a couple system reports first and is extremely slow with X,XXX row tables.

If you are using Oracle 12 or later, you can use a LATERAL join and FETCH FIRST ROW ONLY:
SELECT A.WHENOCCURRED,
A.LPID,
A.ITEM,
A.ADJQTY,
C.WHENOCCURRED,
C.LPID,
C.LOCATION,
C.ITEM,
C.QUANTITY,
C.ENTQUANTITY
FROM A
LEFT OUTER JOIN LATERAL (
SELECT *
FROM C
WHERE A.LPID = C.LPID
AND A.whenoccurred <= c.whenoccurred
ORDER BY c.whenoccurred
FETCH FIRST ROW ONLY
) C
ON (1 = 1) -- The join condition is inside the lateral join
WHERE A.facility = 'FACID'
AND A.WHENOCCURRED > DATE '2022-12-23'
AND A.ADJREASONABBREV = 'CYCLE COUNTS'
ORDER BY A.WHENOCCURRED DESC;

select only one record from a query which returns several rows

How do I retrieve only one row from a query which returns several?
Let's say I want only the 3 row?
This is the query but I want only the 3rd result
SELECT (journeys.id, j_starting_channel)
AS JER FROM JOURNEYS
WHERE j_starting_channel = 'channel_name' ORDER BY journeys.id;

The following should get you there:
SELECT (journeys.id, j_starting_channel)
AS JER FROM JOURNEYS
WHERE j_starting_channel = 'channel_name' ORDER BY journeys.id
LIMIT 1
OFFSET 2
LIMIT n will return the first n results. OFFSET m skips the first m rows and only returns everything thereafter.
LIMIT n OFFSET m thus returns rows m+1 to m+n.
See the PostgreSQL documentation for more details:
https://www.postgresql.org/docs/9.5/sql-select.html

If you just need to skip some rows then you can just use OFFSET to skip the rows in the top and then use LIMIT to return just one row
Like this:
SELECT (journeys.id, j_starting_channel)
AS JER FROM JOURNEYS
WHERE j_starting_channel = 'channel_name' ORDER BY journeys.id LIMIT 1 OFFSET 2
Here you have a step-by-step tutorial on those clauses
https://www.postgresqltutorial.com/postgresql-limit/
And you can always refer to the documentation too

by using OFFSET, LIMIT you can get needed portion of rows from result set
SELECT (journeys.id, j_starting_channel)
AS JER FROM JOURNEYS
WHERE j_starting_channel = 'channel_name' ORDER BY journeys.id OFFSET 2 LIMIT 1;

TSQL select and join issue

I have two tables, EMPL which is a historical employee table to track changes in an employee's tax rate and PAYROLL which is also a historical table filled with employee pay over a number of periods.
FROM EMPL, based upon the EMPL.effect_pd <= PAYROLL.payroll_pd, only one record should be joined from EMPL to PAYROLL.
Below are the two tables, query and result set. However, I only want 1 record for each employee per pay period, which matches the relevant employee record based upon the payroll_pd and effect_pd.
(Click image to enlarge)

first of all - welcome!
You wrote "...FROM EMPL, based upon the EMPL.effect_pd <= PAYROLL.payroll_pd ..." but you start your SQL with PAYROLL and not with EMPL.
Pls test this statement first:
SELECT
E.rec_id
,E.empl_id
,E.empl_name
,E.tax_rate
,E.effect_pd
,P.rec_id
,P.payroll_pd
,P.empl_id
,P.pd_pay
FROM
empl AS E
LEFT OUTER JOIN
payroll AS P
ON E.empl_id = P.empl_id
AND E.effect_pd < P.payroll_pd
After that you get 7 records witch are uniqe.
i think, thats it.
Best regards

After 3 days of messing around with the code, I finally arrived at the solution which is:
SELECT * FROM PAYROLL p
LEFT JOIN EMPL e on p.empl_id = e.empl_id
WHERE e.rec_id = ( SELECT TOP 1 c.rec_id
FROM EMPL c
WHERE c.empl_id = p.empl_id
AND p.payroll_pd >= c.effect_pd
ORDER BY c.effect_pd DESC );

SQLITE : Optimize ORDER BY Query

All,
I am iOS developer. Currently we have stored 2.5 lacks data in database. And we have implemented search functionality on that. Below is the query that we are using.
select CustomerMaster.CustomerName ,CustomerMaster.CustomerNumber,
CallActivityList.CallActivityID,CallActivityList.CustomerID,CallActivityList.UserID,
CallActivityList.ActivityType,CallActivityList.Objective,CallActivityList.Result,
CallActivityList.Comments,CallActivityList.CreatedDate,CallActivityList.UpdateDate,
CallActivityList.CallDate,CallActivityList.OrderID,CallActivityList.SalesPerson,
CallActivityList.GratisProduct,CallActivityList.CallActivityDeviceID,
CallActivityList.IsExported,CallActivityList.isDeleted,CallActivityList.TerritoryID,
CallActivityList.TerritoryName,CallActivityList.Hours,UserMaster.UserName,
(FirstName ||' '||LastName) as UserNameFull,UserMaster.TerritoryID as UserTerritory
from
CallActivityList
inner join CustomerMaster
ON CustomerMaster.DeviceCustomerID = CallActivityList.CustomerID
inner Join UserMaster
On UserMaster.UserID = CallActivityList.UserID
where
(CustomerMaster.CustomerName like '%T%' or
CustomerMaster.CustomerNumber like '%T%' or
CallActivityList.ActivityType like '%T%' or
CallActivityList.TerritoryName like '%T%' or
CallActivityList.SalesPerson like '%T%' )
and CallActivityList.IsExported!='2' and CallActivityList.isDeleted != '1'
order by
CustomerMaster.CustomerName
limit 50 offset 0
Without using 'order by' The query is returning result in 0.5 second. But when i am attaching 'order by', Time is increasing to 2 seconds.
I have tried indexing but it is not making any noticeable change. Any one please help. If we are not going through Query then how can we do it fast.
Thanks in advance.

This is due to the the limit. Without ORDER BY only 50 records have to be processed and any 50 will be returned. With ORDER BY all the records have to be processed in order to determine which ones are the first 50 (in order).
The problem is that the ORDER BY is performed on a joined table. Otherise you could apply the limit on the main table (I assume it is the CallActivityList) first and then join.
SELECT ...
FROM
(SELECT ... FROM CallActivityList ORDER BY ... LIMIT 50 OFFSET 0) AS CAL
INNER JOIN CustomerMaster ON ...
INNER JOIN UserMaster ON ...
ORDER BY ...
This would reduce the costs for joining the tables. If this is not possible, try at least to join CallActivityList with CustomerMaster. Apply the limit to those and finally join with UserMaster.
SELECT ...
FROM
(SELECT ...
FROM
CallActivityList
INNER JOIN CustomerMaster ON ...
ORDER BY CustomerMaster.CustomerName
LIMIT 50 OFFSET 0) AS ActCust
INNER JOIN UserMaster ON ...
ORDER BY ...
Also, in order to make the ordering unambiguous, I would include more columns into the order by, like call date and call id. Otherwise this could result in a inconsistent paging.

How to display null values for a field if some other field has values in Oracle

We have requirement wherein we need to display the sum of line cost for all the labor, material, service, tools associated to work order in Maximo. I have written the query however the sum of material line cost is getting doubled of there are more than one service line cost.
For example
wonum - 1234
material line cost - 10
service line cost - 5 and 6 ( 2 service lines)
total material line cost - 20
total service line cost - 11
The total for material line cost is wrong. I have used the below query, please let me know how to fix it
select a.wonum,a.description,a.location,a.crewid,a.worktype,a.wopriority,a.supervisor,a.actstart,a.siteid,sum(d.linecost) as totalmaterialcost,
sum(b.loadedcost)as totalservicecost
from workorder a
left outer join matusetrans d
on a.wonum=d.refwo and a.siteid=d.siteid
left outer join servrectrans b
on a.wonum=b.refwo and a.siteid=b.siteid
where a.wonum='1234' and a.siteid='ABC'
group by a.wonum,a.description,a.location,a.crewid,a.worktype,a.wopriority,a.supervisor,a.actstart,a.siteid

The problem is that the joins are all at the top level of your statement. This leads to multiple lines/records per workorder.
One solution would be to calculate the sum of matusetrans and servrectrans in two seperate sub-select-statements.
Example:
select a.wonum,
b.sum as totalservicecost,
d.sum as totalmaterialcost
from workorder a
left join (
select sum(b.loadedcost) as sum, b.siteid, b.refwo
from servrectrans b
group by b.siteid, b.refwo
) b on a.wonum = b.refwo and a.siteid = b.siteid
left join (
-- second sum-select goes here
) d on -- second join condition goes here
As a second approach check the workorder-table for columns already containing this data (eventually there is some de-normalization to boost performance).

The WORKORDER table will calculate these values for you already. Are the following fields not providing the data you need?
ACTINTLABCOST
ACTLABCOST
ACTMATCOST
ACTOUTLABCOST
ACTSERVCOST
ACTTOOLCOST
ESTATAPPRINTLABCOST
ESTATAPPRLABCOST
ESTATAPPRMATCOST
ESTATAPPROUTLABCOST
ESTATAPPRSERVCOST
ESTATAPPRTOOLCOST
ESTINTLABCOST
ESTLABCOST
ESTMATCOST
ESTOUTLABCOST
ESTSERVCOST
ESTTOOLCOST

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

T-SQL Query to process data in batches without breaking groups - tsql

Related

Return closest timestamp from Table B based on timestamp from Table A with matching Product IDs

select only one record from a query which returns several rows

TSQL select and join issue

SQLITE : Optimize ORDER BY Query

How to display null values for a field if some other field has values in Oracle

Categories

Resources