Replace Subselect for something more efficient - tsql

I have this query which takes a long time, partly because the number of records in the table excedd 500 000 records, but the join I have to use slows it down quite a lot, at least to my beliefs
SELECT TOP (10) PERCENT H1.DateCompteur, CASE WHEN (h1.cSortie - h2.cSortie > 0)
THEN h1.cSortie - h2.cSortie ELSE 0 END AS Compte, H1.IdMachine
FROM dbo.T_HistoriqueCompteur AS H1 INNER JOIN
dbo.T_HistoriqueCompteur AS H2 ON H1.IdMachine = H2.IdMachine AND H2.DateCompteur =
(SELECT MAX(DateCompteur) AS Expr1
FROM dbo.T_HistoriqueCompteur AS HS
WHERE (DateCompteur < H1.DateCompteur) AND (H1.IdMachine = IdMachine))
ORDER BY H1.DateCompteur DESC
The order by is important since I need only the most recent informations. I tried using the ID field in my sub select since they are ordred by date but could not detect any significant improvement.
SELECT TOP (10) PERCENT H1.DateCompteur, CASE WHEN (h1.cSortie - h2.cSortie > 0)
THEN h1.cSortie - h2.cSortie ELSE 0 END AS Compte, H1.IdMachine
FROM dbo.T_HistoriqueCompteur AS H1 INNER JOIN
dbo.T_HistoriqueCompteur AS H2 ON H1.IdMachine = H2.IdMachine AND H2.ID =
(SELECT MAX(ID) AS Expr1
FROM dbo.T_HistoriqueCompteur AS HS
WHERE (ID < H1.ID) AND (H1.IdMachine = IdMachine))
ORDER BY H1.DateCompteur DESC
the table I use look a little like this (I got much more columns but they are unused in this query).
ID bigint
IdMachine bigint
cSortie bigint
DateCompteur datetime
I think that if I could get rid of the sub select, my query would run much faster but I can't really find a way to do so. What I really want to do is to find the previous row with the same IdMachine so that I can calculate the difference between the two cSortie values. The case in the query is because something it's reseted to 0 and in this case, I want to return 0 instead of a negative value.
So my question is this : Can I do better than what I already have ??? I plan to put this in a view if that makes a difference.

Try this query
WITH T as
(
SELECT TOP (10) PERCENT H1.DateCompteur, H1.cSortie as cSortie1, H1.IdMachine,
(
SELECT TOP 1 H2.cSortie
FROM dbo.T_HistoriqueCompteur H2
WHERE (H2.DateCompteur < H1.DateCompteur) AND (H1.IdMachine = H2.IdMachine)
ORDER BY H2.DateCompteur DESC
) as cSortie2
FROM dbo.T_HistoriqueCompteur AS H1
ORDER BY H1.DateCompteur DESC
)
select DateCompteur,
CASE WHEN (cSortie1 - cSortie2 > 0)
THEN cSortie1 - cSortie2
ELSE 0 END
AS Compte,
IdMachine
FROM T

You could also try CTE's (common table expressions) with windowing functions (ROW_NUMBER):
;WITH CTE AS
(
SELECT ID,IdMachine,cSortie,ROW_NUMBER() OVER(PARTITION BY h.IdMachine ORDER BY ID ASC) AS [ROW]
FROM T_HistoriqueCompteur h
)
SELECT
TOP (10) PERCENT
H1.DateCompteur,
CASE WHEN (h1.cSortie - h2.cSortie > 0) THEN h1.cSortie - h2.cSortie
ELSE 0
END AS Compte,
H1.IdMachine
FROM dbo.T_HistoriqueCompteur AS H1
INNER JOIN CTE cte on cte.idmachine = h1.idmachine and cte.id = h1.id
INNER JOIN CTE h2 on h2.idmachine = cte.idmachine and h2.row + 1 = cte.row
ORDER BY H1.DateCompteur DESC

Related

In PostgreSQL, how can I optimize a query with which I obtain the differences between the current column and the immediately previous one?

I have this audit table
User
date
text
text 2
u1
2023-01-01
hi
yes
u1
2022-12-20
hi
no
u1
2022-12-01
hello
maybe
And I need as a result, something like this:
User
date
text
text 2
u1
2023-01-01
null
x
u1
2022-12-20
x
x
u1
2022-12-01
null
null
So I can know which column changed from the last time.
Something like this is working, but I think may be a way to optimize it? or at least generate a "more easy to look" query? (i need the information for almost 20 columns, not only 3)
SELECT
ta.audit_date,
ta.audit_user,
CASE
WHEN ta.audit_operation = 'I' THEN 'Insert'
WHEN ta.audit_operation = 'U' THEN 'Update'
END AS action,
CASE WHEN ta.column1 <> (SELECT column1
FROM audit_table ta1
WHERE ta1.id = 9207 AND ta1.audit_date < ta.audit_date
ORDER BY ta1.audit_date DESC
LIMIT 1)
THEN 'X' ELSE null END column1,
CASE WHEN ta.column2 <> (SELECT column2
FROM audit_table ta1
WHERE ta1.id = 9207 AND ta1.audit_date < ta.audit_date
ORDER BY ta1.audit_date DESC
LIMIT 1)
THEN 'X' ELSE null END column2,
CASE WHEN ta.column3 <> (SELECT column3
FROM audit_table ta1
WHERE ta1.id = 9207 AND ta1.audit_date < ta.audit_date
ORDER BY ta1.audit_date DESC
LIMIT 1)
THEN 'X' ELSE null END column3
FROM
audit_table ta
WHERE
ta.id = 9207
ORDER BY
audit_date DESC
Thank you!
I think you can just use the LAG() analytic function here. If I understand correctly:
SELECT *, CASE WHEN text != LAG(text) OVER (ORDER BY date) THEN 'x' END AS text_label,
CASE WHEN text2 != LAG(text) OVER (ORDER BY date) THEN 'x' END AS text2_label
FROM yourTable
ORDER BY date;

Cancelled amount and a corresponding entry - Postgres

I have the payment table:
There could be erroneous entries when a payment was made by mistake - see row 5 and then, this payment gets cancelled out - see row 6. I cannot figure out the query where I don't only cancel the negative amounts but also the corresponding pair. Here is the desired outcome:
You could also see the cases when several wrong payments were made and then, I need to cancel out all payments which if summed up give the cancelled amount.
The desired outcome:
I found Remove Rows That Sum Zero For A Given Key, Selecting positive aggregate value and ignoring negative in Postgres SQL and https://www.sqlservercentral.com/forums/topic/select-all-negative-values-that-have-a-positive-value but it is not exactly what I need
I already don't mind cases like case 2. At least, find a reliable way to exclude those like 5;-5.
you can try this for deleting the rows from the table :
WITH RECURSIVE cancel_list (id, total_cancel, sum_cancel, index_to_cancel) AS
( SELECT p.id, abs(p.amount), 0, array[p.index]
FROM payment_table AS p
WHERE p.amount < 0
AND p.id = id_to_check_and_cancel -- this condition can be suppressed in order to go through the full table payment
UNION ALL
SELECT DISTINCT ON (l.id) l.id, l.total_cancel, l.sum_cancel + p.amount, l.index_to_cancel || p.index
FROM cancel_list AS l
INNER JOIN payment_table AS p
ON p.id = l.id
WHERE l.sum_cancel + p.amount <= l.total_cancel
AND NOT l.index_to_cancel #> array[p.index] -- this condition is to avoid loops
)
DELETE FROM payment_table AS p
USING (SELECT DISTINCT ON (c.id) c.id, unnest(c.index_to_cancel) AS index_to_cancel
FROM cancel_list AS c
ORDER BY c.id, array_length(c.index_to_cancel, 1) DESC
) AS c
WHERE p.index = c.index_to_cancel;
you can try this for just querying the table without the hidden rows :
WITH RECURSIVE cancel_list (id, total_cancel, sum_cancel, index_to_cancel) AS
( SELECT p.id, abs(p.amount), 0, array[p.index]
FROM payment_table AS p
WHERE p.amount < 0
AND p.id = id_to_check_and_cancel -- this condition can be suppressed in order to go through the full table payment
UNION ALL
SELECT DISTINCT ON (l.id) l.id, l.total_cancel, l.sum_cancel + p.amount, l.index_to_cancel || p.index
FROM cancel_list AS l
INNER JOIN payment_table AS p
ON p.id = l.id
WHERE l.sum_cancel + p.amount <= l.total_cancel
AND NOT l.index_to_cancel #> array[p.index] -- this condition is to avoid loops
)
SELECT *
FROM payment_table AS p
LEFT JOIN (SELECT DISTINCT ON (c.id) c.id, c.index_to_cancel
FROM cancel_list AS c
ORDER BY c.id, array_length(c.index_to_cancel, 1) DESC
) AS c
ON c.index_to_cancel #> array[p.index]
WHERE c.index_to_cancel IS NULL ;

Check for equal amounts of negative numbers as positive numbers

I have a table with two columns: intGroupID, decAmount
I want to have a query that can basically return the intGroupID as a result if for every positive(+) decAmount, there is an equal and opposite negative(-) decAmount.
So a table of (id=1,amount=1.0),(1,2.0),(1,-1.0),(1,-2.0) would return back the intGroupID of 1, because for each positive number there exists a negative number to match.
What I know so far is that there must be an equal number of decAmounts (so I enforce a count(*) % 2 = 0) and the sum of all rows must = 0.0. However, some cases that get by that logic are:
ID | Amount
1 | 1.0
1 | -1.0
1 | 2.0
1 | -2.0
1 | 3.0
1 | 2.0
1 | -4.0
1 | -1.0
This has a sum of 0.0 and has an even number of rows, but there is not a 1-for-1 relationship of positives to negatives. I need a query that can basically tell me if there is a negative amount for each positive amount, without reusing any of the rows.
I tried counting the distinct absolute values of the numbers and enforcing that it is less than the count of all rows, but it's not catching everything.
The code I have so far:
DECLARE #tblTest TABLE(
intGroupID INT
,decAmount DECIMAL(19,2)
);
INSERT INTO #tblTest (intGroupID ,decAmount)
VALUES (1,-1.0),(1,1.0),(1,2.0),(1,-2.0),(1,3.0),(1,2.0),(1,-4.0),(1,-1.0);
DECLARE #intABSCount INT = 0
,#intFullCount INT = 0;
SELECT #intFullCount = COUNT(*) FROM #tblTest;
SELECT #intABSCount = COUNT(*) FROM (
SELECT DISTINCT ABS(decAmount) AS absCount FROM #tblTest GROUP BY ABS(decAmount)
) AS absCount
SELECT t1.intGroupID
FROM #tblTest AS t1
/* Make Sure Even Number Of Rows */
INNER JOIN
(SELECT COUNT(*) AS intCount FROM #tblTest
)
AS t2 ON t2.intCount % 2 = 0
/* Make Sure Sum = 0.0 */
INNER JOIN
(SELECT SUM(decAmount) AS decSum FROM #tblTest)
AS t3 ON decSum = 0.0
/* Make Sure Count of Absolute Values < Count of Values */
WHERE
#intABSCount < #intFullCount
GROUP BY t1.intGroupID
I think there is probably a better way to check this table, possibly by finding pairs and removing them from the table and seeing if there's anything left in the table once there are no more positive/negative matches, but I'd rather not have to use recursion/cursors.
Create TABLE #tblTest (
intA INT
,decA DECIMAL(19,2)
);
INSERT INTO #tblTest (intA,decA)
VALUES (1,-1.0),(1,1.0),(1,2.0),(1,-2.0),(1,3.0),(1,2.0),(1,-4.0),(1,-1.0), (5,-5.0),(5,5.0) ;
SELECT * FROM #tblTest;
SELECT
intA
, MIN(Result) as IsBalanced
FROM
(
SELECT intA, X,Result =
CASE
WHEN count(*)%2 = 0 THEN 1
ELSE 0
END
FROM
(
---- Start thinking here --- inside-out
SELECT
intA
, x =
CASE
WHEN decA < 0 THEN
-1 * decA
ELSE
decA
END
FROM #tblTest
) t1
Group by intA, X
)t2
GROUP BY intA
Not tested but I think you can get the idea
This returns the id that do not conform
The not is easier to test / debug
select pos.*, neg.*
from
( select id, amount, count(*) as ccount
from tbl
where amount > 0
group by id, amount ) pos
full outer join
( select id, amount, count(*) as ccount
from tbl
where amount < 0
group by id, amount ) neg
on pos.id = neg.id
and pos.amount = -neg.amount
and pos.ccount = neg.ccount
where pos.id is null
or neg.id is null
I think this will return a list of id that do conform
select distinct(id) from tbl
except
select distinct(isnull(pos.id, neg.id))
from
( select id, amount, count(*) as ccount
from tbl
where amount > 0
group by id, amount ) pos
full outer join
( select id, amount, count(*) as ccount
from tbl
where amount < 0
group by id, amount ) neg
on pos.id = neg.id
and pos.amount = -neg.amount
and pos.ccount = neg.ccount
where pos.id is null
or neg.id is null
Boy, I found a simpler way to do this than my previous answers. I hope all my crazy edits are saved for posterity.
This works by grouping all numbers for an id by their absolute value (1, -1 grouped by 1).
The sum of the group determines if there are an equal number of pairs. If it is 0 then it is equal, any other value for the sum means there is an imbalance.
The detection of evenness by the COUNT aggregate is only necessary to detect an even number of zeros. I assumed that 0's could exist and they should occur an even number of times. Remove it if this isn't a concern, as 0 will always pass the first test.
I rewrote the query a bunch of different ways to get the best execution plan. The final result below only has one big heap sort which was unavoidable given the lack of an index.
Query
WITH tt AS (
SELECT intGroupID,
CASE WHEN SUM(decAmount) > 0 OR COUNT(*) % 2 = 1 THEN 1 ELSE 0 END unequal
FROM #tblTest
GROUP BY intGroupID, ABS(decAmount)
)
SELECT tt.intGroupID,
CASE WHEN SUM(unequal) != 0 THEN 'not equal' ELSE 'equals' END [pair]
FROM tt
GROUP BY intGroupID;
Tested Values
(1,-1.0),(1,1.0),(1,2),(1,-2), -- should work
(2,-1.0),(2,1.0),(2,2),(2,2), -- fail, two positive twos
(3,1.0),(3,1.0),(3,-1.0), -- fail two 1's , one -1
(4,1),(4,2),(4,-.5),(4,-2.5), -- fail: adds up the same sum, but different values
(5,1),(5,-1),(5,0),(5,0), -- work, test zeros
(6,1),(6,-1),(6,0), -- fail, test zeros
(7,1),(7,-1),(7,-1),(7,1),(7,1) -- fail, 3 x 1
Results
A pairs
_ _____
1 equal
2 not equal
3 not equal
4 not equal
5 equal
6 not equal
7 not equal
The following should return "disbalanced" groups:
;with pos as (
select intGroupID, ABS(decAmount) m
from TableName
where decAmount > 0
), neg as (
select intGroupID, ABS(decAmount) m
from TableName
where decAmount < 0
)
select distinct IsNull(p.intGroupID, n.intGroupID) as intGroupID
from pos p
full join neg n on n.id = p.id and abs(n.m - p.m) < 1e-8
where p.m is NULL or n.m is NULL
to get unpaired elements, select satement can be changed to following:
select IsNull(p.intGroupID, n.intGroupID) as intGroupID, IsNull(p.m, -n.m) as decAmount
from pos p
full join neg n on n.id = p.id and abs(n.m - p.m) < 1e-8
where p.m is NULL or n.m is NULL
Does this help?
-- Expected result - group 1 and 3
declare #matches table (groupid int, value decimal(5,2))
insert into #matches select 1, 1.0
insert into #matches select 1, -1.0
insert into #matches select 2, 2.0
insert into #matches select 2, -2.0
insert into #matches select 2, -2.0
insert into #matches select 3, 3.0
insert into #matches select 3, 3.5
insert into #matches select 3, -3.0
insert into #matches select 3, -3.5
insert into #matches select 4, 4.0
insert into #matches select 4, 4.0
insert into #matches select 4, -4.0
-- Get groups where we have matching positive/negatives, with the same number of each
select mat.groupid, min(case when pos.PositiveCount = neg.NegativeCount then 1 else 0 end) as 'Match'
from #matches mat
LEFT JOIN (select groupid, SUM(1) as 'PositiveCount', Value
from #matches where value > 0 group by groupid, value) pos
on pos.groupid = mat.groupid and pos.value = ABS(mat.value)
LEFT JOIN (select groupid, SUM(1) as 'NegativeCount', Value
from #matches where value < 0 group by groupid, value) neg
on neg.groupid = mat.groupid and neg.value = case when mat.value < 0 then mat.value else mat.value * -1 end
group by mat.groupid
-- If at least one pair within a group don't match, reject
having min(case when pos.PositiveCount = neg.NegativeCount then 1 else 0 end) = 1
You can compare your values this way:
declare #t table(id int, amount decimal(4,1))
insert #t values(1,1.0),(1,-1.0),(1,2.0),(1,-2.0),(1,3.0),(1,2.0),(1,-4.0),(1,-1.0),(2,-1.0),(2,1.0)
;with a as
(
select count(*) cnt, id, amount
from #t
group by id, amount
)
select id from #t
except
select b.id from a
full join a b
on a.cnt = b.cnt and a.amount = -b.amount
where a.id is null
For some reason i can't write comments, however Daniels comment is not correct, and my solution does accept (6,1),(6,-1),(6,0) which can be correct. 0 is not specified in the question and since it is a 0 value it can be handled eather way. My answer does NOT accept (3,1.0),(3,1.0),(3,-1.0)
To Blam: No I am not missing
or b.id is null
My solution is like yours, but not exactly identical

TSQL - COUNT number of rows in a different state than current row

It's kind of hard to explain, but from this example it should be clear.
Table TABLE:
Name State Time
--------------------
A 1 1/4/2012
B 0 1/3/2012
C 0 1/2/2012
D 1 1/1/2012
Would like to
select * from TABLE where state=1 order by Time desc
plus an additional column 'Skipped' containing the number of rows after one where state=1 in state 0, in other words the output should look like this:
Name State Time Skipped
A 1 1/4/2012 2 -- 2 rows after A where State != 1
D 1 1/1/2012 0 -- 0 rows after D where State != 1
0 should also be reported in case of 2 consecutive rows are in state = 1, i.e. there is nothing between these rows in a state other than 1.
It seems like CTE are must here, but can't figure out how to count rows where state != 1.
Any help will be appreciated.
(MS Sql Server 2008)
I've used a CTE to establish RowNo, so that you're not dependent on consecutive dates:
WITH CTE_Rows as
(
select name,state,time,
rowno = ROW_NUMBER() over (order by [time])
from MyTable
)
select name,state,time,
gap = isnull(r.rowno - x.rowno - 1,0)
from
CTE_Rows r
outer apply (
select top 1 rowno
from CTE_Rows sub
where sub.rowno < r.rowno and sub.state = 1
order by sub.rowno desc) x
where r.state = 1
If you just want to do it by date, then its simpler - just need an outer apply:
select name,state,r.time,
gap = convert(int,isnull(r.time - x.time - 1,0))
from
MyTable r
outer apply (
select top 1 time
from MyTable sub
where sub.time < r.time and sub.state = 1
order by sub.time desc) x
where r.state = 1
FYI the test data is used was created as follows:
create table MyTable
(Name char(1), [state] tinyint, [Time] datetime)
insert MyTable
values
('E',1,'2012-01-05'),
('A',1,'2012-01-04'),
('B',0,'2012-01-03'),
('C',0,'2012-01-02'),
('D',1,'2012-01-01')
Okay, here you go (it gets a little messy):
SELECT U.CurrentTime,
(SELECT COUNT(*)
FROM StateTable AS T3
WHERE T3.State=0
AND T3.Time BETWEEN U.LastTime AND U.CurrentTime) AS Skipped
FROM (SELECT T1.Time AS CurrentTime,
(SELECT TOP 1 T2.Time
FROM StateTable AS T2
WHERE T2.Time < T1.Time AND T2.State=1
ORDER BY T2.Time DESC) AS LastTime
FROM StateTable AS T1 WHERE T1.State = 1) AS U

How to use parameters in a SQL query with NOT EXISTS?

How can I change following query, so that I'm able to parameterize the SparePartNames?
It returns all ID's of repairs where not all mandatory spareparts were changed, in other words where at least one part is missing.
Note that the number of spareparts might change in future not only the names. Is it possible without using a stored procedure with dynamic SQL? If not, how could this SP look like?
Edit: Note that i do not need to know how to pass a list/array as parameter, this is asked myriads of time on SO. I've also already a Split table-valued-function. I'm just wondering how i could rewrite the query to be able to join(or whatever) with a list of mandatory parts, so that i'll find all records where at least one part is missing. So is it possible to use a varchar-parameter like '1264-3212,1254-2975' instead of a list of NOT EXISTS? Sorry for the confusion if it was not clear in the first place.
SELECT d.idData
FROM tabData d
INNER JOIN modModel AS m ON d.fiModel = m.idModel
WHERE (m.ModelName = 'MT27I')
AND (d.fiMaxServiceLevel >= 2)
AND (d.Manufacture_Date < '20120511')
AND (NOT EXISTS
(SELECT NULL
FROM tabDataDetail AS td
INNER JOIN tabSparePart AS sp ON sp.idSparePart = td.fiSparePart
WHERE (td.fiData = d.idData)
AND (sp.SparePartName = '1264-3212'))
OR (NOT EXISTS
(SELECT NULL
FROM tabDataDetail AS td
INNER JOIN tabSparePart AS sp ON sp.idSparePart = td.fiSparePart
WHERE (td.fiData = d.idData)
AND (sp.SparePartName = '1254-2975'))
)
)
Unfortunately I don't see how I could use sp.SparePartName IN/NOT IN(#sparePartNames) here.
One way to do it is to create a function to split delimited strings:
CREATE FUNCTION [dbo].[Split]
(
#Delimiter char(1),
#StringToSplit varchar(512)
)
RETURNS table
AS
RETURN
(
WITH Pieces(pieceNumber, startIndex, delimiterIndex)
AS
(
SELECT 1, 1, CHARINDEX(#Delimiter, #StringToSplit)
UNION ALL
SELECT pieceNumber + 1, delimiterIndex + 1, CHARINDEX(#Delimiter, #StringToSplit, delimiterIndex + 1)
FROM Pieces
WHERE delimiterIndex > 0
)
SELECT
SUBSTRING(#StringToSplit, startIndex, CASE WHEN delimiterIndex > 0 THEN delimiterIndex - startIndex ELSE 512 END) AS Value
FROM Pieces
)
populate a table variable with the spare part names:
DECLARE #SpareParts TABLE
(
SparePartName varchar(50) PRIMARY KEY CLUSTERED
);
INSERT INTO #SpareParts
SELECT Value FROM dbo.Split(',', '1264-3212,1254-2975');
and then join to the table variable:
SELECT d.idData
FROM tabData d
INNER JOIN modModel AS m ON d.fiModel = m.idModel
WHERE (m.ModelName = 'MT27I')
AND (d.fiMaxServiceLevel >= 2)
AND (d.Manufacture_Date < '20120511')
AND EXISTS (
SELECT 1
FROM tabDataDetail AS td
INNER JOIN tabSparePart AS sp ON sp.idSparePart = td.fiSparePart
LEFT JOIN #SpareParts AS s ON s.SparePartName = sp.SparePartName
WHERE td.fiData = d.idData
AND s.SparePartName IS NULL
)
Assuming there is (or will be) a table or view of mandatory spare parts, a list of exists can be replaced with a left join to tabDataDetail / tabSparePart pair on SparePartName; non-matches are reported back using td.fiSparePart is null.
; with mandatorySpareParts (SparePartName) as (
select '1264-3212'
union all
select '1254-2975'
)
SELECT d.idData
FROM tabData d
INNER JOIN modModel AS m ON d.fiModel = m.idModel
WHERE (m.ModelName = 'MT27I')
AND (d.fiMaxServiceLevel >= 2)
AND (d.Manufacture_Date < '20120511')
AND exists
(
SELECT null
from mandatorySpareParts msp
left join ( tabDataDetail AS td
INNER JOIN tabSparePart AS sp
ON sp.idSparePart = td.fiSparePart
AND td.fiData = d.idData
)
ON msp.SparePartName = sp.SparePartName
WHERE td.fiSparePart is null
)
Part names should be replaced by their id's, which would simplify left join and speed the query up.
EDIT: i've errorneously left filtering of td in where clause, which invalidated left join. It is now in ON clause where it belongs.
Use a table-variable and join on that.