I've got a TVF and make use of table variables inside of it.
One of the last steps is to delete several (thousands) lines of it.
Exactly this query increases the runtime dramatically.
Changing the code using temp tables reduces runtime again, but unfortunately temp tables must not be used in TVF.
I can't change the TVF to a SPROC.
Any ideas how to enhance performance?
I added only the part of the TVF which slows it all down. Prior to this, timestamps are collected and preprocessed in #Result. #C collects includes the ids which have to be modified at this time.
UPDATE R
SET starttime = CASE R."myFunction"
WHEN 1 THEN Date1
WHEN 0 THEN Date2
END
FROM #Result AS R
WHERE EXISTS (
SELECT
NULL
FROM #c AS c
WHERE c."id" = R."id")
If the above code is executed without the update it executes nearly instantly, so I think the right-sided function isn't the bottleneck. Even if I change SET starttime... to a fixed value, runtime stays nearly the same.
#Result holds about 250.000 lines, #c about 20.000.
I already added indices to the table vars - without big success.
Try to rewrite your query to use a join instead of using WHERE EXISTS (which is known to perform worse in some cases):
UPDATE R
. . .
FROM #Result AS R
INNER JOIN #c AS c ON c."id" = R."id"
Related
I have a table that looks like the following:
CREATE TABLE tmp (
id uuid primary key,
other_id uuid,
...
);
This table has millions of entries, and I need to: loop through them all, check and compare the values of some of its fields with the values of another table, and correct the values.
I did not want to use the standard ORDER BY ... LIMIT ... OFFSET ... approach as its performance suffers greatly for big offsets. Hence, I tried to used the "seek index" approach, example here.
My problem is that I am getting off-by-one errors, and I am not sure (conceptually) how to solve these in PL/pgSQL code. Something like this:
-- Get initial offset
SELECT id INTO _id_offset
FROM tmp
WHERE ...
ORDER BY id DESC
LIMIT 1
WHILE ... LOOP -- Loop until some fixed high value to prevent infinite loop, just in case
SELECT id, other_id, ... INTO rows_to_update
FROM tmp
WHERE id < _id_offset AND (...) -- Latter part is same condition as above
ORDER BY id DESC
FETCH NEXT _batch_size ROWS ONLY
-- Get next offset
SELECT id INTO _id_offset
FROM rows_to_update
ORDER BY id ASC -- ASC to get the "last" id from above. Cannot simply use _batch_size offset as there may be fewer entries left.
LIMIT 1
-- Update relevant records, check # of updated records to see
-- if we can terminate loop early, update loop condition
...
END LOOP;
Unsurprisingly, the first and last entry are skipped due to the < condition. It would have been rather simple to correct this behaviour in application code, but I'm not sure how it should look like in PL/pgSQL.
Is there a simpler way to loop over an entire table in an efficient manner using PL/pgSQL?
I want to update two columns in my table, one of them depends on the calculation of another updated column. The calculation is rather complex, so I don't want to repeat that every time, I just want to use the newly updated value.
CREATE TABLE test (
A int,
B int,
C int,
D int
)
INSERT INTO test VALUES (0, 0, 5, 10)
UPDATE test
SET
B = C*D * 100,
A = B / 100
So my question, is this even possible to get 50 as the value for column A in just one query?
Another option would be to use persistent computed columns, but will that work when I have dependencies on another computed column?
you cant achieve what you are trying to in a single query.This is due to a Concept called All At Once Operations which translates to "In SQL Server, Operations which appears in Same logical Phase are evaluated at the same time.."..
Below operations wont yield result you are expecting
insert into table1
(t1,t1+100,t1+200)-- sql wont use new t1 incremented value
sames goes with update as well
update t1
set t1=t1*100
t2=t1 --sql wont use t1 updated value(*100)
References:
TSQL Querying by Itzik Ben-Gan
I recently asked a question regarding CTE's and using data with no true root records (i.e Instead of the root record having a NULL parent_Id it is parented to itself)
The question link is here; Creating a recursive CTE with no rootrecord
The answer has been provided to that question and I now have the data I require however I am interested in the difference between the two approaches that I THINK are available to me.
The approach that yielded the data I required was to create a temp table with cleaned up parenting data and then run a recursive CTE against. This looked like below;
Select CASE
WHEN Parent_Id = Party_Id THEN NULL
ELSE Parent_Id
END AS Act_Parent_Id
, Party_Id
, PARTY_CODE
, PARTY_NAME
INTO #Parties
FROM DIMENSION_PARTIES
WHERE CURRENT_RECORD = 1),
WITH linkedParties
AS
(
Select Act_Parent_Id, Party_Id, PARTY_CODE, PARTY_NAME, 0 AS LEVEL
FROM #Parties
WHERE Act_Parent_Id IS NULL
UNION ALL
Select p.Act_Parent_Id, p.Party_Id, p.PARTY_CODE, p.PARTY_NAME, Level + 1
FROM #Parties p
inner join
linkedParties t on p.Act_Parent_Id = t.Party_Id
)
Select *
FROM linkedParties
Order By Level
I also attempted to retrieve the same data by defining two CTE's. One to emulate the creation of the temp table above and the other to do the same recursive work but referencing the initial CTE rather than a temp table;
WITH Parties
AS
(Select CASE
WHEN Parent_Id = Party_Id THEN NULL
ELSE Parent_Id
END AS Act_Parent_Id
, Party_Id
, PARTY_CODE
, PARTY_NAME
FROM DIMENSION_PARTIES
WHERE CURRENT_RECORD = 1),
linkedParties
AS
(
Select Act_Parent_Id, Party_Id, PARTY_CODE, PARTY_NAME, 0 AS LEVEL
FROM Parties
WHERE Act_Parent_Id IS NULL
UNION ALL
Select p.Act_Parent_Id, p.Party_Id, p.PARTY_CODE, p.PARTY_NAME, Level + 1
FROM Parties p
inner join
linkedParties t on p.Act_Parent_Id = t.Party_Id
)
Select *
FROM linkedParties
Order By Level
Now these two scripts are run on the same server however the temp table approach yields the results in approximately 15 seconds.
The multiple CTE approach takes upwards of 5 minutes (so long in fact that I have never waited for the results to return).
Is there a reason why the temp table approach would be so much quicker?
For what it is worth I believe it is to do with the record counts. The base table has 200k records in it and from memory CTE performance is severely degraded when dealing with large data sets but I cannot seem to prove that so thought I'd check with the experts.
Many Thanks
Well as there appears to be no clear answer for this some further research into the generics of the subject threw up a number of other threads with similar problems.
This one seems to cover many of the variations between temp table and CTEs so is most useful for people looking to read around their issues;
Which are more performant, CTE or temporary tables?
In my case it would appear that the large amount of data in my CTEs would cause issue as it is not cached anywhere and therefore recreating it each time it is referenced later would have a large impact.
This might not be exactly the same issue you experienced, but I just came across a few days ago a similar one and the queries did not even process that many records (a few thousands of records).
And yesterday my colleague had a similar problem.
Just to be clear we are using SQL Server 2008 R2.
The pattern that I identified and seems to throw the sql server optimizer off the rails is using temporary tables in CTEs that are joined with other temporary tables in the main select statement.
In my case I ended up creating an extra temporary table.
Here is a sample.
I ended up doing this:
SELECT DISTINCT st.field1, st.field2
into #Temp1
FROM SomeTable st
WHERE st.field3 <> 0
select x.field1, x.field2
FROM #Temp1 x inner join #Temp2 o
on x.field1 = o.field1
order by 1, 2
I tried the following query but it was a lot slower, if you can believe it.
with temp1 as (
DISTINCT st.field1, st.field2
FROM SomeTable st
WHERE st.field3 <> 0
)
select x.field1, x.field2
FROM temp1 x inner join #Temp2 o
on x.field1 = o.field1
order by 1, 2
I also tried to inline the first query in the second one and the performance was the same, i.e. VERY BAD.
SQL Server never ceases to amaze me. Once in a while I come across issues like this one that reminds me it is a microsoft product after all, but in the end you can say that other database systems have their own quirks.
I have a query which run extremely slow when "checking is_member" in comparison to just loading the whole dataset. This view acts as a security check, it checks if you are a member of a particular group - ie group 1, then the next column will state what access it has - ie division 2.
This view then is joined with the Fact table, so that it will only retrieve division 2 rows.
The question is, does the is_member execute for each line of Fact data? Just my theory because it runs 1000 times faster without this view. And if anyone can suggest an alternative structure?
WITH group_security AS (SELECT DISTINCT division_cod FROM dbo.dim_group_security_division AS gsd
WHERE (IS_MEMBER(group_name) = 1))
SELECT dbo.dim_division.dim_division_key, dbo.dim_division.division_ID, dbo.dim_division.division_code, dbo.dim_division.division_name
FROM dbo.dim_division INNER JOIN
group_security ON dbo.dim_division.division_code = group_security.division_code OR group_security.division_code = 'ALL'
Since you JOIN on dbo.dim_division.division_code, do you have an index on this column?
Alternatively you could give this a try:
SELECT dim.dim_division_key,
dim.division_ID,
dim.division_code,
dim.division_name
FROM dim_division dim
WHERE EXISTS ( SELECT *
FROM dbo.dim_group_security_division gsd
WHERE gsd.division_code IN ('ALL', dim.division_code)
AND IS_MEMBER(gsd.group_name) = 1 )
This way the system can stop at the first 'match' in dim_group_security_division instead of having to find all and then aggregate the result because of the DISTINCT.
In this case, it might also be useful to have an index on gsd.division_code to speed things up a bit.
I need to insert certain amount of rows into some table with values taken from variables. I certainly can do a loop inserting single row at a time, but that's too straightforward. I am looking for more elegant solution. My current thoughts are around INSERT INTO ... SELECT ... statement, but now I need a query that will generate the amount of rows that I need. I tried to write recursive CTE to do it:
CREATE FUNCTION ufGenerateRows(#numRows INT = 1)
RETURNS #RtnValue TABLE
(
RowID INT NOT NULL
)
AS
BEGIN
WITH numbers AS
(
SELECT 1 as N
UNION ALL
SELECT N + 1
FROM numbers
WHERE N + 1 <= #numRows
)
INSERT INTO #RtnValue
SELECT N
FROM numbers
RETURN
END
GO
It works, but has a limit of recursion depth of 100, which is inappropriate for me. Can you suggest alternatives?
always use the dbo. schema prefix when creating or referencing objects, especially functions.
you should strive to create inline table-valued functions, as opposed to multi-statement table-valued functions, when possible.
Recursive CTEs are about the least efficient way to generate a set (see this three-part series for much better examples):
http://www.sqlperformance.com/2013/01/t-sql-queries/generate-a-set-1
http://www.sqlperformance.com/2013/01/t-sql-queries/generate-a-set-2
http://www.sqlperformance.com/2013/01/t-sql-queries/generate-a-set-3
Here is one example:
CREATE FUNCTION dbo.GenerateRows(#numRows INT = 1)
RETURNS TABLE
AS
RETURN
(
SELECT TOP (#numRows) RowID = ROW_NUMBER() OVER (ORDER BY s1.[number])
FROM master.dbo.spt_values AS s1
-- CROSS JOIN master.dbo.spt_values AS s2
ORDER BY s1.[number]
);
If you need more than ~2,500 rows, you can cross join with itself, or another table.
Even better would be to create your own numbers table (again, see the links above for examples).
Don't think iteratively - looping - but set-based - all at once.
An INSERT INTO...SELECT TOP x… should do what you need without repeated inserts.
I will follow with an example when I'm not bound to my phone.
UPDATE:
What #AaronBertrand said. :} A CROSS JOIN in the SELECT is spot-on.