SQL Server Performance option(recompile) and Renaming variables - sqlperformance

We have several procedures in an enterprise application that are exhibiting weird behavior. Some of the procedures get 'stuck' and never return, running for hours on the large databases. We were able to replicate this behavior on some of our test databases. What is weird is that we can get the procedure to run and return almost instantly by renaming some of the variables passed in, or we discovered using option(recompile) has the same effect. What I find weird is that after doing that, databases that are very small seem to get stuck and never return. So it seems we can get them to work on the small but not the large, or on the large but not the small. I need some suggestions as to what to look for. This is happening in several stored procedures.
Procedure layout is:
CREATE procedure [dbo].[proc_sample_proc]
#var1 guid
, #var2 guid = null
, #var3 guid
, #var4 guid
, #var5 varchar(20) = null
, #var6 varchar(20) = null
as
if #var1 = #var2
set #var2 = null
DECLARE #var7
SELECT #var7 = dbo.udf_TTTT_GetSOMEUDF(#var4)
SELECT
//somefields
FROM
atable
INNER JOIN btable ON a=b
LEFT JOIN ctable on (nolock) b=#var3
LEFT JOIN dtable on c=d
LEFT JOIN etable on d=e
LEFT JOIN ftable on (nolock) e=#var4
INNER JOIN gtable on f=g
LEFT JOIN htable on g=h
WHERE #var1=somevalue AND #var5=anothervalue
ORDER BY somefields
OPTION(RECOMPILE) <-- I Know this is BAD

Related

PostgreSQL: prevent lock on self table update with left join

I'm on PostgreSQL 9.3. I'm the only one working on the database, and my code run queries sequentially for unit tests.
Most of the times the following UPDATE query run without problem, but sometimes it makes locks on the PostgreSQL server. And then the query seems to never ends, while it takes only 3 sec normally.
I must precise that the query run in a unit test context, i.e. data is exactly the same whereas the lock happens or not. The code is the only process that updates the data.
I know there may be lock problems with PostgreSQL when using update query for a self updating table. And most over when a LEFT JOIN is used.
I also know that a LEFT JOIN query can be replaced with a NOT EXISTS query for an UPDATE but in my case the LEFT JOIN is much faster because there is few data to update, while a NOT EXISTS should visit quite all row candidates.
So my question is: what PostgreSQL commands (like Explicit Locking LOCK on table) or options (like SELECT FOR UPDATE) I should use in order to ensure to run my query without never-ending lock.
Query:
-- for each places of scenario #1 update all owners that
-- are different from scenario #0
UPDATE t_territories AS upt
SET id_owner = diff.id_owner
FROM (
-- list of owners in the source that are different from target
SELECT trg.id_place, src.id_owner
FROM t_territories AS trg
LEFT JOIN t_territories AS src
ON (src.id_scenario = 0)
AND (src.id_place = trg.id_place)
WHERE (trg.id_scenario = 1)
AND (trg.id_owner IS DISTINCT FROM src.id_owner)
-- FOR UPDATE -- bug SQL : FOR UPDATE cannot be applied to the nullable side of an outer join
) AS diff
WHERE (upt.id_scenario = 1)
AND (upt.id_place = diff.id_place)
Table structure:
CREATE TABLE t_territories
(
id_scenario integer NOT NULL,
id_place integer NOT NULL,
id_owner integer,
CONSTRAINT t_territories_pk PRIMARY KEY (id_scenario, id_place),
CONSTRAINT t_territories_fkey_owner FOREIGN KEY (id_owner)
REFERENCES t_owner (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE RESTRICT
)
I think that your query was locked by another query. You can find this query by
SELECT
COALESCE(blockingl.relation::regclass::text,blockingl.locktype) as locked_item,
now() - blockeda.query_start AS waiting_duration, blockeda.pid AS blocked_pid,
blockeda.query as blocked_query, blockedl.mode as blocked_mode,
blockinga.pid AS blocking_pid, blockinga.query as blocking_query,
blockingl.mode as blocking_mode
FROM pg_catalog.pg_locks blockedl
JOIN pg_stat_activity blockeda ON blockedl.pid = blockeda.pid
JOIN pg_catalog.pg_locks blockingl ON(
( (blockingl.transactionid=blockedl.transactionid) OR
(blockingl.relation=blockedl.relation AND blockingl.locktype=blockedl.locktype)
) AND blockedl.pid != blockingl.pid)
JOIN pg_stat_activity blockinga ON blockingl.pid = blockinga.pid
AND blockinga.datid = blockeda.datid
WHERE NOT blockedl.granted
AND blockinga.datname = current_database()
This query I've found here http://big-elephants.com/2013-09/exploring-query-locks-in-postgres/
Also can use ACCESS EXCLUSIVE LOCK to prevent any query to read and write table t_territories
LOCK t_territories IN ACCESS EXCLUSIVE MODE;
More info about locks here https://www.postgresql.org/docs/9.1/static/explicit-locking.html

Why is performance of CTE worse than temporary table in this example

I recently asked a question regarding CTE's and using data with no true root records (i.e Instead of the root record having a NULL parent_Id it is parented to itself)
The question link is here; Creating a recursive CTE with no rootrecord
The answer has been provided to that question and I now have the data I require however I am interested in the difference between the two approaches that I THINK are available to me.
The approach that yielded the data I required was to create a temp table with cleaned up parenting data and then run a recursive CTE against. This looked like below;
Select CASE
WHEN Parent_Id = Party_Id THEN NULL
ELSE Parent_Id
END AS Act_Parent_Id
, Party_Id
, PARTY_CODE
, PARTY_NAME
INTO #Parties
FROM DIMENSION_PARTIES
WHERE CURRENT_RECORD = 1),
WITH linkedParties
AS
(
Select Act_Parent_Id, Party_Id, PARTY_CODE, PARTY_NAME, 0 AS LEVEL
FROM #Parties
WHERE Act_Parent_Id IS NULL
UNION ALL
Select p.Act_Parent_Id, p.Party_Id, p.PARTY_CODE, p.PARTY_NAME, Level + 1
FROM #Parties p
inner join
linkedParties t on p.Act_Parent_Id = t.Party_Id
)
Select *
FROM linkedParties
Order By Level
I also attempted to retrieve the same data by defining two CTE's. One to emulate the creation of the temp table above and the other to do the same recursive work but referencing the initial CTE rather than a temp table;
WITH Parties
AS
(Select CASE
WHEN Parent_Id = Party_Id THEN NULL
ELSE Parent_Id
END AS Act_Parent_Id
, Party_Id
, PARTY_CODE
, PARTY_NAME
FROM DIMENSION_PARTIES
WHERE CURRENT_RECORD = 1),
linkedParties
AS
(
Select Act_Parent_Id, Party_Id, PARTY_CODE, PARTY_NAME, 0 AS LEVEL
FROM Parties
WHERE Act_Parent_Id IS NULL
UNION ALL
Select p.Act_Parent_Id, p.Party_Id, p.PARTY_CODE, p.PARTY_NAME, Level + 1
FROM Parties p
inner join
linkedParties t on p.Act_Parent_Id = t.Party_Id
)
Select *
FROM linkedParties
Order By Level
Now these two scripts are run on the same server however the temp table approach yields the results in approximately 15 seconds.
The multiple CTE approach takes upwards of 5 minutes (so long in fact that I have never waited for the results to return).
Is there a reason why the temp table approach would be so much quicker?
For what it is worth I believe it is to do with the record counts. The base table has 200k records in it and from memory CTE performance is severely degraded when dealing with large data sets but I cannot seem to prove that so thought I'd check with the experts.
Many Thanks
Well as there appears to be no clear answer for this some further research into the generics of the subject threw up a number of other threads with similar problems.
This one seems to cover many of the variations between temp table and CTEs so is most useful for people looking to read around their issues;
Which are more performant, CTE or temporary tables?
In my case it would appear that the large amount of data in my CTEs would cause issue as it is not cached anywhere and therefore recreating it each time it is referenced later would have a large impact.
This might not be exactly the same issue you experienced, but I just came across a few days ago a similar one and the queries did not even process that many records (a few thousands of records).
And yesterday my colleague had a similar problem.
Just to be clear we are using SQL Server 2008 R2.
The pattern that I identified and seems to throw the sql server optimizer off the rails is using temporary tables in CTEs that are joined with other temporary tables in the main select statement.
In my case I ended up creating an extra temporary table.
Here is a sample.
I ended up doing this:
SELECT DISTINCT st.field1, st.field2
into #Temp1
FROM SomeTable st
WHERE st.field3 <> 0
select x.field1, x.field2
FROM #Temp1 x inner join #Temp2 o
on x.field1 = o.field1
order by 1, 2
I tried the following query but it was a lot slower, if you can believe it.
with temp1 as (
DISTINCT st.field1, st.field2
FROM SomeTable st
WHERE st.field3 <> 0
)
select x.field1, x.field2
FROM temp1 x inner join #Temp2 o
on x.field1 = o.field1
order by 1, 2
I also tried to inline the first query in the second one and the performance was the same, i.e. VERY BAD.
SQL Server never ceases to amaze me. Once in a while I come across issues like this one that reminds me it is a microsoft product after all, but in the end you can say that other database systems have their own quirks.

When is a Deadlock not a Deadlock?

I'm asking this question because I'm getting a deadlock from time to time that I don't understand.
This is the scenario:
Stored Procedure that updates table A:
UPDATE A
SET A.Column = #SomeValue
WHERE A.ID = #ID
Stored Procedure that inserts into a temp table #temp:
INSERT INTO #temp (Column1,Column2)
SELECT B.Column1, A.Column2
FROM B
INNER JOIN A
ON A.ID = B.ID
WHERE B.Code IN ('Something','SomethingElse')
I see that there could possibly be a lock wait but I fail to see how a deadlock would occur, am I missing something obvious?
EDIT:
The SPs that I typed here are obviously simplified versions but I'm using the columns involved. The structure of both tables would be:
CREATE TABLE A (ID IDENTITY
CONSTRAINT PRIMARY KEY,
Column VARCHAR (100))
CREATE TABLE B (ID IDENTITY
CONSTRAINT PRIMARY KEY,
Code VARCHAR (100))
Try this since its causeing locks specify for the tables name the table hint and keyword:
WITH(NOLOCK)
So some thing like this for your scenario:
INSERT INTO #temp (Column1,Column2)
SELECT B.Column1, A.Column2
FROM B WITH(NOLCOK)
INNER JOIN A WITH(NOLOCK)
ON A.ID = B.ID
WHERE B.Code IN ('Something','SomethingElse')
See how you go then.
You can lookup table hint also for tsql, sql server to see which one suits you best. The one I specified NOLCOK will not cause locks and also it will skip locked rows as some other process is using them, so if you dont care you can use it.
I am not sure with temp tables but you can also use table hints with INSERT, INSERT INTO WITH(TABLE_HINT).

Nested select statement in FROM clause? Inner Join statements? or just table name?

I'm building a query that needs data from 5 tables.
I've been told by a DBA in the past that specifying a list of columns vs getting all columns (*) is preferred from some performance/memory aspect.
I've also been told that the database performs a JOIN operation behind the scenes when there's a list of tables in the FROM clause, to create one table (or view).
The existing database has very little data at the moment, as we're at a very initial point. So not sure I can measure the performance hit in practice.
I am not a database pro. I can get what data I need. The dillema is, at what price.
Added: At the moment I'm working with MS SQL Server 2008 R2.
My questions are:
Is there a performance difference and why, between the following:
a. SELECT ... FROM tbl1, tbl2, tbl3 etc for simplicity? (somehow I feel that this might be a performance hit)
b. SELECT ... FROM tbl1 inner join tbl2 on ... inner join tbl3 on ... etc (would this be more explicit to the server and save on performance/memory)?
c. SELECT ... FROM (select x,y,z from tbl1) as t1 inner join ... etc (would this save anythig? or is it just extra select statements that create more work for the server and for us)?
Is there yet a better way to do this?
Below are two queries that both get the slice of data that I need. One includes more nested select statements.
I apologize if they are not written in a standard form or helplessly overcomplicated - hopefully you can decipher. I try to keep them organized as much as possible.
Insights would be most appreciated as well.
Thanks for checking this out.
5 tables: devicepool, users, trips, TripTracker, and order
Query 1 (more select statements):
SELECT
username,
base.devid devid,
tripstatus,
stops,
stopnumber,
[time],
[orderstatus],
[destaddress]
FROM
((
( SELECT
username,
devicepool.devid devid,
groupid
FROM
devicepool INNER JOIN users
ON devicepool.userid = users.userid
WHERE devicepool.groupid = 1
)
AS [base]
INNER JOIN
(
SELECT
tripid,
[status] tripstatus,
stops,
devid,
groupid
FROM
trips
)
AS [base2]
ON base.devid = base2.devid AND base2.groupid = base.groupid
INNER JOIN
(
SELECT
stopnumber,
devid,
[time],
MAX([time]) OVER (PARTITION BY devid) latesttime
FROM
TripTracker
)
AS [tracker]
ON tracker.devid = base.devid AND [time] = latesttime)
INNER JOIN
(
SELECT
[status] [orderstatus],
[address] [destaddress],
[tripid],
stopnumber orderstopnumber
FROM [order]
)
AS [orders]
ON orders.orderstopnumber = tracker.stopnumber)
Query 2:
SELECT
username,
base.devid devid,
tripstatus,
stops,
stopnumber,
[time],
[orderstatus],
[destaddress]
FROM
((
( SELECT
username,
devicepool.devid devid,
groupid
FROM
devicepool INNER JOIN users
ON devicepool.userid = users.userid
WHERE devicepool.groupid = 1
)
AS [base]
INNER JOIN
trips
ON base.devid = trips.devid AND trips.groupid = base.groupid
INNER JOIN
(
SELECT
stopnumber,
devid,
[time],
MAX([time]) OVER (PARTITION BY devid) latesttime
FROM
TripTracker
)
AS [tracker]
ON tracker.devid = base.devid AND [time] = latesttime)
INNER JOIN
[order]
ON [order].stopnumber = tracker.stopnumber)
Is there a performance difference and why, between the following: a.
SELECT ... FROM tbl1, tbl2, tbl3 etc for simplicity? (somehow I feel
that this might be a performance hit) b. SELECT ... FROM tbl1 inner
join tbl2 on ... inner join tbl3 on ... etc (would this be more
explicit to the server and save on performance/memory)? c. SELECT ...
FROM (select x,y,z from tbl1) as t1 inner join ... etc (would this
save anythig? or is it just extra select statements that create more
work for the server and for us)?
a) and b) should result in the same query plan (although this is db-specific). b) is much preferred for portability and readability over a). c) is a horrible idea, that hurts readability and if anything will result in worse peformance. Let us never speak of it again.
Is there yet a better way to do this?
b) is the standard approach. In general, writing the plainest ANSI SQL will result in the best performance, as it allows the query parser to easily understand what you are trying to do. Trying to outsmart the compiler with tricks may work in a given situation, but does not mean that it will still work when the cardinality or amount of data changes, or the database engine is upgraded. So, avoid doing that unless you are absolutely forced to.

Why does this query deadlock?

I have an application that reads the structure of an existing PostgreSQL 9.1 database, compares it against a "should be" state and updates the database accordingly. That works fine, most of the time. However, I had several instances now when reading the current database structure deadlocked. The query responsible reads the existing foreign keys:
SELECT tc.table_schema, tc.table_name, tc.constraint_name, kcu.column_name,
ccu.table_schema, ccu.table_name, ccu.column_name
FROM information_schema.table_constraints AS tc
JOIN information_schema.key_column_usage AS kcu
ON tc.constraint_name = kcu.constraint_name
JOIN information_schema.constraint_column_usage AS ccu
ON ccu.constraint_name = tc.constraint_name
WHERE constraint_type = 'FOREIGN KEY'
Viewing the server status in pgAdmin shows this to be the only active query/transaction that's running on the server. Still, the query doesn't return.
The error is reproducible in a way: When I find a database that produces the error, it will produce the error every time. But not all databases produce the error. This is one mysterious bug, and I'm running out of options and ideas on what else to try or how to work around this. So any input or ideas are highly appreciated!
PS: A colleague of mine just reported he produced the same error using PostgreSQL 8.4.
I tested and found your query very slow, too. The root of this problem is that "tables" in information_schema are in fact complicated views to provide catalogs according to the SQL standard. In this particular case, matters are further complicated as foreign keys can be built on multiple columns. Your query yields duplicate rows for those cases which, I suspect, may be an undesired.
Correlated subqueries with unnest, fed to ARRAY constructors avoid the problem in my query.
This query yields the same information, just without duplicate rows and 100x faster. Also, I would venture to guarantee, without deadlocks.
Only works for PostgreSQL, not portable to other RDBMSes.
SELECT c.conrelid::regclass AS table_name
, c.conname AS fk_name
, ARRAY(SELECT a.attname
FROM unnest(c.conkey) x
JOIN pg_attribute a
ON a.attrelid = c.conrelid AND a.attnum = x) AS fk_columns
, c.confrelid::regclass AS ref_table
, ARRAY(SELECT a.attname
FROM unnest(c.confkey) x
JOIN pg_attribute a
ON a.attrelid = c.confrelid AND a.attnum = x) AS ref_columns
FROM pg_catalog.pg_constraint c
WHERE c.contype = 'f';
-- ORDER BY c.conrelid::regclass::text,2
The cast to ::regclass yields table names as seen with your current search_path. May or may not be what you want. For this query to include the absolute path (schema) for every table name you can set the search_path like this:
SET search_path = pg_catalog;
SELECT ...
To continue your session with your default search_path:
RESET search_path;
Related:
Get column names and data types of a query, table or view