Getting a query to index seek (rather than scan)

Getting a query to index seek (rather than scan) - tsql

Running the following query (SQL Server 2000) the execution plan shows that it used an index seek and Profiler shows it's doing 71 reads with a duration of 0.
select top 1 id from table where name = '0010000546163' order by id desc
Contrast that with the following with uses an index scan with 8500 reads and a duration of about a second.
declare #p varchar(20)
select #p = '0010000546163'
select top 1 id from table where name = #p order by id desc
Why is the execution plan different? Is there a way to change the second method to seek?
thanks
EDIT
Table looks like
CREATE TABLE [table] (
[Id] [int] IDENTITY (1, 1) NOT NULL ,
[Name] [varchar] (13) COLLATE Latin1_General_CI_AS NOT NULL)
Id is primary clustered key
There is a non-unique index on Name and a unique composite index on id/name
There are other columns - left them out for brevity

Now you've added the schema, please try this. SQL Server treats length differences as different data types and will convert the varchar(13) column to match the varchar(20) variable
declare #p varchar(13)
If not, what about collation coercien? Is the DB or server different to the column?
declare #p varchar(13) COLLATE Latin1_General_CI_AS NOT NULL
If not, add this before and post results
SET SHOWPLAN_TEXT ON
GO

If the name column is NVARCHAR then u need your parameter to be also of the same type. It should then pick it up by index seek.
declare #p nvarchar(20)
select #p = N'0010000546163'
select top 1 id from table where name = #p order by id desc

Related

PostgreSQL Transaction to Use Results from Query to Insert and Query another Table then Return Original Query Results

I am writing an application that stores data on file samples and YARA signatures. Essentially, in a single transaction, I need to execute a query, reference those results in an insert and another query, then return the original results. I have three tables that are relevant to this discussion:
samples - this is the table that stores information on files that need to be scanned with the associated YARA signatures.
yararules - the table that stores information on the YARA rules.
yaratracker - a table that tracks the sample/rule pairs that have been processed thus far.
In a single transaction, the application needs to:
Get a batch of unique sample/rule pairs that have not yet been processed. Preferably, this query will get all non-processed rules associated with a single sample (i.e. if I'm going to run the YARA rules on a sample, I want to run all of the YARA rules not yet processed on that sample so that I only have to load the sample into memory once).
Get a unique list of id,sha256 from the batch found in step 1.
Insert the batch from step 1 into the yaraqueue with the matchcount column equal to 0 and complete column set to false.
I can accomplish Step 1 with the query below, but I don't know how to reference those results to accomplish step 2. I've tried looking into variables, but apparently there isn't one that can hold multiple rows. I've looked into using a cursor, but I can't seem to use the cursor with a subsequent command and then return the cursor.
SELECT s.id,r.id
FROM sample s CROSS JOIN yararules r
WHERE r.status = 'Disabled' AND NOT EXISTS(
SELECT 1 FROM yaratracker q
WHERE q.sample_id = s.id AND q.rule_id = r.id
)
ORDER BY s.id
LIMIT 1000;
The relevant database schema looks like this.
CREATE TYPE samplelist AS ENUM ('Whitelist', 'Blacklist', 'Greylist', 'Unknown');
CREATE TABLE samples (
id SERIAL PRIMARY KEY,
md5 CHAR(32) NOT NULL,
sha1 CHAR(40) NOT NULL,
sha256 CHAR(64) NOT NULL,
total INT NOT NULL,
positives INT NOT NULL,
list SAMPLELIST NOT NULL,
filetype VARCHAR(16) NOT NULL,
submitted TIMESTAMP WITH TIME ZONE NOT NULL,
user_id SERIAL REFERENCES users;
);
CREATE UNIQUE INDEX md5_idx ON {0} (md5);
CREATE UNIQUE INDEX sha1_idx ON {0} (sha1);
CREATE UNIQUE INDEX sha256_idx ON {0} (sha256);
CREATE TYPE rulestatus AS ENUM ('Enabled', 'Disabled');
CREATE TABLE yararules (
id SERIAL PRIMARY KEY,
name VARCHAR(32) NOT NULL UNIQUE,
description TEXT NOT NULL,
rules TEXT NOT NULL,
lastmodified TIMESTAMP WITH TIME ZONE NOT NULL,
status rulestatus NOT NULL,
user_id SERIAL REFERENCES users ON DELETE CASCADE
);
CREATE TABLE yaratracker (
id SERIAL PRIMARY KEY,
rule_id SERIAL REFERENCES yararules ON DELETE CASCADE,
sample_id SERIAL REFERENCES sample ON DELETE CASCADE,
matchcount INT NOT NULL,
complete BOOL NOT NULL
);
CREATE INDEX composite_idx ON yaratracker (rule_id, sample_id);
CREATE INDEX complete_idx ON yaratracker (complete);

INSERT INTO target_table(a,b,c,...)
SELECT sid, rid, sha, ...
FROM (
SELECT s.id AS sid
,r.id AS rid
, s.sha256 AS sha
, ...
, ROW_NUMBER() OVER (PARTITION BY s.id) AS rn -- <<<--- HERE
FROM sample s CROSS JOIN yararules r
WHERE r.status = 'Disabled' AND NOT EXISTS(
SELECT 1 FROM yaratracker q
WHERE q.sample_id = s.id
AND q.rule_id = r.id
)
ORDER BY s.id
LIMIT 1000;
) src
WHERE src.rn = 1; -- <<<--- HERE
The WHERE src.rn = 1 will restrict the cross-join to deliver only one tuple per sample.id (both id and sha256 are unique in the sample table, so picking a unique id has the same effect as picking a unique sha256)
The complete cross-join result will never be generated; the optimiser is smart enough to push down the WHERE rn=1 condition into the subquery.
Note: the LIMIT 1000 should probably be removed (or pulled up to a higher level)
If you REALLY need to save the results from the CROSS JOIN, you could use a chain of CTEs (expect a performance degradation ...)
WITH big AS (
SELECT s.id AS sample_id
,r.id AS rule_id
, s.sha256
-- , ...
, ROW_NUMBER() OVER (PARTITION BY s.id) AS rn -- <<<--- HERE
FROM sample s
CROSS JOIN yararules r
WHERE r.status = 'Disabled' AND NOT EXISTS(
SELECT 1 FROM yaratracker q
WHERE q.sample_id = s.id AND q.rule_id = r.id
)
)
, ins AS (
INSERT INTO target_table(a,b,c,...)
SELECT b.sample_id, b.rule_id, b.sha256 , ...
FROM big b
WHERE b.rn = 1; -- <<<--- HERE
RETURNING *
)
INSERT INTO yaratracker (rule_id, sample_id, matchcount, complete )
SELECT b.sample_id, b.rule_id, 0, False
FROM big b
-- LEFT JOIN ins i ON i.a = b.sample_id AND i.b= b.rule_id
;
NOTE: the yaratracker(rule_id,sample_id) should not be serials but just plain integers, referencing yararules(id) and sample(id)

Create unique constraint initially disabled

This is my table :
CREATE TABLE [dbo].[TestTable]
(
[Name1] varchar(50) COLLATE French_CI_AS NOT NULL,
[Name2] varchar(255) COLLATE French_CI_AS NULL,
CONSTRAINT [TestTable_uniqueName1] UNIQUE ([Name1]),
CONSTRAINT [TestTable_uniqueName1Name2] UNIQUE ([Name1], [Name2])
)
ALTER TABLE [dbo].[TestTable]
ADD CONSTRAINT [TestTable_uniqueName1]
UNIQUE NONCLUSTERED ([Name1])
ALTER TABLE [dbo].[TestTable]
ADD CONSTRAINT [TestTable_uniqueName1Name2]
UNIQUE NONCLUSTERED ([Name1], [Name2])
GO
ALTER INDEX [TestTable_uniqueName1]
ON [dbo].[TestTable]
DISABLE
GO
My idea is to enable/disable one or other unique contraint depending on the customer application. With this way, I can catch the thrown exception in my c# code, and display a specific error message to the GUI.
Now, my problem is to alter the collation of columns Name1 & Name2, I need to make them case sensitive (French_CS_AS). To alter these fields, I have to drop the two constraints and recreate it. According to the explained schema, I cannot create an enabled constraint and then disable it, because by some customers, I have duplicate keys for one or other constraint.
For my update script, my idea number 1 was
Save the name of enabled constraints in a temp table
Drop the constraints
Alter columns
Create DISABLED unique constraints
Enable specific constraints according to the saved values in points 1.
My problem is in point 4., I don't find how to create a disabled unique constraint with an ALTER TABLE statement. Is it possible to create it directly in the sys.indexes table ?
My idea number 2 was
Rename TestTable to TestTableCopy
Recreate TestTable with the new fields collation, and otherwise the same schema (indexes, FK, triggers, ...)
Disable specifical unique contraints in TestTable
Migrate data from TestTableCopy to TestTable
Drop TestTableCopy
In this way, my fear is to loose some links with other tables/dependencies, beceause it is a central table in my database.
Is there any other way to achieve my goal?
If necessary, I can use unique indexes instead of unique constraints.

It looks like it is impossible to create a unique index on a column that already has duplicate values.
So, rather than having a disabled unique index either:
not have an index at all (which is the same as having a disabled index from the query processor point of view),
or create a non-unique index.
For those instanses where your client has unique data create unique index. For those instanses where your client has non-unique data create non-unique index.

CREATE PROCEDURE [dbo].[spUsers_AddUsers]
#Name1 varchar(50) ,
#Name2 varchar(50) ,
#Unique bit
AS
declare #err int
begin tran
if #Unique = 1 begin
if not exists (SELECT * FROM Users WHERE Name1 = #Name1 and Name2 = #Name2)
begin
INSERT INTO Users (Name1,Name2)
VALUES (#Name1,#Name2)
set #err = ##ERROR
end else
begin
UPDATE Users
set Name1 = #Name1,
Name2 = #Name2
where Name1 = #Name1 and Name2 = #Name2
set #err = ##ERROR
end
end else begin
if not exists ( SELECT * FROM Users WHERE Name1 = #Name1 )
begin
INSERT INTO Users (Name1,Name2)
VALUES (#Name1,#Name2)
set #err = ##ERROR
end else
begin
UPDATE Users
set Name1 = #Name1,
Name2 = #Name2
where Name1 = #Name1
set #err = ##ERROR
end
if #err = 0 commit tran
else rollback tran
So first you check if you need an unique Name1 and Name2 or just Name1. Then if you do you an insert/update based on what constrain you have.

Common Table Expression Select where last observation was at a location

I have the following tables
Location table
[ID] [int] IDENTITY(1,1) NOT NULL
Package table
[ID] [int] IDENTITY(1,1) NOT NULL
PackageObservation table
[PackageID] int
[LocationID] int
[Date] datetime
[Quantity] int
For a given location I want to select packages where the last observation of the package was at the location
What is the Transact SQL?
I think it involves a common table expression but I cant figure it out.
More information.
The following almost does it, but I don't really want to assume that the identity field is in date order
select max(id) ,packageid
from packageobservation o1
where not exists (
select o2.id from packageobservation o2
where o2.[date] > o1.[date] )
group by packageid

You can use following SQL statement:
DECLARE #locationID int = 1
SELECT po.PackageID, MAX(po.[Date]) AS DateAtLocation
FROM PackageObservation po
WHERE po.LocationID=#locationID
AND NOT EXISTS (SELECT * FROM PackageObservation po2
WHERE po2.PackageID = po.PackageID AND
po2.LocationID <> po.LocationID AND
po2.[Date] >= po.[Date] )
GROUP BY po.PackageID
For better speed you can also add combined index on [LocationID],[PackageID] and [Date].
Seems to me that using CTE is not necessary here.

Get row to swap tables on a certain condition

I currently have a parent table:
CREATE TABLE members (
member_id SERIAL NOT NULL, UNIQUE, PRIMARY KEY
first_name varchar(20)
last_name varchar(20)
address address (composite type)
contact_numbers varchar(11)[3]
date_joined date
type varchar(5)
);
and two related tables:
CREATE TABLE basic_member (
activities varchar[3])
INHERITS (members)
);
CREATE TABLE full_member (
activities varchar[])
INHERITS (members)
);
If the type is full the details are entered to the full_member table or if type is basic into the basic_member table. What I want is that if I run an update and change the type to basic or full the tuple goes into the corresponding table.
I was wondering if I could do this with a rule like:
CREATE RULE tuple_swap_full
AS ON UPDATE TO full_member
WHERE new.type = 'basic'
INSERT INTO basic_member VALUES (old.member_id, old.first_name, old.last_name,
old.address, old.contact_numbers, old.date_joined, new.type, old.activities);
... then delete the record from the full_member
Just wondering if my rule is anywhere near or if there is a better way.

You don't need
member_id SERIAL NOT NULL, UNIQUE, PRIMARY KEY
A PRIMARY KEY implies UNIQUE NOT NULL automatically:
member_id SERIAL PRIMARY KEY
I wouldn't use hard coded max length of varchar(20). Just use text and add a check constraint if you really must enforce a maximum length. Easier to change around.
Syntax for INHERITS is mangled. The key word goes outside the parens around columns.
CREATE TABLE full_member (
activities text[]
) INHERITS (members);
Table names are inconsistent (members <-> member). I use the singular form everywhere in my test case.
Finally, I would not use a RULE for the task. A trigger AFTER UPDATE seems preferable.
Consider the following
Test case:
Tables:
CREATE SCHEMA x; -- I put everything in a test schema named "x".
-- DROP TABLE x.members CASCADE;
CREATE TABLE x.member (
member_id SERIAL PRIMARY KEY
,first_name text
-- more columns ...
,type text);
CREATE TABLE x.basic_member (
activities text[3]
) INHERITS (x.member);
CREATE TABLE x.full_member (
activities text[]
) INHERITS (x.member);
Trigger function:
Data-modifying CTEs (WITH x AS ( DELETE ..) are the best tool for the purpose. Requires PostgreSQL 9.1 or later.
For older versions, first INSERT then DELETE.
CREATE OR REPLACE FUNCTION x.trg_move_member()
RETURNS trigger AS
$BODY$
BEGIN
CASE NEW.type
WHEN 'basic' THEN
WITH x AS (
DELETE FROM x.member
WHERE member_id = NEW.member_id
RETURNING *
)
INSERT INTO x.basic_member (member_id, first_name, type) -- more columns
SELECT member_id, first_name, type -- more columns
FROM x;
WHEN 'full' THEN
WITH x AS (
DELETE FROM x.member
WHERE member_id = NEW.member_id
RETURNING *
)
INSERT INTO x.full_member (member_id, first_name, type) -- more columns
SELECT member_id, first_name, type -- more columns
FROM x;
END CASE;
RETURN NULL;
END;
$BODY$
LANGUAGE plpgsql VOLATILE;
Trigger:
Note that it is an AFTER trigger and has a WHEN condition.
WHEN condition requires PostgreSQL 9.0 or later. For earlier versions, you can just leave it away, the CASE statement in the trigger itself takes care of it.
CREATE TRIGGER up_aft
AFTER UPDATE
ON x.member
FOR EACH ROW
WHEN (NEW.type IN ('basic ','full')) -- OLD.type cannot be IN ('basic ','full')
EXECUTE PROCEDURE x.trg_move_member();
Test:
INSERT INTO x.member (first_name, type) VALUES ('peter', NULL);
UPDATE x.member SET type = 'full' WHERE first_name = 'peter';
SELECT * FROM ONLY x.member;
SELECT * FROM x.basic_member;
SELECT * FROM x.full_member;

SELECT or INSERT a row in one command

I'm using PostgreSQL 9.0 and I have a table with just an artificial key (auto-incrementing sequence) and another unique key. (Yes, there is a reason for this table. :)) I want to look up an ID by the other key or, if it doesn't exist, insert it:
SELECT id
FROM mytable
WHERE other_key = 'SOMETHING'
Then, if no match:
INSERT INTO mytable (other_key)
VALUES ('SOMETHING')
RETURNING id
The question: is it possible to save a round-trip to the DB by doing both of these in one statement? I can insert the row if it doesn't exist like this:
INSERT INTO mytable (other_key)
SELECT 'SOMETHING'
WHERE NOT EXISTS (SELECT * FROM mytable WHERE other_key = 'SOMETHING')
RETURNING id
... but that doesn't give the ID of an existing row. Any ideas? There is a unique constraint on other_key, if that helps.

Have you tried to union it?
Edit - this requires Postgres 9.1:
create table mytable (id serial primary key, other_key varchar not null unique);
WITH new_row AS (
INSERT INTO mytable (other_key)
SELECT 'SOMETHING'
WHERE NOT EXISTS (SELECT * FROM mytable WHERE other_key = 'SOMETHING')
RETURNING *
)
SELECT * FROM new_row
UNION
SELECT * FROM mytable WHERE other_key = 'SOMETHING';
results in:
id | other_key
----+-----------
1 | SOMETHING
(1 row)

No, there is no special SQL syntax that allows you to do select or insert. You can do what Ilia mentions and create a sproc, which means it will not do a round trip fromt he client to server, but it will still result in two queries (three actually, if you count the sproc itself).

using 9.5 i successfully tried this
based on Denis de Bernardy's answer
only 1 parameter
no union
no stored procedure
atomic, thus no concurrency problems (i think...)
The Query:
WITH neworexisting AS (
INSERT INTO mytable(other_key) VALUES('hello 2')
ON CONFLICT(other_key) DO UPDATE SET existed=true -- need some update to return sth
RETURNING *
)
SELECT * FROM neworexisting
first call:
id|other_key|created |existed|
--|---------|-------------------|-------|
6|hello 1 |2019-09-11 11:39:29|false |
second call:
id|other_key|created |existed|
--|---------|-------------------|-------|
6|hello 1 |2019-09-11 11:39:29|true |
First create your table ;-)
CREATE TABLE mytable (
id serial NOT NULL,
other_key text NOT NULL,
created timestamptz NOT NULL DEFAULT now(),
existed bool NOT NULL DEFAULT false,
CONSTRAINT mytable_pk PRIMARY KEY (id),
CONSTRAINT mytable_uniq UNIQUE (other_key) --needed for on conflict
);

you can use a stored procedure
IF (SELECT id FROM mytable WHERE other_key = 'SOMETHING' LIMIT 1) < 0 THEN
INSERT INTO mytable (other_key) VALUES ('SOMETHING')
END IF

I have an alternative to Denis answer, that I think is less database-intensive, although a bit more complex:
create table mytable (id serial primary key, other_key varchar not null unique);
WITH table_sel AS (
SELECT id
FROM mytable
WHERE other_key = 'test'
UNION
SELECT NULL AS id
ORDER BY id NULLS LAST
LIMIT 1
), table_ins AS (
INSERT INTO mytable (id, other_key)
SELECT
COALESCE(id, NEXTVAL('mytable_id_seq'::REGCLASS)),
'test'
FROM table_sel
ON CONFLICT (id) DO NOTHING
RETURNING id
)
SELECT * FROM table_ins
UNION ALL
SELECT * FROM table_sel
WHERE id IS NOT NULL;
In table_sel CTE I'm looking for the right row. If I don't find it, I assure that table_sel returns at least one row, with a union with a SELECT NULL.
In table_ins CTE I try to insert the same row I was looking for earlier. COALESCE(id, NEXTVAL('mytable_id_seq'::REGCLASS)) is saying: id could be defined, if so, use it; whereas if id is null, increment the sequence on id and use this new value to insert a row. The ON CONFLICT clause assure
that if id is already in mytable I don't insert anything.
At the end I put everything together with a UNION between table_ins and table_sel, so that I'm sure to take my sweet id value and execute both CTE.
This query needs to search for the value other_key only once, and is a "search this value" not a "check if this value not exists in the table", that is very heavy; in Denis alternative you use other_key in both types of searches. In my query you "check if a value not exists" only on id that is a integer primary key, that, for construction, is fast.

Minor tweak a decade late to Denis's excellent answer:
-- Create the table with a unique constraint
CREATE TABLE mytable (
id serial PRIMARY KEY
, other_key varchar NOT NULL UNIQUE
);
WITH new_row AS (
-- Only insert when we don't find anything, avoiding a table lock if
-- possible.
INSERT INTO mytable ( other_key )
SELECT 'SOMETHING'
WHERE NOT EXISTS (
SELECT *
FROM mytable
WHERE other_key = 'SOMETHING'
)
RETURNING *
)
(
-- This comes first in the UNION ALL since it'll almost certainly be
-- in the query cache. Marginally slower for the insert case, but also
-- marginally faster for the much more common read-only case.
SELECT *
FROM mytable
WHERE other_key = 'SOMETHING'
-- Don't check for duplicates to be removed
UNION ALL
-- If we reach this point in iteration, we needed to do the INSERT and
-- lock after all.
SELECT *
FROM new_row
) LIMIT 1 -- Just return whatever comes first in the results and allow
-- the query engine to cut processing short for the INSERT
-- calculation.
;
The UNION ALL tells the planner it doesn't have to collect results for de-duplication. The LIMIT 1 at the end allows the planner to short-circuit further processing/iteration once it knows there's an answer available.
NOTE: There is a race condition present here and in the original answer. If the entry does not already exist, the INSERT will fail with a unique constraint violation. The error can be suppressed with ON CONFLICT DO NOTHING, but the query will return an empty set instead of the new row. This is a difficult problem because getting that info from another transaction would violate the I in ACID.