In our company, the following logic is used in most of SP's. I couldn't understand how a variable is used with the IN Clause like the following query.
Can anyone explain this?
WHERE ( ( #EMP_ID ) in ( select distinct(EMP_ID)
from Table2(nolock)
where SID = T1.SID and status='A' and client_id=T1.Client_Id ) )
order by EMP_ID
The variable is simply replaced in the query. If it helps, break down the query and think of it if you put in a number for the variable (for example, 45) and a list in place of the inner query:
SELECT *
FROM Table
WHERE 45 IN (42, 46, 47, 90, 45)
This will return rows that contain an employee ID of 45.
Does this make sense?
To answer your question: Yes, someone can explain it. Probably not me.
The SELECT gets a collection of EMP_ID values from Table2. The first WHERE clause then checks if the value of variable #EMP_ID is in the set of selected values. If so, the WHERE clause causes its parent statement to process the row.
The SELECT is a correlated subquery. It uses a couple of values, SID and Client_Id, from a table (aliased as) T1. (Said table is not included in your code snippet.) For each row processed from T1 the correlated subquery is evaluated.
Related
Edit: Answer is to use MIN. it works on both strings & numbers. Credit to #cadet down below.
Original question:
I've been reading through similar questions around this for the last half an hour and cannot understand the responses so let me try to get a simple easy to follow answer.
What is the PostgresSQL equivalent to this code which I would write if I were using SQL Server, to bring back the first value in field2 when aggregating:
Select field1, first(field2) from table group by field1?
I have read that DISTINCT ON is the right thing to use? In that case would it be:
Select field1, DISTINCT ON(field2) from table group by field1? because that gives me a syntax error
Edit:
Here is the error stating that the FIRST function does not exist in PostGresSQL:
ERROR: function first(asset32type) does not exist
LINE 1: Select policy, first (name) from multi_asset group by policy...
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
SQL state: 42883
Character: 16
And in case it isn't already clear when I say that in SQL Server the first() function brings back the first value in field2 when aggregating, I mean if you had data like this:
field1
field2
Tom
32
Tom
53
Then select field1, first(field2) group by field1 would give you back:
Tom, 32 - i.e. it picks the first value from field2
Maybe this one, using DISTINCT ON():
SELECT DISTINCT ON (field1)
field1
, field2
FROM table
ORDER BY
field1
, field2;
But without any data or any example, it's just a wild guess.
If first is related with specific order
select distinct field1,
first_value(field2)
over (partition by field1 order by field2) from
(
values (1,10),(1,11),(1,12),(2,23),(2,24)
) as a(field1,field2)
If first is just minimum or maximum
select field1,
min(field2)
from
(
values (1,10),(1,11),(1,12),(2,23),(2,24)
) as a(field1,field2)
group by field1
This question already has answers here:
sql server 2008 management studio not checking the syntax of my query
(2 answers)
Closed 1 year ago.
I have the following select:
SELECT DISTINCT pl
FROM [dbo].[VendorPriceList] h
WHERE PartNumber IN (SELECT DISTINCT PartNumber
FROM [dbo].InvoiceData
WHERE amount > 10
AND invoiceDate > DATEADD(yyyy, -1, CURRENT_TIMESTAMP)
UNION
SELECT DISTINCT PartNumber
FROM [dbo].VendorDeals)
The issue here is that the table [dbo].VendorDeals has NO column PartNumber, however no error is detected and the query works with the first part of the union.
Even more, IntelliSense also allows and recognize PartNumber. This fails only when inside a complex statement.
It is pretty obvious that if you qualify column names, the mistake will be evident.
This isn't a bug in SQL Server/the T-SQL dialect parsing, no, this is working exactly as intended. The problem, or bug, is in your T-SQL; specifically because you haven't qualified your columns. As I don't have the definition of your table, I'm going to provide sample DDL first:
CREATE TABLE dbo.Table1 (MyColumn varchar(10), OtherColumn int);
CREATE TABLE dbo.Table2 (YourColumn varchar(10) OtherColumn int);
And then an example that is similar to your query:
SELECT MyColumn
FROM dbo.Table1
WHERE MyColumn IN (SELECT MyColumn FROM dbo.Table2);
This, firstly, will parse; it is a valid query. Secondly, provided that dbo.Table2 contains at least one row, then every row from table dbo.Table1 will be returned where MyColumn has a non-NULL value. Why? Well, let's qualify the column with table's name as SQL Server would parse them:
SELECT Table1.MyColumn
FROM dbo.Table1
WHERE Table1.MyColumn IN (SELECT Table1.MyColumn FROM dbo.Table2);
Notice that the column inside the IN is also referencing Table1, not Table2. By default if a column has it's alias omitted in a subquery it will be assumed to be referencing the table(s) defined in that subquery. If, however, none of the tables in the sub query have a column by that name, then it will be assumed to reference a table where that column does exist; in this case Table1.
Let's, instead, take a different example, using the other column in the tables:
SELECT OtherColumn
FROM dbo.Table1
WHERE OtherColumn IN (SELECT OtherColumn FROM dbo.Table2);
This would be parsed as the following:
SELECT Table1.OtherColumn
FROM dbo.Table1
WHERE Table1.OtherColumn IN (SELECT Table2.OtherColumn FROM dbo.Table2);
This is because OtherColumn exists in both tables. As, in the subquery, OtherColumn isn't qualified it is assumed the column wanted is the one in the table defined in the same scope, Table2.
So what is the solution? Alias and qualify your columns:
SELECT T1.MyColumn
FROM dbo.Table1 T1
WHERE T1.MyColumn IN (SELECT T2.MyColumn FROM dbo.Table2 T2);
This will, unsurprisingly, error as Table2 has no column MyColumn.
Personally, I suggest that unless you have only one table being referenced in a query, you alias and qualify all your columns. This not only ensures that the wrong column can't be referenced (such as in a subquery) but also means that other readers know exactly what columns are being referenced. It also stops failures in the future. I have honestly lost count how many times over years I have had a process fall over due to the "ambiguous column" error, due to a table's definition being changed and a query referencing the table wasn't properly qualified by the developer...
In my research, I want to see diseases correlated to diabetes by listing all diseases which co-occurs with diabetes, i.e. when there is at least one patient who has medical record of both diabetes and that disease. At first I try this query:
use [ng_data]
select distinct [disease]
from [dbo].[Final_View_2]
where [encode_id] in
(
select [encode_id]
where [disease] like '%diabetes%'
)
Where encode_id is the id of patient. But this query only returns diseases whose name contain 'diabetes'. It looks like the condition in subquery affects results in main query.
Then when I try this query:
use [ng_data]
select distinct [disease]
from [dbo].[Final_View_2]
where [encode_id] in
(
select [encode_id]
from [dbo].[Final_View_2]
where [disease] like '%diabetes%'
)
it works correctly. It seems that adding from clause in the subquery can resolve the effect of where clause in subquery on the main query. Could someone explain how the queries are carried out and why it produces such result? I'm confused by the dependence of main query on subquery.
Your first query is acting like a correlated sub-query, i.e. the rows it is referencing come from the result set in the outer query. It is in effect just saying "is the column value in this row like '%diabetes%'"; it is therefore no different to putting that WHERE clause in the outer query.
By add the FROM in your sub-query you are creating a secondary resultset of encodeids that have diabetes in the disease column and then selecting all rows that have that encodeid in the first resultset without reference to the disease column.
Take a look at the execution plan for each query to see what it is doing
I'd like to check if a "couple" of attributes is in a the result of another request.
I tried the following query but the syntax isn't good.
SELECT ID
FROM Table1
WHERE (Col_01, Col_02) IN
(
SELECT Col_01, Col_02
FROM Table2
)
Is-it possible to do something like that in T-SQL ?
You can use EXISTS and a correlated subquery:
SELECT ID
FROM Table1 t1
WHERE EXISTS
(
SELECT *
FROM Table2 t2
WHERE t2.Col_01 = t1.Col_01 AND
t2.Col_02 = t1.Col_02
)
You initial attempt was a good one though - some database systems do allow us to use rowset constructors to create arbitrary tuples, and the syntax is quite similar to what you showed, but they're not supported in T-SQL in this part of the syntax, so you have to go this slightly more verbose route.
I thought I understood how I can do a SELECT from the results of another SELECT statement, but there seems to be some sort of blurring of scope that I don't understand. I am using SQL Server 2008R2.
It is easiest to explain with an example.
Create a table with a single nvarchar column - load the table with a single text value and a couple of numbers:
CREATE TABLE #temptable( a nvarchar(30) );
INSERT INTO #temptable( a )
VALUES('apple');
INSERT INTO #temptable( a )
VALUES(1);
INSERT INTO #temptable( a )
VALUES(2);
select * from #temptable;
This will return: apple, 1, 2
Use IsNumeric to get only the rows of the table that can be cast to numeric - this will leave the text value apple behind. This works fine.
select cast(a as int) as NumA
from #temptable
where IsNumeric(a) = 1 ;
This returns: 1, 2
However, if I use that exact same query as an inner select, and try to do a numeric WHERE clause, it fails saying cannot convert nvarchar value 'apple' to data type int. How has it got the value 'apple' back??
select
x.NumA
from
(
select cast(a as int) as NumA
from #temptable
where IsNumeric(a) = 1
) x
where x.NumA > 1
;
Note that the failing query works just fine without the WHERE clause:
select
x.NumA
from
(
select cast(a as int) as NumA
from #temptable
where IsNumeric(a) = 1
) x
;
I find this very surprising. What am I not getting? TIA
If you take a look at the estimated execution plan you'll find that it has optimized the inner query into the outer and combined the WHERE clauses.
Using a CTE to isolate the operations works (in SQL Server 2008 R2):
declare #temptable as table ( a nvarchar(30) );
INSERT INTO #temptable( a )
VALUES ('apple'), ('1'), ('2');
with Numbers as (
select cast(a as int) as NumA
from #temptable
where IsNumeric(a) = 1
)
select * from Numbers
The reason you are getting this is fair and simple. When a query is executed there are some steps that are being followed. This is a parse, algebrize, optimize and compile.
The algebrize part in this case will get all the objects you need for this query. The optimize will use these objects to create a best query plan which will be compiled and executed...
So, when you look into that part you will see it will do a table scan on #temptable. And #temptable is defined as the way you created your table. That you will do some compute on it is a different thing..... The column still has the nvarchar datatype..
To know how this works you have to know how to read a query. First all the objects are retrieved (from table, inner join table), then the predicates (where, on), then the grouping and such, then the select of the columns (with the cast) and then the orderby.
So with that in mind, when you have a combination of selects, the optimizer will still process it that way.. since your select is subordinate to the from and join parts of your query, it will be a reason for getting this error.
I hope i made it a little clear?
The optimizer is free to move expressions in the query plan in order to produce the most cost efficient plan for retrieving the data (the evaluation order of the predicates is not guaranteed). I think using the case expression like bellow produces a NULL in absence of the ELSE clause and thus takes the APPLE out
select a from #temptable where case when isnumeric(a) = 1 then a end > 1