Is a subquery able to select columns from outer query? [duplicate] - tsql

This question already has answers here:
sql server 2008 management studio not checking the syntax of my query
(2 answers)
Closed 1 year ago.
I have the following select:
SELECT DISTINCT pl
FROM [dbo].[VendorPriceList] h
WHERE PartNumber IN (SELECT DISTINCT PartNumber
FROM [dbo].InvoiceData
WHERE amount > 10
AND invoiceDate > DATEADD(yyyy, -1, CURRENT_TIMESTAMP)
UNION
SELECT DISTINCT PartNumber
FROM [dbo].VendorDeals)
The issue here is that the table [dbo].VendorDeals has NO column PartNumber, however no error is detected and the query works with the first part of the union.
Even more, IntelliSense also allows and recognize PartNumber. This fails only when inside a complex statement.
It is pretty obvious that if you qualify column names, the mistake will be evident.

This isn't a bug in SQL Server/the T-SQL dialect parsing, no, this is working exactly as intended. The problem, or bug, is in your T-SQL; specifically because you haven't qualified your columns. As I don't have the definition of your table, I'm going to provide sample DDL first:
CREATE TABLE dbo.Table1 (MyColumn varchar(10), OtherColumn int);
CREATE TABLE dbo.Table2 (YourColumn varchar(10) OtherColumn int);
And then an example that is similar to your query:
SELECT MyColumn
FROM dbo.Table1
WHERE MyColumn IN (SELECT MyColumn FROM dbo.Table2);
This, firstly, will parse; it is a valid query. Secondly, provided that dbo.Table2 contains at least one row, then every row from table dbo.Table1 will be returned where MyColumn has a non-NULL value. Why? Well, let's qualify the column with table's name as SQL Server would parse them:
SELECT Table1.MyColumn
FROM dbo.Table1
WHERE Table1.MyColumn IN (SELECT Table1.MyColumn FROM dbo.Table2);
Notice that the column inside the IN is also referencing Table1, not Table2. By default if a column has it's alias omitted in a subquery it will be assumed to be referencing the table(s) defined in that subquery. If, however, none of the tables in the sub query have a column by that name, then it will be assumed to reference a table where that column does exist; in this case Table1.
Let's, instead, take a different example, using the other column in the tables:
SELECT OtherColumn
FROM dbo.Table1
WHERE OtherColumn IN (SELECT OtherColumn FROM dbo.Table2);
This would be parsed as the following:
SELECT Table1.OtherColumn
FROM dbo.Table1
WHERE Table1.OtherColumn IN (SELECT Table2.OtherColumn FROM dbo.Table2);
This is because OtherColumn exists in both tables. As, in the subquery, OtherColumn isn't qualified it is assumed the column wanted is the one in the table defined in the same scope, Table2.
So what is the solution? Alias and qualify your columns:
SELECT T1.MyColumn
FROM dbo.Table1 T1
WHERE T1.MyColumn IN (SELECT T2.MyColumn FROM dbo.Table2 T2);
This will, unsurprisingly, error as Table2 has no column MyColumn.
Personally, I suggest that unless you have only one table being referenced in a query, you alias and qualify all your columns. This not only ensures that the wrong column can't be referenced (such as in a subquery) but also means that other readers know exactly what columns are being referenced. It also stops failures in the future. I have honestly lost count how many times over years I have had a process fall over due to the "ambiguous column" error, due to a table's definition being changed and a query referencing the table wasn't properly qualified by the developer...

Related

postgress: insert rows to table with multiple records from other join tables

ّ am trying to insert multiple records got from the join table to another table user_to_property. In the user_to_property table user_to_property_id is primary, not null it is not autoincrementing. So I am trying to add user_to_property_id manually by an increment of 1.
WITH selectedData AS
( -- selection of the data that needs to be inserted
SELECT t2.user_id as userId
FROM property_lines t1
INNER JOIN user t2 ON t1.account_id = t2.account_id
)
INSERT INTO user_to_property (user_to_property_id, user_id, property_id, created_date)
VALUES ((SELECT MAX( user_to_property_id )+1 FROM user_to_property),(SELECT
selectedData.userId
FROM selectedData),3,now());
The above query gives me the below error:
ERROR: more than one row returned by a subquery used as an expression
How to insert multiple records to a table from the join of other tables? where the user_to_property table contains a unique record for the same user-id and property_id there should be only 1 record.
Typically for Insert you use either values or select. The structure values( select...) often (generally?) just causes more trouble than it worth, and it is never necessary. You can always select a constant or an expression. In this case convert to just select. For generating your ID get the max value from your table and then just add the row_number that you are inserting: (see demo)
insert into user_to_property(user_to_property_id
, user_id
, property_id
, created
)
with start_with(current_max_id) as
( select max(user_to_property_id) from user_to_property )
select current_max_id + id_incr, user_id, 3, now()
from (
select t2.user_id, row_number() over() id_incr
from property_lines t1
join users t2 on t1.account_id = t2.account_id
) js
join start_with on true;
A couple notes:
DO NOT use user for table name, or any other object name. It is a
documented reserved word by both Postgres and SQL standard (and has
been since Postgres v7.1 and the SQL 92 Standard at lest).
You really should create another column or change the column type
user_to_property_id to auto-generated. Using Max()+1, or
anything based on that idea, is a virtual guarantee you will generate
duplicate keys. Much to the amusement of users and developers alike.
What happens in an MVCC when 2 users run the query concurrently.

Db2 convert rows to columns

I need the below results ..
Table :
Order postcode qnty
123 2234 1
Expected result:
Order 123
Postcode 2234
Qnty 1
SQL server:
Select pvt.element_name
,pvt.element_value(select order.postcode
from table name)up
unpivot (element_value for element_name in(order,postcode) as Pvt
How to achieve this in db2?
Db2 for IBM i doesn't have a built-in unpviot function.. AFAIK, it's not available on any Db2 platofrm...unless it's been added recently.
The straight forward method
select 'ORDER' as key, order as value
from mytable
UNION ALL
select 'POSTCODE', postcode
from mytable
UNION ALL
select 'QNTY', char(qnty)
from mytable;
A better performing method is to do a cross join between the source table and a correlated VALUES of as many rows as columns that need to be unpivoted.
select
Key, value
from mytable T,
lateral (values ('ORDER', t.order)
, ('POSTCODE', t.postcode)
, ('QNQTY', varchar(t.qnty))
) as unpivot(key, value);
However, you'll need to know ahead of time what the values you're unpivoting on.
If you don't know the values, there are some ways to unpivot with the XMLTABLE (possibly JSON_TABLE) that might work. I've never used them, and I'm out of time to spend answering this question. You can find some examples via google.
I have created a stored procedure for LUW that rotate a table:
https://github.com/angoca/db2tools/blob/master/pivot.sql
You just need to call the stored procedure by passing the tablename as parameter, and it will return a cursor with the headers of the column in the first column.

Why is "select table_name from table_name" valid [duplicate]

This question already has an answer here:
What are differences between SQL queries?
(1 answer)
Closed 4 years ago.
This syntax is valid for PostgreSQL:
select T from table_name as T
T seems to become a CSV list of values from all columns in table_name. select T from table_name as T works, and, for that matter, select table_name from table_name. Where is this syntax documented, and what is the datatype of T?
This syntax is not in SQL Server, and (AFAIK) does not exist in any other SQL variant.
If you create a table, Postgres creates a type with the same name in the background. The table is then essentially a "list of that type".
Postgres also allows to reference a complete row as a single "record" - a value built from multiple columns. Those records can be created dynamically through a row constructor.
Each row in a the result of a SELECT statement is implicitly assigned a TYPE - if the row comes from a single table, it's the table's type. Otherwise it's an anonymous type.
When you use the table name in a place where a column would be allowed it references the full row as a single record. If the table is aliased in the select, the type of that record is still the table's type.
So the statement:
select T
from table_name as T;
returns a result with a single column which is a record (of the table's type) containing each column of the table as a field. The default output format of a record is a comma separated list of the values enclosed in parentheses.
Assuming table_name has three columns c1, c2 and c3 the following would essentially do the same thing:
select row(c1, c2, c3)
from table_name;
Note that a record reference can also be used in comparisons, e.g. finding rows that are different between two tables can be done in the following manner
select *
from table_one t1
full outer join table_two t2 on t1.id = t2.id
where t1 <> t2;

Postgres subquery has access to column in a higher level table. Is this a bug? or a feature I don't understand?

I don't understand why the following doesn't fail. How does the subquery have access to a column from a different table at the higher level?
drop table if exists temp_a;
create temp table temp_a as
(
select 1 as col_a
);
drop table if exists temp_b;
create temp table temp_b as
(
select 2 as col_b
);
select col_a from temp_a where col_a in (select col_a from temp_b);
/*why doesn't this fail?*/
The following fail, as I would expect them to.
select col_a from temp_b;
/*ERROR: column "col_a" does not exist*/
select * from temp_a cross join (select col_a from temp_b) as sq;
/*ERROR: column "col_a" does not exist
*HINT: There is a column named "col_a" in table "temp_a", but it cannot be referenced from this part of the query.*/
I know about the LATERAL keyword (link, link) but I'm not using LATERAL here. Also, this query succeeds even in pre-9.3 versions of Postgres (when the LATERAL keyword was introduced.)
Here's a sqlfiddle: http://sqlfiddle.com/#!10/09f62/5/0
Thank you for any insights.
Although this feature might be confusing, without it, several types of queries would be more difficult, slower, or impossible to write in sql. This feature is called a "correlated subquery" and the correlation can serve a similar function as a join.
For example: Consider this statement
select first_name, last_name from users u
where exists (select * from orders o where o.user_id=u.user_id)
Now this query will get the names of all the users who have ever placed an order. Now, I know, you can get that info using a join to the orders table, but you'd also have to use a "distinct", which would internally require a sort and would likely perform a tad worse than this query. You could also produce a similar query with a group by.
Here's a better example that's pretty practical, and not just for performance reasons. Suppose you want to delete all users who have no orders and no tickets.
delete from users u where
not exists (select * from orders o where o.user_d = u.user_id)
and not exists (select * from tickets t where t.user_id=u.ticket_id)
One very important thing to note is that you should fully qualify or alias your table names when doing this or you might wind up with a typo that completely messes up the query and silently "just works" while returning bad data.
The following is an example of what NOT to do.
select * from users
where exists (select * from product where last_updated_by=user_id)
This looks just fine until you look at the tables and realize that the table "product" has no "last_updated_by" field and the user table does, which returns the wrong data. Add the alias and the query will fail because no "last_updated_by" column exists in product.
I hope this has given you some examples that show you how to use this feature. I use them all the time in update and delete statements (as well as in selects-- but I find an absolute need for them in updates and deletes often)

Create a new table out of an existing one

I am having this table of words from multiple files. I want to count how many files each word shows. I can that with the piece of code below. But when I nest it with the CREATE TABLE statement, it won't work. The second piece of code below is the error code.
SELECT WORD, COUNT(*) FROM (select DISTINCT ABSTRACTID, WORD FROM NSFABSTRACTS)
GROUP BY WORD ORDER BY COUNT(*) DESC
CREATE TABLE DOC_FREQ (WORD, TOTALCOUNT) AS
(
SELECT WORD, COUNT(*) FROM (select DISTINCT ABSTRACTID, WORD FROM NSFABSTRACTS)
GROUP BY WORD ORDER BY COUNT(*));
Here is the error message:
SQL Error: ORA-00907: missing right parenthesis
00907. 00000 - "missing right parenthesis"
*Cause:
*Action:
Can anyone suggest how to create this table? Thanks.
You cannot use order by when you have the query enclosed in parentheses; at least if that clause is also within the parentheses:
create table t42 as (select * from dual order by dummy);
SQL Error: ORA-00907: missing right parenthesis
It is allowed outside:
create table t42 as (select * from dual) order by dummy;
table T42 created.
You can remove the parentheses as they aren't needed at all here:
create table t42 as select * from dual order by dummy;
table T42 created.
Or remove the order by, since an order by in the create statement usually makes little difference and doesn't affect how the data is retrieved:
create table t42 as (select * from dual);
table T42 created.
Or preferably for my tastes, both:
create table t42 as select * from dual;
table T42 created.