Are Unused Portions of Table Expressions Calculated? - tsql

Suppose I have a SQL Server view defined as follows:
CREATE VIEW V_Test1 AS
SELECT ColumnA, ColumnB
FROM Table1
and then I have another view defined as:
CREATE VIEW V_Test2 AS
SELECT ColumnA
FROM Table1
Would these 2 statements be equally optimized, considering that a view is merely a table expression, and ColumnB is not even called? What if ColumnB wasn't in Table1 but rather the result of a complicated logic or call to another function?
SELECT ColumnA FROM V_Test1
SELECT ColumnA FROM V_Test2

Yes, in Microsoft SQL Server they will have identical execution plans. You can even write an expression that causes an error into ColumnB (for example "1/0"), it won't be executed.

Related

Is a subquery able to select columns from outer query? [duplicate]

This question already has answers here:
sql server 2008 management studio not checking the syntax of my query
(2 answers)
Closed 1 year ago.
I have the following select:
SELECT DISTINCT pl
FROM [dbo].[VendorPriceList] h
WHERE PartNumber IN (SELECT DISTINCT PartNumber
FROM [dbo].InvoiceData
WHERE amount > 10
AND invoiceDate > DATEADD(yyyy, -1, CURRENT_TIMESTAMP)
UNION
SELECT DISTINCT PartNumber
FROM [dbo].VendorDeals)
The issue here is that the table [dbo].VendorDeals has NO column PartNumber, however no error is detected and the query works with the first part of the union.
Even more, IntelliSense also allows and recognize PartNumber. This fails only when inside a complex statement.
It is pretty obvious that if you qualify column names, the mistake will be evident.
This isn't a bug in SQL Server/the T-SQL dialect parsing, no, this is working exactly as intended. The problem, or bug, is in your T-SQL; specifically because you haven't qualified your columns. As I don't have the definition of your table, I'm going to provide sample DDL first:
CREATE TABLE dbo.Table1 (MyColumn varchar(10), OtherColumn int);
CREATE TABLE dbo.Table2 (YourColumn varchar(10) OtherColumn int);
And then an example that is similar to your query:
SELECT MyColumn
FROM dbo.Table1
WHERE MyColumn IN (SELECT MyColumn FROM dbo.Table2);
This, firstly, will parse; it is a valid query. Secondly, provided that dbo.Table2 contains at least one row, then every row from table dbo.Table1 will be returned where MyColumn has a non-NULL value. Why? Well, let's qualify the column with table's name as SQL Server would parse them:
SELECT Table1.MyColumn
FROM dbo.Table1
WHERE Table1.MyColumn IN (SELECT Table1.MyColumn FROM dbo.Table2);
Notice that the column inside the IN is also referencing Table1, not Table2. By default if a column has it's alias omitted in a subquery it will be assumed to be referencing the table(s) defined in that subquery. If, however, none of the tables in the sub query have a column by that name, then it will be assumed to reference a table where that column does exist; in this case Table1.
Let's, instead, take a different example, using the other column in the tables:
SELECT OtherColumn
FROM dbo.Table1
WHERE OtherColumn IN (SELECT OtherColumn FROM dbo.Table2);
This would be parsed as the following:
SELECT Table1.OtherColumn
FROM dbo.Table1
WHERE Table1.OtherColumn IN (SELECT Table2.OtherColumn FROM dbo.Table2);
This is because OtherColumn exists in both tables. As, in the subquery, OtherColumn isn't qualified it is assumed the column wanted is the one in the table defined in the same scope, Table2.
So what is the solution? Alias and qualify your columns:
SELECT T1.MyColumn
FROM dbo.Table1 T1
WHERE T1.MyColumn IN (SELECT T2.MyColumn FROM dbo.Table2 T2);
This will, unsurprisingly, error as Table2 has no column MyColumn.
Personally, I suggest that unless you have only one table being referenced in a query, you alias and qualify all your columns. This not only ensures that the wrong column can't be referenced (such as in a subquery) but also means that other readers know exactly what columns are being referenced. It also stops failures in the future. I have honestly lost count how many times over years I have had a process fall over due to the "ambiguous column" error, due to a table's definition being changed and a query referencing the table wasn't properly qualified by the developer...

It's a function or table in this later join case?

It is often particularly handy to LEFT JOIN to a LATERAL subquery, so that source rows will appear in the result even if the LATERAL subquery produces no rows for them. For example, if get_product_names() returns the names of products made by a manufacturer, but some manufacturers in our table currently produce no products, we could find out which ones those are like this:
SELECT m.name
FROM manufacturers m LEFT JOIN LATERAL get_product_names(m.id) pname ON true
WHERE pname IS NULL;
All contents extract from PostgreSQL manual. LINK
Now I finally probably get what does LATERAL mean. In this case,
Overall I am Not sure get_product_names is a table or function. The following is my understanding.
A: get_product_names(m.id) is a function, and using m.id as a input parameter returns a table. The return table alias as pname. Overall it's a table m join a null (where condition) table.
B: get_product_names is a table, table m left join table get_product_names on m.id. pname is alias for get_product_names. Overall it's a table m join a null (where condition) table.
get_product_names is a table function (also known as set returning function or SRF in PostgreSQL slang). Such a function does not necessarily return a single result row, but arbitrarily many rows.
Since the result of such a function is a table, you typically use it in SQL statements where you would use a table, that is in the FROM clause.
A simple example is
SELECT * FROM generate_series(1, 5);
generate_series
-----------------
1
2
3
4
5
(5 rows)
You can also use normal functions in this way, they are then treated as a table function that returns exactly one row.

How to describe columns (get their names, data types, etc.) of a SQL query in PostgreSQL

I need a way to get a "description" of the columns from a SELECT query (cursor), such as their names, data types, precision, scale, etc., in PostgreSQL (or better yet PL/pgSQL).
I'm transitioning from Oracle PL/SQL, where I can get such description using a built-in procedure dbms_sql.describe_columns. It returns an array of records, one for each column of a given (parsed) cursor.
EDB has it implemented too (https://www.enterprisedb.com/docs/en/9.0/oracompat/Postgres_Plus_Advanced_Server_Oracle_Compatibility_Guide-127.htm#P13324_681237)
An examples of such query:
select col1 from tab where col2 = :a
I need an API (or a workaround) that could be called like this (hopefully):
select query_column_description('select col1 from tab where col2 = :a');
that will return something similar to:
{{"col1","numeric"}}
Why? We build views where these queries become individual columns. For example, view's query would look like the following:
select (select col1 from tab where col2 = t.colA) as col1::numeric
from tab_main t
http://sqlfiddle.com/#!17/21c7a/2
You can use systems table :
First step create a temporary view with your query (without clause where)
create or replace view temporary view a_view as
select col1 from tab
then select
select
row_to_json(t.*)
from (
select
column_name,
data_type
from
information_schema.columns
where
table_schema = 'public' and
table_name = 'a_view'
) as t

Grouping data in postgresql

If I have a table with multiple entries with same name I want to group only the name, i.e., show as many rows present in table but the name should appear only once and other data should show in multiple columns. i.e., for other rows name should be blank:
table expected result
---------------- ------------------
col1 col2 col1 col2
a 5 a 5
a 6 6
a 8 8
b 3 b 3
b 4 4
I'm using PostgreSQL 9.2.
You could use row_number to determine the first occurrence of each group, and from there, it's just a case away from not displaying it:
SELECT CASE rn WHEN 1 THEN col1 ELSE NULL END, col2
FROM (SELECT col1,
col2,
ROW_NUMBER() OVER (PARTITION BY col1
ORDER BY col2 ASC) AS rn
FROM my_table
ORDER BY col1, col2) t
Firstly I need to say that I do not have experience in PostgreSQL, just some basic SQL knowledge. It is not right to change data in original table itself, what you want is some 'view' of the data. Usually such things are made after data set is returned to client, actually it is a matter how to display the data (representation matter), and it should not be made in SQL query but on client side. But, if you want to bother the server with such things indeed, so I would do following: created copy of the table (it can be a temp table), then cleared values in col1 which are not the first in the subsequent select ordering records by col2. By the way, your table does not have primary key, so you will have a problem to implement that, since you can't identify parent record within the subsequent select.
So, the idea to archive that you need on client side (via a data cursor), just traversing records each by one, has even more points.

New table with all columns as differences between two base tables columns

I am using Postgres 9.1 and have two tables (tab1 and tab2). I wish to create a third table (tab3) where each column is the difference between respective columns in these tables, i.e. tab3.col1 = (tab1.col1 - tab2.col1). However my tables tab1 and tab2 have a large number columns. Is there an efficient way to create table tab3?
If I were to hard code my desired output I would plan to use the code below. However I wish to avoid this as I have over 60 columns to create and want to avoid any hard-coding errors. The columns may not be in the same order across the two tables, however the naming is consistent across tables.
CREATE TABLE tab3 AS
SELECT a.col1_01 - b.col2_01 AS col3_01,
a.col1_02 - b.col2_02 AS col3_02,
...
...
FROM tab1 a FULL JOIN tab2 b USING (permno, datadate);
You can build the whole statement from information in the system catalog (or information schema) and execute it dynamically with a DO command. That's what I would do.
DO
$do$
BEGIN
EXECUTE (
SELECT 'CREATE TABLE tab3 AS
SELECT '
|| string_agg(format('a.%1$I - b.%1$I AS %1$I', attname)
, E'\n , ' ORDER BY attnum)
|| '
FROM tab1 a
FULL JOIN tab2 b USING (permno, datadate)'
FROM pg_attribute
WHERE attrelid = 'tab1'::regclass
AND attnum > 0 -- exclude system columns (neg. attnum)
AND NOT attisdropped -- no dropped (dead) columns
);
END
$do$;
Assuming that tab1 and tab2 are visible in your search_path.
Produces and executes what you requested exactly. Just replace dummy table and column names with your real names.
Read about format() and string_agg() in the manual.
More information in these related answers:
Table name as a PostgreSQL function parameter
Prepend table name to each column in a result set in SQL? (Postgres specifically)