Filter union result - tsql

I'm making select with a union.
SELECT * FROM table_1
UNION
SELECT * FROM table_2
Is it possible to filter query results by column values?

Yes, you can enclose your entire union inside another select:
select * from (
select * from table_1 union select * from table_2) as t
where t.column = 'y'
You have to introduce the alias for the table ("as t"). Also, if the data from the tables is disjoint, you might want to consider switching to UNION ALL - UNION by itself works to eliminate duplicates in the result set. This is frequently not necessary.

A simple to read solution is to use a CTE (common table expression). This takes the form:
WITH foobar AS (
SELECT foo, bar FROM table_1
UNION
SELECT foo, bar FROM table_2
)
Then you can refer to the CTE in subsequent queries by name, as if it were a normal table:
SELECT foo,bar FROM foobar WHERE foo = 'value'
CTEs are quite powerful, I recommend further reading here
One tip that you will not find in that MS article is; if you require more than one CTE put a comma between the expression statements. eg:
WITH foo AS (
SELECT thing FROM place WHERE field = 'Value'
),
bar AS (
SELECT otherthing FROM otherplace WHERE otherfield = 'Other Value'
)

If you want to filter the query based on some criteria then you could do this -
Select * from table_1 where table_1.col1 = <some value>
UNION
Select * from table_2 where table_2.col1 = <some value>
But, I would say if you want to filter result to find the common values then you can use joins instead
Select * from table_1 inner join table_2 on table_1.col1 = table_2.col1

Related

How to query two tables with same schema but different time ranges in a date column

I have two tables:
main_products
old_products
They have the same info and schema with only one difference:
main_products has min(date) = 2022-01 and max(date) = 2022-05
and
old_products has min(date) = 2020-01 and max(date) = 2020-12
How can I query to get all records from old_products + all records from main_products to get products from 2020-01 to 2022-05 ?
The product on both tables has and product_id field.
I tried to join both tables on product_id but the output is a table with twice number of columns.
select t1.*, t2.* from t1
inner join t2
one t1.product_id = t2.product_id
I think you are looking for a UNION or UNION ALL:
SELECT *
FROM t1
WHERE ...
UNION ALL
SELECT *
FROM t2
WHERE ...
If the columns in t1 and t2 are the same (same number of columns and same types), this will pull the data from both of them. Use UNION if you want duplicates removed or UNION ALL to include duplicates. (In your case it won't make a functional difference since the tables don't overlap by date, but UNION ALL will be faster.)
In the above example, you can put your condition (to only get 2022-01 to 2022-05) in both WHERE conditions. If you don't like repeating the condition, you can use the UNION ALL query in a subquery with the condition outside:
SELECT *
FROM
(
SELECT *
FROM t1
UNION ALL
SELECT *
FROM t2
) sq
WHERE ...

Using an array variable in WHERE NOT IN -like clause

I have declared an array variable as
select array(select "account_id" from [some where clause]) as negateArray;
Note that I return a single column. Now I need to use it in another WHERE-clause that should mean: do NOT include data if account_id matches any of found in negateArray.
I write something like
select * from [some other where clause] WHERE (account_id NOT IN ANY(negateArray));
but I am getting a syntax error. What would be the correct way to write the condition for PostgreSQL?
In order to let the optimizer do its best - convert the not in condition to left [outer] join. Here is an equivalent SQL-only version rewrite:
select t2.*
from outer_table t2
left join (select account_id from inner_table where [some where clause on inner_table]) t1
on t2.account_id = t1.account_id
where [some other where clause on outer_table]
AND t1.account_id IS NULL;
t1.account_id IS NULL does the not in job.
Edit
Equivalent but shorter (and probably more efficient) using [inner] join and inverted condition:
select t2.*
from outer_table t2
join (select account_id from inner_table where NOT [some where clause on inner_table]) t1
on t2.account_id = t1.account_id
where [some other where clause on outer_table];
If you want to use variables, then you need to do this using procedure/function.
First, define variable as negateArray int[]
Then try:
select array(select "account_id" from [some where clause]) INTO negateArray;
and then:
select * from [some other where clause] WHERE (account_id <> ANY(negateArray));
EDIT
For "pure" SQL, you don't need variable, just use query like this:
select * from [some other where clause] -- your second query
WHERE account_id NOT IN(
select "account_id" from [some where clause] -- your first query
);

SQL left join on maximum date

I have two tables: contracts and contract_descriptions.
On contract_descriptions there is a column named contract_id which is equal on contracts table records.
I am trying to join the latest record on contract_descriptions:
SELECT *
FROM contracts c
LEFT JOIN contract_descriptions d ON d.contract_id = c.contract_id
AND d.date_description =
(SELECT MAX(date_description)
FROM contract_descriptions t
WHERE t.contract_id = c.contract_id)
It works, but is it the performant way to do it? Is there a way to avoid the second SELECT?
You could also alternatively use DISTINCT ON:
SELECT * FROM contracts c LEFT JOIN (
SELECT DISTINCT ON (cd.contract_id) cd.* FROM contract_descriptions cd
ORDER BY cd.contract_id, cd.date_description DESC
) d ON d.contract_id = c.contract_id
DISTINCT ON selects only one row per contract_id while the sort clause cd.date_description DESC ensures that it is always the last description.
Performance depends on many values (for example, table size). In any case, you should compare both approaches with EXPLAIN.
Your query looks okay to me. One typical way to join only n rows by some order from the other table is a lateral join:
SELECT *
FROM contracts c
CROSS JOIN LATERAL
(
SELECT *
FROM contract_descriptions cd
WHERE cd.contract_id = c.contract_id
ORDER BY cd.date_description DESC
FETCH FIRST 1 ROW ONLY
) cdlast;

How Dynamicaly columns in UNPIVOT operator

I currently have the following query:
WITH History AS (
SELECT
kz.*,
kz.__$operation AS operation,
map.tran_begin_time as beginT,
map.tran_end_time as endT
FROM cdc.fn_cdc_get_all_changes_dbo_EXT_GeolObject_KategZalezh(sys.fn_cdc_get_min_lsn('dbo_EXT_GeolObject_KategZalezh'), sys.fn_cdc_get_max_lsn(), 'all') AS kz
INNER JOIN [cdc].[lsn_time_mapping] map
ON kz.[__$start_lsn] = map.start_lsn
where kz.GUID_BalanceHC_Zalezh = 'DDA9AB3A-A0AF-4623-9362-0000C8C83D63'
),
UnpivotedValues AS(
SELECT guid, GUID_another, field, val, operation, beginT, endT
FROM History
UNPIVOT ( [val] FOR field IN
(
area,
oilwidthmin,
oilwidthmax,
efectivwidthmin,
efectivwidthmax,
etc...
))t
),
UnpivotedWithLastValue AS (
SELECT
*,
--Use LAG() to get the last value for the same field
LAG(val, 1) OVER (PARTITION BY guid, GUID_another, field ORDER BY BeginT) LastVal
FROM UnpivotedValues
)
SELECT * FROM UnpivotedWithLastValue WHERE val <> LastVal OR LastVal IS NULL ORDER BY guid
This query returns the changed values for a single table that has CDC (Change Data Capture) enabled.
I want to create a stored procedure that receives the columns to be unpivoted, and the cdc function (e.g. cdc.fn_cdc_get_all_...) as parameters and returns the result set.
The result for this tables must be joined in one report.
In my case parameter 1 is cdc.fn_cdc_get_all_changes_dbo_EXT_GeolObject_KategZalezh(sys.fn_cdc_get_min_lsn('dbo_EXT_GeolObject_KategZalezh'), sys.fn_cdc_get_max_lsn(), 'all'). This is the CDC function.
How should I send the list of fields that i want in the result? How's the string?
Also, is there a way to do without dynamic SQL? Dynamic SQL it is not better solution for performance.
As you know SQL Server is declarative by design and does not support macro substitution.
UNPIVOT would clearly be more performant, but here is a simplified example of a UNPIVOT which does not require Dynamic SQL, but only a little XML.
Example
Let's assume your table/results looks like this:
You may notice that I only we only specify key fields to EXCLUDE in the final WHERE
Declare #YourData table (ID int,Active bit,First_Name varchar(50),Last_Name varchar(50),EMail varchar(50),Salary decimal(10,2))
Insert into #YourData values
(1,1,'John','Smith','john.smith#email.com',85600),
(2,0,'Jane','Doe' ,'jane.doe#email.com',83200)
;with cte as (
-- Replace with your Complex Query
Select * from #YourData
)
Select A.ID
,A.Active
,C.*
From cte A
Cross Apply (Select XMLData=cast((Select A.* for XML RAW) as xml)) B
Cross Apply (
Select Item = attr.value('local-name(.)','varchar(100)')
,Value = attr.value('.','varchar(max)')
From XMLData.nodes('/row') C1(n)
Cross Apply C1.n.nodes('./#*') C2(attr)
Where attr.value('local-name(.)','varchar(100)') not in ('ID','Active')
) C
Returns

How to perform "a UNION b" when a and b are CTEs?

If I try to UNION (or INTERSECT or EXCEPT) a common table expression I get a syntax error near the UNION. If instead of using the CTE I put the query into the union directly, everything works as expected.
I can work around this but for some more complicated queries using CTEs makes things much more readable. I also just don't like not knowing why something is failing.
As an example, the following query works:
SELECT *
FROM
(
SELECT oid, route_group
FROM runs, gpspoints
WHERE gpspoints.oid = runs.start_point_oid
UNION
SELECT oid, route_group
FROM runs, gpspoints
WHERE gpspoints.oid = runs.end_point_oid
) AS allpoints
;
But this one fails with:
ERROR: syntax error at or near "UNION"
LINE 20: UNION
WITH
startpoints AS
(
SELECT oid, route_group
FROM runs, gpspoints
WHERE gpspoints.oid = runs.start_point_oid
),
endpoints AS
(
SELECT oid, route_group
FROM runs, gpspoints
WHERE gpspoints.oid = runs.end_point_oid
)
SELECT *
FROM
(
startpoints
UNION
endpoints
) AS allpoints
;
The data being UNIONed together is identical but one query fails and the other does not.
I'm running PostgreSQL 9.3 on Windows 7.
The problem is because CTEs are not direct text-substitutions and a UNION b is invalid SELECT syntax. The SELECT keyword is a mandatory part of the parsing and the syntax error is raised before the CTEs are even taken into account.
This is why
SELECT * FROM a
UNION
SELECT * FROM b
works; the syntax is valid, and then the CTEs (represented by a and b) are then used at the table-position (via with_query_name).
At least in SQL Server, I can easily do this - create two CTE's, and do a SELECT from each, combined with a UNION:
WITH FirstNames AS
(
SELECT DISTINCT FirstName FROM Person
), LastNames AS
(
SELECT DISTINCT LastName FROM Person
)
SELECT * FROM FirstNames
UNION
SELECT * FROM LastNames
Not sure if this works in Postgres, too - give it a try!