Correlated subqueries vs. immutable functions - postgresql

create table t1 as
select 100 col1, '{A,B,C}'::character varying[] col2
union all
select 200, '{A,B,C}'::character varying[]
union all
select 150, '{X,Y,Z}'::character varying[]
union all
select 250, '{X,Y,Z}'::character varying[];
create table t2 as
select 'A' col1, 10 col2
union all
select 'B', 20
union all
select 'C', 25
union all
select 'X', 15
union all
select 'Y', 10
union all
select 'Z', 20;
Consider this query:
select t1.col1,
(select sum(col2)
from t2
where t2.col1 = any(t1.col2))
from t1;
My understanding is that if I implement that subquery as a function call instead and define that function as IMMUTABLE, it will execute twice instead of four times.
Is this also true for the correlated subqueries?
Does the planner evaluate the contents of input arrays for this purpose?

It will not cache the results, neither with a function nor with a subquery. Run EXPLAIN to convince yourself.
That's why you should use a join rather than a subselect for this kind of request to avoid a nested loop.

Related

Sort two csv fields by removing duplicates and without row-by-row processing

I am trying to combine two csv fields, eliminate duplicates, sort and store it in a new field.
I was able to achieve this. However, I encountered a scenario where the values are like abc and abc*. I need to keep the one with abc* and remove the other.
Could this be achieved without row by row processing?
Here is what I have.
CREATE TABLE csv_test
(
Col1 VARCHAR(100),
Col2 VARCHAR(100),
Col3 VARCHAR(500)
);
INSERT dbo.csv_test (Col1, Col2)
VALUES ('xyz,def,abc', 'abc*,tuv,def,xyz*,abc'), ('qwe,bca,a23', 'qwe,bca,a23*,abc')
--It is assumed that there are no spaces around commas
SELECT Col1, Col2, Col1 + ',' + Col2 AS Combined_NonUnique_Unsorted,
STUFF((
SELECT ',' + Item
FROM (SELECT DISTINCT Item FROM dbo.DelimitedSplit8K(Col1 + ',' + Col2,',')) t
ORDER BY Item
FOR XML PATH('')
),1,1,'') Combined_Unique_Sorted
, ExpectedResult = 'Keep the one with * and make it unique'
FROM dbo.csv_test;
--Expected Results; if there are values like abc and abc* ; I need to keep abc* and remove abc ;
--How can I achieve this without looping or using temp tables?
abc,abc*,def,tuv,xyz,xyz* -> abc*,def,tuv,xyz*
a23,a23*,abc,bca,qwe -> a23*,abc,bca,qwe
Well, since you agree that normalizing the database is the correct thing to do, I decided to try to come up with a solution for you.
I ended up with quite a cumbersome solution involving 4(!) common table expressions - cumbersome, but it works.
The first cte is to add a row identifier missing from your table - I've used ROW_NUMBER() OVER(ORDER BY Col1, Col2) for that.
The second cte is to get a unique set of values from combining both csv columns. Note that this does not handle the * part yet.
The third cte is handling the * issue.
And finally, the fourth cte is putting all the unique items back into a single csv. (I could do it in the third cte but I wanted to have each cte responsible of a single part of the solution - it's much more readable.)
Now all that's left is to update the first cte's Col3 with the fourth cte's Combined_Unique_Sorted:
;WITH cte1 as
(
SELECT Col1,
Col2,
Col3,
ROW_NUMBER() OVER(ORDER BY Col1, Col2) As rn
FROM dbo.csv_test
), cte2 as
(
SELECT rn, Item
FROM cte1
CROSS APPLY
(
SELECT DISTINCT Item
FROM dbo.DelimitedSplit8K(Col1 +','+ Col2, ',')
) x
), cte3 AS
(
SELECT rn, Item
FROM cte2 t0
WHERE NOT EXISTS
(
SELECT 1
FROM cte2 t1
WHERE t0.Item + '*' = t1.Item
AND t0.rn = t1.rn
)
), cte4 AS
(
SELECT rn,
STUFF
((
SELECT ',' + Item
FROM cte3 t1
WHERE t1.rn = t0.rn
ORDER BY Item
FOR XML PATH('')
), 1, 1, '') Combined_Unique_Sorted
FROM cte3 t0
)
UPDATE t0
SET Col3 = Combined_Unique_Sorted
FROM cte1 t0
INNER JOIN cte4 t1 ON t0.rn = t1.rn
To verify the results:
SELECT *
FROM csv_test
ORDER BY Col1, Col2
Results:
Col1 Col2 Col3
qwe,bca,a23 qwe,bca,a23*,abc a23*,abc,bca,qwe
xyz,def,abc abc*,tuv,def,xyz*,abc abc*,def,tuv,xyz*
You can see a live demo on rextester.

(impala) AnalysisException: Subqueries are not supported in the select list

I have a query like this, and appearantly Impala doesn't support subqueries in SELECT statement. How can I neatly rewrite it in Impala?
SELECT
col1,
col2,
...
CASE
WHEN (SELECT 1
FROM
table1 x,
table2 y
WHERE
x.id = y.id
LIMIT 1) = 1
THEN
'A'
ELSE
'B'
END
coln
FROM
...
Your query has the following error(s):
AnalysisException: Subqueries are not supported in the select list.
You could try
SELECT col1, col2, ... 'A' coln
FROM ...
WHERE EXISTS (SELECT 1 FROM table1 x, table2 y WHERE x.id = y.id LIMIT 1)
UNION ALL
SELECT col1, col2, ... 'B' coln
FROM ...
WHERE NOT EXISTS (SELECT 1 FROM table1 x, table2 y WHERE x.id = y.id LIMIT 1)
No guarantees, haven't tried it myself.
In general, a cleaner solution is placing the subqueries into the FROM clause, thereby linking the subqueries to the main table through inner or left joins. I usually do this when dealing with complex types in Impala.
However, in your specific example you are trying to do a left join, defining a field for each row which indicates whether the join was successful ('A') or not ('B'). In this case you could do the following:
SELECT
x.id, x.col2, x.col3, ...
CASE
WHEN y.id IS NOT NULL THEN 'A'
ELSE 'B'
END
coln
FROM table1 x LEFT JOIN
table2 y USING (id)
...

Postgresql rows to columns (UNION ALL to JOIN)

Hello with this query I'm getting one result with four rows, how can I change it in order to get four named columns with their own result every one?
SELECT COUNT(*) FROM vehicles WHERE cus=1
UNION ALL
SELECT COUNT(*) FROM user WHERE cus=1
UNION ALL
SELECT COUNT(*) FROM vehicle_events WHERE cus=1
UNION ALL
SELECT COUNT(*) FROM vehicle_alerts WHERE cus=1
Thanks in advance.
SELECT a.ct veh_count, b.ct user_count, c.ct event_count, d.ct alert_count
FROM
( SELECT COUNT(*) ct FROM vehicles WHERE cus=1 ) a,
( SELECT COUNT(*) ct FROM user WHERE cus=1 ) b,
( SELECT COUNT(*) ct FROM vehicle_events WHERE cus=1 ) c,
( SELECT COUNT(*) ct FROM vehicle_alerts WHERE cus=1 ) d;
UNION only adds rows; it has no effect on the columns.
Columns, which define the "shape" of the row tuples, must appear as selected columns1.
For example:
SELECT
(SELECT COUNT(*) FROM vehicles WHERE cus=1) as veh_count
,(SELECT COUNT(*) FROM users WHERE cus=1) as user_count
..
1 There are other constructs that can allow this, see crosstab for example - but the columns are fixed by the query command. It takes dynamic SQL to get a variable number of columns.

PostgreSQL - How to get distinct on two columns separately?

I've a table like this:
Source table "tab"
column1 column2
x 1
x 2
y 1
y 2
y 3
z 3
How can I build the query to get result with unique values in each of two columns separately. For example I'd like to get a result like one of these sets:
column1 column2
x 1
y 2
z 3
or
column1 column2
x 2
y 1
z 3
or ...
Thanks.
What you're asking for is difficult because it's weird: SQL treats rows as related fields but you're asking to make two separate lists (distinct values from col1 and distinct values from col2) then display them in one output table not caring how the rows match up.
You can so this by writing the SQL along those lines. Write a separate select distinct for each column, then put them together somehow. I'd put them together by giving each row in each results a row number, then joining them both to a big list of numbers.
It's not clear what you want null to mean. Does it mean there's a null in one of the columns, or that there's not the same number of distinct values in each column? This one problem from asking for things that don't match up with typical relational logic.
Here's an example, removing the null value from the data since that confuses the issue, different data values to avoid confusing rowNumber with data and so there are 3 distinct values in one column and 4 in another. This works for SQL Server, presumably there's a variation for PostgreSQL.
if object_id('mytable') is not null drop table mytable;
create table mytable ( col1 nvarchar(10) null, col2 nvarchar(10) null)
insert into mytable
select 'x', 'a'
union all select 'x', 'b'
union all select 'y', 'c'
union all select 'y', 'b'
union all select 'y', 'd'
union all select 'z', 'a'
select c1.col1, c2.col2
from
-- derived table giving distinct values of col1 and a rownumber column
( select col1
, row_number() over (order by col1) as rowNumber
from ( select distinct col1 from mytable ) x ) as c1
full outer join
-- derived table giving distinct values of col2 and a rownumber column
( select col2
, row_number() over (order by col2) as rowNumber
from ( select distinct col2 from mytable ) x ) as c2
on c1.rowNumber = c2.rowNumber

postgres output query within with clause

I'm trying to get the output of queries within the with clause of my final query as csv or some sort of text files. I only have query access, I'm not allowed to create tables for this database. I have a set of queries that do some calculations on a data set, another set of queries that compute on the previous set and yet another that calculates on the final set. I don't want to run all of it as three seperate queries because the results from the first two are actually in the last one.
WITH
Q1 AS(
SELECT col1, col2, col3, col4, col5, col6, col7
FROM table1
),
Q2 AS(
SELECT AVG(col1) as col1Avg, MAX(col1) as col1Max, col2, col3,col4
FROm Q1
GROUP BY col2, col3, col4
)
SELECT
AVG(col1AVG), col3
FROM
Q2
GROUP BY col3
I would like the results from Q1, Q2 and the final select statement as preferably 3 csv files but I could live with all of it in one csv file. Is this possible?
Thanks!
Edit: Just to clarify, the columns from the queries are very different. I'm definitely pulling more columns from my first query than my second. I've edited the above code a bit to make this more clear.
To combine all the results together you'd use UNION ALL, but the number and data types of the columns must match.
select col1, col2, col2
from blah
union all
select col1, col2, col2
from blah2
union all
... etc
You can reference CTE's in there of course ...
with
cte_1 as (
select ... from ...),
cte_2 as (
select ... from ... cte_1),
cte_3 as (
select ... from ... cte_2)
select col1, col2, col2
from cte_1
union all
select col1, col2, col2
from cte_2
union all
select col1, col2, col2
from cte_3
If your final output is a csv then it looks like you have multiple row formats in there -- checksums? If so, in the queries that you union all together you might like to combine all the columns from each query into one string ...
with
cte_1 as (
select ... from ...),
cte_2 as (
select ... from ... cte_1),
cte_3 as (
select ... from ... cte_2)
select col1||','||col2||','||col2
from cte_1
union all
select col1||','||col2
from cte_2
union all
select col1
from cte_3