PostgreSQL - How to get distinct on two columns separately? - postgresql

I've a table like this:
Source table "tab"
column1 column2
x 1
x 2
y 1
y 2
y 3
z 3
How can I build the query to get result with unique values in each of two columns separately. For example I'd like to get a result like one of these sets:
column1 column2
x 1
y 2
z 3
or
column1 column2
x 2
y 1
z 3
or ...
Thanks.

What you're asking for is difficult because it's weird: SQL treats rows as related fields but you're asking to make two separate lists (distinct values from col1 and distinct values from col2) then display them in one output table not caring how the rows match up.
You can so this by writing the SQL along those lines. Write a separate select distinct for each column, then put them together somehow. I'd put them together by giving each row in each results a row number, then joining them both to a big list of numbers.
It's not clear what you want null to mean. Does it mean there's a null in one of the columns, or that there's not the same number of distinct values in each column? This one problem from asking for things that don't match up with typical relational logic.
Here's an example, removing the null value from the data since that confuses the issue, different data values to avoid confusing rowNumber with data and so there are 3 distinct values in one column and 4 in another. This works for SQL Server, presumably there's a variation for PostgreSQL.
if object_id('mytable') is not null drop table mytable;
create table mytable ( col1 nvarchar(10) null, col2 nvarchar(10) null)
insert into mytable
select 'x', 'a'
union all select 'x', 'b'
union all select 'y', 'c'
union all select 'y', 'b'
union all select 'y', 'd'
union all select 'z', 'a'
select c1.col1, c2.col2
from
-- derived table giving distinct values of col1 and a rownumber column
( select col1
, row_number() over (order by col1) as rowNumber
from ( select distinct col1 from mytable ) x ) as c1
full outer join
-- derived table giving distinct values of col2 and a rownumber column
( select col2
, row_number() over (order by col2) as rowNumber
from ( select distinct col2 from mytable ) x ) as c2
on c1.rowNumber = c2.rowNumber

Related

In Sql Server 2008, can I INSERT multiple rows with some fixed column values and some from a SELECT statement that uses one of the fixed values?

I’m building an insert statement dynamically to add multiple rows of data to a table in a batch, as I believe it is more efficient than inserting one at a time. However, I need the last couple of columns in each inserted row to be set with the results of querying another table using a value from the new row. This is my imaginary pseudocode version:
INSERT INTO TableA (column1, column2, column3, column4, column5)
VALUES (SELECT {value1a}, {value1b}, {value1c}, b.column1, b.column2 FROM TableB b WHERE b.column3 = {value1c}),
(SELECT {value2a}, {value2b}, {value2c}, b.column1, b.column2 FROM TableB b WHERE b.column3 = {value2c}),
…
Now here is another wrinkle: I have a unique index on TableA with an ignore clause, and there is a lot of redundant data to process, so only about 15% of the rows in any given batch insert will actually be added to the database. Does this mean it would be more efficient to insert the rows with values for columns 1 – 3, then query for the rows that were inserted, and update column 4 and 5? If so, would the following be the most efficient way to do that for all the inserted rows?
UPDATE a SET a.column4 = b.column1, a.column5 = b.column2
FROM TableA a INNER JOIN TableB b ON b.column3 = a.column3
WHERE a.CreatedAt >= {BatchInsertTime}
(assuming no other processes are adding rows to the table)
For better efficiency and a simpler way to join TableB, send all the TableA rows in a JSON doc, eg
insert into TableA (column1, column2, column3, column4, column5) …
select d.*, b.column1 column4, b.column2 column5
from openjson(#json)
with
(
column1 varchar(20),
column2 int,
column3 varchar(20)
) as d
left join TableB b
on b.column3 = d.column2
where #json is an NVARCHAR(MAX) parameter that looks like
[
{"column1":"foo", "column2":3,"column3":"bar" },
{"column1":"foo2", "column2":4,"column3":"bar2" },
. . .
]

select distinct values in multiple column and save in common column with column tags

I have a table in postgres with two columns:
col1 col2
a a
b c
d e
f f
I would like to have distinct on the two columns and make one column and later assign the tag of column name from where it is coming. The desired output is:
col source
a col1, col2
b col1
c col1
d col1
e col1
f col1, col2
I am able to find distinct in individual columns but not able to make a single column and add label source.
below is the query i am using:
select distinct on (col1, col2) col1, col2 from table
Any suggestions would be really helpful.
You can un-pivot the columns and the aggregate them back:
select u.value, string_agg(distinct u.source, ',' order by u.source)
from data
cross join lateral (
values('col1', col1), ('col2', col2)
)as u(source,value)
group by u.value
order by u.value;
Online example
Alternatively, if you don't want to list each column, you can convert the row to a JSON value and then un-pivot that:
select x.value, string_agg(distinct x.source, ',' order by x.source)
from data d
cross join lateral jsonb_each_text(to_jsonb(d)) as x(source, value)
group by x.value
order by x.value;

PostgreSQL join to denormalize a table with generate_series

I've this table:
CREATE TABLE "mytable"
( name text, count integer );
INSERT INTO mytable VALUES ('john', 4),('mark',2),('albert',3);
and I would like "denormlize" the rows in this way:
SELECT name FROM mytable JOIN generate_series(1,4) tmp(a) ON (a<=count)
so I've a number of rows for each name equals to the count column: I've 4 rows with john, 2 with mark and 3 with albert.
But i can't use the generate_series() function if I don't know the highest count (in this case 4). There is a way to do this without knowing the MAX(count) ?
select name,
generate_series(1,count)
from mytable;
Set returning functions can be used in the select list and will do a cross join with the row retrieved from the base table.
I think this is an undocumented behaviour that might go away in the future, but I'm not sure about that (I recall some discussion regarding this on the mailing list)
SQLFiddle example
DROP TABLE ztable ;
CREATE TABLE ztable (zname varchar, zvalue INTEGER NOT NULL);
INSERT INTO ztable(zname, zvalue) VALUES( 'one', 1), ( 'two', 2 ), ( 'three', 3) , ( 'four', 4 );
WITH expand AS (
WITH RECURSIVE zzz AS (
SELECT 1::integer AS rnk , t0.zname
FROM ztable t0
UNION
SELECT 1+rr.rnk , t1.zname
FROM ztable t1
JOIN zzz rr ON rr.rnk < t1.zvalue
)
SELECT zzz.zname
FROM zzz
)
SELECT x.*
FROM expand x
;

how to convert single line SQL result into multiple rows?

I am developing a T-SQL query in SSMS 2008 R2 which returns one line only. But the problem is that in this one line there are four fields which I instead want to be unique rows. For example, my output line looks like:
Col. 1 Col. 2 Col. 3 Col. 4
xxxx yyyy zzzz aaaa
Instead, I want this to look like:
Question Answer
Col. 1 xxxx
Col. 2 yyyy
Col. 3 zzzz
Col. 4 aaaa
I have tried using the UNPIVOT operator for this, but it is not doing the above. How can I achieve this?
You should be able to use UNPIVOT for this:
Here is a static pivot where you hard code in the values of the columns:
create table t1
(
col1 varchar(5),
col2 varchar(5),
col3 varchar(5),
col4 varchar(5)
)
insert into t1 values ('xxxx', 'yyyy', 'zzzz', 'aaaa')
select question, answer
FROM t1
unpivot
(
answer
for question in (col1, col2, col3, col4)
) u
drop table t1
Here is a SQL Fiddle with a demo.
but you can also use a Dynamic Unpivot:
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX);
select #cols = stuff((select ','+quotename(C.name)
from sys.columns as C
where C.object_id = object_id('t1') and
C.name like 'Col%'
for xml path('')), 1, 1, '')
set #query = 'SELECT question, answer
from t1
unpivot
(
answer
for question in (' + #cols + ')
) p '
execute(#query)
This is from my data names but it is tested
select 'sID', sID as 'val'
from [CSdemo01].[dbo].[docSVsys]
where sID = 247
union
select 'sParID', sParID as 'val'
from [CSdemo01].[dbo].[docSVsys]
where sID = 247 ;
But UNPIVOT should work
UNION-ing together your four questions would look like:
SELECT 'column1' AS Question, MAX(column1) AS Answer UNION
SELECT 'column2' , MAX(column2) UNION
SELECT 'column3' , MAX(column3) UNION
SELECT 'column4' , MAX(column4)
(obv just using the MAX as an example)

Aggregate GREATEST in T-SQL

My SQL is rusty -- I have a simple requirement to calculate the sum of the greater of two column values:
CREATE TABLE [dbo].[Test]
(
column1 int NOT NULL,
column2 int NOT NULL
);
insert into Test (column1, column2) values (2,3)
insert into Test (column1, column2) values (6,3)
insert into Test (column1, column2) values (4,6)
insert into Test (column1, column2) values (9,1)
insert into Test (column1, column2) values (5,8)
In the absence of the GREATEST function in SQL Server, I can get the larger of the two columns with this:
select column1, column2, (select max(c)
from (select column1 as c
union all
select column2) as cs) Greatest
from test
And I was hoping that I could simply sum them thus:
select sum((select max(c)
from (select column1 as c
union all
select column2) as cs))
from test
But no dice:
Msg 130, Level 15, State 1, Line 7
Cannot perform an aggregate function on an expression containing an aggregate or a subquery.
Is this possible in T-SQL without resorting to a procedure/temp table?
UPDATE: Eran, thanks - I used this approach. My final expression is a little more complicated, however, and I'm wondering about performance in this case:
SUM(CASE WHEN ABS(column1 * column2) > ABS(column3 * column4)
THEN column5 * ABS(column1 * column2) * column6
ELSE column5 * ABS(column3 * column4) * column6 END)
Try this:
SELECT SUM(CASE WHEN column1 > column2
THEN column1
ELSE column2 END)
FROM test
Try this... Its not the best performing option, but should work.
SELECT
'LargerValue' = CASE
WHEN SUM(c1) >= SUM(c2) THEN SUM(c1)
ELSE SUM(c2)
END
FROM Test
SELECT
SUM(MaximumValue)
FROM (
SELECT
CASE WHEN column1 > column2
THEN
column1
ELSE
column2
END AS MaximumValue
FROM
Test
) A
FYI, the more complicated case should be fine, so long as all of those columns are part of the same table. It's still looking up the same number of rows, so performance should be very similar to the simpler case (as SQL Server performance is usually IO bound).
How to find max from single row data
-- eg (empid , data1,data2,data3 )
select emplid , max(tmp.a)
from
(select emplid,date1 from table
union
select emplid,date2 from table
union
select emplid,date3 from table
) tmp , table
where tmp.emplid = table.emplid
select sum(id) from (
select (select max(c)
from (select column1 as c
union all
select column2) as cs) id
from test
)
The best answer to this is simply put :
;With Greatest_CTE As
(
Select ( Select Max(ValueField) From ( Values (column1), (column2) ) ValueTable(ValueField) ) Greatest
From Test
)
Select Sum(Greatest)
From Greatest_CTE
It scales a lot better than the other answers with more than two value columns.