PostgreSQL grouping - postgresql

I would like to group values according to values in over columns.
This is an example:
I would like to get the output:
{{-30,-50,20},{-20,30,60},{-30,NULL or other value, 20}}
I managed to arrive to:
SELECT array_agg("val")
FROM my_table
WHERE "t_id" = 1
GROUP BY "m_id";
{{-30,-50,20},{-20,30,60},{-30,20}}
What would be the best approach?

create table my_table (
t_id int,
m_id int,
s_id int,
val int
);
insert into my_table (t_id, m_id, s_id, val) values
(1,1,1,-30),
(1,1,2,-50),
(1,1,3,20),
(1,2,1,-20),
(1,2,2,30),
(1,2,3,60),
(1,3,1,-30),
(1,3,3,20);
select array_agg(val order by s_id)
from
my_table t
right join
(
(
select distinct t_id, m_id
from my_table
) a
cross join
(
select distinct s_id
from my_table
) b
) s using (t_id, m_id, s_id)
where t_id = 1
group by m_id
order by m_id
;
array_agg
---------------
{-30,-50,20}
{-20,30,60}
{-30,NULL,20}

Related

Is it possible to find duplicating records in two columns simultaneously in PostgreSQL?

I have the following database schema (oversimplified):
create sequence partners_partner_id_seq;
create table partners
(
partner_id integer default nextval('partners_partner_id_seq'::regclass) not null primary key,
name varchar(255) default NULL::character varying,
company_id varchar(20) default NULL::character varying,
vat_id varchar(50) default NULL::character varying,
is_deleted boolean default false not null
);
INSERT INTO partners(name, company_id, vat_id) VALUES('test1','1010109191191', 'BG1010109191192');
INSERT INTO partners(name, company_id, vat_id) VALUES('test2','1010109191191', 'BG1010109191192');
INSERT INTO partners(name, company_id, vat_id) VALUES('test3','3214567890102', 'BG1010109191192');
INSERT INTO partners(name, company_id, vat_id) VALUES('test4','9999999999999', 'GE9999999999999');
I am trying to figure out how to return test1, test2 (because the company_id column value duplicates vertically) and test3 (because the vat_id column value duplicates vertically as well).
To put it in other words - I need to find duplicating company_id and vat_id records and group them together, so that test1, test2 and test3 would be together, because they duplicate by company_id and vat_id.
So far I have the following query:
SELECT *
FROM (
SELECT *, LEAD(row, 1) OVER () AS nextrow
FROM (
SELECT *, ROW_NUMBER() OVER (w) AS row
FROM partners
WHERE is_deleted = false
AND ((company_id != '' AND company_id IS NOT null) OR (vat_id != '' AND vat_id IS NOT NULL))
WINDOW w AS (PARTITION BY company_id, vat_id ORDER BY partner_id DESC)
) x
) y
WHERE (row > 1 OR nextrow > 1)
AND is_deleted = false
This successfully shows all company_id duplicates, but does not appear to show vat_id ones - test3 row is missing. Is this possible to be done within one query?
Here is a db-fiddle with the schema, data and predefined query reproducing my result.
You can do this with recursion, but depending on the size of your data you may want to iterate, instead.
The trick is to make the name just another match key instead of treating it differently than the company_id and vat_id:
create table partners (
partner_id integer generated always as identity primary key,
name text,
company_id text,
vat_id text,
is_deleted boolean not null default false
);
insert into partners (name, company_id, vat_id) values
('test1','1010109191191', 'BG1010109191192'),
('test2','1010109191191', 'BG1010109191192'),
('test3','3214567890102', 'BG1010109191192'),
('test4','9999999999999', 'GE9999999999999'),
('test5','3214567890102', 'BG8888888888888'),
('test6','2983489023408', 'BG8888888888888')
;
I added a couple of test cases and left in the lone partner.
with recursive keys as (
select partner_id,
array['n_'||name, 'c_'||company_id, 'v_'||vat_id] as matcher,
array[partner_id] as matchlist,
1 as size
from partners
), matchers as (
select *
from keys
union all
select p.partner_id, c.matcher,
p.matchlist||c.partner_id as matchlist,
p.size + 1
from matchers p
join keys c
on c.matcher && p.matcher
and not p.matchlist #> array[c.partner_id]
), largest as (
select distinct sort(matchlist) as matchlist
from matchers m
where not exists (select 1
from matchers
where matchlist #> m.matchlist
and size > m.size)
-- and size > 1
)
select *
from largest
;
matchlist
{1,2,3,5,6}
{4}
fiddle
EDIT UPDATE
Since recursion did not perform, here is an iterative example in plpgsql that uses a temporary table:
create temporary table match1 (
partner_id int not null,
group_id int not null,
matchkey uuid not null
);
create index on match1 (matchkey);
create index on match1 (group_id);
insert into match1
select partner_id, partner_id, md5('n_'||name)::uuid from partners
union all
select partner_id, partner_id, md5('c_'||company_id)::uuid from partners
union all
select partner_id, partner_id, md5('v_'||vat_id)::uuid from partners;
do $$
declare _cnt bigint;
begin
loop
with consolidate as (
select group_id,
min(group_id) over (partition by matchkey) as new_group_id
from match1
), minimize as (
select group_id, min(new_group_id) as new_group_id
from consolidate
group by group_id
), doupdate as (
update match1
set group_id = m.new_group_id
from minimize m
where m.group_id = match1.group_id
and m.new_group_id != match1.group_id
returning *
)
select count(*) into _cnt from doupdate;
if _cnt = 0 then
exit;
end if;
end loop;
end;
$$;
updated fiddle

Delete duplicate rows with different values in columns

I didn't find my case on the Internet. Tell me how i can delete duplicates if the values are in different columns.
I have a table with a lot of values, for example:
|Id1|Id2|
|89417980|89417978|
|89417980|89417979|
|89417978|89417980|
|89417979|89417980|
I need to exclude duplicates and leave in the answer only:
|Id1|Id2|
|89417980|89417978|
|89417980|89417979|
min/max does not work here, as the values may be different.
I tried to union/join tables on a table/exclude results with temporary tables, but in the end I come to the beginning.
Assuming id1 and id2 are primary keys columns you could try this
DECLARE #tbl table (id1 int, id2 int )
INSERT INTO #tbl
SELECT 89417980, 89417978
UNION SELECT 89417980, 89417979
UNION SELECT 89417978, 89417980
UNION SELECT 89417979, 89417980
SELECT * FROM #tbl
;WITH CTE AS (--Get comparable value as "cs"
SELECT
IIF(id1 > id2, CHECKSUM(id1, id2), CHECKSUM(id2,id1)) as cs
, id1
, id2
, ROW_NUMBER() OVER (order by id1, id2) as rn
FROM #tbl
)
, CTE2 AS ( --Get rows to keep
SELECT MAX (rn) as rn
FROM CTE
GROUP BY cs
HAVING COUNT(*) > 1
)
DELETE tbl -- Delete all except the rows to keep
FROM #tbl tbl
WHERE NOT EXISTS(SELECT 1
FROM CTE2
JOIN CTE ON CTE.rn = CTE2.rn
WHERE CTE.id1 = tbl.id1
AND CTE.id2 = tbl.id2
)
SELECT * FROM #tbl

SQL Server : group by with corresponding row values

I need to write a T-SQL group by query for a table with multiple dates and seq columns:
DROP TABLE #temp
CREATE TABLE #temp(
id char(1),
dt DateTime,
seq int)
Insert into #temp values('A','2015-03-31 10:00:00',1)
Insert into #temp values('A','2015-08-31 10:00:00',2)
Insert into #temp values('A','2015-03-31 10:00:00',5)
Insert into #temp values('B','2015-09-01 10:00:00',1)
Insert into #temp values('B','2015-09-01 10:00:00',2)
I want the results to contains only the items A,B with their latest date and the corresponding seq number, like:
id MaxDate CorrespondentSeq
A 2015-08-31 10:00:00.000 2
B 2015-09-01 10:00:00.000 2
I am trying with (the obviously wrong!):
select id, max(dt) as MaxDate, max(seq) as CorrespondentSeq
from #temp
group by id
which returns:
id MaxDate CorrespondentSeq
A 2015-08-31 10:00:00.000 5 <-- 5 is wrong
B 2015-09-01 10:00:00.000 2
How can I achieve that?
EDIT
The dt datetime column has duplicated values (exactly same date!)
I am using SQL Server 2005
You can use a ranking subselect to get only the highest ranked entries for an id:
select id, dt, seq
from (
select id, dt, seq, rank() over (partition by id order by dt desc, seq desc) as r
from #temp
) ranked
where r=1;
SELECT ID, DT, SEQ
FROM (
SELECT ID, DT, SEQ, Row_Number()
OVER (PARTITION BY id ORDER BY dt DESC, seq DESC) AS row_number
FROM temp
) cte
WHERE row_number = 1;
Demo : http://www.sqlfiddle.com/#!3/3e3d5/5
With trial and errors maybe I have found a solution, but I'm not completely sure this is correct:
select A.id, B.dt, max(B.seq)
from (select id, max(dt) as maxDt
from #temp
group by id) as A
inner join #temp as B on A.id = B.id AND A.maxDt = B.dt
group by A.id, B.dt
Select id, dt, seq
From #temp t
where dt = (Select Max(dt) from #temp
Where id = t.Id)
If there are duplicate rows, then you also need to specify what the query processor should use to determine which of the duplicates to return. Say you want the lowest value of seq,
Then you could write:
Select id, dt, seq
From #temp t
where dt = (Select Max(dt) from #temp
Where id = t.Id)
and seq = (Select Min(Seq) from #temp
where id = t.Id
and dt = t.dt)

How to use Common Table Expression with parameters?

I have a stored procedure with 2 CTEs. The second CTE has a parameter
WITH path_sequences
AS
(
),
WITH categories
AS
(
... WHERE CategoryId = #CategoryId
// I dont know how to get this initial parameter inside the CTE
)
SELECT * FROM path_sequences p
JOIN categories c
ON p.CategoryId = c.CategoryId
The initial parameter that I need to get inside the second TCE is p.CategoryId. How do I do that without having to create another stored procedure to contain the second CTE?
Thanks for helping
You can create table valued function
create function ftCategories
(
#CategoryID int
)
returns table
as return
with categories as (
... WHERE CategoryId = #CategoryId
)
select Col1, Col2 ...
from categories
and use it as
SELECT *
FROM path_sequences p
cross apply ftCategories(p.CategoryId) c
I have created simple query using your code. You can use it like -
DECLARE #CategoryId INT
SET #CategoryId = 1
;WITH path_sequences
AS
(
SELECT 1 CategoryId
),
categories
AS
(
SELECT 1 CategoryId WHERE 1 = #CategoryId
)
SELECT * FROM path_sequences p
JOIN categories c
ON p.CategoryId = c.CategoryId
This syntax is for External Aliases:
-- CTES With External Aliases:
WITH Sales_CTE (SalesPersonID, SalesOrderID, SalesYear)
AS
-- Define the CTE query.
(
SELECT SalesPersonID, SalesOrderID, YEAR(OrderDate) AS SalesYear
FROM Sales.SalesOrderHeader
WHERE SalesPersonID IS NOT NULL
)
The only way to add parameters is to use scope variables like so:
--Declare a variable:
DECLARE #category INT
WITH
MyCTE1 (exName1, exName2)
AS
(
SELECT <SELECT LIST>
FROM <TABLE LIST>
--Use the variable as 'a parameter'
WHERE CategoryId = #CategoryId
)
First remove the second WITH, separate each cte with just a comma. Next you can add parameters like this:
DECLARE #category INT; -- <~~ Parameter outside of CTEs
WITH
MyCTE1 (col1, col2) -- <~~ were poorly named param1 and param2 previously
AS
(
SELECT blah blah
FROM blah
WHERE CategoryId = #CategoryId
),
MyCTE2 (col1, col2) -- <~~ were poorly named param1 and param2 previously
AS
(
)
SELECT *
FROM MyCTE2
INNER JOIN MyCTE1 ON ...etc....
EDIT (and CLARIFICATION):
I have renamed the columns from param1 and param2 to col1 and col2 (which is what I meant originally).
My example assumes that each SELECT has exactly two columns. The columns are optional if you want to return all of the columns from the underlying query AND those names are unique. If you have more or less columns than what is being SELECTed you will need to specify names.
Here is another example:
Table:
CREATE TABLE Employee
(
Id INT NOT NULL IDENTITY PRIMARY KEY CLUSTERED,
FirstName VARCHAR(50) NOT NULL,
LastName VARCHAR(50) NOT NULL,
ManagerId INT NULL
)
Fill table with some rows:
INSERT INTO Employee
(FirstName, LastName, ManagerId)
VALUES
('Donald', 'Duck', 5)
INSERT INTO Employee
(FirstName, LastName, ManagerId)
VALUES
('Micky', 'Mouse', 5)
INSERT INTO Employee
(FirstName, LastName, ManagerId)
VALUES
('Daisy', 'Duck', 5)
INSERT INTO Employee
(FirstName, LastName, ManagerId)
VALUES
('Fred', 'Flintstone', 5)
INSERT INTO Employee
(FirstName, LastName, ManagerId)
VALUES
('Darth', 'Vader', null)
INSERT INTO Employee
(FirstName, LastName, ManagerId)
VALUES
('Bugs', 'Bunny', null)
INSERT INTO Employee
(FirstName, LastName, ManagerId)
VALUES
('Daffy', 'Duck', null)
CTEs:
DECLARE #ManagerId INT = 5;
WITH
MyCTE1 (col1, col2, col3, col4)
AS
(
SELECT *
FROM Employee e
WHERE 1=1
AND e.Id = #ManagerId
),
MyCTE2 (colx, coly, colz, cola)
AS
(
SELECT e.*
FROM Employee e
INNER JOIN MyCTE1 mgr ON mgr.col1 = e.ManagerId
WHERE 1=1
)
SELECT
empsWithMgrs.colx,
empsWithMgrs.coly,
empsWithMgrs.colz,
empsWithMgrs.cola
FROM MyCTE2 empsWithMgrs
Notice in the CTEs the columns are being aliased. MyCTE1 exposes columns as col1, col2, col3, col4 and MyCTE2 references MyCTE1.col1 when it references it. Notice the final select uses MyCTE2's column names.
Results:
For anyone still struggling with this, the only thing you need to is terminate your declaration of variables with a semicolon before the CTE. Nothing else is required.
DECLARE #test AS INT = 42;
WITH x
AS (SELECT #test AS 'Column')
SELECT *
FROM x
Results:
Column
-----------
42
(1 row affected)

PostgreSQL join to denormalize a table with generate_series

I've this table:
CREATE TABLE "mytable"
( name text, count integer );
INSERT INTO mytable VALUES ('john', 4),('mark',2),('albert',3);
and I would like "denormlize" the rows in this way:
SELECT name FROM mytable JOIN generate_series(1,4) tmp(a) ON (a<=count)
so I've a number of rows for each name equals to the count column: I've 4 rows with john, 2 with mark and 3 with albert.
But i can't use the generate_series() function if I don't know the highest count (in this case 4). There is a way to do this without knowing the MAX(count) ?
select name,
generate_series(1,count)
from mytable;
Set returning functions can be used in the select list and will do a cross join with the row retrieved from the base table.
I think this is an undocumented behaviour that might go away in the future, but I'm not sure about that (I recall some discussion regarding this on the mailing list)
SQLFiddle example
DROP TABLE ztable ;
CREATE TABLE ztable (zname varchar, zvalue INTEGER NOT NULL);
INSERT INTO ztable(zname, zvalue) VALUES( 'one', 1), ( 'two', 2 ), ( 'three', 3) , ( 'four', 4 );
WITH expand AS (
WITH RECURSIVE zzz AS (
SELECT 1::integer AS rnk , t0.zname
FROM ztable t0
UNION
SELECT 1+rr.rnk , t1.zname
FROM ztable t1
JOIN zzz rr ON rr.rnk < t1.zvalue
)
SELECT zzz.zname
FROM zzz
)
SELECT x.*
FROM expand x
;