Select bitmask in Postgresql

Select bitmask in Postgresql - postgresql

I have a table with columns "one" and "two":
a | x
a | y
a | z
b | x
b | z
c | y
I want to write a query to complement it with missing nested values
b | null | y
c | null | x
c | null | z
Then I will select it with array_agg(two) group by one, such that
a {1 1 1}
b {1 0 1}
c {0 1 0}
And eventually export it in a CSV file with COPY query
What query should I write for the first step?

You can use a CROSS JOIN to build all the possible pairs of elements then a LEFT JOIN to check if each pair of elements exists:
SELECT
T1.one,
T2.two,
CASE WHEN your_table.one IS NULL THEN 0 ELSE 1 END AS is_present
FROM (SELECT DISTINCT one FROM your_table) T1
CROSS JOIN (SELECT DISTINCT two FROM your_table) T2
LEFT JOIN your_table
ON T1.one = your_table.one AND T2.two = your_table.two
You can then add a GROUP BY T1.one and an ARRAY_AGG(...) to this query.

Related

Postgres: how to find rows having duplicate values in fields

How can I find if any value exists more than once in one row? An example:
id | c1 | c2 | c3
----+----+----+----
1 | a | b | c
2 | a | a | b
3 | b | b | b
The query should return rows 2 and 3 since they have the same value more than once. The solution I'm looking for is not 'where c1 = c2 or c1 = c3 or c2 = c3' since there can be any number of columns in tables I need to test. All values are text but can be any length.

One way to do that is to convert the columns to rows:
select *
from the_table tt
where exists (select 1
from ( values (c1), (c2), (c3) ) as t(v)
group by v
having count(*) > 1)
If you want a dynamic solution where you don't have to list each column, you can do that by converting the row to a JSON value:
select *
from the_table tt
where exists (select 1
from jsonb_each_text(to_jsonb(tt)) as j(k,v)
group by v
having count(*) > 1)
Online example

calculate rank without using rank or rownums function by using single column

Do not use any functions like rank or rownums.
Hint: Formulate matrix operation using sql. A rank of an item indicates how many items are less than or equal to it.
A matrix can be simulated by cross join and rank can be derived by
counting items smaller than the current item.
Table A:-
x
----
d
b
a
g
c
k
k
g
Expected output:
x1 | rank
----+------
a | 1
b | 2
d | 3
g | 4
c | 5
k | 6
select x as x1, count(x) as rank
from (select DISTINCT x from A order by x) as sub

Your current query is on the right track, using a distinct subquery. For a working version, use a correlated subquery in the select clause which takes counts:
SELECT
x AS x1,
(SELECT COUNT(DISTINCT x) FROM A t WHERE t.x <= sub.x) rank
FROM (SELECT DISTINCT x FROM A) AS sub
ORDER BY
x;
Demo

Creating clusters of related columns

I have a table named Stores with columns:
StoreCode NVARCHAR(10),
OldStoreCode NVARCHAR(10)
Here is a sample of my data:
| StoreCode | OldStoreCode |
|-----------|--------------|
| A | B |
| B | A |
| D | E |
| E | F |
| M | K |
| J | K |
| K | L |
|-----------|--------------|
I want to create clusters of related Stores. Related store means there is a one way relation between StoreCodes and OldStoreCodes.
Expected result table:
| StoreCode | ClusterId |
|-----------|-----------|
| A | 1 |
| B | 1 |
| D | 2 |
| E | 2 |
| F | 2 |
| M | 3 |
| K | 3 |
| J | 3 |
| L | 3 |
|-----------|-----------|
There is no maximum number hops. There may be a StoreCode A which has a OldStoreCode B, which has a OldStoreCode C, which has a OldStoreCode D etc.
How can I cluster stores like this?

Try it like this:
EDIT: With changes by OP taken from comment
DECLARE #tbl TABLE(ID INT IDENTITY, StoreCode VARCHAR(100),OldStoreCode VARCHAR(100));
INSERT INTO #tbl VALUES
('A','B'),('B','A'),('D','E'),('E','F'),('M','K'),('J','K'),('K','L');
WITH Related AS
(
SELECT DISTINCT t1.ID,Val
FROM #tbl AS t1
INNER JOIN #tbl AS t2 ON t1.StoreCode=t2.StoreCode
OR t1.OldStoreCode=t2.OldStoreCode
OR t1.OldStoreCode=t2.StoreCode
OR t1.StoreCode=t2.OldStoreCode
CROSS APPLY(SELECT DISTINCT Val
FROM
(VALUES(t1.StoreCode),(t2.StoreCode),(t1.OldStoreCode),(t2.OldStoreCode)) AS A(Val)
) AS valsInCols
)
,ClusterKeys AS
(
SELECT r1.ID
,(
SELECT r2.Val AS [*]
FROM Related AS r2
WHERE r2.ID=r1.ID
ORDER BY r2.Val
FOR XML PATH('')
) AS ClusterKey
FROM Related AS r1
GROUP BY r1.ID
)
,ClusterIds AS
(
SELECT ClusterKey
,MIN(ID) AS ID
FROM ClusterKeys
GROUP BY ClusterKey
)
SELECT r.ID
,r.Val
FROM ClusterIds c
INNER JOIN Related r ON c.ID = r.ID
The result
ID Val
1 A
1 B
3 D
3 E
3 F
5 J
5 K
5 L
5 M

This should do it:
SAMPLE DATA:
IF OBJECT_ID('tempdb..#Temp1') IS NOT NULL
BEGIN
DROP TABLE #Temp1;
END;
CREATE TABLE #Temp1(StoreCode NVARCHAR(10)
, OldStoreCode NVARCHAR(10));
INSERT INTO #Temp1(StoreCode
, OldStoreCode)
VALUES
('A'
, 'B'),
('B'
, 'A'),
('D'
, 'E'),
('E'
, 'F'),
('M'
, 'K'),
('J'
, 'K'),
('K'
, 'L');
QUERY:
;WITH A -- get all distinct new and old storecodes
AS (
SELECT StoreCode
FROM #Temp1
UNION
SELECT OldStoreCode
FROM #Temp1),
B -- give a unique number id to each store code
AS (SELECT rn = RANK() OVER(ORDER BY StoreCode)
, StoreCode
FROM A),
C -- combine the store codes and the unique number id's in one table
AS (SELECT b2.rn AS StoreCodeID
, t.StoreCode
, b1.rn AS OldStoreCodeId
, t.OldStoreCode
FROM #Temp1 AS t
LEFT OUTER JOIN B AS b1 ON t.OldStoreCode = b1.StoreCode
LEFT OUTER JOIN B AS b2 ON t.StoreCode = b2.StoreCode),
D -- assign a row number for each entry in the data set
AS (SELECT rn = RANK() OVER(ORDER BY StoreCode)
, *
FROM C),
E -- derive first and last store in the path
AS (SELECT FirstStore = d2.StoreCode
, LastStore = d1.OldStoreCode
, GroupID = d1.OldStoreCodeId
FROM D AS d1
RIGHT OUTER JOIN D AS d2 ON d1.StoreCodeID = d2.OldStoreCodeId
AND d1.rn - 1 = d2.rn
WHERE d1.OldStoreCode IS NOT NULL) ,
F -- get the stores wich led to the last store with one hop
AS (SELECT C.StoreCode
, E.GroupID
FROM E
INNER JOIN C ON E.LastStore = C.OldStoreCode)
-- combine to get the full grouping
SELECT A.StoreCode, ClusterID = DENSE_RANK() OVER (ORDER BY A.GroupID) FROM (
SELECT C.StoreCode,F.GroupID FROM C INNER JOIN F ON C.OldStoreCode = F.StoreCode
UNION
SELECT * FROM F
UNION
SELECT E.LastStore,E.GroupID FROM E) AS A ORDER BY StoreCode, ClusterID
RESULTS:

Comparing tables and getting non matching values

I'm pretty new to SQL and I can't get this to work I've got these two tables below
Table A Table B
_________________ _________________
| A | 2015-10-4 | B | 2015-11-6
| B | 2015-11-4 | C | 2015-05-4
| C | 2015-05-6 | D | 2015-05-8
| D | 2015-05-7 | C | 2015-05-5
I'm trying to write a stored procedure that will get all letters from table B that has a date less than table A and any letter that doesn't exist in table B.
This is what I have so far
SELECT *
FROM A q JOIN
B c ON q.Letter = c.Letter AND q.Date > c.Date OR c.Letter IS NULL
This returns C but I can't have it return A also. It's confusing to me trying to join and compare tables still.
I do not want duplicate rows, the results I would be expecting would return
| A | 2015-10-4
| C | 2015-05-6
EDIT
I'm running into an issue now where if I have a case like this
Table A Table B
_________________ _________________
| A | 2015-10-4 | B | 2015-11-6
| B | 2015-11-4 | C | 2015-05-4
| C | 2015-05-6 | D | 2015-05-8
| D | 2015-05-7 | C | 2015-05-5
| C | 2015-05-7
It will still return C for some reason. Using a.date > max(b.date) doesn't work because max can't used that way. And I want to assume the max date can be anywhere in the table in table B.
So now my new results would be
| A | 2015-10-4
But I am getting A and C still.

You should use a LEFT JOIN:
SELECT DISTINCT A.letter, A.[Date]
FROM dbo.TableA A
LEFT JOIN dbo.TableB B
ON A.letter = B.letter
WHERE B.[Date] < A.[Date] OR B.letter IS NULL;
UPDATE
You should have explained your requirements as: "get all letters from table B in which every date is lesser than...."
SELECT DISTINCT A.letter, A.[Date]
FROM dbo.TableA A
LEFT JOIN (SELECT letter, MAX([Date]) [Date]
FROM dbo.TableB
GROUP BY letter) B
ON A.letter = B.letter
WHERE B.[Date] < A.[Date] OR B.letter IS NULL;

I would go for a UNION / UNION ALL, so that you get the result subset for the first condition + the ones for the second one.
Something similar to this should do the job:
sqlite> create table A (letter, my_date);
sqlite> create table B (letter, my_date);
sqlite> insert into A values ('A', '2015-10-04');
sqlite> insert into A values ('B', '2015-11-04');
sqlite> insert into A values ('C', '2015-05-06');
sqlite> insert into A values ('D', '2015-05-07');
sqlite> insert into B values ('B', '2015-11-06');
sqlite> insert into B values ('C', '2015-05-04');
sqlite> insert into B values ('D', '2015-05-08');
sqlite> insert into B values ('C', '2015-05-05');
A 2015-10-04
sqlite> select B.* from A, B where A.letter = B.letter and B.my_date < A.my_date UNION ALL select A.* from A where not exists (select 1 from B where B.letter=A.letter);
letter my_date
---------- ----------
C 2015-05-04
C 2015-05-05
A 2015-10-04

Window functions and more "local" aggregation

Suppose I have this table:
select * from window_test;
k | v
---+---
a | 1
a | 2
b | 3
a | 4
Ultimately I want to get:
k | min_v | max_v
---+-------+-------
a | 1 | 2
b | 3 | 3
a | 4 | 4
But I would be just as happy to get this (since I can easily filter it with distinct):
k | min_v | max_v
---+-------+-------
a | 1 | 2
a | 1 | 2
b | 3 | 3
a | 4 | 4
Is it possible to achieve this with PostgreSQL 9.1+ window functions? I'm trying to understand if I can get it to use separate partition for the first and last occurrence of k=a in this sample (ordered by v).

This returns your desired result with the sample data. Not sure if it will work for real world data:
select k,
min(v) over (partition by group_nr) as min_v,
max(v) over (partition by group_nr) as max_v
from (
select *,
sum(group_flag) over (order by v,k) as group_nr
from (
select *,
case
when lag(k) over (order by v) = k then null
else 1
end as group_flag
from window_test
) t1
) t2
order by min_v;
I left out the DISTINCT though.

EDIT: I've came up with the following query — without window functions at all:
WITH RECURSIVE tree AS (
SELECT k, v, ''::text as next_k, 0 as next_v, 0 AS level FROM window_test
UNION ALL
SELECT c.k, c.v, t.k, t.v + level, t.level + 1
FROM tree t JOIN window_test c ON c.k = t.k AND c.v + 1 = t.v),
partitions AS (
SELECT t.k, t.v, t.next_k,
coalesce(nullif(t.next_v, 0), t.v) AS next_v, t.level
FROM tree t
WHERE NOT EXISTS (SELECT 1 FROM tree WHERE next_k = t.k AND next_v = t.v))
SELECT min(k) AS k, v AS min_v, max(next_v) AS max_v
FROM partitions p
GROUP BY v
ORDER BY 2;
I've provided 2 working queries now, I hope one of them will suite you.
SQL Fiddle for this variant.
Another way how to achieve this is to use a support sequence.
Create a support sequence:
CREATE SEQUENCE wt_rank START WITH 1;
The query:
WITH source AS (
SELECT k, v,
coalesce(lag(k) OVER (ORDER BY v), k) AS prev_k
FROM window_test
CROSS JOIN (SELECT setval('wt_rank', 1)) AS ri),
ranking AS (
SELECT k, v, prev_k,
CASE WHEN k = prev_k THEN currval('wt_rank')
ELSE nextval('wt_rank') END AS rank
FROM source)
SELECT r.k, min(s.v) AS min_v, max(s.v) AS max_v
FROM ranking r
JOIN source s ON r.v = s.v
GROUP BY r.rank, r.k
ORDER BY 2;

Would this not do the job for you, without the need for windows, partitions or coalescing. It just uses a traditional SQL trick for finding nearest tuples via a self join, and a min on the difference:
SELECT k, min(v), max(v) FROM (
SELECT k, v, v + min(d) lim FROM (
SELECT x.*, y.k n, y.v - x.v d FROM window_test x
LEFT JOIN window_test y ON x.k <> y.k AND y.v - x.v > 0)
z GROUP BY k, v, n)
w GROUP BY k, lim ORDER BY 2;
I think this is probably a more 'relational' solution, but I'm not sure about its efficiency.