Modeling mutual exclusion & co-relation in relational database schema - postgresql

If A, B & C are attributes with their values as,
A -> {1}
B -> {2,5,9}
C -> {11,12}
A & B are correlated (A cannot exist without B).
When A = 1, B can be 5 or 9, B cannot be 2.
B & C are correlated, when B is 5, C can be 11, C cannot be 12.
Ex: So, when C = 11, then B = 5, A = 1
how do i model this relationship in a relational schema or is there a better way to represent it?
What i have so far as attribute table.
ID | Attribute | value
----------------------
1 | A | 1
2 | B | 2
3 | B | 5
4 | B | 9
5 | C | 11
6 | C | 12
and the correlation table, ID1 & ID2 are foreign keys to the attribute table and are together composite primary key.
ID1 | ID2
---------
1 | 3
1 | 4
3 | 11

Related

KSQL return top-n rows

I'd like to have a table, which only contains the top n rows per group. Consider this table:
Field | Type
--------------------------------------------
a | VARCHAR(STRING) (primary key)
b | VARCHAR(STRING) (primary key)
c | VARCHAR(STRING) (primary key)
d | INTEGER
For every group (denoted by the primary key), I need e. g. the 10 rows with the highest value in column d. d is an aggregation over {a, b, c}, which sums up column c. This works pretty easy in normal SQL with ROW_NUMBER(), as described here: How do I use ROW_NUMBER()? , where you simply assign a number to every row, which depicts the row's placement in descending order depending on the value of column d.
Unfortunately, ksql doesn't support subqueries yet, which you need for ROW_NUMBER(). https://github.com/confluentinc/ksql/issues/745 I'm also not quite sure whether ksql supports ROW_NUMBER(), I think not, since I haven't found anything in the documentation and didn't manage to run it by myself.
I also found the TOPK function in ksql, but that doesn't seem to work as expected. https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/aggregate-functions/#topk
There is an issue about it on GitHub https://github.com/confluentinc/ksql/issues/403 . When I run it by myself, I obtain a column with an array with the top n values for {a, b}, which I can't map to the according values of column c. So that doesn't work as well. Example of what TOPK yields for k = 5:
a | b | d
--------------------------------------------
val1 | val2 | [10, 8, 6, 4, 2]
val1 | val4 | [7, 3, 3, 1, 0]
val1 | val6 | [5, 4, 3, 2, 1]
Here is an example of what I actually need, assuming that {10, 8, 6, 4, 2} are the 5 biggest values of d. For {a, b}, I want the values of c with the 5 biggest values of d.
a | b | c | d
--------------------------------------------
val1 | val2 | val3 | 10
val1 | val2 | val8 | 8
val1 | val2 | val9 | 6
val1 | val2 | val10 | 4
val1 | val2 | val11 | 2
Now, is there any possibility to do a top-n query in ksql? Or is it on the roadmap for future releases? Thanks.

KDB: transpose a dual keyed table to a matrix

How can i transpose a dual keyed dictionay(x,y)
x y | z
- - | -
1 a | data
2 a | data
3 a | data
4 a | data
5 a | data
1 b | data
2 b | data
3 b | data
4 b | data
5 b | data
1 c | data
2 c | data
3 c | data
4 c | data
5 c | data
to matrix style structure
x\y 1 2 3 4 5
------------
a |data.......
b |data.......
c |data.......
I am not sure how to start. I have the concept of using flip group twice.
Can anyone help?
I believe you want a pivot table. http://code.kx.com/q/cookbook/pivoting-tables/
Nick Psaris has a nice pivot function on his github from the qtips book;
pivot:{[t]
u:`$string asc distinct last f:flip key t;
pf:{x#(`$string y)!z};
p:?[t;();g!g:-1_ k;(pf;`u;last k:key f;last key flip value t)];
p}
q)t:2!ungroup ([]x:1+til 5;y:5#enlist `a`b`c;z:`data)
q)pivot 2!`y`x`z xcols 0!t
y| 1 2 3 4 5
-| ------------------------
a| data data data data data
b| data data data data data
c| data data data data data

postgresql find similar word groups

I have a table1 containing a column A, where ~100,000 strings (varchar) are stored. Unfortunately, each string has multiple words which are seperated with spaces. Further they have different length, i.e. one string can consist of 3 words while an other string contains 7 words.
Then I have a column B stored in a second table2 which contains only 100 strings in the same manner. Hence, multiple words per string, seperated by spaces.
The target is, to look how likely a record of Column B is matching with probably multiple records of column A based on the words. The result should also have a ranking. I was thinking of using full text search in a loop but I don't know how to do this, or if there is a proper way to achieve this?
I don't know if you can "tturn" table to a dictionary to use full text for ranking here. But you can query it with some primityve ranking quite easily, eg:
t=# with a(a) as (values('a b c'),('a c d'),('b e f'),('r b t'),('q w'))
, b(i,b) as (values(1,'a b'), (2,'e'), (3,'b'))
, p as (select unnest(string_to_array(b.b,' ')) arr,i from b)
select a phrases,arr match_words,count(1) over (partition by arr) words_in_matches, count(1) over (partition by i) matches,i from a left join p on a.a like '%'||arr||'%';
phrases | match_words | words_in_matches | matches | i
---------+-------------+------------------+---------+---
r b t | b | 6 | 5 | 1
a b c | b | 6 | 5 | 1
b e f | b | 6 | 5 | 1
a b c | a | 2 | 5 | 1
a c d | a | 2 | 5 | 1
b e f | e | 1 | 1 | 2
r b t | b | 6 | 3 | 3
a b c | b | 6 | 3 | 3
b e f | b | 6 | 3 | 3
q w | | 1 | 1 |
(10 rows)
phrases are rows from your big table.
match_words are tokens from your small table (splitted by spaces)
words_in_matches amount of tokens in phrases
matches is amount of matches in big table phrases from small table phrases
i index of phrase from small table
So you can order by third or fourth column to get some sort of ranking...

Unions of intersecting sets

I have a table representing sets of records like
set_id | record_id
a | 1
a | 2
a | 3
b | 2
b | 4
b | 5
c | 6
c | 7
d | 9
d | 11
e | 10
f | 11
f | 12
I want to yield output like this
output
{1, 2, 3, 4, 5}
{6, 7}
{9, 11, 12}
{10}
Where intersecting sets are combined (notice set a and set b have been combined; d and f have also been combined).
Is there a good way of doing this with SQL, not a stored procedure. I know that I'm looking for a kind of Union-Find procedure.
prepare:
so=> create table so75(set_id text, record_id int);
CREATE TABLE
so=> copy so75 from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> ^CERROR: COPY from stdin failed: canceled by user
CONTEXT: COPY so75, line 1
so=> copy so75 from stdin delimiter '|';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> a | 1
a | 2
a | 3
b | 2
b | 4
b | 5
c | 6
c | 7
d | 9
d | 11
e | 10
f | 11
f | 12
>> >> >> >> >> >> >>
>> \.
COPY 14
qry:
so=> with keys as (
with a as (
select *,count(1) over (partition by record_id) c, array_agg(set_id) over(partition by record_id) cc
from so75
)
select set_id, cc
from a where c > 1
)
select distinct array_agg(distinct record_id)
from so75
left outer join keys on keys.set_id = so75.set_id
group by case when array_length(cc,1) > 1 then cc::text else so75.set_id end;
array_agg
-------------
{6,7}
{10}
{1,2,3,4,5}
{9,11,12}
(4 rows)

postgres counting one record twice if it meets certain criteria

I thought that the query below would naturally do what I explain, but apparently not...
My table looks like this:
id | name | g | partner | g2
1 | John | M | Sam | M
2 | Devon | M | Mike | M
3 | Kurt | M | Susan | F
4 | Stacy | F | Bob | M
5 | Rosa | F | Rita | F
I'm trying to get the id where either the g or g2 value equals 'M'... But, a record where both the g and g2 values are 'M' should return two lines, not 1.
So, in the above sample data, I'm trying to return:
$q = pg_query("SELECT id FROM mytable WHERE ( g = 'M' OR g2 = 'M' )");
1
1
2
2
3
4
But, it always returns:
1
2
3
4
Your query doesn't work because each row is returned only once whether it matches one or both of the conditions. To get what you want use two queries and use UNION ALL to combine the results:
SELECT id FROM mytable WHERE g = 'M'
UNION ALL
SELECT id FROM mytable WHERE g2 = 'M'
ORDER BY id
Result:
1
1
2
2
3
4
you might try a UNION along these lines:
"SELECT id FROM mytable WHERE ( g = 'M') UNION SELECT id FROM mytable WHERE ( g2 = 'M')"
Hope this helps, Martin
SELECT id FROM mytable WHERE g = 'M'
UNION
SELECT id FROM mytable WHERE g2 = 'M'