Enforcing a unique relationship over multiple columns where one column is nullable - db2

Given the table
ID PERSON_ID PLAN EMPLOYER_ID TERMINATION_DATE
1 123 ABC 321 2020-01-01
2 123 DEF 321 (null)
3 123 ABC 321 (null)
4 123 ABC 321 (null)
I want to exclude the 4th entry. (The 3rd entry shows the person was re-hired and therefore is a new relationship. I'm only showing relevant fields)
My first attempt was to simply create a unique index over PERSON_ID / PLAN / EMPLOYER_ID / TERMINATION_DATE, thinking that DB2 for IBMi considered nulls equal in a unique index. I was evidently wrong...
Is there a way to enforce uniqueness over these columns, or,
is there a better way to approach the value of termination date? (null is not technically correct; I'm thinking of it as more true/false, but the business logic needs a date)
Edit
According to the docs for 7.3:
UNIQUE
Prevents the table from containing two or more rows with the same value of the index key. When UNIQUE is used, all null values for a column are considered equal. For example, if the key is a single column that can contain null values, that column can contain only one null value. The constraint is enforced when rows of the table are updated or new rows are inserted.
The constraint is also checked during the execution of the CREATE INDEX statement. If the table already contains rows with duplicate key values, the index is not created.
UNIQUE WHERE NOT NULL
Prevents the table from containing two or more rows with the same value of the index key, where all null values for a column are not considered equal. Multiple null values in a column are allowed. Otherwise, this is identical to UNIQUE.
So, the behavior I'm seeing looks more like UNIQUE WHERE NOT NULL. When I generate SQL for this table, I see
ADD CONSTRAINT TERMEMPPLANSSN
UNIQUE( TERMINATION_DATE , EMPLOYERID , PLAN_CODE , SSN ) ;
(note this is showing the real field names, not the ones I used in my example)
Edit 2
Bottom line, Constraint !== Index. When I went back and created an actual index, I got the desired behavior.

CREATE TABLE PERSON
(
ID INT NOT NULL
, PERSON_ID INT NOT NULL
, PLAN CHAR(3) NOT NULL
, EMPLOYER_ID INT
, TERMINATION_DATE DATE
);
INSERT INTO PERSON (ID, PERSON_ID, PLAN, EMPLOYER_ID, TERMINATION_DATE)
VALUES
(1, 123, 'ABC', 321, DATE('2020-01-01'))
, (2, 123, 'DEF', 321, CAST(NULL AS DATE))
, (3, 123, 'ABC', 321, CAST(NULL AS DATE))
WITH NC;
--- To not allow: ---
INSERT INTO PERSON (ID, PERSON_ID, PLAN, EMPLOYER_ID, TERMINATION_DATE) VALUES
(4, 123, 'ABC', 321, CAST(NULL AS DATE))
or
(4, 123, 'ABC', 321, DATE('2020-01-01'))
You may:
CREATE UNIQUE INDEX PERSON_U1 ON PERSON
(PERSON_ID, PLAN, EMPLOYER_ID, TERMINATION_DATE);
--- To not allow: ---
INSERT INTO PERSON (ID, PERSON_ID, PLAN, EMPLOYER_ID, TERMINATION_DATE) VALUES
(4, 123, 'ABC', 321, DATE('2020-01-01'))
but allow multiple:
(X, 123, 'ABC', 321, CAST(NULL AS DATE))
(Y, 123, 'ABC', 321, CAST(NULL AS DATE))
...
You may:
CREATE UNIQUE WHERE NOT NULL INDEX PERSON_U2 ON PERSON
(PERSON_ID, PLAN, EMPLOYER_ID, TERMINATION_DATE);

Related

When JOINing 3 tables in Postgres, can the results be sorted by values in the 2 joined tables?

I have a database with three tables. Ultimately I want to JOIN the three tables and sort them by a column shared by two of the tables.
A main item table with foreign keys (product_id) to the two sub-tables:
items
CREATE TABLE items (
id INT NOT NULL,
product_id varchar(40) NOT NULL,
type CHAR NOT NULL
);
and then a table corresponding to each typeA and typeB. They have differing columns, but for the sake of this exercise I'm only including the columns they have in common:
CREATE TABLE products_a (
id varchar(40) NOT NULL,
name varchar(40) NOT NULL,
price INT NOT NULL
);
CREATE TABLE products_b (
id varchar(40) NOT NULL,
name varchar(40) NOT NULL,
price INT NOT NULL
);
Some example rows:
INSERT INTO items VALUES
( 1, 'abc', 'a' ),
( 2, 'def', 'b' ),
( 3, 'ghi', 'a' ),
( 4, 'jkl', 'b' );
INSERT INTO products_a VALUES
( 'abc', 'product 1', 10 ),
( 'ghi', 'product 2', 50 );
INSERT INTO products_b VALUES
( 'def', 'product 3', 20 ),
( 'jkl', 'product 4', 100 );
I have a JOIN working, but my sorting is not interpolating the rows as I would expect.
Query:
SELECT
items.id AS item_id,
products_a.name AS product_a_name,
products_a.price AS product_a_price,
products_b.name AS product_b_name,
products_b.price AS product_b_price
FROM items
FULL JOIN products_a ON items.product_id = products_a.id
FULL JOIN products_b ON items.product_id = products_b.id
ORDER BY 3, 5 ASC;
Actual result:
item_id
product_a_name
product_a_price
product_b_name
product_b_price
1
product 1
10
NULL
NULL
3
product 2
50
NULL
NULL
2
NULL
NULL
product 3
20
4
NULL
NULL
product 4
100
Desired result:
item_id
product_a_name
product_a_price
product_b_name
product_b_price
1
product 1
10
NULL
NULL
2
NULL
NULL
product 3
20
3
product 2
50
NULL
NULL
4
NULL
NULL
product 4
100
I realize this is a weird table setup, but simplified this way looks more contrived than it is. Ultimately the sorting matches the real use case, though, and changing the DB schema is not an option. I feel like I am missing something simple here, just sorting by either one column or another. Any help is appreciated.
Use COALESCE in the ORDER BY clause to always sort by the first non NULL price:
SELECT
items.id AS item_id,
products_a.name AS product_a_name,
products_a.price AS product_a_price,
products_b.name AS product_b_name,
products_b.price AS product_b_price
FROM items
FULL JOIN products_a ON items.product_id = products_a.id
FULL JOIN products_b ON items.product_id = products_b.id
ORDER BY
COALESCE(3, 5);

How to understand the use of the ... Select 1 from ... an expression in SQL

Task:
Find the employee with the highest salary among each department
Data:
CREATE SEQUENCE employee_id_seq;
create table Employee
(
id_emp int DEFAULT nextval('employee_id_seq')
NOT NULL PRIMARY KEY UNIQUE,
name_emp varchar(255) NOT NULL,
mgr_id_fk int not null,
job_emp text NOT NULL,
salary int NOT NULL,
date_emp date NOT NULL,
dep_ID_fk int NOT NULL
);
ALTER SEQUENCE employee_id_seq
OWNED BY employee.id_emp;
create table Manager
(
id_mgr int not null primary key unique,
type_mgr varchar(255)
);
ALTER table Employee
add FOREIGN KEY (mgr_id_fk) REFERENCES Manager (id_mgr)
on update cascade
on delete set null;
create table Department
(
id_depart int NOT NULL PRIMARY KEY unique,
name_depart varchar(255) not null,
address text,
phone text
);
insert into Manager (id_mgr, type_mgr)
VALUES
(1006, 'juniormgr'),
(1004, 'middlemgr'),
(1005, 'seniormgr');
insert into Department (id_depart, name_depart, address, phone)
values (1, 'Sales', 'Sydney', '0425 198 053'),
(2, 'Accounts', 'Melbourne', '0429 198 955'),
(3, 'Admin', 'Melbourne', '0428 198 758'),
(4, 'Marketing', 'Sydney', '0427 198 757');
insert into Employee(id_emp, name_emp, mgr_id_fk, job_emp, salary, date_emp, dep_ID_fk)
values (nextval('employee_id_seq'), 'ken Adams', 1006, 'Salesman', 70000, '2008-04-12', 1),
(nextval('employee_id_seq'), 'Ru Jones', 1004, 'Salesman', 65000, '2010-01-18', 1),
(nextval('employee_id_seq'), 'Dhal Sim', 1006, 'Accountant', 88000, '2001-03-07', 2),
(nextval('employee_id_seq'), 'Ellen Honda', 1006, 'Manager', 118000, '2001-03-17', 1),
(nextval('employee_id_seq'), 'Mike Bal', 1005, 'Receptionist', 68000, '2006-06-21', 3),
(nextval('employee_id_seq'), 'Martin Bison', 1005, 'CEO', 210000, '2010-07-12', 3),
(nextval('employee_id_seq'), 'Shen Li', 1004, 'Salesman', 86000, '2014-09-18', 1),
(nextval('employee_id_seq'), 'Zang Ross', 1004, 'Salesman', 65000, '2017-02-02', 1),
(nextval('employee_id_seq'), 'Sagar Kahn', 1005, 'Salesman', 70000, '2016-03-01', 1);
Such a request will give you the necessary information :
select * from employee e
where not exists (select 1 from employee e2
where e2.dep_id_fk = e.dep_id_fk
and e2.salary > e.salary);
My reasoning:
First, a subquery will be executed and Postgres will save this temporary result
then the rest of the request will be executed
select * from employee e
where not exists ...
where not exists , excludes all matches found in the subquery
Question:
This code causes a misunderstanding of how it still works,
because everything here is illogical in my opinion.
For example, how does this work
select 1 from employee e2
where e2.dep_id_fk = e.dep_id_fk
and e2.salary > e.salary
select 1 from - what does this expression even do ?
e2.dep_id_fk = e.dep_id_fk - this is a check of the same table ( in the subquery and in the main query), but why ?
e2.salary > e.salary - and that's why ?
There are two things you need to understand:
What is a co-related subquery
How does the (NOT) EXISTS operator work.
A co-related subquery is run once for each row that is returned from the outer query. You can imagine this as a kind of nested loop, where for each row returned by the outer query, the sub-query is executed by using the values of the columns from the outer query.
So if the outer query processes the row for 'ken Adams' it will take the value 1 from dep_id_fk and the value 70000 of the column salary and essentially run:
select 1
from employee e2
where e2.dep_id_fk = 1 --<< this value is from the outer query
and e2.salary > 70000 --<< this value if from the outer query
If that query returns no rows, the row from the outer query is included in the result. Then the database proceeds with the next row in the outer query and does the same again until all rows from the outer query are processed and either included in or excluded from the result.
However the NOT EXISTS and EXISTS operators check only check for the presence of rows from the sub-query. The actual value(s) returned from the sub-query are completely irrelevant.
A lot of people incorrectly assume that select 1 is somehow cheaper than select * in the sub-query - but this is a totally wrong assumption. The expression is never even evaluated, so it's completely irrelevant what is selected there. If you think select * is more logical than select 1 than use that.
To prove that the expression is never evaluated (or even looked at), you can use one that would otherwise throw an exception.
select *
from employee e
where not exists (select 1/0
from employee e2
where e2.dep_id_fk = e.dep_id_fk
and e2.salary > e.salary);
If you run select 1/0 outside of a sub-query used for an EXISTS or NOT EXISTS condition, it would result in an error. But the EXISTS operator never even looks at the expressions in the SELECT list.
Let's try
select *
from employee e
where not exists(select 'foo'
from employee e2
where e2.dep_id_fk = e.dep_id_fk
and e2.salary > e.salary);
or
select *
from employee e
where not exists(select 42
from employee e2
where e2.dep_id_fk = e.dep_id_fk
and e2.salary > e.salary);
2 queries return same result:
Therefore, the purpose of 1 is just supports check existing.

Grouping user id columns together with string_agg on PostgreSQL 13

This is my emails table
create table emails (
id bigint not null primary key generated by default as identity,
name text not null
);
And contacts table:
create table contacts (
id bigint not null primary key generated by default as identity,
email_id bigint not null,
user_id bigint not null,
full_name text not null,
ordering int not null
);
As you can see I have user_id field here. There can be multiple same user ID's on my result so i want to join them using comma ,
Insert some data to the tables:
insert into emails (name)
values
('dennis1'),
('dennis2');
insert into contacts (id, email_id, user_id, full_name, ordering)
values
(5, 1, 1, 'dennis1', 9),
(6, 2, 1, 'dennis1', 5),
(7, 2, 1, 'dennis1', 1),
(8, 1, 3, 'john', 2),
(9, 2, 4, 'dennis7', 1),
(10, 2, 4, 'dennis7', 1);
My query is:
select em.name,
c.user_ids
from emails em
join (
select email_id, string_agg(user_id::text, ',' order by ordering desc) as user_ids
from contacts
group by email_id
) c on c.email_id = em.id
order by em.name;
Actual Result
name user_ids
dennis1 1,3
dennis2 1,1,4,4
Expected Result
name user_ids
dennis1 1,3
dennis2 1,4
On my real-world data, I get same user id like 50 times. Instead it should appear 1 time only. In example above, you see user 1 and 4 appears 2 times for dennis2 user.
How can I unique them?
Demo: https://dbfiddle.uk/?rdbms=postgres_13&fiddle=2e957b52eb46742f3ddea27ec36effb1
P.S: I tried to add user_id it to group by but this time I get duplicate rows...
demo:db<>fiddle
SELECT
name,
string_agg(user_id::text, ',' order by ordering desc)
FROM (
SELECT DISTINCT ON (em.id, c.user_id)
*
FROM emails em
JOIN contacts c ON c.email_id = em.id
) s
GROUP BY name
Join the tables
DISTINCT ON email and the user_id, so for every email record, there is no equal users
Aggregate

How to filter a query based on a jsonb data?

Not even sure if it's possible to do this kind of query in postgres. At least i'm stuck.
I have two tables: a product recommendation list, containing multiple products to be recommended to a particular customer; and a transaction table indicating the product bought by customer and transaction details.
I'm trying to track the performance of my recommendation by plotting all the transaction that match the recommendations (both customer and product).
Below is my test case.
Kindly help
create table if not exists productRec( --Product Recommendation list
task_id int,
customer_id int,
detail jsonb);
truncate productRec;
insert into productRec values (1, 2, '{"1":{"score":5, "name":"KitKat"},
"4":{"score":2, "name":"Yuppi"}
}'),
(1, 3, '{"1":{"score":3, "name":"Yuppi"},
"4":{"score":2, "name":"GoldenSnack"}
}'),
(1, 4, '{"1":{"score":3, "name":"Chickies"},
"4":{"score":2, "name":"Kitkat"}
}');
drop table txn;
create table if not exists txn( --Transaction table
customer_id int,
item_id text,
txn_value numeric,
txn_date date);
truncate txn;
insert into txn values (1, 'Yuppi', 500, DATE '2001-01-01'), (2, 'Kitkat', 2000, DATE '2001-01-01'),
(3, 'Kitkat', 2000, DATE '2001-02-01'), (4, 'Chickies', 200, DATE '2001-09-01');
--> Query must plot:
--Transaction value vs date where the item_id is inside the recommendation for that customer
--ex: (2000, 2001-01-01), (200, 2001-09-01)
We can get each recommendation as its own row with jsonb_each. I don't know what to do with the keys so I just take the value (still jsonb) and then the name inside it (the ->> outputs text).
select
customer_id,
(jsonb_each(detail)).value->>'name' as name
from productrec
So now we have a list of customer_ids and item_ids they were recommended. Now we can just join this with the transactions.
select
txn.txn_value,
txn.txn_date
from txn
join (
select
customer_id,
(jsonb_each(detail)).value->>'name' as name
from productrec
) p ON (
txn.customer_id = p.customer_id AND
lower(txn.item_id) = lower(p.name)
);
In your example data you spelled Kitkat differently in the recommendation table for customer 2. I added lowercasing in the join condition to counter that but it might not be the right solution.
txn_value | txn_date
-----------+------------
2000 | 2001-01-01
200 | 2001-09-01
(2 rows)

Transpose row values to columns ( join table with a bag of key value pairs in T-SQL)

I have two tables,
table1 ( Id, date, info1)
table2 ( Id, date, nvarchar(50) Key, nvarchar(50) Value)
I would like to join these tables and obtain rows where each value in the Key column is a new column, and the values in the value table are the data in the rows.
Example:
table1 row:
1, 2010-01-01, 234
table2 row:
1, 2010-01-01, 'TimeToProcess', '15'
1, 2010-01-01, 'ProcessingStatus', 'Complete'
The result desired is a row like:
1, 2010-01-01, 234, '15', 'Complete'
Where the column headers are (Id, date, info1, TimeToProcess, ProcessingStatus)
This transposition looks like it works similarly to PIVOT but I could not get it to work -- I suspect -- due to the nvarchar(50) type of the Key, Value columns and the fact that I am forced to use an aggregate function when in fact I do not need it.
I could use inner joins to achieve this but the only way I know how to do it would take 1 inner join per Key which in my case amounts to 6 inner joins as that's how many metrics I have.
How can I do this?
You're on the right track with PIVOT. You can PIVOT on a string value. Use the MIN or MAX aggregate function.