How to create this Hierarchy - tsql

I have 3 tables: structures, account range and accountvalueUSD. From the hierarchy of parent and child ID in table structure, I want to create a hierarchy like this :
Level 1...Level 2...Level 3...Level 4....account...valueusd
111 112 113 114 100 1000
111 112 113 114 101 2000
The table structure links with table account range with key: financialitem
The table acountrange links with table account value with key: accountfrom and accountto with accountnumber
Can you please help me how to do it?
CREATE TABLE [dbo].[structure](
[Financialitem] [nvarchar](3) NULL,
[ID] [int] NULL,
[ParentID] [int] NULL,
[ChildID] [int] NULL,
[NextID] [int] NULL,
[Level] [int] NULL
) ON [PRIMARY]
INSERT INTO [dbo].[structure]
VALUES
(111,1,null,2,null,1),
(112,2,1,3,null,2),
(113,3,2,4,null,3),
(114,4,3,null,null,4),
(221,5,2,6,null,3),
(222,6,5,null,7,4),
(223,7,5,null,null,4)
CREATE TABLE [dbo].[accountrange](
[Financialitem] [nvarchar](3) NULL,
[Accountfrom] [int] NULL,
[Accountto] [int] NULL
) ON [PRIMARY]
INSERT INTO [dbo].[accountrange]
VALUES
(114,100,105),
(222,200,205),
(223,300,305)
CREATE TABLE [dbo].[accountvalue](
[accountnumber] [int] NULL,
[valuesUSD] [int] NULL,
) ON [PRIMARY]
INSERT INTO [dbo].[accountvalue]
VALUES
(100,1000),
(101,2000),
(301,1500),
(201,1400)

Using the data you provided, this query gives the output you specified. It's hard coded to 4 levels, so if you needed something more dynamic, then further thought would be required.
I've also assumed that the reason the 221 branch didn't appear is because it needs to match on both parent and child ID columns.
select L1.Financialitem as [Level 1]
, L2.Financialitem as [Level 2]
, L3.Financialitem as [Level 3]
, ar.Financialitem as [Level 4]
, av.accountnumber, av.valuesUSD
from accountrange ar
inner join accountvalue av on av.accountnumber between ar.Accountfrom and ar.Accountto
inner join structure as L4 on L4.Financialitem = ar.Financialitem
inner join structure as L3 on L3.ID = L4.ParentID and L3.ChildID = L4.ID
inner join structure as L2 on L2.ID = L3.ParentID and L2.ChildID = L3.ID
inner join structure as L1 on L1.ID = L2.ParentID and L1.ChildID = L2.ID

Related

JOIN with array of ids returns duplicate root records instead of just one

I'm trying to join several tables and pull out each DISTINCT root record (from table_a), but for some reason I keep getting duplicates. Here is my select query:
Fiddle
select
ta.id,
ta.table_a_name as "tableName"
from my_schema.table_a ta
left join my_schema.table_b tb
on (tb.table_a_id = ta.id)
left join my_schema.table_c tc
on (tc.table_b_id = tb.id)
left join my_schema.table_d td
on (td.id = any(tc.table_d_ids))
where td.id = any(array[100]);
This returns the following:
[
{
"id": 2,
"tableName": "Root record 2"
},
{
"id": 2,
"tableName": "Root record 2"
}
]
But I am only expecting, in this case,
[
{
"id": 2,
"tableName": "Root record 2"
}
]
What am I doing wrong here?
Here's the fiddle and, just in case, the create and insert statements below:
create schema if not exists my_schema;
create table if not exists my_schema.table_a (
id serial primary key,
table_a_name varchar (255) not null
);
create table if not exists my_schema.table_b (
id serial primary key,
table_a_id bigint not null references my_schema.table_a (id)
);
create table if not exists my_schema.table_d (
id serial primary key
);
create table if not exists my_schema.table_c (
id serial primary key,
table_b_id bigint not null references my_schema.table_b (id),
table_d_ids bigint[] not null
);
insert into my_schema.table_a values
(1, 'Root record 1'),
(2, 'Root record 2'),
(3, 'Root record 3');
insert into my_schema.table_b values
(10, 2),
(11, 2),
(12, 3);
insert into my_schema.table_d values
(100),
(101),
(102),
(103),
(104);
insert into my_schema.table_c values
(1000, 10, array[]::int[]),
(1001, 10, array[100]),
(1002, 11, array[100, 101]),
(1003, 12, array[102]),
(1004, 12, array[103]);
Short answer is use distinct, and this will get the results you want:
select distinct
ta.id,
ta.table_a_name as "tableName"
from my_schema.table_a ta
left join my_schema.table_b tb
on (tb.table_a_id = ta.id)
left join my_schema.table_c tc
on (tc.table_b_id = tb.id)
left join my_schema.table_d td
on (td.id = any(tc.table_d_ids))
where td.id = any(array[100]);
That said, this doesn't sit well with me because I assume this is not the end of your query.
The root issue is that you have two records from table_b - table_d that match this criteria. If you follow the breadcrumbs back, you will see there really are two matches:
select
ta.id,
ta.table_a_name as "tableName", tb.*, tc.*, td.*
from my_schema.table_a ta
left join my_schema.table_b tb
on (tb.table_a_id = ta.id)
left join my_schema.table_c tc
on (tc.table_b_id = tb.id)
left join my_schema.table_d td
on (td.id = any(tc.table_d_ids))
where td.id = any(array[100]);
So 'distinct' is just a lazy fix to say if there are dupes, limit it to one...
My next question is, is there more to it than this? What's supposed to happen next? Do you really just want candidates from table_a, or is this part 1 of a longer issue? If there is more to it, then there is likely a better solution than a simple select distinct.
-- edit 10/1/2022 --
Based on your comment, I have one final suggestion. Because this really all there is to your output AND you don't actually need the data from the b/c/d tables, then I think a semi-join is a better solution.
It's slightly more code (not going to win any golf or de-obfuscation contents), but it's much more efficient than a distinct or group by all columns. The reason is a distinct pulls every row result and then has to order and remove dupes. A semi-join, by contrast, will "stop looking" once it finds a match. It also scales very well. Almost every time I see a distinct misused, it's better served by a semi-join.
select
ta.id,
ta.table_a_name as "tableName"
from my_schema.table_a ta
where exists (
select null
from
table_b tb,
table_c tc,
table_d tc
where
tb.table_a_id = ta.id and
tc.table_b_id = tb.id and
td.id = any(tc.table_d_ids) and
td.id = any(array[100])
)
I didn't suggest this initially because I was unclear on the "what next."

When JOINing 3 tables in Postgres, can the results be sorted by values in the 2 joined tables?

I have a database with three tables. Ultimately I want to JOIN the three tables and sort them by a column shared by two of the tables.
A main item table with foreign keys (product_id) to the two sub-tables:
items
CREATE TABLE items (
id INT NOT NULL,
product_id varchar(40) NOT NULL,
type CHAR NOT NULL
);
and then a table corresponding to each typeA and typeB. They have differing columns, but for the sake of this exercise I'm only including the columns they have in common:
CREATE TABLE products_a (
id varchar(40) NOT NULL,
name varchar(40) NOT NULL,
price INT NOT NULL
);
CREATE TABLE products_b (
id varchar(40) NOT NULL,
name varchar(40) NOT NULL,
price INT NOT NULL
);
Some example rows:
INSERT INTO items VALUES
( 1, 'abc', 'a' ),
( 2, 'def', 'b' ),
( 3, 'ghi', 'a' ),
( 4, 'jkl', 'b' );
INSERT INTO products_a VALUES
( 'abc', 'product 1', 10 ),
( 'ghi', 'product 2', 50 );
INSERT INTO products_b VALUES
( 'def', 'product 3', 20 ),
( 'jkl', 'product 4', 100 );
I have a JOIN working, but my sorting is not interpolating the rows as I would expect.
Query:
SELECT
items.id AS item_id,
products_a.name AS product_a_name,
products_a.price AS product_a_price,
products_b.name AS product_b_name,
products_b.price AS product_b_price
FROM items
FULL JOIN products_a ON items.product_id = products_a.id
FULL JOIN products_b ON items.product_id = products_b.id
ORDER BY 3, 5 ASC;
Actual result:
item_id
product_a_name
product_a_price
product_b_name
product_b_price
1
product 1
10
NULL
NULL
3
product 2
50
NULL
NULL
2
NULL
NULL
product 3
20
4
NULL
NULL
product 4
100
Desired result:
item_id
product_a_name
product_a_price
product_b_name
product_b_price
1
product 1
10
NULL
NULL
2
NULL
NULL
product 3
20
3
product 2
50
NULL
NULL
4
NULL
NULL
product 4
100
I realize this is a weird table setup, but simplified this way looks more contrived than it is. Ultimately the sorting matches the real use case, though, and changing the DB schema is not an option. I feel like I am missing something simple here, just sorting by either one column or another. Any help is appreciated.
Use COALESCE in the ORDER BY clause to always sort by the first non NULL price:
SELECT
items.id AS item_id,
products_a.name AS product_a_name,
products_a.price AS product_a_price,
products_b.name AS product_b_name,
products_b.price AS product_b_price
FROM items
FULL JOIN products_a ON items.product_id = products_a.id
FULL JOIN products_b ON items.product_id = products_b.id
ORDER BY
COALESCE(3, 5);

How to understand the use of the ... Select 1 from ... an expression in SQL

Task:
Find the employee with the highest salary among each department
Data:
CREATE SEQUENCE employee_id_seq;
create table Employee
(
id_emp int DEFAULT nextval('employee_id_seq')
NOT NULL PRIMARY KEY UNIQUE,
name_emp varchar(255) NOT NULL,
mgr_id_fk int not null,
job_emp text NOT NULL,
salary int NOT NULL,
date_emp date NOT NULL,
dep_ID_fk int NOT NULL
);
ALTER SEQUENCE employee_id_seq
OWNED BY employee.id_emp;
create table Manager
(
id_mgr int not null primary key unique,
type_mgr varchar(255)
);
ALTER table Employee
add FOREIGN KEY (mgr_id_fk) REFERENCES Manager (id_mgr)
on update cascade
on delete set null;
create table Department
(
id_depart int NOT NULL PRIMARY KEY unique,
name_depart varchar(255) not null,
address text,
phone text
);
insert into Manager (id_mgr, type_mgr)
VALUES
(1006, 'juniormgr'),
(1004, 'middlemgr'),
(1005, 'seniormgr');
insert into Department (id_depart, name_depart, address, phone)
values (1, 'Sales', 'Sydney', '0425 198 053'),
(2, 'Accounts', 'Melbourne', '0429 198 955'),
(3, 'Admin', 'Melbourne', '0428 198 758'),
(4, 'Marketing', 'Sydney', '0427 198 757');
insert into Employee(id_emp, name_emp, mgr_id_fk, job_emp, salary, date_emp, dep_ID_fk)
values (nextval('employee_id_seq'), 'ken Adams', 1006, 'Salesman', 70000, '2008-04-12', 1),
(nextval('employee_id_seq'), 'Ru Jones', 1004, 'Salesman', 65000, '2010-01-18', 1),
(nextval('employee_id_seq'), 'Dhal Sim', 1006, 'Accountant', 88000, '2001-03-07', 2),
(nextval('employee_id_seq'), 'Ellen Honda', 1006, 'Manager', 118000, '2001-03-17', 1),
(nextval('employee_id_seq'), 'Mike Bal', 1005, 'Receptionist', 68000, '2006-06-21', 3),
(nextval('employee_id_seq'), 'Martin Bison', 1005, 'CEO', 210000, '2010-07-12', 3),
(nextval('employee_id_seq'), 'Shen Li', 1004, 'Salesman', 86000, '2014-09-18', 1),
(nextval('employee_id_seq'), 'Zang Ross', 1004, 'Salesman', 65000, '2017-02-02', 1),
(nextval('employee_id_seq'), 'Sagar Kahn', 1005, 'Salesman', 70000, '2016-03-01', 1);
Such a request will give you the necessary information :
select * from employee e
where not exists (select 1 from employee e2
where e2.dep_id_fk = e.dep_id_fk
and e2.salary > e.salary);
My reasoning:
First, a subquery will be executed and Postgres will save this temporary result
then the rest of the request will be executed
select * from employee e
where not exists ...
where not exists , excludes all matches found in the subquery
Question:
This code causes a misunderstanding of how it still works,
because everything here is illogical in my opinion.
For example, how does this work
select 1 from employee e2
where e2.dep_id_fk = e.dep_id_fk
and e2.salary > e.salary
select 1 from - what does this expression even do ?
e2.dep_id_fk = e.dep_id_fk - this is a check of the same table ( in the subquery and in the main query), but why ?
e2.salary > e.salary - and that's why ?
There are two things you need to understand:
What is a co-related subquery
How does the (NOT) EXISTS operator work.
A co-related subquery is run once for each row that is returned from the outer query. You can imagine this as a kind of nested loop, where for each row returned by the outer query, the sub-query is executed by using the values of the columns from the outer query.
So if the outer query processes the row for 'ken Adams' it will take the value 1 from dep_id_fk and the value 70000 of the column salary and essentially run:
select 1
from employee e2
where e2.dep_id_fk = 1 --<< this value is from the outer query
and e2.salary > 70000 --<< this value if from the outer query
If that query returns no rows, the row from the outer query is included in the result. Then the database proceeds with the next row in the outer query and does the same again until all rows from the outer query are processed and either included in or excluded from the result.
However the NOT EXISTS and EXISTS operators check only check for the presence of rows from the sub-query. The actual value(s) returned from the sub-query are completely irrelevant.
A lot of people incorrectly assume that select 1 is somehow cheaper than select * in the sub-query - but this is a totally wrong assumption. The expression is never even evaluated, so it's completely irrelevant what is selected there. If you think select * is more logical than select 1 than use that.
To prove that the expression is never evaluated (or even looked at), you can use one that would otherwise throw an exception.
select *
from employee e
where not exists (select 1/0
from employee e2
where e2.dep_id_fk = e.dep_id_fk
and e2.salary > e.salary);
If you run select 1/0 outside of a sub-query used for an EXISTS or NOT EXISTS condition, it would result in an error. But the EXISTS operator never even looks at the expressions in the SELECT list.
Let's try
select *
from employee e
where not exists(select 'foo'
from employee e2
where e2.dep_id_fk = e.dep_id_fk
and e2.salary > e.salary);
or
select *
from employee e
where not exists(select 42
from employee e2
where e2.dep_id_fk = e.dep_id_fk
and e2.salary > e.salary);
2 queries return same result:
Therefore, the purpose of 1 is just supports check existing.

Enforcing a unique relationship over multiple columns where one column is nullable

Given the table
ID PERSON_ID PLAN EMPLOYER_ID TERMINATION_DATE
1 123 ABC 321 2020-01-01
2 123 DEF 321 (null)
3 123 ABC 321 (null)
4 123 ABC 321 (null)
I want to exclude the 4th entry. (The 3rd entry shows the person was re-hired and therefore is a new relationship. I'm only showing relevant fields)
My first attempt was to simply create a unique index over PERSON_ID / PLAN / EMPLOYER_ID / TERMINATION_DATE, thinking that DB2 for IBMi considered nulls equal in a unique index. I was evidently wrong...
Is there a way to enforce uniqueness over these columns, or,
is there a better way to approach the value of termination date? (null is not technically correct; I'm thinking of it as more true/false, but the business logic needs a date)
Edit
According to the docs for 7.3:
UNIQUE
Prevents the table from containing two or more rows with the same value of the index key. When UNIQUE is used, all null values for a column are considered equal. For example, if the key is a single column that can contain null values, that column can contain only one null value. The constraint is enforced when rows of the table are updated or new rows are inserted.
The constraint is also checked during the execution of the CREATE INDEX statement. If the table already contains rows with duplicate key values, the index is not created.
UNIQUE WHERE NOT NULL
Prevents the table from containing two or more rows with the same value of the index key, where all null values for a column are not considered equal. Multiple null values in a column are allowed. Otherwise, this is identical to UNIQUE.
So, the behavior I'm seeing looks more like UNIQUE WHERE NOT NULL. When I generate SQL for this table, I see
ADD CONSTRAINT TERMEMPPLANSSN
UNIQUE( TERMINATION_DATE , EMPLOYERID , PLAN_CODE , SSN ) ;
(note this is showing the real field names, not the ones I used in my example)
Edit 2
Bottom line, Constraint !== Index. When I went back and created an actual index, I got the desired behavior.
CREATE TABLE PERSON
(
ID INT NOT NULL
, PERSON_ID INT NOT NULL
, PLAN CHAR(3) NOT NULL
, EMPLOYER_ID INT
, TERMINATION_DATE DATE
);
INSERT INTO PERSON (ID, PERSON_ID, PLAN, EMPLOYER_ID, TERMINATION_DATE)
VALUES
(1, 123, 'ABC', 321, DATE('2020-01-01'))
, (2, 123, 'DEF', 321, CAST(NULL AS DATE))
, (3, 123, 'ABC', 321, CAST(NULL AS DATE))
WITH NC;
--- To not allow: ---
INSERT INTO PERSON (ID, PERSON_ID, PLAN, EMPLOYER_ID, TERMINATION_DATE) VALUES
(4, 123, 'ABC', 321, CAST(NULL AS DATE))
or
(4, 123, 'ABC', 321, DATE('2020-01-01'))
You may:
CREATE UNIQUE INDEX PERSON_U1 ON PERSON
(PERSON_ID, PLAN, EMPLOYER_ID, TERMINATION_DATE);
--- To not allow: ---
INSERT INTO PERSON (ID, PERSON_ID, PLAN, EMPLOYER_ID, TERMINATION_DATE) VALUES
(4, 123, 'ABC', 321, DATE('2020-01-01'))
but allow multiple:
(X, 123, 'ABC', 321, CAST(NULL AS DATE))
(Y, 123, 'ABC', 321, CAST(NULL AS DATE))
...
You may:
CREATE UNIQUE WHERE NOT NULL INDEX PERSON_U2 ON PERSON
(PERSON_ID, PLAN, EMPLOYER_ID, TERMINATION_DATE);

Copy records into a table and update parent record id relationship

I need to duplicate records in a table that have a parent child relationship. Since the new records will have a new record id, the parent/child relationship needs to be preserved and updated to reflect the new record IDs (primary keys).
Lets say I have the following table:
asset_id parent_asset_id
1 NULL
5 1
23 1
25 23
When I copy the records into the table, they look like this:
asset_id parent_asset_id
42 NULL
43 1
44 1
45 23
I need the new asset-to-parent relationship to be as follows:
asset_id parent_asset_id
42 NULL
43 42
44 42
45 44
I am trying to use CTE (but maybe this is not the best approach). I update the current asset.asset table with the new records and then want to update the parent child relationship, but can't figure out how to join tables to get the relationship correct.
parent_asset_id is the column that needs to be have the correct relationship, all other fields stay the same.
CREATE FUNCTION asset.copy_asset_store(
IN _username VARCHAR(100),
IN _src_asset_store_id VARCHAR,
IN _dest_asset_store_id VARCHAR
)
DECLARE
v_user_id INT;
BEGIN
WITH copy_into_study AS(
UPDATE
asset.asset
SET
origin_asset_id = original.origin_asset_id,
asset_type_id = original.asset_type_id,
asset_store_id = _dest_asset_store_id,
parent_asset_id = original.parent_asset_id
asset_name = original.asset_name,
asset_desc = original.asset_desc,
is_signature_required = original.is_signature_required,
s3_bucket_id = original.s3_bucket_id,
color_id = original.color_id,
asset_expiration = original.asset_expiration,
is_deleted = original.is_deleted,
rec_created_by_user_id = v_user_id,
rec_updated_by_user_id = v_user_id
FROM
asset.asset AS original
WHERE
original.asset_store_id = _src_asset_store_id::INT
RETURNING *
),update_parent AS (
UPDATE
asset.asset
SET
parent_asset_id = copy_into_study.parent_asset_id
FROM
copy_into_study
WHERE
)