Select rows with and without match of join - postgresql

This – allegedly easy – task currently I cannot solve.
SQL Fiddle
http://sqlfiddle.com/#!17/90dce/1
Schema
Given this schema and data
CREATE TABLE asset (
"id" BIGINT NULL DEFAULT NULL,
"name" TEXT NULL DEFAULT NULL,
PRIMARY KEY ("id")
);
CREATE INDEX IF NOT EXISTS "IDX_id" ON asset (id);
CREATE TABLE category (
"id" BIGINT NULL DEFAULT NULL,
"ctype" TEXT NULL DEFAULT NULL,
"name" TEXT NULL DEFAULT NULL,
PRIMARY KEY ("id")
);
CREATE INDEX IF NOT EXISTS "IDX_id" ON category (id);
CREATE TABLE asset_category (
"asset_id" BIGINT NULL DEFAULT NULL,
"category_id" BIGINT NULL DEFAULT NULL,
CONSTRAINT "FK_asset_id" FOREIGN KEY ("asset_id") REFERENCES "asset" ("id") ON UPDATE CASCADE ON DELETE SET NULL,
CONSTRAINT "FK_category_id" FOREIGN KEY ("category_id") REFERENCES "category" ("id") ON UPDATE CASCADE ON DELETE SET NULL,
UNIQUE (asset_id, category_id)
);
INSERT INTO asset (id, "name") VALUES(1, 'Awesome Asset with a hit');
INSERT INTO asset (id, "name") VALUES(2, 'Great Asset without a hit');
INSERT INTO category (id, "name", "ctype") VALUES(1, 'First Category', NULL);
INSERT INTO category (id, "name", "ctype") VALUES(2, 'Second Category', 'directory');
INSERT INTO asset_category ("asset_id", "category_id") VALUES(1, 1);
INSERT INTO asset_category ("asset_id", "category_id") VALUES(1, 2);
INSERT INTO asset_category ("asset_id", "category_id") VALUES(2, 1);
Task
I want to get all assets with their category Id (in case they have one of type "directory". Otherwise NULL as category.
See my query below, I wrote two joins letting me limit the results in the ON clause. However, since both are related to the other category, the first JOIN hinders me to get a clean result.
What I tried
This query Query A
SELECT a.id "assetId", c.id "categoryId"
FROM asset a
LEFT JOIN asset_category ac ON ac.asset_id = a.id
left join category c on (
c.id = ac.category_id
AND
c.ctype = 'directory'
)
restulting in:
assetId categoryId
1 (null)
1 2
2 (null)
That is almost good, except, assetId 1 appears twice. This probably due to first JOIN, which creates a relation to assetcategory and the other category not of type 'directory'. Same as assetId 2.
Query B uses inner join:
SELECT a.id "assetId", c.id "categoryId"
FROM asset a
LEFT JOIN asset_category ac ON ac.asset_id = a.id
inner join category c on (
c.id = ac.category_id
AND
c.ctype = 'directory'
)
resulting in
assetId categoryId
1 2
However, here the problem is, it hides asset with id 2 for me as join is not successfully resolving asset id 2.
Desired output
assetId | categoryId
1 | 2
2 | null
I would be really happy about this seemingly simple task.

demo:db<>fiddle
Your first query is a good approach. It seems you wanted only one record per id. This is what is DISTINCT ON for:
SELECT DISTINCT ON (a.id)
a.id, c.id
FROM asset a
LEFT JOIN asset_category ac ON a.id = ac.asset_id
LEFT JOIN category c ON c.id = ac.category_id AND c."ctype" = 'directory'
ORDER BY a.id, ctype NULLS LAST
So, just order your joined result by id first, and order ctype = NULL records to bottom, which makes the directory values bubble up being the first one. DISTINCT ON takes the first record for each id afterwards which is the one you expect.

Related

Avoid duplicates when migrating from one table to another

I need to migrate data from an old Books table:
create table dbo.Books_OLD (
Id int identity not null constraint PK_Books_OLD_Id primary key (Id),
Title nvarchar (200) not null,
Image varbinary (max) null,
Preview varbinary (max) null
)
To a new table structure:
create table dbo.Books (
Id int identity not null constraint PK_Books_Id primary key (Id),
Title nvarchar (200) not null
)
create table dbo.Files (
Id int identity not null constraint PK_Files_Id primary key (Id),
Content varbinary (max) null,
Name nvarchar (280) null
)
create table dbo.BookFiles (
BookId int not null,
FileId int not null,
constraint PK_BookFiles_Id primary key (BookId, FileId)
)
alter table dbo.BookFiles
add constraint FK_BookFiles_BookId foreign key (BookId) references Books(Id) on delete cascade on update cascade,
constraint FK_BookFiles_FileId foreign key (FileId) references Files(Id) on delete cascade on update cascade;
The migration should run as follows:
Books_OLD.Title => Create new Book with given Title value
Books_OLD.Image => Create new File with Image content.
Create new BookFile to associate File to Book.
Books_OLD.Preview => Create new File with Preview content.
Create new BookFile to associate File to Book.
I was able to migrate the data but somehow when I run this:
select FileId
from BookFiles
group by FileId
having count(*) > 1;
I have duplicates. I should not have duplicate FileIds. What am I missing?
The migration code I have is:
DECLARE #BOOKS table (
BookId int,
Image varbinary(max),
Preview varbinary(max)
)
MERGE Books AS d
USING Books_OLD AS s
ON 0 = 1
WHEN NOT MATCHED
THEN INSERT (Title)
VALUES (s.Title)
OUTPUT INSERTED.Id, s.Image, s.Preview
INTO #BOOKS;
INSERT Files (Content, Created)
SELECT t.Content, GETUTCDATE()
FROM #BOOKS i
CROSS APPLY (VALUES (Preview, 'Preview'), (Image, 'Image')) t(Content, ContentType)
WHERE Content IS NOT NULL
INSERT BookFiles (BookId, FileId)
SELECT i.BookId, f.Id
FROM #BOOKS i
JOIN Files f
ON f.Content = i.Image
UNION ALL
SELECT i.BookId, f.Id
FROM #BOOKS i
JOIN Files f
ON f.Content = i.Preview
Some Books can have two files (Image and Preview) so BookId can appear more than once in BooksFiles.
But each file (Image or Preview) in Books_OLD table should only be associated with one Book. So it is strange that I have duplicated FileId in BookFiles.
What am I missing?
If you have the same image or preview for different book in your Books_Old, in your original code from this part:
INSERT BookFiles (BookId, FileId)
SELECT i.BookId, f.Id
FROM #BOOKS i
JOIN Files f
ON f.Content = i.Image
It will return you more results when doing the INNER JOIN because two image or preview from different books can be joined. And the duplicate FileId is actually a bad record, because the BookId is not correspond to that particular Image or Preview even though they are the same.
What you could do is have another table variable called #Files, similar to the Files table structure, you just need to add one more column, which is BookId, then:
INSERT BookFiles (BookId, FileId)
SELECT i.BookId, f.Id
FROM #BOOKS i
JOIN #Files f
ON f.Content = i.Image
AND f.BookId = i.BookId --added joining condition
--assume code before has inserted bookId into `#Files`
So at last, you pick all the needed columns from #Files, insert them to Files.
UPDATE: please refer below for the full codes:
DECLARE #BOOKS table (
BookId int,
Image varbinary(max),
Preview varbinary(max)
)
--Added #File Variable
DECLARE #Files table
(
BookId int,
Content varbinary (max) null,
Created nvarchar (280) null,
Id int identity(1,1) not null primary key
)
MERGE Books AS d
USING Books_OLD AS s
ON 0 = 1
WHEN NOT MATCHED
THEN INSERT (Title)
VALUES (s.Title)
OUTPUT INSERTED.Id, s.Image, s.Preview
INTO #BOOKS;
INSERT #Files (BookId,Content, Created) --
SELECT i.BookId,t.Content, GETUTCDATE()
FROM #BOOKS i
CROSS APPLY (VALUES (Preview, 'Preview'), (Image, 'Image')) t(Content, ContentType)
WHERE Content IS NOT NULL
INSERT BookFiles (BookId, FileId)
SELECT i.BookId, f.Id
FROM #BOOKS i
JOIN #Files f
ON f.Content = i.[Image]
AND f.BookId = i.BookId --added joining condition
UNION ALL
SELECT i.BookId, f.Id
FROM #BOOKS i
JOIN #Files f
ON f.Content = i.Preview
AND f.BookId = i.BookId --added joining condition
--Last insert all needed from #File into File
INSERT INTO Files (Content, Created)
SELECT content,Created
FROM #Files
PS: Not sure whether there is a typo for dbo.File, you have Name in your table definition, but when inserting, its Created

How to load data as nested JSONB from non-JSONB postgres tables

I'm trying to construct an object for use from my postgres backend. The tables in question look something like this:
We have some Things that essentially act as rows for a matrix where the columns are Field_Columns. Field_Values are filled cells.
Create Table Platform_User (
serial id PRIMARY KEY
)
Create Table Things (
serial id PRIMARY KEY,
INTEGER user_id REFERENCES Platform_User(id)
)
Create Table Field_Columns (
serial id PRIMARY KEY,
TEXT name,
)
Create Table Field_Values (
INTEGER field_column_id REFERENCES Field_Columns(id),
INTEGER thing_id REFERENCES Things(id)
TEXT content,
PRIMARY_KEY(field_column_id, thing_id)
)
This would be simple if I were trying to load just the Field_Values for a single Thing as JSON, which would look like this:
SELECT JSONB_OBJECT(
ARRAY(
SELECT name
FROM Field_Columns
ORDER BY Field_Columns.id
),
ARRAY(
SELECT Field_Values.content
FROM Fields_Columns
LEFT JOIN Field_Values ON Field_Values.field_column_id = Field_Columns.id
AND Field_Values.thing_id = Things.id
ORDER BY Field_Columns.id)
)
)
FROM Things
WHERE Thing.id = $1
however, I'd like to construct the JSON object to look like this when returned. I want to get an object of all the Fields:Field_Values objects for the Things that a user owns
{
14:
{
'first field':'asdf',
'other field':''
}
25:
{
'first field':'qwer',
'other field':'dfgdsfg'
}
43:
{
'first field':'',
'other field':''
}
}
My efforts to construct this query look like this, but I'm running into the problem where the JSONB object function doesn't want to construct an object where the value of the field is an object itself
SELECT (
JSONB_OBJECT(
ARRAY(SELECT Things.id::TEXT
FROM Things
WHERE Things.user_id = $2
ORDER BY Things.id
),
ARRAY(SELECT JSONB_OBJECT(
ARRAY(
SELECT name
FROM Field_Columns
ORDER BY Field_Columns.id),
ARRAY(
SELECT Field_Values.content
FROM Field_Columns
LEFT JOIN Field_Values ON Field_Values.field_column_Id = Field_Columns.id
AND Field_Values.thing_id = Things.id
ORDER BY Field_Columns.id)
)
FROM Things
WHERE Things.user_id = $2
ORDER BY Things.id
)
)
) AS thing_fields
The specific error I get is function jsonb_object(text[], jsonb[]) does not exist. Is there a way to do this that doesn't involve copious text conversions and nonsense like that? Or will I just need to abandon trying to sort my data in the query and do it in my code instead.
Your DDL scripts are syntactically incorrect so I created these for you:
create table platform_users (
id int8 PRIMARY KEY
);
create table things (
id int8 PRIMARY KEY,
user_id int8 REFERENCES platform_users(id)
);
create table field_columns (
id int8 PRIMARY KEY,
name text
);
create table field_values (
field_column_id int8 REFERENCES field_columns(id),
thing_id int8 REFERENCES things(id),
content text,
PRIMARY KEY(field_column_id, thing_id)
);
I also created some scripts to populate the db:
insert into platform_users(id) values (1);
insert into platform_users(id) values (2);
insert into platform_users(id) values (3);
insert into platform_users(id) values (4);
insert into platform_users(id) values (5);
insert into things(id, user_id) values(1, 1);
insert into things(id, user_id) values(2, 1);
insert into things(id, user_id) values(3, 2);
insert into things(id, user_id) values(4, 2);
insert into field_columns(id, name) values(1, 'col1');
insert into field_columns(id, name) values(2, 'col2');
insert into field_values(field_column_id, thing_id, content) values(1, 1, 'thing1 val1');
insert into field_values(field_column_id, thing_id, content) values(2, 1, 'thing1 val2');
insert into field_values(field_column_id, thing_id, content) values(1, 2, 'thing2 val1');
insert into field_values(field_column_id, thing_id, content) values(2, 2, 'thing2 val2');
Please include such scripts next time when you ask for help, and make sure that your scripts are correct. This will reduce the work needed to answer your question.
You can get your jsonb value by aggregating the key value pairs with jsonb_object_agg
select
t.id,
jsonb_object_agg(fc.name, fv.content)
from
things t inner join
field_values fv on fv.thing_id = t.id inner join
field_columns fc on fv.field_column_id = fc.id
group by 1
The results looking like this:
thing_id;jsonb_value
1;"{"col1": "thing1 val1", "col2": "thing1 val2"}"
2;"{"col1": "thing2 val1", "col2": "thing2 val2"}"

Merging columns from 2 different tables to apply aggregate function

I have below 3 Tables
Create table products(
prod_id character(20) NOT NULL,
name character varying(100) NOT NULL,
CONSTRAINT prod_pkey PRIMARY KEY (prod_id)
)
Create table dress_Sales(
prod_id character(20) NOT NULL,
dress_amount numeric(7,2) NOT NULL,
CONSTRAINT prod_pkey PRIMARY KEY (prod_id),
CONSTRAINT prod_id_fkey FOREIGN KEY (prod_id)
REFERENCES products (prod_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
Create table sports_Sales(
prod_id character(20) NOT NULL,
sports_amount numeric(7,2) NOT NULL,
CONSTRAINT prod_pkey PRIMARY KEY (prod_id),
CONSTRAINT prod_id_fkey FOREIGN KEY (prod_id)
REFERENCES products (prod_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
I want to get the Sum and Average sales amount form both the tables(Only for the Selected Prod_id). I have tried the below code but it's not producing any value.
select sum(coalesce(b.dress_amount, c.sports_amount)) as total_Amount
from products a JOIN dress_sales b on a.prod_id = b.prod_id
JOIN sports_sales c on a.prod_id = c.prod_id and a.prod_id = ANY( {"123456","456789"}')`
Here 1000038923 is in dress_sales table and 8002265822 is in sports_sales.
Looks like your product can exist in only one table (dress_sales or sports_sales).
In this case you should use left join:
select
sum(coalesce(b.dress_amount, c.sports_amount)) as total_amount,
avg(coalesce(b.dress_amount, c.sports_amount)) as avg_amount
from products a
left join dress_sales b using(prod_id)
left join sports_sales c using(prod_id)
where
a.prod_id in ('1', '2');
If you use inner join (which is default) the product row will not appear in the result set as it will not be joined with either dress_sales or sports_sales.
If you have a product that appears in both tables you can use a subquery that can handle both dress_amount and sports_amount values.
select sum(combined.amount), avg(combined.amount)
from
(select prod_id, dress_amount as amount from dress_sales
union all
select prod_id, sports_amount as amount from sports_sales) combined
where
combined.prod_id in ('1','2');

Selecting one specific data row (required), and 3 others (specific data row must be included)

I need to select a specific row and 2 other rows that is not that specific row (a total of 3). The specific row must always be included in the 3 results. How should I go about it? I think it can be done with a UNION ALL, but do I have another choice? Thanks all! :)
Here are my scripts to create the sample tables:
create table users (
user_id serial primary key,
user_name varchar(20) not null
);
create table result_table1 (
result_id serial primary key,
user_id int4 references users(user_id),
result_1 int4 not null
);
create table result_table2 (
result_id serial primary key,
user_id int4 references users(user_id),
result_2 int4 not null
);
insert into users (user_name) values ('Kevin'),('John'),('Batman'),('Someguy');
insert into result_table1 (user_id, result_1) values (1, 20),(2, 40),(3, 70),(4, 42);
insert into result_table2 (user_id, result_2) values (1, 4),(2, 3),(3, 7),(4, 5);
Here is my UNION query:
SELECT result_table1.user_id,
result_1,
result_2
FROM result_table1
INNER JOIN (
SELECT user_id
FROM users
) users
ON users.user_id = result_table1.user_id
INNER JOIN (
SELECT result_table2.user_id,
result_2
FROM result_table2
) result_table2
ON result_table2.user_id = result_table1.user_id
WHERE users.user_id = 1
UNION ALL
SELECT result_table1.user_id,
result_1,
result_2
FROM result_table1
INNER JOIN (
SELECT user_id
FROM users
) users
ON users.user_id = result_table1.user_id
INNER JOIN (
SELECT result_table2.user_id,
result_2
FROM result_table2
) result_table2
ON result_table2.user_id = result_table1.user_id
WHERE users.user_id != 1
LIMIT 3;
Are there any options other than a UNION? The query works and does what I want for now, but will it always include user_id = 1 if I had a larger set of rows (assume that user_id = 1 will always be there)? :(
Thank you all! :)

Copy content in TSQL

I need to copy content from one table to itself and related tables... Let me schematize the problem. Let's say I have two tables:
Order
OrderID : int
CustomerID : int
OrderName : nvarchar(32)
OrderItem
OrderItemID : int
OrderID : int
Quantity : int
With the PK being autoincremental.
Let's say I want to duplicate the content of one customer to another. How do I do that efficiently?
The problem are the PKs. I would need to map the values of OrderIDs from the original set of data to the copy in order to create proper references in OrderItem. If I just select-Insert, I won't be able to create that map.
Suggestions?
For duplicating one parent and many children with identities as the keys, I think the OUTPUT clause can make things pretty clean (SqlFiddle here):
-- Make a duplicate of parent 1, including children
-- Setup some test data
create table Parents (
ID int not null primary key identity
, Col1 varchar(10) not null
, Col2 varchar(10) not null
)
insert into Parents (Col1, Col2) select 'A', 'B'
insert into Parents (Col1, Col2) select 'C', 'D'
insert into Parents (Col1, Col2) select 'E', 'F'
create table Children (
ID int not null primary key identity
, ParentID int not null references Parents (ID)
, Col1 varchar(10) not null
, Col2 varchar(10) not null
)
insert into Children (ParentID, Col1, Col2) select 1, 'g', 'h'
insert into Children (ParentID, Col1, Col2) select 1, 'i', 'j'
insert into Children (ParentID, Col1, Col2) select 2, 'k', 'l'
insert into Children (ParentID, Col1, Col2) select 3, 'm', 'n'
-- Get one parent to copy
declare #oldID int = 1
-- Create a place to store new ParentID
declare #newID table (
ID int not null primary key
)
-- Create new parent
insert into Parents (Col1, Col2)
output inserted.ID into #newID -- Capturing the new ParentID
select Col1, Col2
from Parents
where ID = #oldID -- Only one parent
-- Create new children using the new ParentID
insert into Children (ParentID, Col1, Col2)
select n.ID, c.Col1, c.Col2
from Children c
cross join #newID n
where c.ParentID = #oldID -- Only one parent
-- Show some output
select * from Parents
select * from Children
Do you have to have the primary keys from table A as primaries in Table B? If not you can do a select statement with an insert into. Primary Key's are usually int's that start from an ever increasing seed (identity). Going around this and declaring an insert of this same data problematically has the disadvantage of someone thinking this is a distinct key set on this table and not a 'relationship' or foreign key value.
You can Select Primary Key's for inserts into other tables, just not themselves.... UNLESS you set the 'identity insert on' hint. Do not do this unless you know what this does as you can create more problems than it's worth if you don't understand the ramifications.
I would just do the ole:
insert into TableB
select *
from TableA
where (criteria)
Simple example (This assumes SQL Server 2008 or higher). My bad I did not see you did not list TSQL framework. Not sure if this will run on Oracle or MySql.
declare #Order Table ( OrderID int identity primary key, person varchar(8));
insert into #Order values ('Brett'),('John'),('Peter');
declare #OrderItem Table (orderItemID int identity primary key, OrderID int, OrderInfo varchar(16));
insert into #OrderItem
select
OrderID -- I can insert a primary key just fine
, person + 'Stuff'
from #Order
select *
from #Order
Select *
from #OrderItem
Add an extra helper column to Order called OldOrderID
Copy all the Order's from the #OldCustomerID to the #NewCustomerID
Copy all of the OrderItems using the OldOrderID column to help make the relation
Remove the extra helper column from Order
ALTER TABLE Order ADD OldOrderID INT NULL
INSERT INTO Order (CustomerID, OrderName, OldOrderID)
SELECT #NewCustomerID, OrderName, OrderID
FROM Order
WHERE CustomerID = #OldCustomerID
INSERT INTO OrderItem (OrderID, Quantity)
SELECT o.OrderID, i.Quantity
FROM Order o INNER JOIN OrderItem i ON o.OldOrderID = i.OrderID
WHERE o.CustomerID = #NewCustomerID
UPDATE Order SET OldOrderID = null WHERE OldOrderID IS NOT NULL
ALTER TABLE Order DROP COLUMN OldOrderID
IF the OrderName is unique per customer, you could simply do:
INSERT INTO [Order] ([CustomerID], [OrderName])
SELECT
2 AS [CustomerID],
[OrderName]
FROM [Order]
WHERE [CustomerID] = 1
INSERT INTO [OrderItem] ([OrderID], [Quantity])
SELECT
[o2].[OrderID],
[oi1].[Quantity]
FROM [OrderItem] [oi1]
INNER JOIN [Order] [o1] ON [oi1].[OrderID] = [o1].[OrderID]
INNER JOIN [Order] [o2] ON [o1].[OrderName] = [o2].[OrderName]
WHERE [o1].[CustomerID] = 1 AND [o2].[CustomerID] = 2
Otherwise, you will have to use a temporary table or alter the existing Order table like #LastCoder suggested.