TSQL: How to return two rows if Column = Null - tsql

I have to build a procedure that returns a table at the end, which contains a list of fields where specific substances were applied. I need to return one row for each field and the applied substance.
This works great for all fields where something was actually applied, but I also need to display the same amount of rows for those fields, were nothing was applied.
At the moment I get a table like this:
Field 1 | Substance 1 | 12345 kg
Field 1 | Substance 2 | 23423 kg
Field 2 | Substance 1 | 23236 kg
Field 2 | Substance 2 | 12312 kg
Field 3 | NULL | NULL
I know that I could swap the NULL value with at least one Substance by making a Case-Condition, but I need two rows (one for Substance 1 and one for Substance 2) containing the names of each substance.
Is there any way to achieve this?

Or maybe you have something like this:
CREATE TABLE Fields (
FieldID INT PRIMARY KEY,
FieldName VARCHAR(50) NOT NULL UNIQUE,
)
INSERT INTO dbo.Fields (FieldID, FieldName) VALUES
(1, 'Field 1'),
(2, 'Field 2'),
(3, 'Field 3')
CREATE TABLE dbo.Substances (
SubstanceID INT PRIMARY KEY,
Substance VARCHAR(50) NOT NULL UNIQUE
)
INSERT INTO dbo.Substances (SubstanceID, Substance) VALUES
(1, 'Substance 1'),
(2, 'Substance 2')
CREATE TABLE AppliedSubstances (
FieldID INT NOT NULL REFERENCES dbo.Fields,
SubstanceID INT NOT NULL REFERENCES dbo.Substances,
Quantity INT NOT NULL
)
INSERT INTO dbo.AppliedSubstances (FieldID, SubstanceID, Quantity) VALUES
(1, 1, 12345),
(1, 2, 23423),
(2, 1, 23236),
(2, 2, 12312)
Then you can use the following query:
SELECT f.FieldName, s.Substance, a.Quantity
FROM dbo.AppliedSubstances a
INNER JOIN dbo.Fields f ON f.FieldID = a.FieldID
INNER JOIN dbo.Substances s ON s.SubstanceID = a.SubstanceID
UNION ALL
SELECT f.FieldName, s.Substance, NULL AS Quantity
FROM dbo.Fields f
CROSS JOIN dbo.Substances s
WHERE NOT EXISTS (
SELECT * FROM dbo.AppliedSubstances a
WHERE a.FieldID=f.FieldID AND a.SubstanceID=s.SubstanceID
)
Or a shorter stranger version (with a different meaning if you have some substances that were applied only for some fields):
SELECT f.FieldName, s.Substance, a.Quantity
FROM dbo.AppliedSubstances a
RIGHT JOIN dbo.Fields f ON f.FieldID = a.FieldID
INNER JOIN dbo.Substances s ON s.SubstanceID = ISNULL(a.SubstanceID,s.SubstanceID)

I'm not sure if I understand your question correctly, but try this:
CREATE TABLE SourceData (
FieldName VARCHAR(50),
Substance VARCHAR(50),
Quantity INT
)
INSERT INTO dbo.SourceData (FieldName, Substance, Quantity) VALUES
('Field 1', 'Substance 1', 12345),
('Field 1', 'Substance 2', 23423),
('Field 2', 'Substance 1', 23236),
('Field 2', 'Substance 2', 12312),
('Field 3', NULL, NULL)
SELECT FieldName, Substance, Quantity
FROM dbo.SourceData WHERE Substance IS NOT NULL
UNION ALL
SELECT s1.FieldName, x.Substance, NULL AS Quantity
FROM dbo.SourceData s1 CROSS JOIN (
SELECT DISTINCT s2.Substance
FROM dbo.SourceData s2
WHERE s2.Substance IS NOT NULL
) x
WHERE s1.Substance IS NULL

Related

When JOINing 3 tables in Postgres, can the results be sorted by values in the 2 joined tables?

I have a database with three tables. Ultimately I want to JOIN the three tables and sort them by a column shared by two of the tables.
A main item table with foreign keys (product_id) to the two sub-tables:
items
CREATE TABLE items (
id INT NOT NULL,
product_id varchar(40) NOT NULL,
type CHAR NOT NULL
);
and then a table corresponding to each typeA and typeB. They have differing columns, but for the sake of this exercise I'm only including the columns they have in common:
CREATE TABLE products_a (
id varchar(40) NOT NULL,
name varchar(40) NOT NULL,
price INT NOT NULL
);
CREATE TABLE products_b (
id varchar(40) NOT NULL,
name varchar(40) NOT NULL,
price INT NOT NULL
);
Some example rows:
INSERT INTO items VALUES
( 1, 'abc', 'a' ),
( 2, 'def', 'b' ),
( 3, 'ghi', 'a' ),
( 4, 'jkl', 'b' );
INSERT INTO products_a VALUES
( 'abc', 'product 1', 10 ),
( 'ghi', 'product 2', 50 );
INSERT INTO products_b VALUES
( 'def', 'product 3', 20 ),
( 'jkl', 'product 4', 100 );
I have a JOIN working, but my sorting is not interpolating the rows as I would expect.
Query:
SELECT
items.id AS item_id,
products_a.name AS product_a_name,
products_a.price AS product_a_price,
products_b.name AS product_b_name,
products_b.price AS product_b_price
FROM items
FULL JOIN products_a ON items.product_id = products_a.id
FULL JOIN products_b ON items.product_id = products_b.id
ORDER BY 3, 5 ASC;
Actual result:
item_id
product_a_name
product_a_price
product_b_name
product_b_price
1
product 1
10
NULL
NULL
3
product 2
50
NULL
NULL
2
NULL
NULL
product 3
20
4
NULL
NULL
product 4
100
Desired result:
item_id
product_a_name
product_a_price
product_b_name
product_b_price
1
product 1
10
NULL
NULL
2
NULL
NULL
product 3
20
3
product 2
50
NULL
NULL
4
NULL
NULL
product 4
100
I realize this is a weird table setup, but simplified this way looks more contrived than it is. Ultimately the sorting matches the real use case, though, and changing the DB schema is not an option. I feel like I am missing something simple here, just sorting by either one column or another. Any help is appreciated.
Use COALESCE in the ORDER BY clause to always sort by the first non NULL price:
SELECT
items.id AS item_id,
products_a.name AS product_a_name,
products_a.price AS product_a_price,
products_b.name AS product_b_name,
products_b.price AS product_b_price
FROM items
FULL JOIN products_a ON items.product_id = products_a.id
FULL JOIN products_b ON items.product_id = products_b.id
ORDER BY
COALESCE(3, 5);

Grouping user id columns together with string_agg on PostgreSQL 13

This is my emails table
create table emails (
id bigint not null primary key generated by default as identity,
name text not null
);
And contacts table:
create table contacts (
id bigint not null primary key generated by default as identity,
email_id bigint not null,
user_id bigint not null,
full_name text not null,
ordering int not null
);
As you can see I have user_id field here. There can be multiple same user ID's on my result so i want to join them using comma ,
Insert some data to the tables:
insert into emails (name)
values
('dennis1'),
('dennis2');
insert into contacts (id, email_id, user_id, full_name, ordering)
values
(5, 1, 1, 'dennis1', 9),
(6, 2, 1, 'dennis1', 5),
(7, 2, 1, 'dennis1', 1),
(8, 1, 3, 'john', 2),
(9, 2, 4, 'dennis7', 1),
(10, 2, 4, 'dennis7', 1);
My query is:
select em.name,
c.user_ids
from emails em
join (
select email_id, string_agg(user_id::text, ',' order by ordering desc) as user_ids
from contacts
group by email_id
) c on c.email_id = em.id
order by em.name;
Actual Result
name user_ids
dennis1 1,3
dennis2 1,1,4,4
Expected Result
name user_ids
dennis1 1,3
dennis2 1,4
On my real-world data, I get same user id like 50 times. Instead it should appear 1 time only. In example above, you see user 1 and 4 appears 2 times for dennis2 user.
How can I unique them?
Demo: https://dbfiddle.uk/?rdbms=postgres_13&fiddle=2e957b52eb46742f3ddea27ec36effb1
P.S: I tried to add user_id it to group by but this time I get duplicate rows...
demo:db<>fiddle
SELECT
name,
string_agg(user_id::text, ',' order by ordering desc)
FROM (
SELECT DISTINCT ON (em.id, c.user_id)
*
FROM emails em
JOIN contacts c ON c.email_id = em.id
) s
GROUP BY name
Join the tables
DISTINCT ON email and the user_id, so for every email record, there is no equal users
Aggregate

Copy value from one row to another row in PostgreSQL

I have a table like this:
id product amount
1 A 6
1 A 8
1 A
1 B 1
1 B
2 C 2
2 C
2 C 4
2 C
2 C
and I need to make it like this:
id product amount
1 A 6
1 A 8
1 A 8
1 B 1
1 B 1
2 C 2
2 C 2
2 C 4
2 C 4
2 C 4
Copy amount by previous non-missing value.
I tried to use lag() function. however, aggregation function lag() is not allowed in UPDATE.
update tableA set amount = lag(amount);
What can I do using PostgreSQL?
You can SELECT what you want to UPDATE, but there is no (easy) way to actually do the UPDATE, because the table fox does not have a primary key (yet).
CREATE TABLE fox (
id integer NOT NULL,
product text NOT NULL,
amount integer
);
To populate the fox with some data.
INSERT INTO fox VALUES
(1, 'A', 6),
(1, 'A', 8),
(1, 'A', NULL),
(1, 'B', 1),
(1, 'B', NULL),
(2, 'C', 2),
(2, 'C', NULL),
(2, 'C', 4),
(2, 'C', NULL),
(2, 'C', NULL),
(3, 'What does the fox say?', 5);
The query.
WITH ranks (rank, id, product, amount) AS (
SELECT ROW_NUMBER() OVER (), id, product, amount FROM foo
)
SELECT r.id, r.product,
(SELECT amount FROM ranks
WHERE id = r.id AND product = r.product
AND rank < r.rank AND amount IS NOT NULL
ORDER BY amount DESC LIMIT 1
)
FROM ranks r WHERE r.amount IS NULL ORDER BY 1, 2, 3;
Yields the rows which previously had a NULL and now have the appropriate amount.
id | product | amount
----+---------+--------
1 | A | 8
1 | B | 1
2 | C | 2
2 | C | 4
2 | C | 4
But you cannot use this data to update, because rows are still not uniquely identified by (id, product) - which means you cannot write a WHERE condition identifying your rows uniquely. How would the WHERE clause know whether to change the amount to 2 or 4 in the UPDATE? The multiple rows with (id, product) = (2, 'C') are indistinguishable in the WHERE of the UPDATE.
Let's give the fox a primary key.
ALTER TABLE fox ADD COLUMN IF NOT EXISTS pkey serial ;
ALTER TABLE fox ADD PRIMARY KEY (pkey) ;
Now we can identify the rows by the PRIMARY KEY pkey.
WITH nulls AS (
SELECT pkey, id, product
FROM fox
WHERE amount IS NULL
)
SELECT pkey,
id, product, -- you can leave these out in your UPDATE: pkey is UNIQUE
(SELECT amount FROM fox
WHERE id = n.id AND product = n.product
AND n.pkey > pkey AND amount IS NOT NULL
ORDER BY pkey DESC LIMIT 1)
FROM nulls n ORDER BY 1, 2, 3, 4;
to display the changes to be made
pkey | id | product | amount
------+----+---------+--------
3 | 1 | A | 8
5 | 1 | B | 1
7 | 2 | C | 2
9 | 2 | C | 4
10 | 2 | C | 4
And we can use pkey in the UPDATE.
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE ;
WITH nulls AS (
SELECT pkey, id, product
FROM fox
WHERE amount IS NULL
), changes AS (
SELECT pkey,
(SELECT amount FROM fox
WHERE id = n.id AND product = n.product
AND n.pkey > pkey AND amount IS NOT NULL
ORDER BY pkey DESC LIMIT 1)
FROM nulls n
) UPDATE fox f SET amount = c.amount FROM changes c WHERE f.pkey = c.pkey ;
Check the result is okay:
SELECT * FROM fox ORDER BY 1, 2, 3, 4;
And accept using COMMIT or ROLLBACK accordingly.
Alternative to adding a PRIMARY KEY
Every table should always have a primary key.
If you insist not to have one, then you could also compute the rows with their then-not-NULL amount and instead of UPDATEing them, you could INSERT them into your table and then DELETE FROM fox WHERE amount IS NULL remove the rows which had no amount. This way you get around adding a primary key, which is unique. Of course the UPDATE and DELETE are packaged into a TRANSACTION such as not to interfere with other Transactions running concurrently. For example another Transaction adding rows with NULL amount AFTER you have calculated the data to be INSERTed using SELECT and before you DELETE all NULL amounts. You'd miss the concurrently added row with NULL amount in this case (data loss due to concurrency; think ACID).
But a missing primary key will probably bite you later on, anyway.
Without knowing what defines "previous rows" all is a guess. But you can use a anonymous block to do what your want, just make your changes:
CREATE TEMPORARY TABLE test_lag AS
SELECT column1 AS id, column2 AS product, column3 AS amount FROM (
VALUES (1, 'A', 6),
(1, 'A', 8),
(1, 'A', NULL),
(1, 'B', 1),
(1, 'B', NULL),
(2, 'C', 2),
(2, 'C', NULL),
(2, 'C', 4),
(2, 'C', NULL),
(2, 'C', NULL)) AS tmp;
DO $$
BEGIN
--Loop until update all null amounts
--Why we need this? It's because PostgreSQL don't supports IGNORE NULLS clause on lag()
LOOP
WITH tmp AS (
SELECT ctid, lag(amount) OVER() AS last_amount FROM test_lag ORDER BY id, product -- You MUST change this ORDER to right columns (What's previous row?)
)
UPDATE test_lag SET amount = tmp.last_amount FROM tmp WHERE test_lag.ctid = tmp.ctid AND amount IS NULL;
IF NOT FOUND THEN
EXIT;
END IF;
END LOOP;
END $$;
SELECT * FROM test_lag ORDER BY id, product, amount;

How to query the data in a join table by two sets of joined records?

I've got three tables: users, courses, and grades, the latter of which joins users and courses with some metadata like the user's score for the course. I've created a SQLFiddle, though the site doesn't appear to be working at the moment. The schema looks like this:
CREATE TABLE users(
id INT,
name VARCHAR,
PRIMARY KEY (ID)
);
INSERT INTO users VALUES
(1, 'Beth'),
(2, 'Alice'),
(3, 'Charles'),
(4, 'Dave');
CREATE TABLE courses(
id INT,
title VARCHAR,
PRIMARY KEY (ID)
);
INSERT INTO courses VALUES
(1, 'Biology'),
(2, 'Algebra'),
(3, 'Chemistry'),
(4, 'Data Science');
CREATE TABLE grades(
id INT,
user_id INT,
course_id INT,
score INT,
PRIMARY KEY (ID)
);
INSERT INTO grades VALUES
(1, 2, 2, 89),
(2, 2, 1, 92),
(3, 1, 1, 93),
(4, 1, 3, 88);
I'd like to know how (if possible) to construct a query which specifies some users.id values (1, 2, 3) and courses.id values (1, 2, 3) and returns those users' grades.score values for those courses
| name | Algebra | Biology | Chemistry |
|---------|---------|---------|-----------|
| Alice | 89 | 92 | |
| Beth | | 93 | 88 |
| Charles | | | |
In my application logic, I'll be receiving an array of user_ids and course_ids, so the query needs to select those users and courses dynamically by primary key. (The actual data set contains millions of users and tens of thousands of courses—the examples above are just a sample to work with.)
Ideally, the query would:
use the course titles as dynamic attributes/column headers for the users' score data
sort the row and column headers alphabetically
include empty/NULL cells if the user-course pair has no grades relationship
I suspect I may need some combination of JOINs and Postgresql's crosstab, but I can't quite wrap my head around it.
Update: learning that the terminology for this is "dynamic pivot", I found this SO answer which appears to be trying to solve a related problem in Postgres with crosstab()
I think a simple pivot query should work here, since you only have 4 courses in your data set to pivot.
SELECT t1.name,
MAX(CASE WHEN t3.title = 'Biology' THEN t2.score ELSE NULL END) AS Biology,
MAX(CASE WHEN t3.title = 'Algebra' THEN t2.score ELSE NULL END) AS Algebra,
MAX(CASE WHEN t3.title = 'Chemistry' THEN t2.score ELSE NULL END) AS Chemistry,
MAX(CASE WHEN t3.title = 'Data Science' THEN t2.score ELSE NULL END) AS Data_Science
FROM users t1
LEFT JOIN grades t2
ON t1.id = t2.user_id
LEFT JOIN courses t3
ON t2.course_id = t3.id
GROUP BY t1.name
Follow the link below for a running demo. I used MySQL because, as you have noticed, SQLFiddle seems to be perpetually busted the other databases.
SQLFiddle

Find all records NOT in any blocked range where blocked ranges are in a table

I have a table TaggedData with the following fields and data
ID GroupID Tag MyData
** ******* *** ******
1 Texas AA01 Peanut Butter
2 Texas AA15 Cereal
3 Ohio AA05 Potato Chips
4 Texas AA08 Bread
I have a second table of BlockedTags as follows:
ID StartTag EndTag
** ******** ******
1 AA00 AA04
2 AA15 AA15
How do I select from this to return all data matching a given GroupId but NOT in any blocked range (inclusive)? For the data given if the GroupId is Texas, I don't want to return Cereal because it matches the second range. It should only return Bread.
I did try left joins based queries but I'm not even that close.
Thanks
create table TaggedData (
ID int,
GroupID varchar(16),
Tag char(4),
MyData varchar(50))
create table BlockedTags (
ID int,
StartTag char(4),
EndTag char(4)
)
insert into TaggedData(ID, GroupID, Tag, MyData)
values (1, 'Texas', 'AA01', 'Peanut Butter')
insert into TaggedData(ID, GroupID, Tag, MyData)
values (2, 'Texas' , 'AA15', 'Cereal')
insert into TaggedData(ID, GroupID, Tag, MyData)
values (3, 'Ohio ', 'AA05', 'Potato Chips')
insert into TaggedData(ID, GroupID, Tag, MyData)
values (4, 'Texas', 'AA08', 'Bread')
insert into BlockedTags(ID, StartTag, EndTag)
values (1, 'AA00', 'AA04')
insert into BlockedTags(ID, StartTag, EndTag)
values (2, 'AA15', 'AA15')
select t.* from TaggedData t
left join BlockedTags b on t.Tag between b.StartTag and b.EndTag
where b.ID is null
Returns:
ID GroupID Tag MyData
----------- ---------------- ---- --------------------------------------------------
3 Ohio AA05 Potato Chips
4 Texas AA08 Bread
(2 row(s) affected)
So, to match on given GroupID you change the query like that:
select t.* from TaggedData t
left join BlockedTags b on t.Tag between b.StartTag and b.EndTag
where b.ID is null and t.GroupID=#GivenGroupID
I Prefer the NOT EXISTS simply because it gives you more readability, usability and better performance usually in large data (several cases get better execution plans):
would be like this:
SELECT * from TaggedData
WHERE GroupID=#GivenGroupID
AND NOT EXISTS(SELECT 1 FROM BlockedTags WHERE Tag BETWEEN StartTag ANDEndTag)