Why does postgres group null values? - postgresql

CREATE TEMP TABLE wirednull (
id bigint NOT NULL,
value bigint,
CONSTRAINT wirednull_pkey PRIMARY KEY (id)
);
INSERT INTO wirednull (id,value) VALUES (1,null);
INSERT INTO wirednull (id,value) VALUES (2,null);
SELECT value FROM wirednull GROUP BY value;
Returns one row, but i would expect two rows since
SELECT *
FROM wirednull a
LEFT JOIN wirednull b
ON (a.value = b.value)
does not find any joins, because null!=null in postgres

According to SQL wikipedia :
When two nulls are equal: grouping, sorting, and some set operations
Because SQL:2003 defines all Null markers as being unequal to one another, a special definition was required in order to group Nulls together when performing certain operations. SQL defines "any two values that are equal to one another, or any two Nulls", as "not distinct".[20] This definition of not distinct allows SQL to group and sort Nulls when the GROUP BY clause (and other keywords that perform grouping) are used.
This wasn't the question:
Because null = null or something = null return unknown not true/false
So:
ON (a.value = b.value)
Doesn't match.

Related

How to do a select inside an insert using Postgresql?

I am a beginner in postgresql and databases in general. I have a table with a column product_id. Some of the values in that column are null. I need to change those null values to the values from another table.
I want to do something like this:
insert into a(product_id) (select product_id from b where product_name='foo') where product_id = null;
I realize that this syntax doesn't work but I just need help figuring it out.
Assuming your table name is "a" and you have some null product_id, but the othe colums does contain data.
So you need to UPDATE, not to INSERT.
Your Query will be something like this :
Update a
set product_id = select product_id from b where b.product_name = 'foo'
Where product_id is null
be sure that your sub query (select ..from b) return a unique value.
Try below
INSERT INTO a (product_id)
select product_id from b where product_name='foo';
your where condition is wrong after the) bracket I.e. where product_id = null;

Output Inserted.id equivalent in Postgres

I am new to PostgreSQL and trying to convert mssql scripts to Postgres.
For Merge statement, we can use insert on conflict update or do nothing but am using the below statement, not sure whether it is the correct way.
MSSQL code:
Declare #tab2(New_Id int not null, Old_Id int not null)
MERGE Tab1 as Target
USING (select * from Tab1
WHERE ColumnId = #ID) as Source on 0 = 1
when not matched by Target then
INSERT
(ColumnId
,Col1
,Col2
,Col3
)
VALUES (Source.ColumnId
,Source.Col1
,Source.Col2
,Source.Col3
)
OUTPUT INSERTED.Id, Source.Id into #tab2(New_Id, Old_Id);
Postgres Code:
Create temp table tab2(New_Id int not null, Old_Id int not null)
With source as( select * from Tab1
WHERE ColumnId = ID)
Insert into Tab1(ColumnId
,Col1
,Col2
,Col3
)
select Source.ColumnId
,Source.Col1
,Source.Col2
,Source.Col3
from source
My query is how to convert OUTPUT INSERTED.Id in postgres.I need this id to insert records in another table (lets say as child tables based on Inserted values in Tab1)
In PostgreSQL's INSERT statements you can choose what the query should return. From the docs on INSERT:
The optional RETURNING clause causes INSERT to compute and return value(s) based on each row actually inserted (or updated, if an ON CONFLICT DO UPDATE clause was used). This is primarily useful for obtaining values that were supplied by defaults, such as a serial sequence number. However, any expression using the table's columns is allowed. The syntax of the RETURNING list is identical to that of the output list of SELECT. Only rows that were successfully inserted or updated will be returned.
Example (shortened form of your query):
WITH [...] INSERT INTO Tab1 ([...]) SELECT [...] FROM [...] RETURNING Tab1.id

CTE based insert of multiple rows into "one-per-group" table violates unique index

I have a table where only one row per group can be true.
This is enforced by a partial unique index (which can't be deferred).
CREATE TABLE test
(
id SERIAL PRIMARY KEY,
my_group INTEGER,
last BOOLEAN DEFAULT TRUE
);
CREATE UNIQUE INDEX "test.last" ON test (my_group) WHERE last;
INSERT INTO test (my_group)
VALUES (1), (2);
I'm trying to insert a new row into this table that shall replace the "last" element of the corresponding group. I also want to accomplish this in a single statement.
With some CTE trickery I'm able to do this: link to Fiddle
-- the statement is structured this way to closely resemble my actual usecase
WITH
new_data AS (
VALUES (1)
),
uncheck_old_last AS (
UPDATE test
SET last = FALSE
WHERE last AND my_group in (SELECT * FROM new_data)
RETURNING TRUE
)
INSERT INTO test (my_group)
SELECT *
FROM new_data
WHERE COALESCE((SELECT * FROM uncheck_old_last LIMIT 1), true);
So far so good, the insert happens... no conflicts.
I don't quite understand why this is working as from my understanding all CTEs should read the same initial DB state and can't see the changes made by other CTEs
The problem is now that I get a unique violation when I try to do the same with multiple rows at once: Link to Fiddle
-- the statement is structured this way to closely resemble my actual usecase
WITH
new_data AS (
VALUES (1), (2) -- <- difference to above query
),
uncheck_old_last AS (
UPDATE test
SET last = FALSE
WHERE last AND my_group in (SELECT * FROM new_data)
RETURNING TRUE
)
INSERT INTO test (my_group)
SELECT *
FROM new_data
WHERE COALESCE((SELECT * FROM uncheck_old_last LIMIT 1), true);
-- Schema Error: error: duplicate key value violates unique constraint "test.last"
Is there any way to insert multiple rows with one statement /Can someone explain to me why the first query is working and the second isn't?
This was caused by PostgreSQL simplifying my always true clause:
WHERE COALESCE((SELECT * FROM uncheck_old_last LIMIT 1), true)
was supposed to create a dependency between the main query and the CTE to enforce execution order from the main query's point of view.
It broke with more than one entry because the limit 1 allowed PostgreSQL to ignore the second row, as only one was required for evaluation.
I fixed it by comparing COUNT(*) > -1 instead:
COALESCE((SELECT COUNT(*) FROM uncheck_old_last) > -1, true)

Looking up values from many tables based on value in each column

I have several tables containing key value pairs for differint fields in my database. I also have a table that that contains the keys of these differint tables that should be selected as the value for that key. However, I can't figure out how to select these values from the multiple tables?
The tables
CREATE TABLE CHARACTERS(
ID INTEGER PRIMARY KEY,
NAME VARCHAR(64)
);
CREATE TABLE MEDIA(
ID INTEGER PRIMARY KEY,
NAME VARCHAR(64)
);
CREATE TABLE EPISODES(
ID INTEGER PRIMARY KEY,
MEDIAID INTEGER,
NAME VARCHAR(64)
);
-- Selecting from this table
CREATE TABLE APPS(
ID INTEGER PRIMARY KEY,
CHARID INTEGER,
EPISODEID INTEGER,
MEDIAID INTEGER
);
I am selecting from the APPS table, and I want to replace the value of the *ID columns with the value of the name in the accomping table's NAME column. I want this done for each row in the APPS table. Like so...
CHARID -> CHARACTERS.NAME
EPISODEID -> EPISODES.NAME
MEDIAID -> MEDIA.NAME
I have tried to use joins, but they don't do it for each row in the APPS table. I have 18 rows in the APPS table, but I only get back way less than I have in the table or way more than I have in the table. So how can I make it do it for each row in the APPS table?
You do by JOINing the tables together and selecting the desired columns from the individual tables:
SELECT c.name AS character_name, e.name AS episode, m.name AS media
FROM apps a
LEFT JOIN episodes e ON e.id = a.episodeid
LEFT JOIN media m ON m.id = a.mediaid
LEFT JOIN characters c ON c.id = a.charid;
If you want to present the rows in a specific order, you can specify that too as a final clause in the SELECT statement. You can use any field from the included tables; that field is not necessarily part of the columns selected:
ORDER BY a.id -- order by apps.id
or
ORDER BY e.id, c.name -- order first by episode id, then by character name
etc

Ambiguous column in PostgreSQL UPSERT (writeable CTE) using one table to update another

I have a table called users_import into which I am parsing and importing a CSV file. Using that table I want to UPDATE my users table if the user already exists, or INSERT if it does not already exist. (This is actually a very simplified example of something much more complicated I'm trying to do.)
I am trying to do something very similar to this:
https://stackoverflow.com/a/8702291/912717
Here are the table definitions and query:
CREATE TABLE users (
id INTEGER NOT NULL UNIQUE PRIMARY KEY,
name TEXT NOT NULL
);
CREATE TABLE users_import (
id INTEGER NOT NULL UNIQUE PRIMARY KEY,
name TEXT NOT NULL
);
WITH upsert AS (
UPDATE users AS u
SET
name = i.name
FROM users_import AS i
WHERE u.id = i.id
RETURNING *
)
INSERT INTO users (name)
SELECT id, name
FROM users_import
WHERE NOT EXISTS (SELECT 1 FROM upsert WHERE upsert.id = users_import.id);
That query gives this error:
psql:test.sql:23: ERROR: column reference "id" is ambiguous
LINE 11: WHERE NOT EXISTS (SELECT 1 FROM upsert WHERE upsert.id = us...
^
Why is id ambiguous and what is causing it?
The RETURNING * in the WITH upsert... clause has all columns from users and all columns from the joined table users_import. So the result has two columns named id and two columns named name, hence the ambiguity when refering to upsert.id.
To avoid that, use RETURNING u.id if you don't need the rest of the columns.