PostgreSQL 14 and Unique Index Issue - postgresql

I have a database I am importing and it has a unique index for a field that is null. In PostgreSQL 13 it was not an issue but in 14 it now longer allows the import as null is no longer null but a value.
Is there a setting where null is treated like it should be instead of as a value?

The behavior has not changed in PostgreSQL v14. If the import doesn't work in the database, the only possible explanation is that you have defined the column NOT NULL in one database, but not in the other one (or used a similar check constraint).
PostgreSQL v15 introduces this standard conforming additional clause for unique constraints:
UNIQUE NULLS [NOT] DISTINCT
If you define a unique constraint with NULLS NOT DISTINCT in v15, it will behave differently from prior versions. However, the default is still UNIQUE NULLS DISTINCT.

when I query the table, the null values are being set as '' during the import, so it fails after the first row. Not sure what changed (other than upgrading to 14.5). Going to reach out to the importer company, I can insert multiple null values so somethings up on their end.

Related

How to add a column to a table on production PostgreSQL with zero downtime?

Here
https://stackoverflow.com/a/53016193/10894456
is an answer provided for Oracle 11g,
My question is the same:
What is the best approach to add a not null column with default value
in production oracle database when that table contain one million
records and it is live. Does it create any locks if we do the column
creation , adding default value and making it as not null in a single
statement?
but for PostgreSQL ?
This prior answer essentially answers your query.
Cross referencing the relevant PostgreSQL doc with the PostgreSQL sourcecode for AlterTableGetLockLevel mentioned in the above answer shows that ALTER TABLE ... ADD COLUMN will always obtain an ACCESS EXCLUSIVE table lock, precluding any other transaction from accessing the table for the duration of the ADD COLUMN operation.
This same exclusive lock is obtained for any ADD COLUMN variation; ie. it doesn't matter whether you add a NULL column (with or without DEFAULT) or have a NOT NULL with a default.
However, as mentioned in the linked answer above, adding a NULL column with no DEFAULT should be very quick as this operation simply updates the catalog.
In contrast, adding a column with a DEFAULT specifier necessitates a rewrite the entire table in PostgreSQL 10 or less.
This operation is likely to take a considerable time on your 1M record table.
According to the linked answer, PostgreSQL >= 11 does not require such a rewrite for adding such a column, so should perform more similarly to the no-DEFAULT case.
I should add that for PostgreSQL 11 and above, the ALTER TABLE docs note that table rewrites are only avoided for non-volatile DEFAULT specifiers:
When a column is added with ADD COLUMN and a non-volatile DEFAULT is specified, the default is evaluated at the time of the statement and the result stored in the table's metadata. That value will be used for the column for all existing rows. If no DEFAULT is specified, NULL is used. In neither case is a rewrite of the table required.
Adding a column with a volatile DEFAULT [...] will require the entire table and its indexes to be rewritten. [...] Table and/or index rebuilds may take a significant amount of time for a large table; and will temporarily require as much as double the disk space.

Aggregation function is different in version 8.4 and 9.5?

I have a query like this.
SELECT companies.id, companies.code, MAX(disclosures.filed_at) disclosure_filed_at
FROM \"companies\" INNER JOIN \"disclosures\" ON \"disclosures\".\"company_id\" = \"companies\".\"id\"
GROUP BY companies.id
This query works in Postgresql 9.5.2, but it failed in version 8.4.20 with an error.
PG::GroupingError: ERROR: column "companies.code" must appear in the GROUP BY clause or be used in an aggregate function
If I add companies.code to GROUP BY clause, then it works. But when I select by companies.*, I can't group by companies.*.
Should I write all columns in version 8.4 to use *?
The Postgres behavior is supported by the ANSI standard. The reason is that the id not only defines each row in companies, but it is defined to do so (using a unique or primary key constraint, although I'm not sure if this works in Postgres for a unique constraint).
Hence, the database knows that it can safely refer to any other column from the same row. This is called "functional dependency".
This feature has also now been added to MySQL (documented here). You might find that documentation easier to follow than the Postgres description:
When GROUP BY is present, or any aggregate functions are present, it
is not valid for the SELECT list expressions to refer to ungrouped
columns except within aggregate functions or when the ungrouped column
is functionally dependent on the grouped columns, since there would
otherwise be more than one possible value to return for an ungrouped
column. A functional dependency exists if the grouped columns (or a
subset thereof) are the primary key of the table containing the
ungrouped column.

Make a column NOT NULL in a large table without locking issues?

I want to change a column to NOT NULL:
ALTER TABLE "foos" ALTER "bar_id" SET NOT NULL
The "foos" table has almost 1 000 000 records. It does fairly low volumes of writes, but quite constantly. There are a lot of reads.
In my experience, changing a column in a big table to NOT NULL like this can cause downtime in the app, presumably because it leads to (b)locks.
I've yet to find a good explanation corroborating this, though.
And if it is true, what can I do to avoid it?
EDIT: The docs (via this comment) say:
Adding a column with a DEFAULT clause or changing the type of an existing column will require the entire table and its indexes to be rewritten.
I'm not sure if changing NULL counts as "changing the type of an existing column", but I believe I did have an index on the column the last time I saw this issue.
Perhaps removing the index, making the column NOT NULL, and then adding the index back would improve things?
I think you can do that using a check constraint rather then set not null.
ALTER TABLE foos
add constraint id_not_null check (bar_id is not null) not valid;
This will still require an ACCESS EXCLUSIVE lock on the table, but it is very quick because Postgres doesn't validate the constraint (so it doesn't have to scan the entire table). This will already make sure that new rows (or changed rows) can not put a null value into that column
Then (after committing the alter table!) you can do:
alter table foos validate constraint id_not_null;
Which does not require an ACCESS EXCLUSIVE lock and still allows access to the table.

Records not matching without LTRIM, RTRIM and Upper/Lower function

This issue I am facing from long time. I have two tables in different database having same columns and exactly same data type. But when doing join or any other matching query I get few results only, I noticed that when keeping
LTRIM(RTRIM(UPPER(SourceTable.Column))) =
LTRIM(RTRIM(UPPER(DestinationTable.Column)))
It works fine. I am surprised to say that I have seen same issue on bit and integer column and they also works fine when I keep LTRIM, RTRIM and UPPER/LOWER.
Below are the collation of the two databases:
Source: SQL_Latin1_General_CP1_CI_AS
Destination: SQL_Latin1_General_CP1_CI_AS
As you can see that they have same collation even though I am getting this issue. Can I have a permanent solution to this?
If the datatypes are exactly the same, it could be that you actually have a different collation on the columns - you can actually have a different collation to the database one, specified at the column level. First port of call for me would be to check that.
MSDN resource, quote:
Column-level collations
When you create or alter a table, you can specify collations for each
character-string column by using the COLLATE clause. If no collation is
specified, the column is assigned the default collation of the database.

Why does Postgres handle NULLs inconsistently where unique constraints are involved?

I recently noticed an inconsistency in how Postgres handles NULLs in columns with a unique constraint.
Consider a table of people:
create table People (
pid int not null,
name text not null,
SSN text unique,
primary key (pid)
);
The SSN column should be kept unique. We can check that:
-- Add a row.
insert into People(pid, name, SSN)
values(0, 'Bob', '123');
-- Test the unique constraint.
insert into People(pid, name, SSN)
values(1, 'Carol', '123');
The second insert fails because it violates the unique constraint on SSN. So far, so good. But let's try a NULL:
insert into People(pid, name, SSN)
values(1, 'Carol', null);
That works.
select *
from People;
0;"Bob";"123"
1;"Carol";"<NULL>"
A unique column will take a null. Interesting. How can Postgres assert that null is in any way unique, or not unique for that matter?
I wonder if I can add two rows with null in a unique column.
insert into People(pid, name, SSN)
values(2, 'Ted', null);
select *
from People;
0;"Bob";"123"
1;"Carol";"<NULL>"
2;"Ted";"<NULL>"
Yes I can. Now there are two rows with NULL in the SSN column even though SSN is supposed to be unique.
The Postgres documentation says, For the purpose of a unique constraint, null values are not considered equal.
Okay. I can see the point of this. It's a nice subtlety in null-handling: By considering all NULLs in a unique-constrained column to be disjoint, we delay the unique constraint enforcement until there is an actual non-null value on which to base that enforcement.
That's pretty cool. But here's where Postgres loses me. If all NULLs in a unique-constrained column are not equal, as the documentation says, then we should see all of the nulls in a select distinct query.
select distinct SSN
from People;
"<NULL>"
"123"
Nope. There's only a single null there. It seems like Postgres has this wrong. But I wonder: Is there another explanation?
Edit:
The Postgres docs do specify that "Null values are considered equal in this comparison." in the section on SELECT DISTINCT. While I do not understand that notion, I'm glad it's spelled out in the docs.
It is almost always a mistake when dealing with null to say:
"nulls behave like so-and-so here, *so they should behave like
such-and-such here"
Here is an excellent essay on the subject from a postgres perspective. Briefly summed up by saying nulls are treated differently depending on the context and don't make the mistake of making any assumptions about them.
The bottom line is, PostgreSQL does what it does with nulls because the SQL standard says so.
Nulls are obviously tricky and can be interpreted in multiple ways (unknown value, absent value, etc.), and so when the SQL standard was initially written, the authors had to make some calls at certain places. I'd say time has proved them more or less right, but that doesn't mean that there couldn't be another database language that handles unknown and absent values slightly (or wildly) differently. But PostgreSQL implements SQL, so that's that.
As was already mentioned in a different answer, Jeff Davis has written some good articles and presentations on dealing with nulls.
NULL is considered to be unique because NULL doesn't represent the absence of a value. A NULL in a column is an unknown value. When you compare two unknowns, you don't know whether or not they are equal because you don't know what they are.
Imagine that you have two boxes marked A and B. If you don't open the boxes and you can't see inside, you never know what the contents are. If you're asked "Are the contents of these two boxes the same?" you can only answer "I don't know".
In this case, PostgreSQL will do the same thing. When asked to compare two NULLs, it says "I don't know." This has a lot to do with the crazy semantics around NULL in SQL databases. The article linked to in the other answer is an excellent starting point to understanding how NULLs behave. Just beware: it varies by vendor.
Multiple NULL values in a unique index are okay because x = NULL is false for all x and, in particular, when x is itself NULL. You'll also run into this behavior in WHERE clauses where you have to say WHERE x IS NULL and WHERE x IS NOT NULL rather than WHERE x = NULL and WHERE x <> NULL.