Postgres 12 case-insensitive compare - postgresql

I'm attempting to move a SQL Server DB which is used by a C# application (+EF6) to Postgres 12 but I'm not having much luck with getting case-insensitive string comparisons working. The existing SQL Server db uses SQL_Latin1_General_CP1_CI_AS collation which means all WHERE clauses don't have to worry about case.
I understand that CIText was the way to do this previously, but is now superseded by non-deterministic collations.
I created such a collation;
CREATE COLLATION ci (provider = icu, locale = 'und-u-ks-level2', deterministic = false);
and when this is applied to the CREATE TABLE on a per-column basis it does work - case is ignored.
CREATE TABLE casetest (
id serial NOT NULL,
code varchar(10) null COLLATE "ci",
CONSTRAINT "PK_id" PRIMARY KEY ("id"));
But from what I have read it must be applied to every varchar column and can't be set globally across the whole db.
Is this correct?
I don't want to use .ToLower() everywhere due to clutter and that any index on the column is then not used.
I tried modifying the pre-existing 'default' collation in pg_collation to match the settings of 'ci' collation but it has no effect.
Thanks in advance.
PG

You got it right. From PostgreSQL v15 on, ICU collations can be used as database collations, but only deterministic ones (that don't compare different strings as equal). So your case-insensitive collation wouldn't work there either. Since you are using v12, you cannot use ICU collations as database default collation at all, but have to use them in column definitions.
This limitation is annoying and not in the nature of things. It will probably be lifted in some future version.
You can use a DO statement to change the collation of all string columns:
DO
$$DECLARE
v_table regclass;
v_column name;
v_type oid;
v_typmod integer;
BEGIN
FOR v_table, v_column, v_type, v_typmod IN
SELECT a.attrelid::regclass,
a.attname,
a.atttypid,
a.atttypmod
FROM pg_attribute AS a
JOIN pg_class AS c ON a.attrelid = c.oid
WHERE a.atttypid IN (25, 1042, 1043)
AND c.relnamespace::regnamespace::name
NOT IN ('pg_catalog', 'information_schema', 'pg_toast')
LOOP
EXECUTE
format('ALTER TABLE %s ALTER %I SET DATA TYPE %s COLLATE ci',
v_table,
v_column,
format_type(v_type, v_typmod)
);
END LOOP;
END;$$;

Related

ERROR: operator does not exist: varchar >= integer when changing column type int to varchar in PostgreSQL

I have a task to create a Liquibase migration to change a value affext in table trp_order_sold, which is right now int8, to varchar (or any other text type if it's more likely to be possible).
The script I made is following:
ALTER TABLE public.trp_order_sold
ALTER COLUMN affext SET DATA TYPE VARCHAR
USING affext::varchar;
I expected that USING affext::text; part is gonna work as a converter, however with or without it I am getting this error:
ERROR: operator does not exist: varchar >= integer
Hint: No operator matches the given name and argument types. You might need to add explicit type casts.
Any hints on what I'm doing wrong? Also I am writing a PostgreSQL script but a working XML equivalent would be fine for me as well.
These would most typically use or depend on your column:
a generated column
a trigger
a trigger's when condition
a view or a rule
a check constraint
In my test (online demo) only the last one leads to the error you showed:
create table test_table(col1 int);
--CREATE TABLE
alter table test_table add constraint test_constraint check (col1 >= 1);
--ALTER TABLE
alter table test_table alter column col1 type text using col1::text;
--ERROR: operator does not exist: text >= integer
--HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
You'll have to check the constraints on your table with \d+ command in psql, or by querying the system tables:
SELECT con.*
FROM pg_catalog.pg_constraint con
INNER JOIN pg_catalog.pg_class rel
ON rel.oid = con.conrelid
INNER JOIN pg_catalog.pg_namespace nsp
ON nsp.oid = connamespace
WHERE nsp.nspname = 'your_table_schema'
AND rel.relname = 'your_table_name';
Then you will need to drop the constraint causing the problem and build a new one to work with your new data type.
Since integer 20 goes before integer 100, but text '20' goes after text '100', if you plan to keep the old ordering behaviour you'd need this type of cast:
case when affext<0 then '-' else '0' end||lpad(ltrim(affext::text,'-'),10,'0')
and then make sure new incoming affext values are cast accordingly in an insert and update trigger. Or use a numeric ICU collation similar to this.

PostgreSQL rename a column only if it exists

I couldn't find in the PostgreSQL documentation if there is a way to run an: ALTER TABLE tablename RENAME COLUMN IF EXISTS colname TO newcolname; statement.
I would be glad we could, because I'm facing an error that depends on who made and gave me an SQL script, for which in some cases everything is perfectly fine (when the column has the wrong name, name that will actually be changed using a RENAME statement), and in other cases not (when the column already has the right name).
Hence the idea of using an IF EXISTS statement on the column name while trying to rename it. If the column has already the good name (here cust_date_mean), the rename command that must be applied only on the wrong name should be properly skipped and not issuing the following error:
db_1 | [223] ERROR: column "cust_mean" does not exist
db_1 | [223] STATEMENT: ALTER TABLE tablename RENAME COLUMN cust_mean TO cust_date_mean;
db_1 | ERROR: column "cust_mean" does not exist
(In the meantime I will clarify things with the team so it's not a big deal if such command doesn't exist but I think it could help).
While there is no built-in feature, you can use a DO statement:
DO
$$
DECLARE
_tbl regclass := 'public.tbl'; -- not case sensitive unless double-quoted
_colname name := 'cust_mean'; -- exact, case sensitive, no double-quoting
_new_colname text := 'cust_date_mean'; -- exact, case sensitive, no double-quoting
BEGIN
IF EXISTS (SELECT FROM pg_attribute
WHERE attrelid = _tbl
AND attname = _colname
AND attnum > 0
AND NOT attisdropped) THEN
EXECUTE format('ALTER TABLE %s RENAME COLUMN %I TO %I', _tbl, _colname, _new_colname);
ELSE
RAISE NOTICE 'Column % of table % not found!', quote_ident(_colname), _tbl;
END IF;
END
$$;
Does exactly what you ask for. Enter table and column names in the DECLARE section.
The NOTICE is optional.
For repeated use, I would create a function and pass parameters instead of the variables.
Variables are handled safely (no SQL injection). The table name can optionally be schema-qualified. If it's not, it's resolved according to the current search_path, just like ALTER TABLE would.
Related:
PostgreSQL create table if not exists
Table name as a PostgreSQL function parameter

In Postgres, running ANALYZE changes behaviour of ILIKE clause with COLLATE

We are migrating an application from SQL Server to Postgres and attempting to emulate various aspects of the case insensitivity of SQL Server. We have created a non-deterministic collation to support case-insensitive matching of foreign keys and equality comparisons.
But we are seeing some weird behaviour when using ILIKE which we can't explain, and would appreciate some assistance.
To see the behaviour, run the following on a fresh database:
CREATE COLLATION IF NOT EXISTS public.ci (provider = icu, locale = 'und-u-ks-level2', deterministic = false);
DROP TABLE IF EXISTS sort_test;
CREATE TABLE sort_test (a text COLLATE public.ci);
INSERT INTO sort_test SELECT md5(n::text) FROM generate_series(1, 10000) n;
-- Removing the following line fixes the issue
ANALYZE sort_test;
-- This line throws "nondeterministic collations are not supported for ILIKE"
SELECT * FROM sort_test WHERE a ILIKE 'c4ca4238a0%' COLLATE "und-x-icu";
Why does running the ANALYZE statement break the ILIKE statement?
That behavior is a PostgreSQL bug.
The reason why it works without the ANALYZE is that the error is thrown when applying the operator to the “histogram bounds” in the statistics. Before ANALYZE there are no statistics, so no error is thrown.

Create function with temporary tables that return a select query using these temp tables

I need to create a function, which returns results of a SELECT query. This SELECT query is a JOIN of few temporary tables created inside this function. Is there any way to create such function? Here is an example (it is very simplified, in reality there are multiple temp tables with long queries):
CREATE OR REPLACE FUNCTION myfunction () RETURNS TABLE (column_a TEXT, column_b TEXT) AS $$
BEGIN
CREATE TEMPORARY TABLE raw_data ON COMMIT DROP
AS
SELECT d.column_a, d2.column_b FROM dummy_data d JOIN dummy_data_2 d2 using (id);
RETURN QUERY (select distinct column_a, column_b from raw_data limit 100);
END;
$$
LANGUAGE 'plpgsql' SECURITY DEFINER
I get error:
[Error] Script lines: 1-19 -------------------------
ERROR: RETURN cannot have a parameter in function returning set;
use RETURN NEXT at or near "QUERY"Position: 237
I apologize in advance for any obvious mistakes, I'm new to this.
Psql version is PostgreSQL 8.2.15 (Greenplum Database 4.3.12.0 build 1)
The most recent version of Greenplum Database (5.0) is based on PostgreSQL 8.3, and it supports the RETURN QUERY syntax. Just tested your function on:
PostgreSQL 8.4devel (Greenplum Database 5.0.0-beta.10+dev.726.gd4a707c762 build dev)
The most probable error this could raise in Postgres:
ERROR: column "foo" specified more than once
Meaning, there is at least one more column name (other than id which is folded to one instance with the USING clause) included in both tables. This would not raise an exception in a plain SQL SELECT which tolerates duplicate output column names. But you cannot create a table with duplicate names.
The problem also applies for Greenplum (like you later declared), which is not Postgres. It was forked from PostgreSQL in 2005 and developed separately. The current Postgres manual hardly applies at all any more. Look to the Greenplum documentation.
And psql is just the standard PostgreSQL interactive terminal program. Obviously you are using the one shipped with PostgreSQL 8.2.15, but the RDBMS is still Greenplum, not Postgres.
Syntax fix (for Postgres, like you first tagged, still relevant):
CREATE OR REPLACE FUNCTION myfunction()
RETURNS TABLE (column_a text, column_b text) AS
$func$
BEGIN
CREATE TEMPORARY TABLE raw_data ON COMMIT DROP AS
SELECT d.column_a, d2.column_b -- explicit SELECT list avoids duplicate column names
FROM dummy_data d
JOIN dummy_data_2 d2 using (id);
RETURN QUERY
SELECT DISTINCT column_a, column_b
FROM raw_data
LIMIT 100;
END
$func$ LANGUAGE plpgsql SECURITY DEFINER;
The example wouldn't need a temp table - unless you access the temp table after the function call in the same transaction (ON COMMIT DROP). Else, a plain SQL function is better in every way. Syntax for Postgres and Greenplum:
CREATE OR REPLACE FUNCTION myfunction(OUT column_a text, OUT column_b text)
RETURNS SETOF record AS
$func$
SELECT DISTINCT d.column_a, d2.column_b
FROM dummy_data d
JOIN dummy_data_2 d2 using (id)
LIMIT 100;
$func$ LANGUAGE plpgsql SECURITY DEFINER;
Not least, it should also work for Greenplum.
The only remaining reason for this function is SECURITY DEFINER. Else you could just use the simple SQL statement (possibly as prepared statement) instead.
RETURN QUERY was added to PL/pgSQL with version 8.3 in 2008, some years after the fork of Greenplum. Might explain your error msg:
ERROR: RETURN cannot have a parameter in function returning set;
use RETURN NEXT at or near "QUERY" Position: 237
Aside: LIMIT without ORDER BY produces arbitrary results. I assume you are aware of that.
If for some reason you actually need temp tables and cannot upgrade to Greenplum 5.0 like A. Scherbaum suggested, you can still make it work in Greenplum 4.3.x (like in Postgres 8.2). Use a FOR loop in combination with RETURN NEXT.
Examples:
plpgsql error "RETURN NEXT cannot have a parameter in function with OUT parameters" in table-returning function
How to use `RETURN NEXT`in PL/pgSQL correctly?
Use of custom return types in a FOR loop in plpgsql

How to use PostgreSQL Foreign Data Wrapper to join 2 different postgresql databases

Can anyone provide an example (with the various SQL statements involved) on how to use foreign data wrappers in postgresql to enable a table from a postgresql database A to be joined to a table from a postgresql database B?
It is unclear from the docs to what degree is the FDW functionality available in pgsql 9.0 versus 9.1. The docs also do not have any examples that shows how to join between 2 different postgresql databases (with WHERE qualifier push-down) using FDW.
http://www.postgresql.org/docs/9.0/static/sql-createforeigndatawrapper.html
http://www.postgresql.org/docs/9.1/static/ddl-foreign-data.html
http://www.depesz.com/index.php/2011/03/14/waiting-for-9-1-foreign-data-wrapper/
You manipulate it just like any table. Per Depesz' post:
CREATE FOREIGN TABLE passwd (
username text,
pass text,
uid int4,
gid int4,
gecos text,
home text,
shell text
) SERVER file_server
OPTIONS (format 'text', filename '/etc/passwd', delimiter ':', null '');
select * from passwd;
The docs have no examples of joining tables for a good reason: it's plain old SQL...
The join push-down currently is the subject of a GSOC:
http://wiki.postgresql.org/wiki/Enhancing_FDW_functionality_for_PostgreSQL_GSoC2011
http://nine-chapters.com/?p=5
The simplest solution I found is the dblink extension. I tested it on PostgreSQL 9.1:
create extension dblink.
select * from dblink('port=5452 host=localhost dbname=mydb user=myuser password=xxx',
'select id,spaltenname from variablen') as v (a int, b varchar(20));
http://www.postgresql.org/docs/9.1/static/dblink.html
A simple join would be:
with a as (select * from dblink('port=5452 host=localhost dbname=mydb user=myuser password=xxx', 'select id,spaltenname from variablen') as v (a int, b varchar(20)))
select a join (select 1)b on (true);
The example above enables you to join with a table on another postgresql server, but it is just a copy and then join. No automatic "WHERE qualifier push-down" as you called it. You could of course just select the lines WHERE you need them in the first statement...
If you want to join 2 different postgresql databases I recommend you to use dblink:
select datos.*
FROM dblink('hostaddr=192.168.0.10 port=5432 dbname=my_dbname user=my_user password=my_pass'::text, '
select field_1, field_2
from my_table order by field_1
'
::text)
datos(field_1, integer, field_2 character varying(10));
(I tested it on PostgreSQL 9.1.3)
http://www.postgresql.org/docs/9.2/static/contrib-dblink-function.html