Generated column of tsvector for text array in Postgres (Supabase) - postgresql

in Postgres (Supabase) I am trying to automatically generate a column from another column which contains a text array of short title variants.
The tsvector works just fine and as expected. The other possibility which would be to use array_to_tsvector is not an option as short title text array contains not just single words but variants of short titles (sentences).
alter table "MOVIES"
add column fts_short_title
tsvector GENERATED ALWAYS AS (
to_tsvector('simple',
array_to_string( title_short,' '::text))
) STORED;
but I get this error
Failed to run sql query: generation expression is not immutable
On the other hand I was successful when adding such a column for JSONB of full titles for different languages
alter table "MOVIES"
add column fts
tsvector GENERATED ALWAYS AS (
to_tsvector('simple',
coalesce(title->>'en', '') || ' ' ||
coalesce(title->>'de', '') || ' ' ||
coalesce(title->>'it', '') || ' ' ||
coalesce(title->>'fr', ''))
) STORED;
Thank you very much for any tip and help.
.. SQL is rather new to me, have used only from MongoDB previously, so sorry for my question.

You could define immutable wrappers for otherwise non-immutable functions. online demo
create or replace function array_to_string_immutable (
arg text[],
separator text,
null_string text default null)
returns text immutable parallel safe language sql as $$
select array_to_string(arg,separator,null_string) $$;
alter table "MOVIES"
add column fts_short_title
tsvector GENERATED ALWAYS AS (
to_tsvector('simple',
array_to_string_immutable( title_short,' '::text))
) STORED;
table "MOVIES";
While the textcat() function behind || operator is immutable, I'm pretty sure array_to_string() is only stable for the same reason concat() is so you need to be reasonably careful with where you use this workaround.
You could do the same for the other column to use a concat_ws() and avoid the repeated ||' '||coalesce():
create or replace function concat_ws_immutable (
separator text,
variadic arg text[])
returns text immutable parallel safe language sql as $$
select concat_ws(separator,variadic arg) $$;
alter table "MOVIES"
add column fts
tsvector GENERATED ALWAYS AS (
to_tsvector('simple',concat_ws_immutable(' ',title->>'en',title->>'de',title->>'it',title->>'fr'))
) STORED;
You are also free to do pretty much whatever you want, however you want to the column in a plpgsql function used by a trigger after insert or update on "MOVIES".

Related

Syntax for parameterized PostgreSQL function using dynamic SQL

This code:
ALTER TABLE myschema.mytable add column geom geometry (point,4326);
CREATE INDEX mytable_idx on myschema.mytable using GIST(geom);
UPDATE myschema.mytable set geom = st_setsrid(st_point(mytable.long, mytable.lat), 4326);
This works fine when updating a single table. How would you convert it into a dynamic SQL function, with schema and table as parameters?
Since the function input must be an existing table, the simplest safe way would be to use a regclass input parameter like demonstrated here:
Table name as a PostgreSQL function parameter
However, you also need the bare table name for the concatenated index name, so I'll stick with taking text for schema and table separately:
CREATE OR REPLACE FUNCTION create_geom(_sch text, _tab text)
RETURNS void
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE format(
'ALTER TABLE %1$I.%2$I ADD COLUMN geom geometry(POINT,4326);
UPDATE %1$I.%2$I SET geom = st_setsrid(st_point(long, lat), 4326);
CREATE INDEX %3$I ON %1$I.%2$I USING gist(geom);'
, _sch, _tab
, _tab || '_geom_gist_idx');
END
$func$;
Call:
SELECT create_geom('myschema', 'mytable');
Use a single EXECUTE, no need for multiple calls.
Just omit table-qualification for columns in the UPDATE. While not joining additional tables, column names are unambiguous. Else, use a table alias, which can be constant. Like:
UPDATE %1$s AS x SET geom = st_setsrid(st_point(x.long, x.lat), 4326);
But it's smarter to populate the column before you build the index. That's a lot faster and produces a balanced index without bloat. So I switched the commands.
Note how I concatenate the index name first (_tab || '_geom_gist_idx'), and then double-quote as required with %3$I. That's the safe way. Something like %I_idx fails with non-standard names.
That said, it's typically a mistake to add columns with redundant information to a table. (What keeps you from changing one or the other? Why bloat the table?) Either just use an expression index instead of all of the above:
CREATE INDEX ON myschema.mytable USING gist (st_setsrid(st_point(long, lat), 4326));
Or drop the now redundant long & lat from the table. Those can be extracted from the new geom cheaply on the fly.
Or, if you need all columns (for special performance reasons?), consider a generated column instead. See:
Computed / calculated / virtual / derived columns in PostgreSQL
Having your queries as SQL templates and using format function for identifiers:
CREATE OR REPLACE FUNCTION public.create_geom(sch text, tab text)
RETURNS void language plpgsql AS $body$
DECLARE
DYNSQLA constant text := 'ALTER TABLE %I.%I add column geom geometry (point,4326)';
DYNSQLB constant text := 'CREATE INDEX %I_idx on %I.%I using GIST(geom);';
DYNSQLC constant text := 'UPDATE %I.%I set geom = st_setsrid(st_point(%I.long, %I.lat), 4326)';
BEGIN
execute format(DYNSQLA, sch, tab);
execute format(DYNSQLB, tab, sch, tab);
execute format(DYNSQLC, sch, tab, tab, tab);
END;
$body$;
SELECT create_geom('myschema','mytable');

Undefined column error when using the output of a function as the table name

I have a multi-tenant database where each tenant gets their own schema. Each schema has a set of materialized views used in full-text searches.
The following function takes a schema name and a table name and concatenates them into schema.table_name format:
CREATE OR REPLACE FUNCTION create_table_name(_schema text, _tbl text, OUT result text)
AS 'select $1 || ''.'' || $2'
LANGUAGE SQL
It works as expected in PGAdmin:
I'm trying to use this function in a prepared statement, like this:
SELECT p.id AS id,
ts_rank(
p.document, plainto_tsquery(unaccent(?))
) AS rank
FROM create_table_name(?, 'project_search') AS p
WHERE p.document ## plainto_tsquery(unaccent(?))
OR p.name ILIKE ?
However, when I run it, I get the following error:
ERROR 42703 (undefined_column) column p.id does not exist
If I "hard-code" the schema and table name though, it works.
Why am I getting this error?
P.S. I should note that I am aware of the dangers of this approach, but the schema name always comes from inside my application so I'm not worried about SQL injection.
You want to use the function result as table name in a query, but what you are actually doing is using the function as a table function. This “table” has only one row and one column called result, which explains the error message.
You need dynamic SQL for that, for example by using PL/pgSQL code in a DO statement:
DO
$$DECLARE
...
BEGIN
EXECUTE
format(
E'SELECT p.id AS id,\n'
' ts_rank(\n'
' p.document,\n'
' plainto_tsquery(unaccent(?))\n'
' ) AS rank\n'
'FROM %I.project_search AS p\n'
'WHERE p.document ## plainto_tsquery(unaccent($1))\n'
'OR p.name ILIKE $2',
schema_name
)
USING fts_query, like_pattern
INTO var1, ...;
...
$$;
To handle more than one result row, you'd use a FOR loop — this is just a simple example to show the principle.
Note how I use format with the %I pattern to avoid SQL injection. Your function is vulnerable.

Dynamic SELECT INTO in PL/pgSQL function

How can I write a dynamic SELECT INTO query inside a PL/pgSQL function in Postgres?
Say I have a variable called tb_name which is filled in a FOR loop from information_schema.tables. Now I have a variable called tc which will be taking the row count for each table. I want something like the following:
FOR tb_name in select table_name from information_schema.tables where table_schema='some_schema' and table_name like '%1%'
LOOP
EXECUTE FORMAT('select count(*) into' || tc 'from' || tb_name);
END LOOP
What should be the data type of tb_name and tc in this case?
CREATE OR REPLACE FUNCTION myfunc(_tbl_pattern text, _schema text = 'public')
RETURNS void AS -- or whatever you want to return
$func$
DECLARE
_tb_name information_schema.tables.table_name%TYPE; -- currently varchar
_tc bigint; -- count() returns bigint
BEGIN
FOR _tb_name IN
SELECT table_name
FROM information_schema.tables
WHERE table_schema = _schema
AND table_name ~ _tbl_pattern -- see below!
LOOP
EXECUTE format('SELECT count(*) FROM %I.%I', _schema, _tb_name)
INTO _tc;
-- do something with _tc
END LOOP;
END
$func$ LANGUAGE plpgsql;
Notes
I prepended all parameters and variables with an underscore (_) to avoid naming collisions with table columns. Just a useful convention.
_tc should be bigint, since that's what the aggregate function count() returns.
The data type of _tb_name is derived from its parent column dynamically: information_schema.tables.table_name%TYPE. See the chapter Copying Types in the manual.
Are you sure you only want tables listed in information_schema.tables? Makes sense, but be aware of implications. See:
How to check if a table exists in a given schema
a_horse already pointed to the manual and Andy provided a code example. This is how you assign a single row or value returned from a dynamic query with EXECUTE to a (row) variable. A single column (like count in the example) is decomposed from the row type automatically, so we can assign to the scalar variable tc directly - in the same way we would assign a whole row to a record or row variable. Related:
How to get the value of a dynamically generated field name in PL/pgSQL
Schema-qualify the table name in the dynamic query. There may be other tables of the same name in the current search_path, which would result in completely wrong (and very confusing!) results without schema-qualification. Sneaky bug! Or this schema is not in the search_path at all, which would make the function raise an exception immediately.
How does the search_path influence identifier resolution and the "current schema"
Always quote identifiers properly to defend against SQL injection and random errors. Schema and table have to be quoted separately! See:
Table name as a PostgreSQL function parameter
Truncating all tables in a Postgres database
I use the regular expression operator ~ in table_name ~ _tbl_pattern instead of table_name LIKE ('%' || _tbl_pattern || '%'), that's simpler. Be wary of special characters in the pattern parameter either way! See:
PostgreSQL Reverse LIKE
Escape function for regular expression or LIKE patterns
Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL
I set a default for the schema name in the function call: _schema text = 'public'. Just for convenience, you may or may not want that. See:
Assigning default value for type
Addressing your comment: to pass values, use the USING clause like:
EXECUTE format('SELECT count(*) FROM %I.%I
WHERE some_column = $1', _schema, _tb_name,column_name)
USING user_def_variable;
Related:
INSERT with dynamic table name in trigger function
It looks like you want the %I placeholder for FORMAT so that it treats your variable as an identifier. Also, the INTO clause should go outside the prepared statement.
FOR tb_name in select table_name from information_schema.tables where table_schema='some_schema' and table_name like '%1%'
LOOP
EXECUTE FORMAT('select count(*) from %I', tb_name) INTO tc;
END LOOP

Error: column does not exist in PostgreSQL

Trying to create a function that will return multiple rows from a table if a searchTerm exists anywhere inside one of the columns. (I am new to Postgres.)
CREATE OR REPLACE FUNCTION dts_getProjects(searchTerm TEXT) RETURNS SETOF project
AS $$
SELECT credit_br AS Branch, status FROM job_project
WHERE credit_br LIKE '%'||searchTerm||'%'
$$
language 'sql';
I get this error:
ERROR: column "searchTerm" does not exist
LINE 3: ...status FROM job_project WHERE credit_br LIKE '%'||searchTerm||'...
It should work like this:
CREATE OR REPLACE FUNCTION dts_get_projects(_search_term text)
RETURNS SETOF job_project AS
$func$
SELECT j.*
FROM job_project j
WHERE j.credit_br ILIKE '%' || _search_term || '%'
$func$ LANGUAGE sql;
I am using the table type to return whole rows. That's the safe fallback since you did not disclose any data types or table definitions.
I also use ILIKE to make the search case-insensitive (just a guess, you decide).
This only searches the one column credit_br. Your description sounds like you'd want to search all columns (anywhere inside one of the columns). Again, most of the essential information is missing. A very quick and slightly dirty way would be to search the whole row expression converted to text:
...
WHERE j::text ILIKE '%' || _search_term || '%';
...
Related:
Check a whole table for a single value
Asides:
Don't use mixed-case identifiers in Postgres if you can avoid it.
Are PostgreSQL column names case-sensitive?
Don't quote the language name of functions. It's an identifier.

PL/pgSQL INSERT INTO table

I have a plpgsql function that creates an execute statement and I want it to execute it into a table. The standard EXECUTE ... INTO table_name didn't work, so I was looking for an alternative. Basically, the select statement returns three columns which I want to save to a table.
Here's some sample code of the execute:
query = 'SELECT name, work, phone FROM info WHERE name = '
|| quote_literal(inName) || ' ORDER BY phone';
Ideally, if I was just running the query myself I would just put a SELECT INTO tablename, but that didn't work with the execute.
Any ideas?
Use CREATE TABLE AS for that:
EXECUTE 'CREATE TABLE foo AS
SELECT name, work, phone
FROM info WHERE name = ' || quote_literal(in_name) || ' ORDER BY phone';
SELECT INTO is discouraged for that purpose:
Combine two tables into a new one so that select rows from the other one are ignored
SELECT / EXECUTE .. INTO .. is meant for single rows, not for whole tables in plpgsql.
And the assignment operator in plpgsql is :=, not =.
And a single quote was missing.
And don't use unquoted mixed case identifiers in Postgres.