Postgres: How can I include weights in a generated column? - postgresql

I have following schema:
CREATE TABLE books (
title VARCHAR(255),
subtitle TEXT
);
Adding a generated column without weights is working fine:
ALTER TABLE books ADD COLUMN full_text_search TSVECTOR
GENERATED ALWAYS AS (to_tsvector('english',
coalesce(title, '') ||' '||
coalesce(subtitle, '')
)) STORED; -- ✅ Working
Now I want to add weights and it is not working:
ALTER TABLE books ADD COLUMN full_text_search_weighted TSVECTOR
GENERATED ALWAYS AS (to_tsvector('english',
setweight(coalesce(title, ''), 'A') ||' '||
setweight(coalesce(subtitle, ''), 'B')
)) STORED; -- ❌ Not working
Is there a way to include weights with a generated column in postgres?
Reproduction Link: https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/1385

After reading the docs I found out that setweight is returning tsvector. I had to modify it like this:
ALTER TABLE books ADD COLUMN full_text_search_weighted TSVECTOR
GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(title, '')), 'A') ||' '||
setweight(to_tsvector('english', coalesce(subtitle, '')), 'B')
) STORED; -- ✅ Working
Now we can order the result with following query:
SELECT ts_rank(full_text_search_weighted , plainto_tsquery('english', 'book')), title, subtitle
FROM "books"
WHERE full_text_search_weighted ## plainto_tsquery('english', 'book')
ORDER BY ts_rank(full_text_search_weighted , plainto_tsquery('english', 'book')) DESC;
Reproduction link: https://www.db-fiddle.com/f/sffBR96NJtWej9c1Pcfg2H/0

Related

How to find all foreign key constraints that are broken in Postgresql

We recently discovered an issue with our database where a foreign key constraint was not working correctly. Basically the primary table did not have any primary ids that matched the foreign key in the child table. When we dropped the foreign key constraint and tried to recreate it, it then threw an error that the foreign key constraint could not be created because there were foreign keys with no matching id in the parent table. Once those were cleaned up, it allowed us to recreate the foreign key.
Of course we are wondering how this happened to begin with. I worked with Oracle for 15 years and never saw a foreign key failing in this way. But our concern right now is how many other foreign keys are not working correctly. This is a problem because we have some BEFORE DELETE triggers that fail silently when the calling function returns a null because of a foreign_key_violation (this is how we discovered the issue to begin with).
EXCEPTION
WHEN foreign_key_violation
THEN RETURN NULL;
What we want to do is get all of the foreign keys in the database (probably a few thousand), loop over them all, and check every single one against it's parent table to see if any are "broken".
Basically:
Select all foreign keys using Postgres system tables.
Loop over all of them and do something like:
select count(parent_id) from child_table
where foreign_key_id not in (
select parent_id as foreign_key_id
from parent_table
)
);
For all the ones that are not 0, drop the foreign key constraint, fix the orphaned data, and recreate the foreign key constraint.
Does this sound reasonable? Has anyone done something like this before? What is the best way to get the foreign key constraints from Postgres?
Edit:
What we realized is that if we dropped and recreated the foreign keys, it would tell us which ones were problematic.
SELECT 'Alter table ' || conrelid::regclass || ' drop constraint ' || conname || '; alter table ' || conrelid::regclass || ' add constraint ' || conname || ' ' || pg_get_constraintdef(oid) || ';'
FROM pg_constraint
WHERE contype = 'f'
AND connamespace = 'public'::regnamespace
ORDER BY conrelid::regclass::text, contype DESC;
Next, run all the commands. If there are any foreign key violations, they will show up when the alter table add constraint command tries to run. Make a note of which ones had a problem and keep running the following commands until you have a list of all issues.
The following is example of sql commands to fix the issues.
delete from child_table where id_child_table in (
select id_child_table from child_table
where id_parent_id not in (
select id_parent_id
from parent_table
)
);
alter table child_table drop constraint child_table_fk1;
alter table child_table add constraint child_table_fk1 FOREIGN KEY (id_parent_table) REFERENCES parent_table(id_parent_table) ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE;
Using a basic data sample :
create table test1 (id1 serial, seq1 int, constraint pk1 primary key (id1, seq1));
create table test2 (id2 int, seq2 int, descr2 text, constraint pk2 primary key (id2, seq2), constraint fk1 foreign key (id2, seq2) references test1 (id1, seq1) ON DELETE RESTRICT ON UPDATE RESTRICT) ;
insert into test1 (seq1) values (1), (2), (3), (4), (5);
insert into test2 (id2, seq2, descr2) values (1,1,'A'), (2,2,'B'), (3,3,'C'), (4,4,'D'), (5,5,'E');
Considering the query to get the list of foreign keys (source internet) :
SELECT conrelid::regclass AS table_name,
conname AS foreign_key,
pg_get_constraintdef(oid)
FROM pg_constraint
WHERE contype = 'f'
AND connamespace = 'public'::regnamespace
ORDER BY conrelid::regclass::text, contype DESC;
Result :
table_name
foreign_key
pg_get_constraintdef
test2
fk1
FOREIGN KEY (id2, seq2) REFERENCES test1(id1, seq1) ON UPDATE RESTRICT ON DELETE RESTRICT
With some changes to get a computable resulting list :
SELECT conrelid::regclass AS ReferencingTable
, a.ReferencingKey
, b.ReferencedTable
, b.ReferencedKey
FROM pg_constraint AS pg
CROSS JOIN LATERAL
( SELECT '(' || string_agg(conrelid::regclass || '.' || fkey, ',' ORDER BY fkey.ORDINALITY) || ')' AS ReferencingKey
FROM pg_get_constraintdef(pg.oid) AS fk
CROSS JOIN LATERAL string_to_table(translate(regexp_substr(fk, '\([^\)]*\)', 1, 1), '() ', ''), ',') WITH ORDINALITY AS fkey
) AS a
CROSS JOIN LATERAL
( SELECT '(' || string_agg(ReferencedTable || '.' || rkey, ',' ORDER BY rkey.ORDINALITY) || ')' AS ReferencedKey
, ReferencedTable
FROM pg_get_constraintdef(pg.oid) AS fk
CROSS JOIN LATERAL split_part((regexp_match(fk, 'REFERENCES\s\w+'))[1], ' ', 2) AS ReferencedTable
CROSS JOIN LATERAL string_to_table(translate(regexp_substr(fk, '\([^\)]*\)', 1, 2), '() ', ''), ',') WITH ORDINALITY AS rkey
GROUP BY ReferencedTable
) AS b
WHERE contype = 'f'
AND connamespace = 'public' ::regnamespace
Result :
referencingtable
referencingkey
referencedtable
referencedkey
test2
(test2.id2,test2.seq2)
test1
(test1.id1,test1.seq1)
Creating a plpgsql function with schema_name as input data and with a dynamic query :
CREATE OR REPLACE FUNCTION test(IN schema_name text, OUT Referencing_Table text, OUT Referencing_Key text, OUT Referenced_Table text)
RETURNS setof record LANGUAGE plpgsql AS
$$
DECLARE
_row record ;
BEGIN
FOR _row IN
( SELECT conrelid::regclass AS ReferencingTable
, a.ReferencingKey
, b.ReferencedTable
, b.ReferencedKey
FROM pg_constraint AS pg
CROSS JOIN LATERAL
( SELECT '(' || string_agg(conrelid::regclass || '.' || fkey, ',' ORDER BY fkey.ORDINALITY) || ')' AS ReferencingKey
FROM pg_get_constraintdef(pg.oid) AS fk
CROSS JOIN LATERAL string_to_table(translate(regexp_substr(fk, '\([^\)]*\)', 1, 1), '() ', ''), ',') WITH ORDINALITY AS fkey
) AS a
CROSS JOIN LATERAL
( SELECT '(' || string_agg(ReferencedTable || '.' || rkey, ',' ORDER BY rkey.ORDINALITY) || ')' AS ReferencedKey
, ReferencedTable
FROM pg_get_constraintdef(pg.oid) AS fk
CROSS JOIN LATERAL split_part((regexp_match(fk, 'REFERENCES\s\w+'))[1], ' ', 2) AS ReferencedTable
CROSS JOIN LATERAL string_to_table(translate(regexp_substr(fk, '\([^\)]*\)', 1, 2), '() ', ''), ',') WITH ORDINALITY AS rkey
GROUP BY ReferencedTable
) AS b
WHERE contype = 'f'
AND connamespace = schema_name ::regnamespace )
LOOP
RETURN QUERY EXECUTE FORMAT (
'SELECT %L :: text AS Referencing_Table, %s :: text AS Referencing_Key, %L :: text AS Referenced_Table
FROM %I LEFT JOIN %I ON %s = %s
WHERE %s IS NULL'
, _row.ReferencingTable
, _row.ReferencingKey
, _row.ReferencedTable
, _row.ReferencingTable
, _row.ReferencedTable
, _row.ReferencedKey
, _row.ReferencingKey
, _row.ReferencedKey
) ;
END LOOP ;
END ;
$$ ;
SELECT * FROM test('public') should return the list of foreign keys of any table Referencing_Table in schema schema_name with no correspondance in table Referenced_Table
but not able to test the result in dbfiddle because not able to break the foreign key (superuser privilege)

How to perform a database-wide full text search in PostgreSQL?

I have a PostgreSQL database with about 500 tables. Each table has a unique ID column named id and a user ID column named user_id. I would like to perform a full-text search of all varchar columns across all of these tables for a particular user. I do this today with ElasticSearch but I'd like to simplify my architecture. I understand that I can add full text search columns to all of the tables with things like stored generated columns and then add indices for fast full text search:
ALTER TABLE pgweb
ADD COLUMN textsearchable_index_col tsvector
GENERATED ALWAYS AS (to_tsvector('english', coalesce(title, '') || ' ' || coalesce(body, ''))) STORED;
CREATE INDEX textsearch_idx ON pgweb USING GIN (textsearchable_index_col);
However, I'm not familiar with how to do cross-table searches efficiently. Maybe a view across all textsearchable_index_col columns? I'd like the result to be something like the table name and id of the matching row. For example:
table_name | id
-------------+-------
table1 | 492
table42 | 20
If it matters, I'm using Ruby on Rails as the client with ActiveRecord. I'm using a managed PostgreSQL 13 database at Digital Ocean so I won't be able to install custom psql plugins.
Maybe It is not the answer you are looking for, because I am not sure if there is a better approach, but first I will try to automate the process.
I will make two dynamic queries, the first one to create columns textsearchable_index_col (in each table with at least one varchar column) and the other to create an index on that columns (one index per table).
You could ADD a textsearchable_index_col column for each "character varying" column instead only one concatenating all "character varying" columns, but in this case I will create one textsearchable_index_col column per table like you propose.
I assume table schema "public" but you can use the real one.
-- Create columns textsearchable_index_col:
SELECT 'ALTER TABLE ' || table_schema || '.' || table_name || E' ADD COLUMN textsearchable_index_col tsvector GENERATED ALWAYS AS (to_tsvector(\'english\', coalesce(' ||
string_agg(column_name, E', \'\') || \' \' || coalesce(') || E', \'\'))) STORED;'
FROM information_schema.columns
WHERE table_schema = 'public' AND data_type IN ('character varying')
GROUP BY table_schema, table_name;
-- Create indexes on textsearchable_index_col columns:
SELECT 'CREATE INDEX ' || table_name || '_textsearch_idx ON ' || table_schema || '.' || table_name || ' USING GIN (textsearchable_index_col);'
FROM information_schema.columns
WHERE table_schema = 'public' AND data_type IN ('character varying')
GROUP BY table_schema, table_name;
Then I will use a dynamic query to create a query (using UNION) to search on all that textsearchable_index_col columns:
You need to replace question mark by parameters (user_id and searched text), and take out the last "UNION ALL"
SELECT E'SELECT \'' || table_name || E'\' AS table_name, id FROM ' || table_schema || '.' || table_name || E' WHERE user_id = ? AND textsearchable_index_col' || ' ## to_tsquery(?) UNION ALL'
FROM information_schema.columns
WHERE table_schema = 'public' AND data_type IN ('character varying')
GROUP BY table_schema, table_name;

How to check content loading status on a static database?

We have a static database we constantly update with loader scripts. These loader scripts get current information from third party sources, clean it and upload it to database.
I have already made some SQL scripts to ensure schemas and tables required exists. Now I'd like to check that each table has the expected row count.
I did something like this:
select case when count(*) = <someNumber>
then 'someSchema.someTable OK'
else 'someSchema.someTable BAD row count' end
from someSchema.someTable;
But doing these kind of queries for ~300 tables is cumbersome.
Now I was thinking maybe there's a way to have a table like:
create table expected_row_count (
schema_name varchar,
table_name varchar,
row_count bigint
);
And somehow test all listed tables and only output the ones that fail the count check. But I'm kind of missing now... Should I try to write a function? Can a table like this be used to build queries and execute them?
Whole credit goes to #a-horse_with*_no_name , I'm posting a reply for completeness:
Check row count
First let's create some data to test the query:
create schema if not exists data;
create table if not exists data.test1 (nothing int);
create table if not exists data.test2 (nothing int);
insert into data.test1 (nothing)
(select random() from generate_series(1, 28));
insert into data.test2 (nothing)
(select random() from generate_series(1, 55));
create table if not exists public.expected_row_count (
table_schema varchar not null default '',
table_name varchar not null default '',
row_count bigint not null default 0
);
insert into public.expected_row_count (table_schema, table_name, row_count) values
('data', 'test1', (select count(*) from data.test1)),
('data', 'test2', (select count(*) from data.test2))
;
Now the query to check the data:
select * from (
select
table_schema,
table_name,
(xpath('/row/cnt/text()', xml_count))[1]::text::int as row_count
from (
select
table_schema,
table_name,
query_to_xml(format('select count(*) as cnt from %I.%I', table_schema, table_name), false, true, '') as xml_count
from information_schema.tables
where table_schema = 'data' --<< change here for the schema you want
) infs ) as r
inner join expected_row_count erc
on r.table_schema = erc.table_schema
and r.table_name = erc.table_name
and r.row_count != erc.row_count
;
Previous query should give an empty results if all counts are ok, and the
tables with missing data if not. To check it, update the count for some
table on expected_row_count and re-run the query. For example:
update expected_row_count set row_count = 666 where table_name = 'test1';

Make index use using calculative field T-SQL

We do have one query which is running very frequently but as we are using function on both side of column, it's not doing index seek and this query turn out to be one of the most expensive query. Is there any way i can make this column calculative?
SELECT TOP 1 column1
FROM table1
WHERE replace(replace(replace(column2, char(10), ''), char(13), ''), ' ', '') =
replace(replace(replace(#var1, char(10), ''), char(13), ''), ' ', '')
To expand on dfundako's comment, you can make an index on a persisted calculated column like this:
CREATE TABLE table1 (
column1 INT,
column2 VARCHAR(50),
column3 AS REPLACE(replace(replace(column2, char(10), ''), char(13), ''), ' ', '') PERSISTED
)
CREATE INDEX index1 ON table1 (column3) INCLUDE (column1)
DECLARE #var1 VARCHAR(50)
SELECT column1 FROM dbo.table1 WHERE column3=replace(replace(replace(#var1, char(10), ''), char(13), ''), ' ', '')

Postgresql, select a "fake" row

In Postgres 8.4 or higher, what is the most efficient way to get a row of data populated by defaults without actually creating the row. Eg, as a transaction (pseudocode):
create table "mytable"
(
id serial PRIMARY KEY NOT NULL,
parent_id integer NOT NULL DEFAULT 1,
random_id integer NOT NULL DEFAULT random(),
)
begin transaction
fake_row = insert into mytable (id) values (0) returning *;
delete from mytable where id=0;
return fake_row;
end transaction
Basically I'd expect a query with a single row where parent_id is 1 and random_id is a random number (or other function return value) but I don't want this record to persist in the table or impact on the primary key sequence serial_id_seq.
My options seem to be using a transaction like above or creating views which are copies of the table with the fake row added but I don't know all the pros and cons of each or whether a better way exists.
I'm looking for an answer that assumes no prior knowledge of the datatypes or default values of any column except id or the number or ordering of the columns. Only the table name will be known and that a record with id 0 should not exist in the table.
In the past I created the fake record 0 as a permanent record but I've come to consider this record a type of pollution (since I typically have to filter it out of future queries).
You can copy the table definition and defaults to the temp table with:
CREATE TEMP TABLE table_name_rt (LIKE table_name INCLUDING DEFAULTS);
And use this temp table to generate dummy rows. Such table will be dropped at the end of the session (or transaction) and will only be visible to current session.
You can query the catalog and build a dynamic query
Say we have this table:
create table test10(
id serial primary key,
first_name varchar( 100 ),
last_name varchar( 100 ) default 'Tom',
age int not null default 38,
salary float default 100.22
);
When you run following query:
SELECT string_agg( txt, ' ' order by id )
FROM (
select 1 id, 'SELECT ' txt
union all
select 2, -9999 || ' as id '
union all
select 3, ', '
|| coalesce( column_default, 'null'||'::'||c.data_type )
|| ' as ' || c.column_name
from information_schema.columns c
where table_schema = 'public'
and table_name = 'test10'
and ordinal_position > 1
) xx
;
you will get this sting as a result:
"SELECT -9999 as id , null::character varying as first_name ,
'Tom'::character varying as last_name , 38 as age , 100.22 as salary"
then execute this query and you will get the "phantom row".
We can build a function that build and excecutes the query and return our row as a result:
CREATE OR REPLACE FUNCTION get_phantom_rec (p_i test10.id%type )
returns test10 as $$
DECLARE
v_sql text;
myrow test10%rowtype;
begin
SELECT string_agg( txt, ' ' order by id )
INTO v_sql
FROM (
select 1 id, 'SELECT ' txt
union all
select 2, p_i || ' as id '
union all
select 3, ', '
|| coalesce( column_default, 'null'||'::'||c.data_type )
|| ' as ' || c.column_name
from information_schema.columns c
where table_schema = 'public'
and table_name = 'test10'
and ordinal_position > 1
) xx
;
EXECUTE v_sql INTO myrow;
RETURN myrow;
END$$ LANGUAGE plpgsql ;
and then this simple query gives you what you want:
select * from get_phantom_rec ( -9999 );
id | first_name | last_name | age | salary
-------+------------+-----------+-----+--------
-9999 | | Tom | 38 | 100.22
I would just select the fake values as literals:
select 1 id, 1 parent_id, 1 user_id
The returned row will be (virtually) indistinguishable from a real row.
To get the values from the catalog:
select
0 as id, -- special case for serial type, just return 0
(select column_default::int -- Cast to int, because we know the column is int
from INFORMATION_SCHEMA.COLUMNS
where table_name = 'mytable'
and column_name = 'parent_id') as parent_id,
(select column_default::int -- Cast to int, because we know the column is int
from INFORMATION_SCHEMA.COLUMNS
where table_name = 'mytable'
and column_name = 'user_id') as user_id;
Note that you must know what the columns are and their type, but this is reasonable. If you change the table schema (except default value), you would need to tweak the query.
See the above as a SQLFiddle.