Force index to be updated - postgresql

I'm handing ownerships as "Project -> Ownership -> User" relations and the following function gets the project owners' names as text:
CREATE FUNCTION owners_as_text(projects) RETURNS TEXT AS $$
SELECT trim(both concat_ws(' ', screen_name, first_name, last_name)) FROM users
INNER JOIN ownerships ON users.id = ownerships.user_id
WHERE deleted_at IS NULL AND ownable_id = $1.id AND ownable_type = 'Project'
$$ LANGUAGE SQL IMMUTABLE SET search_path = public, pg_temp;
This is then used to build an index which ignores accents:
CREATE INDEX index_projects_on_owners_as_text
ON projects
USING GIN(immutable_unaccent(owners_as_text(projects)) gin_trgm_ops)
When the project is updated, this index is updated as well. However, when e.g. an owner name changes, this index won't be touched, right?
How can I force the index to be updated on a regular basis to catch up in that case?
(REINDEX is not an option since it's locking and will cause deadlocks should write actions happen at the same time.)

The idea is erroneously assumed because the index is built on a function that in fact is not immutable. For the documentation:
IMMUTABLE indicates that the function cannot modify the database and always returns the same result when given the same argument values; that is, it does not do database lookups or otherwise use information not directly present in its argument list.
The problems you are now facing arise from incorrect assumptions.

Since you lied to PostgreSQL by saying that the function was IMMUTABLE when it is actually STABLE, it is unsurprising that the index becomes corrupted when the database changes.
The solution is not to create such an index.
It would be better not to use that function, but a view that has the expression you want to search for as a column. Then a query that uses the view can be optimized, and an index on immutable_unaccent(btrim(concat_ws(' ', screen_name, first_name, last_name))) can be used.
It is probably OK to cheat about unaccent's volatility...

Related

can't btree index function that returns text

I've created a function to index a certain value from another table.
basically, i'm querying these activities, often filtered on the context of the activityplans table.
this is my function:
CREATE or replace FUNCTION get_activity_context(parentact uuid, parentplan uuid) RETURNS TEXT
LANGUAGE sql IMMUTABLE AS
$$
SELECT CASE
WHEN $2 is not null THEN
(select LOWER((("context")::json->>'href')::text) from activityplans ap where ap.key = $2)
WHEN $1 is not null THEN
(select LOWER((("context")::json->>'href')::text) from activityplans ap, activities act where act."parentPlan" = ap.key AND act.key=$1)
END
$$;
the function works, when I use it, for example, like select get_activity_context("parentActivity", "parentPlan") from activities limit 10;
but when i try to create an index:
create index on activities (get_activity_context("parentActivity", "parentPlan"));
i get this:
ERROR: could not read block 0 in file "base/16402/60840": read only 0 of 8192 bytes
CONTEXT: SQL function "get_activity_context" during startup
SQL state: XX001
googling this error only bring me to database data issues etc, but i don't think this is the case. My guess is something is wrong with the function, but i can't seem to figure out what.
I don't know which relation 60840 is in your database, but it sure has a problem. Find out with
SELECT relname,relnamespace::regnamespace
FROM pg_class
WHERE relfilenode = 60840;
Anyway, that index will never work, because the function is not IMMUTABLE, no matter how you declared it. It may return a different result tomorrow. This would lead to data corruption.
An index on one table can never refer to data from another table.
My first thought:
Usually index are created like this:
CREATE INDEX id_column_idx
ON public.naleznosc USING btree
(id_column)
TABLESPACE pg_default;
but you, are trying to create them this way:
CREATE INDEX .......... on activities ...........(get_activity_context("parentActivity", "parentPlan")) ...........;
The dots showing places where you not put some things :)

Postgresql row level security does not throw errors

I have a Postgresql DB that I want to enable the Row-Level-Security on one of its tables.
Everything is working fine, except one thing, that is I want to have an error to be thrown when a user try to perform an update on a record that he doesn't have privileges on.
According to the docs:
check_expression: Any SQL conditional expression (returning boolean).
The conditional expression cannot contain any aggregate or window
functions. This expression will be used in INSERT and UPDATE queries
against the table if row level security is enabled. Only rows for
which the expression evaluates to true will be allowed. An error will
be thrown if the expression evaluates to false or null for any of the
records inserted or any of the records that result from the update.
Note that the check_expression is evaluated against the proposed new
contents of the row, not the original contents.
So I tried the following:
CREATE POLICY update_policy ON my_table FOR UPDATE TO editors
USING (has_edit_privilege(user_name))
WITH CHECK (has_edit_privilege(user_name));
I have also another policy for SELECT
CREATE POLICY select_policy ON my_table FOR SELECT TO editors
USING (has_select_privilege(user_name));
According to the docs, this should create a policy that would prevent any one from the editors ROLE to perform update on any record of my_table, and would throw an error when an update is performed. This works correctly, but no error is thrown.
What's my problem?
Please help.
First, let me explain how row level security works when reading from the table:
You need not even have to define the policy – if there is no policy for a user on a table with row level security, the default is that the user can see nothing.
No error will be thrown when reading from a table.
If you want an error to be thrown, you could write a function
CREATE FUNCTION test_row(my_table) RETURNS boolean
LANGUAGE plpgsql COST 10000 AS
$$BEGIN
IF /* the user is not allowed */
THEN
RAISE EXCEPTION ...;
END IF;
RETURN TRUE;
$$END;$$;
Then use that function in your policy:
CREATE POLICY update_policy ON my_table FOR UPDATE TO editors
USING (test_row(my_table));
I used COST 10000 for the function to tell PostgreSQL to test that condition after all other conditions, if possible.
This is not a fool-proof technique, but it will work for simple queries. What could happen in the general case is that some conditions get checked after the condition from the policy, and this could lead to errors with rows that wouldn't even be returned from the query.
But I think it is the best you can get when abusing the concept of row level security.
Now let me explain about writing to the table:
Any attempt to write a row to the table that does not satisfy the CHECK clause will cause an error as documented.
Now let's put it together:
Assuming that you define the policy like in your question:
Any INSERT will cause an error.
Any SELECT will return an empty result.
Any UPDATE or DELETE will be successful, but affect no row. This is because these operations have to read (scan) the table before they modify data, and that scan will return no rows like in the SELECT case. Since no rows are affected by the data modification, no error is thrown.

Executing queries dynamically in PL/pgSQL

I have found solutions (I think) to the problem I'm about to ask for on Oracle and SQL Server, but can't seem to translate this into a Postgres solution. I am using Postgres 9.3.6.
The idea is to be able to generate "metadata" about the table content for profiling purposes. This can only be done (AFAIK) by having queries run for each column so as to find out, say... min/max/count values and such. In order to automate the procedure, it is preferable to have the queries generated by the DB, then executed.
With an example salesdata table, I'm able to generate a select query for each column, returning the min() value, using the following snippet:
SELECT 'SELECT min('||column_name||') as minval_'||column_name||' from salesdata '
FROM information_schema.columns
WHERE table_name = 'salesdata'
The advantage being that the db will generate the code regardless of the number of columns.
Now there's a myriad places I had in mind for storing these queries, either a variable of some sort, or a table column, the idea being to then have these queries execute.
I thought of storing the generated queries in a variable then executing them using the EXECUTE (or EXECUTE IMMEDIATE) statement which is the approach employed here (see right pane), but Postgres won't let me declare a variable outside a function and I've been scratching my head with how this would fit together, whether that's even the direction to follow, perhaps there's something simpler.
Would you have any pointers, I'm currently trying something like this, inspired by this other question but have no idea whether I'm headed in the right direction:
CREATE OR REPLACE FUNCTION foo()
RETURNS void AS
$$
DECLARE
dyn_sql text;
BEGIN
dyn_sql := SELECT 'SELECT min('||column_name||') from salesdata'
FROM information_schema.columns
WHERE table_name = 'salesdata';
execute dyn_sql
END
$$ LANGUAGE PLPGSQL;
System statistics
Before you roll your own, have a look at the system table pg_statistic or the view pg_stats:
This view allows access only to rows of pg_statistic that correspond
to tables the user has permission to read, and therefore it is safe to
allow public read access to this view.
It might already have some of the statistics you are about to compute. It's populated by ANALYZE, so you might run that for new (or any) tables before checking.
-- ANALYZE tbl; -- optionally, to init / refresh
SELECT * FROM pg_stats
WHERE tablename = 'tbl'
AND schemaname = 'public';
Generic dynamic plpgsql function
You want to return the minimum value for every column in a given table. This is not a trivial task, because a function (like SQL in general) demands to know the return type at creation time - or at least at call time with the help of polymorphic data types.
This function does everything automatically and safely. Works for any table, as long as the aggregate function min() is allowed for every column. But you need to know your way around PL/pgSQL.
CREATE OR REPLACE FUNCTION f_min_of(_tbl anyelement)
RETURNS SETOF anyelement
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE (
SELECT format('SELECT (t::%2$s).* FROM (SELECT min(%1$s) FROM %2$s) t'
, string_agg(quote_ident(attname), '), min(' ORDER BY attnum)
, pg_typeof(_tbl)::text)
FROM pg_attribute
WHERE attrelid = pg_typeof(_tbl)::text::regclass
AND NOT attisdropped -- no dropped (dead) columns
AND attnum > 0 -- no system columns
);
END
$func$;
Call (important!):
SELECT * FROM f_min_of(NULL::tbl); -- tbl being the table name
db<>fiddle here
Old sqlfiddle
You need to understand these concepts:
Dynamic SQL in plpgsql with EXECUTE
Polymorphic types
Row types and table types in Postgres
How to defend against SQL injection
Aggregate functions
System catalogs
Related answer with detailed explanation:
Table name as a PostgreSQL function parameter
Refactor a PL/pgSQL function to return the output of various SELECT queries
Postgres data type cast
How to set value of composite variable field using dynamic SQL
How to check if a table exists in a given schema
Select columns with particular column names in PostgreSQL
Generate series of dates - using date type as input
Special difficulty with type mismatch
I am taking advantage of Postgres defining a row type for every existing table. Using the concept of polymorphic types I am able to create one function that works for any table.
However, some aggregate functions return related but different data types as compared to the underlying column. For instance, min(varchar_column) returns text, which is bit-compatible, but not exactly the same data type. PL/pgSQL functions have a weak spot here and insist on data types exactly as declared in the RETURNS clause. No attempt to cast, not even implicit casts, not to speak of assignment casts.
That should be improved. Tested with Postgres 9.3. Did not retest with 9.4, but I am pretty sure, nothing has changed in this area.
That's where this construct comes in as workaround:
SELECT (t::tbl).* FROM (SELECT ... FROM tbl) t;
By casting the whole row to the row type of the underlying table explicitly we force assignment casts to get original data types for every column.
This might fail for some aggregate function. sum() returns numeric for a sum(bigint_column) to accommodate for a sum overflowing the base data type. Casting back to bigint might fail ...
#Erwin Brandstetter, Many thanks for the extensive answer. pg_stats does indeed provide a few things, but what I really need to draw a complete profile is a variety of things, min, max values, counts, count of nulls, mean etc... so a bunch of queries have to be ran for each columns, some with GROUP BY and such.
Also, thanks for highlighting the importance of data types, i was sort of expecting this to throw a spanner in the works at some point, my main concern was with how to automate the query generation, and its execution, this last bit being my main concern.
I have tried the function you provide (I probably will need to start learning some plpgsql) but get a error at the SELECT (t::tbl) :
ERROR: type "tbl" does not exist
btw, what is the (t::abc) notation referred as, in python this would be a list slice, but it’s probably not the case in PLPGSQL

Postgres Rules Preventing CTE Queries

Using Postgres 9.3:
I am attempting to automatically populate a table when an insert is performed on another table. This seems like a good use for rules, but after adding the rule to the first table, I am no longer able to perform inserts into the second table using the writable CTE. Here is an example:
CREATE TABLE foo (
id INT PRIMARY KEY
);
CREATE TABLE bar (
id INT PRIMARY KEY REFERENCES foo
);
CREATE RULE insertFoo AS ON INSERT TO foo DO INSERT INTO bar VALUES (NEW.id);
WITH a AS (SELECT * FROM (VALUES (1), (2)) b)
INSERT INTO foo SELECT * FROM a
When this is run, I get the error
"ERROR: WITH cannot be used in a query that is rewritten by rules
into multiple queries".
I have searched for that error string, but am only able to find links to the source code. I know that I can perform the above using row-level triggers instead, but it seems like I should be able to do this at the statement level. Why can I not use the writable CTE, when queries like this can (in this case) be easily re-written as:
INSERT INTO foo SELECT * FROM (VALUES (1), (2)) a
Does anyone know of another way that would accomplish what I am attempting to do other than 1) using rules, which prevents the use of "with" queries, or 2) using row-level triggers? Thanks,
        
TL;DR: use triggers, not rules.
Generally speaking, prefer triggers over rules, unless rules are absolutely necessary. (Which, in practice, they never are.)
Using rules introduces heaps of problems which will needlessly complicate your life down the road. You've run into one here. Another (major) one is, for instance, that the number of affected rows will correspond to that of the very last query -- if you're relying on FOUND somewhere and your query is incorrectly reporting that no rows were affected by a query, you'll be in for painful bugs.
Moreover, there's occasional talk of deprecating Postgres rules outright:
http://postgresql.nabble.com/Deprecating-RULES-td5727689.html
As the other answer I definitely recommend using INSTEAD OF triggers before RULEs.
However if for some reason you don't want to change existing VIEW RULEs and still want use WITH you can do so by wrapping the VIEW in a stored procedure:
create function insert_foo(int) returns void as $$
insert into foo values ($1)
$$ language sql;
WITH a AS (SELECT * FROM (VALUES (1), (2)) b)
SELECT insert_foo(a.column1) from a;
This could be useful when using some legacy db through some system that wraps statements with CTEs.

Exclusion constraint on a bitstring column with bitwise AND operator

I read about Exclusion Constraints in PostgreSQL but can't seem to find a way to use bitwise operators on bitstrings.
I have two columns (name text, value bit(8)). And I want to create a constraint that basically says this:
ADD CONSTRAINT route_method_overlap
EXCLUDE USING gist(name WITH =, value WITH &)
But this doesn't work since:
operator &(bit,bit) is not a member of operator family "gist_bit_ops"
I assume this is because the bit_ops operator & doesn't return boolean. But is there a way to do what I'm trying to do? Is there a way to coerce operator & to cast its return value as a boolean?
Currently using Postgres 9.1.4 with the "btree_gist" extension installed, all from the Ubuntu 12.04 repos. But the version doesn't matter. If there's fixes/updates upstream, I can install from the repos. I'm still in the design phase.
You installed the extension btree_gist. Without it, the example would already fail at name WITH =.
CREATE EXTENSION btree_gist;
The operator classes installed by btree_gist cover many operators. Unfortunately, the & operator is not among them. Obviously, because it does not return a boolean which would be expected of an operator to qualify.
Alternative solution
I would use a combination of a b-tree multi-column index (for speed) and a trigger instead. Consider this demo, tested on PostgreSQL 9.1:
CREATE TABLE t (
name text
, value bit(8)
);
INSERT INTO t VALUES ('a', B'10101010');
CREATE INDEX t_name_value_idx ON t (name, value);
CREATE OR REPLACE FUNCTION trg_t_name_value_inversion_prohibited()
RETURNS trigger
LANGUAGE plpgsql AS
$func$
BEGIN
IF EXISTS (
SELECT FROM t
WHERE (name, value) = (NEW.name, ~ NEW.value) -- example: exclude inversion
) THEN
RAISE EXCEPTION 'Your text here!';
END IF;
RETURN NEW;
END
$func$;
CREATE TRIGGER insup_bef_t_name_value_inversion_prohibited
BEFORE INSERT OR UPDATE OF name, value -- only involved columns relevant!
ON t
FOR EACH ROW
EXECUTE FUNCTION trg_t_name_value_inversion_prohibited();
INSERT INTO t VALUES ('a', ~ B'10101010'); -- fails with your error msg.
In Postgres 10 or older use instead:
...
EXECUTE PROCEDURE trg_t_name_value_inversion_prohibited();
See:
Trigger function does not exist, but I am pretty sure it does
~ is the inversion operator.
The extension btree_gist is not required in this scenario.
I restricted the trigger to INSERT OR UPDATE OF relevant columns for efficiency.
A check constraint wouldn't work. I quote the manual on CREATE TABLE:
Currently, CHECK expressions cannot contain subqueries nor refer to
variables other than columns of the current row.
Bold emphasis mine.
Should perform very well, actually better than the exclusion constraint, because maintenance of a b-tree index is cheaper than a GiST index. And the look-up with basic = operators should be faster than hypothetical look-ups with the & operator.
This solution is not as bullet-proof as an exclusion constraint, because triggers can more easily be circumvented - in a subsequent trigger on the same event for instance, or if the trigger is disabled temporarily. Be prepared to run extra checks on the whole table if such conditions apply.
More complex condition
The example trigger only catches the inversion of value. As you clarified in your comment, you actually need a condition like this instead:
IF EXISTS (
SELECT FROM t
WHERE name = NEW.name
AND value & NEW.value <> B'00000000'::bit(8)
) THEN
This condition is slightly more expensive, but can still use an index. The multi-column index from above would work - if you have use for it anyway. Or, more efficiently, a simple index on name:
CREATE INDEX t_name_idx ON t (name);
You commented that there can only be a maximum of 8 distinct rows per name, fewer in practice. So this should still be fast.
Ultimate INSERT performance
If INSERT performance is paramount, especially if many attempted INSERTs fail the condition, you could do more: create a materialized view that pre-aggregated value per name:
CREATE TABLE mv_t AS
SELECT name, bit_or(value) AS value
FROM t
GROUP BY 1
ORDER BY 1;
name is guaranteed to be unique here. I'd use a PRIMARY KEY on name to provide the index we're after:
ALTER TABLE mv_t SET (FILLFACTOR=90);
ALTER TABLE mv_t
ADD CONSTRAINT mv_t_pkey PRIMARY KEY(name);
Then your INSERT could look like this:
WITH i(n,v) AS (SELECT 'a'::text, B'10101010'::bit(8))
INSERT INTO t (name, value)
SELECT n, v
FROM i
LEFT JOIN mv_t m ON m.name = i.n
AND m.value & i.v <> B'00000000'::bit(8)
WHERE m.n IS NULL; -- alternative syntax for EXISTS (...)
The fillfactor is only useful if your table gets a lot of updates.
Update rows in the materialized view in a TRIGGER AFTER INSERT OR UPDATE OF name, value OR DELETE to keep it current. Cost and gain of additional objects have to be weighed carefully. Largely depends on your typical load.