Delete row despite missing select right on a column

Delete row despite missing select right on a column - postgresql

In this example, the second column should not be visible for a member (role) of the group 'user_group', because this column is only required internally to regulate the row level security. however, records can only be deleted if this column is also visible. How can you get around that?
Options that come to mind would be:
just make the second column visible (i.e. selectable), which would
actually be completely superfluous and I want to hide internally as
much as possible
write a function that is called with elevated rights
(security definer), which I want even less.
Are there any other options?
(and especially with deletions I want to use nice things like 'ON DELETE SET NULL' for foreign keys in other tables, instead of having to unnecessarily program triggers for them)
create table test (
internal_id serial primary key,
user_id int not null default session_user_id(),
info text default null
);
grant
select(internal_id, info),
insert(info),
update(info),
delete
on test to user_group;
create policy test_policy on policy for all to public using (
user_id = session_user_id());

RLS just implicitly adds unavoidable WHERE clauses to all queries, it doesn't mess with the roles under which code is evaluated. From the docs:
"Since policy expressions are added to the user's query directly, they will be run with the rights of the user running the overall query. Therefore, users who are using a given policy must be able to access any tables or functions referenced in the expression or they will simply receive a permission denied error when attempting to query the table that has row-level security enabled."
This feature is orthogonal to the granted column permissions. So the public role must be able to view the user_id column, otherwise evaluating user_id = session_user_id() leads to an error. There's really no way around making the column visible.
completely superfluous and I want to hide internally as much as possible
The solution for that would be a VIEW that doesn't include the column. It will even be updatable!

Related

What is the most efficient/recommended way to lock multiple transactions that share a concern in postgres?

I am developing inventory management software. The database uses RLS to allow multiple clients using the same database. As such every table has a "Client ID" column. When inventory is scanned, it updates the numbers for "active", "reserved" or "incoming" states.
Active = ready to use,
Reserved = temporarily reserved as an order is potentially incoming (but may be cancelled) and
Incoming = ordered stock
These states are adjusted at various points - sales, good received, stock count updates etc. I want to know how best to avoid data inconsistencies. What I want to do is basically tell postgres "lock this table for client X with rows matching SKU XXX while I run this entire transaction" but I can't figure out how to do it.

It sounds like you need a table of customer/SKU combinations, and then everyone who does the things that need to be serialized must lock the relevant row in that table first. Or if you don't want to permanently enumerate all possible client/SKU combinations, you could do it on the fly:
create table customer_sku_lock (customer_id int, sku text, primary key (customer_id, sku));
Then
BEGIN;
insert into customer_sku_lock values (4, 'DEADBEEF'); --get lock
delete from customer_sku_lock where customer_id=4 and sku='DEADBEEF'; --set up to release lock upon commit
-- do stuff here
COMMIT;
It might seem counterintuitive to do the DELETE right after the INSERT, but it does work to serialize things and that way the DELETE is less likely to get forgotten about. For extra robustness, you could even package the INSERT and DELETE up into one function call, and make the function definer's rights, and make the lock table only accessible to the functions owner.

Storing duplicate data as a column in Postgres?

In some database project, I have a users table which somehow has a computed value avg_service_rating. And there is another table called services with all the services associated to the user and the ratings for that service. Is there a computationally-lite way which I can maintain the avg_service_rating rating without updating it every time an INSERT is done on the services table? Perhaps like a generate column but with a function call instead? Any direct advice or link to resources will be greatly appreciated as well!
CREATE TABLE users (
username VARCHAR PRIMARY KEY,
avg_service_ratings NUMERIC -- is it possible to store some function call for this column?,
...
);
CREATE TABLE service (
username VARCHAR NOT NULL REFERENCE users (username);
service_date DATE NOT NULL,
rating INTEGER,
PRIMARY KEY (username, service_date),
);

If the values should be consistent, a generated column won't fit the bill, since it is only recomputed if the row itself is modified.
I see two solutions:
have a trigger on the services table that updates the users table whenever a rating is added or modified. That slows down data modifications, but not your queries.
Turn users into a view. The original users table would be renamed, and it loses the avg_service_rating column, which is computed on the fly by the view.
To make the illusion perfect, create an INSTEAD OF INSERT OR UPDATE OR DELETE trigger on the view that modifies the underlying table. Then your application does not need to be changed.
With this solution you pay a certain price both on SELECT and on data modifications, but the latter price will be lower, since you don't have to modify two tables (and users might receive fewer modifications than services). An added advantage is that you avoid data duplication.

A generated column would only be useful if the source data is in the same table row.
Otherwise your options are a view (where you could call a function or calculate the value via a subquery), or an AFTER UPDATE OR INSERT trigger on the service table, which updates users.avg_service_ratings. With a trigger, if you get a lot of updates on the service table you'd need to consider possible concurrency issues, but it would mean the figure doesn't need to be calculated every time a row in the users table is accessed.

Postgres Unique Sequences in one table based on owner/foreign key

I am creating a web application that will store all user information in one database using permissions, roles, and FKs to restrict data access. One of the tables in this application tracks work orders created by each user (i.e. the work order table has an FK to the user table).
I am wanting to ensure that each user has their own uninterrupted sequence of 'work order IDs' that are assigned when the work order is scheduled. That is, if user 1 creates his first work order, it will assign it #1, however, if user 2 creates his fifth work order, it will assign it #5.
The work order table has a UUID primary key, so each record is distinguishable, and the user FK has a not-null constraint.
Based on my research so far, it seems like Postgres Sequences would likely be my best answer. I would need to create a sequence for each user, and incorporate it into a trigger to stamp the work order record with the next appropriate ID. However, this seems like it would be very performance intensive, and creating a new sequence for every user would have its own set of challenges.
A second approach could be to create a second table that tracks each user's latest sequence, query it, increment it, and update both the work order table and the number tracking table. However, in this scenario, I think it would be susceptible to race conditions if two users were to convert records at exactly the same time.
I'm unsure what the best way to solve the problem would be. Is there another way that would provide better performance?

Sequences won't work for you, because they are not transactional by design: if an insert with a generated number fails, that number is consumed even after a ROLLBACK.
You should create a second table
CREATE TABLE counters (
user_id bigint PRIMARY KEY REFERENCES users ON DELETE CASCADE,
work_order_id bigint NOT NULL DEFAULT 0
);
Then you get the next number with
UPDATE counters
SET work_order_id = work_order_id + 1
RETURNING work_order_id;
That is atomic and safe from race conditions. Just make sure you run that update and the insert in the same database transaction, then they will either both succeed or both fail and be undone.
This will serialize inserts into the work orders table per user, but gap-less sequences are always a performance problem.

What is the scope of Postgres policies?

I am trying to wrap my head around row level security in Postgres. Unfortunately the documentation is not very verbose on the matter. My problem is the following:
I have two tables: locations and locations_owners. There is a TRIGGER set on INSERT for locations, which will automatically add a new row to the locations_owners table including the request.jwt.claim.sub variable.
This works all just fine, however when I want to create a policy for DELETE like this:
CREATE POLICY location_delete ON eventzimmer.locations FOR DELETE TO organizer USING(
(SELECT EXISTS (SELECT name FROM protected.locations_owners AS owners WHERE owners.name = name AND owners.sub = (SELECT current_setting('request.jwt.claim.sub', true))))
);
It will always evaluate to true, no matter the actual content. I know that I can call a custom procedure with SELECT here, however I ended up with the following questions:
what is the scope of a policy? Can I access tables? Can I access procedures? The documentation says "Any SQL conditional expression" so SELECT EXISTS should be fine
how are the column names of the rows mapped to the policy. The examples just magically use the column names (which I adopted by using the name variable), however I have not found any documentation about what this actually does
what is the magic user_name variable. Where does it come from? I believe it is the current role which is executing the query, but how can I know?
why is there no WITH CHECK expression available for DELETE? If I understand correctly, WITH CHECK will fail any row with invalid constraint, which is the behaviour I would prefer (because otherwise PostgREST will always return 204)
I am a little bit confused by the astonishingly missing amount of information in the (otherwise) very good documentation of PostgreSQL. Where is this information? How can I find it?
For the sake of completeness I have also attached the column definitions below:
CREATE TABLE eventzimmer.locations (
name varchar PRIMARY KEY NOT NULL,
latitude float NOT NULL,
longitude float NOT NULL
);
CREATE TABLE IF NOT EXISTS protected.locations_owners (
name varchar NOT NULL REFERENCES eventzimmer.locations(name) ON DELETE CASCADE,
sub varchar NOT NULL
);

Many of the questions will become clear once you understand how row level security is implemented: the conditions in the policies will automatically be added to the query, just as if you added another WHERE condition.
Use EXPLAIN to see the query plan, and you will see the policy's conditions in there.
So you can use any columns from the table on which the policy is defined.
Essentially, you can use anything in a policy definition that you could use in a WHERE conditions: Function calls, subqueries and so on.
You can also qualify the column name with the table name if that is required for disambiguation. This can be used in the policy from your example: The unqualified name is interpreted as owners.name, so the test always succeeds. To fix the policy, use locations.name instead of name.
There is no magic user_name variable, and I don't know where you get that from. There is, however, the current_user function, which is always available and can of course also be used in a policy definition.
WITH CHECK is a condition that the new row added by INSERT or UPDATE must fulfill. Since DELETE doesn't add any data, WITH CHECK doesn't apply to it.

Way to migrate a create table with sequence from postgres to DB2

I need to migrate a DDL from Postgres to DB2, but I need that it works the same as in Postgres. There is a table that generates values from a sequence, but the values can also be explicitly given.
Postgres
create sequence hist_id_seq;
create table benchmarksql.history (
hist_id integer not null default nextval('hist_id_seq') primary key,
h_c_id integer,
h_c_d_id integer,
h_c_w_id integer,
h_d_id integer,
h_w_id integer,
h_date timestamp,
h_amount decimal(6,2),
h_data varchar(24)
);
(Look at the sequence call in the hist_id column to define the value of the primary key)
The business logic inserts into the table by explicitly providing an ID, and in other cases, it leaves the database to choose the number.
If I change this in DB2 to a GENERATED ALWAYS it will throw errors because there are some provided values. On the other side, if I create the table with GENERATED BY DEFAULT, DB2 will throw an error when trying to insert with the same value (SQL0803N), because the "internal sequence" does not take into account the already inserted values, and it does not retry with a next value.
And, I do not want to restart the sequence each time a provided ID was inserted.
This is the problem in BenchmarkSQL when trying to port it to DB2: https://sourceforge.net/projects/benchmarksql/ (File sqlTableCreates)
How can I implement the same database logic in DB2 as it does in Postgres (and apparently in Oracle)?

You're operating under a misconception: that sources external to the db get to dictate its internal keys. Ideally/conceptually, autogenerated ids will never need to be seen outside of the db, as conceptually there should be unique natural keys for export or reporting. Still, there are times when applications will need to manage some ids, often when setting up related entities (eg, JPA seems to want to work this way).
However, if you add an id value that you generated from a different source, the db won't be able to manage it. How could it? It's not efficient - for one thing, attempting to do so would do one of the following
Be unsafe in the face of multiple clients (attempt to add duplicate keys)
Serialize access to the table (for a potentially slow query, too)
(This usually shows up when people attempt something like: SELECT MAX(id) + 1, which would require locking the entire table for thread safety, likely including statements that don't even touch that column. If you try to find any "first-unused" id - trying to fill gaps - this gets more complicated and problematic)
Neither is ideal, so it's best to not have the problem in the first place. This is usually done by having id columns be autogenerated, but (as pointed out earlier) there are situations where we may need to know what the id will be before we insert the row into the table. Fortunately, there's a standard SQL object for this, SEQUENCE. This provides a db-managed, thread-safe, fast way to get ids. It appears that in PostgreSQL you can use sequences in the DEFAULT clause for a column, but DB2 doesn't allow it. If you don't want to specify an id every time (it should be autogenerated some of the time), you'll need another way; this is the perfect time to use a BEFORE INSERT trigger;
CREATE TRIGGER Add_Generated_Id NO CASCADE BEFORE INSERT ON benchmarksql.history
NEW AS Incoming_Entity
FOR EACH ROW
WHEN Incoming_Entity.id IS NULL
SET id = NEXTVAL FOR hist_id_seq
(something like this - not tested. You didn't specify where in the project this would belong)
So, if you then add a row with something like:
INSERT INTO benchmarksql.history (hist_id, h_data) VALUES(null, 'a')
or
INSERT INTO benchmarksql.history (h_data) VALUES('a')
an id will be generated and attached automatically. Note that ALL ids added to the table must come from the given sequence (as #mustaccio pointed out, this appears to be true even in PostgreSQL), or any UNIQUE CONSTRAINT on the column will start throwing duplicate-key errors. So any time your application needs an id before inserting a row in the table, you'll need some form of
SELECT NEXT VALUE FOR hist_id_seq
FROM sysibm.sysdummy1
... and that's it, pretty much. This is completely thread and concurrency safe, will not maintain/require long-term locks, nor require serialized access to the table.