Is there a way to give write permissions only in a transaction in Postgres? - postgresql

I work with a software that is used by a lot of different clients in several countries, with different needs, rules and constraints on their data.
When I make a change to the database's structure, I have a tool to test it on every client's database, obviously with read-only rights. This means that the best way to test a query like UPDATE table SET x = y WHERE condition
is to call the "read-only part" SELECT x FROM table WHERE condition.
It works but it's not ideal, as sometimes it is writing data that causes problems (mostly deadlocks or timeouts), meaning I can't see the problem until a client suffers from it.
I'm wondering if there is a way to grant write permissions in Postgres, but only when inside a transaction, and force a rollback on every transaction. This way, changes could be tested accurately on real data and still prevent any dev from editing it.
Any ideas?
Edit: the volumes are too large to consider cloning data for every dev who needs to run a query

This sounds similar to creating an audit table to record information about transactions. I would consider using a trigger to write a copy of the data to a "rollback" table/row and then copy the "rollback" table/row back on completion of the update.

Related

Is there a way to show everything that was changed in a PostgreSQL database during a transaction?

I often have to execute complex sql scripts in a single transaction on a large PostgreSQL database and I would like to verify everything that was changed during the transaction.
Verifying each single entry on each table "by hand" would take ages.
Dumping the database before and after the script to plain sql and using diff on the dumps isn't really an option since each dump would be about 50G of data.
Is there a way to show all the data that was added, deleted or modified during a single transaction?
Dude, What are you looking for is the most searchable thing on the internet when it comes to capturing Database changes. It is a kind of version control we can say.
But as long as I know, sadly there are no in-built approaches are available in PostgreSQL or MySql. But you can overcome it by setting/adding some triggers for your most usable operations.
You can create some backup schemas, and tables to capture your changes that are changed(updated), created, or deleted.
In this way you can achieve what you want. I know this process is fully manual, But really effective.
If you need to analyze the script's behaviour only sporadically, then the easiest approach would be to change server configuration parameter log_min_duration_statement to 0 and then back to any value it had before the analysis. Then all of the script activity will be written to the instance log.
This approach is not suitable if your storage is not prepared to accommodate this amount of data, or for systems in which you don't want sensitive client data to be written to a plain-text log file.

Optimize the trigger to add audit log

I have a local database which is the production database, on which all operations are being done real time. I am storing the log on each action in an audit log table in another database via trigger. It basically checks if any change is made in any of the row's column it will remove that row and add it AGAIN (which is not a good way I think as it should simply update it but due to some reasons I need to delete and add it again).
There are some tables on which operations are being done rapidly like 100s of rows are being added in database. This is slowing the process of saving the data into audit log table. Now if trigger has to like delete 100 rows and add 100 again it will affect the performance obviously and if number of rows increases it will reduce the performance more.
What should be the best practice to tackle this, I have been looking into Read Replica and Foreign Data Wrapper but as for Read Replica it's only Readable and not writable for PostgreSQL and I don't really get to know how Foreign Data Wrapper gonna help me as this was suggested by one of my colleague.
Hope someone can guide me in right direction.
A log is append-only by definition. Loggers should never be modifying or removing existing entries.
Audit logs are no different. Audit triggers should INSERT an entry for each change (however you want to define "change"). They should never UPDATE or DELETE anything*.
The change and the corresponding log entry should be written to the same database within the same transaction, to ensure atomicity/consistency; logging directly to a remote database will always leave you with a window where the log is committed but the change is not (or vice versa).
If you need to aggregate these log entries and push them to a different database, you should do it from an external process, not within the trigger itself. If you need this to happen in real time, you can inform the process of new changes via a notification channel.
* In fact, you should revoke UPDATE/DELETE privileges on the audit table from the user inserting the logs. Furthermore, the trigger should ideally be a SECURITY DEFINER function owned by a privileged user with INSERT rights on the log table. The user connecting to the database should not be given permission to write to the log table directly.
This ensures that if your client application is compromised (whether due to a malfunction, or a malicious user e.g. exploiting an SQL injection vulnerability), then your audit log retains a complete and accurate record of everything it changed.

What's the difference between issuing a query with or without a "begin" and "commit" command in PostgreSQL?

As title say, it is possible to issue a query on psql with a "begin", query, and "commit".
What I want to know is what happens if I don't use a "begin" command?
Some database engine will allow you to execute modifications (INSERT, UPDATE, DELETE) without an open transaction. It's basically assumed that you have an instant BEGIN / COMMIT around each of your instructions, which is a bad practice in case something goes wrong in a batch of many instructions.
You can still make a SELECT, but no INSERT, UPDATE, DELETE without a BEGIN to enforces the good practice. That way, if something goes wrong, a ROLLBACK is instantly executed, canceling all your modifications as if they never existed.
Using a transaction around a batch of various SELECT will guarantee that the data you get for each SELECT matches the same version of the database at the instant you open the transaction depending on your ISOLATION level.
Please read this for more information :
http://www.postgresql.org/docs/9.5/static/sql-start-transaction.html
and
http://www.postgresql.org/docs/9.5/static/tutorial-transactions.html
If you don't use BEGIN/COMMIT, it's the same as wrapping each individual query in a BEGIN/COMMIT block. You can use BEGIN/COMMIT to group multiple queries into a single transaction. A few reasons you might want to do so include
Updating multiple tables at the same time. For instance, usually when you delete a record you also want to delete other rows that reference it. If you do this in the same transaction, nothing will ever be able to reference a row that's already been deleted.
You want to be able to revert some changes if something goes wrong later. Suppose you're writing some user inputted data to multiple tables. At some point you realize that some of it isn't formatted properly. You probably wouldn't want to insert any of it, so you should wrap the entire operation in a transaction.
If you want to ensure the data you're updating hasn't been updated while you're writing to it. Suppose I'm adding $10 to a bank account from two separate connections. I want to add $20 in total - I don't want one of the UPDATEs to clobber the other.
Postgres gives you the first two of these by default. The last one would require a higher transaction isolation level, and makes your query run the risk of raising a serialization error. Transaction isolation levels are a fairly complicated topic, so if you want more info on them the best place to go is the documentation.

Database Content Versioning

I am interested in keeping a running history of every change which has happened on some tables in my database, thus being able to reconstruct historical states of the database for analysis purposes.
I am using Postgres, and this MVCC thing just seems like I should be able to exploit it for this purpose but I cannot find any documentation to support this. Can I do it? Is there a better way?
Any input is appreciated!
UPD
I have marked Denis' response as the answer, because he did in fact answer whether MVCC is what I want which was the question. However, the strategy I have settled on is detailed below in case anyone finds it useful:
The Postgres feature that does what I want: online backup/point in time recovery.
http://www.postgresql.org/docs/8.1/static/backup-online.html explains how to use this feature but essentially you can set this "write ahead log" to archive mode, take a snapshot of the database (say, before it goes live), then continually archive the WAL. You can then use log replay to recall the state of the database at any time, with the side benefit of having a warm standby if you choose (by continually replaying the new WALs on your standby server).
Perhaps this method is not as elegant as other ways of keeping a history, since you need to actually build the database for every point in time you wish to query, however it looks extremely easy to set up and loses zero information. That means when I have the time to improve my handling of historical data, I'll have everything and will therefore be able to transform my clunky system to a more elegant system.
One key fact that makes this so perfect is that my "valid time" is the same as my "transaction time" for the specific application- if this were not the case I would only be capturing "transaction time".
Before I found out about the WAL, I was considering just taking daily snapshots or something but the large size requirement and data loss involved did not sit well with me.
For a quick way to get up and running without compromising my data retention from the outset, this seems like the perfect solution.
Time Travel
PostgreSQL used to have just this feature, and called it "Time Travel". See the old documentation.
There's somewhat similar functionality in the spi contrib module that you might want to check out.
Composite type audit trigger
What I usually do instead is to use triggers to log changes along with timestamps to archival tables, and query against those. If the table structure isn't going to change you can use something like:
CREATE TABLE sometable_history(
command_tag text not null check (command_tag IN ('INSERT','DELETE','UPDATE','TRUNCATE')),
new_content sometable,
change_time timestamp with time zone
);
and your versioning trigger can just insert into sometable_history(TG_OP,NEW,current_timestamp) (with a different CASE for DELETE, where NEW is not defined).
hstore audit trigger
That gets painful if the schema changes to add new NOT NULL columns though. If you expect to do anything like that consider using a hstore to archive the columns, instead of a composite type. I've already added an implementation of that on the PostgreSQL wiki already.
PITR
If you want to avoid impact on your master database (growing tables, etc), you can alternately use continuous archiving and point-in-time recovery to log WAL files that can, using a recovery.conf, be replayed to any moment in time. Note that WAL files are big and they include not only the tuples you changed, but VACUUM activity and other details. You'll want to run them through clearxlogtail since they can have garbage data on the end if they're partial segments from an archive timeout, then you'll want to compress them heavily for long term storage.
I am using Postgres, and this MVCC thing just seems like I should be able to exploit it for this purpose but I cannot find any documentation to support this. Can I do it?
Not really. There are tools to see dead rows, because auto-vacuuming is so that will eventually be reclaimed.
Is there a better way?
If I get your question right, you're looking into logging slowly changing dimensions.
You might find this recent related thread interesting:
Temporal database design, with a twist (live vs draft rows)
I'm not aware of any tools/products that are built for that purpose.
While this may not be exactly what you're asking for, you can configure Postgresql to log ddl changes. Setting the log_line_prefix parameter (try including %d, %m, and %u) and setting the log_statement parameter to ddl should give you a reasonable history of who made what ddl changes and when.
Having said that, I don't believe logging ddl to be foolproof. For example, consider a situation where:
Multiple schemas have a table with the same name,
one of the tables is altered, and
the ddl doesn't fully qualify the table name (relying on the search path to get it right),
then it may not be possible to know from the log which table was actually altered.
Another option might be to log ddl as above but then have a watcher program perform a pg_dump of the database schema whenever a ddl entry get's logged. You could even compare the new dump with the previous dump and extract just the objects that were changed.

Database last updated?

I'm working with SQL 2000 and I need to determine which of these databases are actually being used.
Is there a SQL script I can used to tell me the last time a database was updated? Read? Etc?
I Googled it, but came up empty.
Edit: the following targets issue of finding, post-facto, the last access date. With regards to figuring out who is using which databases, this can definitively monitored with the right filters in the SQL profiler. Beware however that profiler traces can get quite big (and hence slow/hard to analyze) when the filters are not adequate.
Changes to the database schema, i.e. addition of table, columns, triggers and other such objects typically leaves "dated" tracks in the system tables/views (can provide more detail about that if need be).
However, and unless the data itself includes timestamps of sorts, there are typically very few sure-fire ways of knowing when data was changed, unless the recovery model involves keeping all such changes to the Log. In that case you need some tools to "decompile" the log data...
With regards to detecting "read" activity... A tough one. There may be some computer-forensic like tricks, but again, no easy solution I'm afraid (beyond the ability to see in server activity the very last query for all still active connections; obviously a very transient thing ;-) )
I typically run the profiler if I suspect the database is actually used. If there is no activity, then simply set it to read-only or offline.
You can use a transaction log reader to check when data in a database was last modified.
With SQL 2000, I do not know of a way to know when the data was read.
What you can do is to put a trigger on the login to the database and track when the login is successful and track associated variables to find out who / what application is using the DB.
If your database is fully logged, create a new transaction log backup, and check it's size. The log backup will have a fixed small lengh, when there were no changes made to the database since the previous transaction log backup has been made, and it will be larger in case there were changes.
This is not a very exact method, but it can be easily checked, and might work for you.