Postgres: is it possible to lock some rows for changing? - postgresql

I have pretty old tables, which hold records for clients' payments and commissions for several years. In regular business cycle sometimes it's needed to recalc commissions and update table. But usually period of recalulation 1 or 2 months backwards, not more.
Recently, in result of bug in php script, our developer recalculated commissions since the very beggining 0_0. And the process of recalculation is really complicated so it cant be restored just grabbing yeasterday's backup - data changes in noumerous databases, so restoring data is really complicated and awfully expensive procedure. And complains from clients and change in accounting...you know..Horor.
We can't split tables by periods. (Well we can, but it will take year to remake all data selects).
What I'm trying to think about is to set up some update trigger that would check date of the changing record and allowed date that should be less then the updating date. So in case of mistake or bug, when someone would try to update such 'restricted' row it would get an exception and keep the data unchaged.
Is that a good approach? How's that can be done - I mean trigger?

For postgres you can use a check constraint to ensure the allowed_date is always less than the update_date:
ALTER TABLE mytable ADD CONSTRAINT datecheck CHECK (allowed_date < update_date);

Related

Postgres include REINDEX in UPDATE statement

I have a database with a table which is incremently patched and has many indexes. But sometimes the patching does not happen and the new patch becomes very large. Which makes in practice makes it smarter to delete the indexes and patch the table and reset the indexes. But this seems horrible and in practice with users using the table this is not an option. So I though that there was a posibility to RESET the index during the update statement or even better have postgres it self check if it is optimal. (I'm using postgres 10 this might be a problem that is solved by upgrading).
I hope you can help me.
No, there is no good solution, nor any on the horizon for future versions.
Either you keep the indexes and must maintain them during the "patch"; or you drop them in the same transaction as the "patch" and rebuild them later in which case the table is locked against all other uses; or you drop them in a separate transaction and rebuild them later in which case other sessions can see the table in an unindexed state.
There are in principle ways this could be improved (for example, ending the "patch" the same way create-index-concurrently ends, with a merge join between the index and the table. But since CIC must be in its own transaction, it is not clear how these could be shoehorned together), but I am not aware of any promising work going on currently.

Table partitioning in PostgreSQL 11 with automatic partition creation?

I need to maintain audit table and since the number of changes are going to be huge, I need an efficient way of dealing with the problem. The solution which I have thought is to record only the changed column in the audit table and partition it on the createdon column quarterly or half-yearly.
I wanted to know if there is anything like 'interval partition' of oracle? If not then how can I achieve it?
I want that every 6 months a new partition is created automatically as the row is inserted.
I am using postgres 11 as my db.
I do not think there is any magic configuration that make your life easier on this point :
https://www.postgresql.org/docs/11/ddl-partitioning.html
If you want the table auto-created, I think you have two major possibilities :
Verify each data at the in of the 'mother' table to see if it fits in an already present partition (trigger, if huge amount of inserts it could be a problem)
Check once in a while that you already have the partitions that are going to be needed in the future. For this one pg_partman is going to be your best ally.
As an example, few years ago, I had done a partition mechanism when there was only the declarative one and not any possibility to add pg_partman. With the trigger mechanism for 15 million rows per month it still works like a charm.
If you do not want to harm your performances EVER (and especially if you do not know how large your system is going to grow) I recommand to you the same response than in a_horse_with_no_name comment : use pg_partman.
If you cannot use it, like it was the case for me, adopt one of the two philosophies (trigger or advance table creation by crontask (for example)).

What's the equivalent of INSERT ... ON CONFLICT that first tries update?

I've been using INSERT ... ON CONFLICT DO UPDATE to insert/update data. The question I have is that I know that most of the time, I will want to do an update: for every day, update a counter. If there's no data for that date, then create the row. That creation will only happen once (obviously), but the update could happen millions of times per day. Is using INSERT ... ON CONFLICT DO UPDATE still the right approach? is there an equivalent of "try to update first, if that fails then insert the row"? (like an actual "UPSERT").
There is no variant of UPDATE that has the same behavior, for the simple reason that it would do exactly the same thing as INSERT ... ON CONFLICT. Don't worry about the name.
If you have millions of updates for each row per day, you should worry much more about VACUUM.
If you can, do not index the attributes that will be updated frequently and create the table with a fillfactor lower than 100. The you can get the much more efficient &ldauo;HOT updates”, which will significantly reduce the amount of disk writes and VACUUM required.
Make sure to tune autovacuum to be more aggressive by lowering autovacuum_vacuum_cost_delay.

Will Postgres automatically cache "active" records from a large table?

I have a table which contains a large number of records but only a portion of them are "active" at any one time and my web app just needs to reference those. I have an updating process that runs each night which adds new (active) records and may re-activate older records.
Will postgres be able to figure out that it should cache the active records? If not, should I move the active records to a new table to help it -- or is there some other way of giving it a "hint"?
UPDATE - The active records are indicated by having a NULL value in a datetime field called end_date
Thanks
The operating system will cache active and recently used disk blocks. PostgreSQL relies mainly on the operating system disk cache. However, its own shared_buffers will cache hot pages too.
It sounds like this is a very roundabout way of saying "I have these queries I think are slower than they should be". Consider a read of http://wiki.postgresql.org/wiki/Slow_Query_Questions and https://stackoverflow.com/tags/postgresql-performance/info
I am not sure what you mean by "active record", might be a postgres terminology I just miss. If you simply mean a row named "active", lay an index on it. That way postgres can directly access them without rowwise search, so in a way it´s like a cache (rebuildt every time you alter a row)

Moving the data from transaction table to history table to increase insert performance, postgres

I have 3 database tables, each contain 6 million rows and adding 3 million rows every year.
Following are the table information:
Table 1: 20 fields with 50 characters average in each filed. Has 2 indexes both are on timestamp fields.
Table 2: 5 fields, 2 byte array field and 1 xml field
Table 3: 4 fields, 1 byte array field
Following is the usage:
Insert 15 to 20 records per second in each table.
A view is created by joining first 2 tables and the select is mostly based on the date field in the first table.
Right Now, insert one record each in all three table together takes about 100 milliseconds.
I'm planning to migrate from postgres 8.4 to 9.2. I would like to do some optimization for insert performance also.Also, I'm planning to create history tables and keep the old record into those tables. I have the following questions in this regard
Will create history tables and move older data to those tables help in increasing the insert performance?
If it helps, how often I need to move the old records into the history tables, daily? or weekly/monthly/yearly?
If i keep only one month (220,000) data instead of one year data (3 million) will it help in improving insert performance?
Thanks in advance,
Sudheer
I'm sure someone better informed than I will show up and provide a better answer, but my impression is that:
Insert performance is mostly a function of your indexing strategy and your hardware
Performance, in general, is better under 9.0+ than 8.4, and this may rub off on insert performance, but I'm not certain of that.
None of your ideas are going to directly affect insert performance
Now, that said, the cost of maintaining a small index is lower than a large one, so it may be that creating history tables and moving old data there will improve performance simply by reducing index pressure. But I would expect dropping one of your indexes to have a direct and greater effect. Perhaps you could have a history table with both indexes and just maintain one of them on the "today" table?
If I were in your shoes, I'd get a copy of production going on my machine running 8.4 with a similar configuration. Then upgrade to 9.2 and see if the insert performance changes. Then try out these ideas and benchmark them, see which ones improve the situation. It's absolutely essential that things be kept as similar to production as possible for this to yield useful information, but it will certainly be better information than any hypothetical answer you might get.
Now, 100ms seems pretty slow for inserting one row IMO. Better hardware would certainly improve this situation. The usual suggestion would be a big striped RAID array with a battery-backed cache. PostgreSQL 9.0 High Performance has more information on all of this.