Check redo / committed data size in postgres? - postgresql

I have following Quires:
How Do I check redo / un-committed data size in PostgreSQL ?
Looks like if I do multiple update in sequence, it slows down.
Like Update 1, update 2, .... update n; ...seem update n is slower than update 1. Does uncommitted data volume affects it ? How redo management works in PostgreSQL ?
How do I monitor current running SQL in stored function? pg_stat_activity just shows function call; at session level. How do I get current SQL under that function which is running now ?
~ Santosh

You're clearly coming from an Oracle background.
PostgreSQL does not have undo and redo logs, as such.
Uncommitted (in-progress or rolled back), live committed data and comimtted-then-deleted data are mixed together in the heap, i.e. the main table contents. The fraction used by rolled back transactions, old versions of updated rows and deleted rows is referred to as table bloat. See the wiki.
The closest thing to do the redo log is the write-ahead logs in pg_xlog. There's no SQL-level interface to getting the current xlog size.
The documentation discusses this in some more detail, but it's an area of PostgreSQL management that could really use more attention from interested contributors. Both better built-in monitoring tools and better documentation would be good. Patches are welcome.
As for your second question... you don't. There isn't currently a way to get a function call stack. One is being discussed, but hasn't been implemented as of 9.5.

Related

postgres 9.6 new progress reporting facility

Reading release notes of recent Postgres 9.6, I found this interesting new feature
Add a generic command progress reporting facility (Vinayak Pokale,
Rahila Syed, Amit Langote, Robert Haas)
Further reading gave me no information on this, but some play around article at depesz.
Of course the first what I thought - is there any history for what has been processed (and list of object to be processed - I dreamed) somewhere as well? Or this pg_stat_get_progress_info just shows current and have no idea of VACUUM plans and past?..
And another Question - Is there interface to consume that facility for own process (reports, data load and etc).
The view is called pg_stat_progress_vacuum; depesz must have used an older version of the patch for his article.
Currently, progress reporting is only available for VACUUM (and autovacuum) operations.
This feature offers no historical data, but there are other ways to get those:
If you set log_autovacuum_min_duration to 0, all autovacuum operations will be reported in the server log (normally, you don't have to run VACUUM manually).
The pg_stat_all_tables system view contains columns last_vacuum and last_autovacuum that indicate when the respective operation last ran on the table.

Use WAL files for PostgreSQL record version control?

I want to be able to track changes to records in a PostgreSQL database. I've considered using a version field and on-update rules or triggers such that previous versions of records are kept in the table (or in a separate table). This would have the advantage of making it possible to view the version history of a record with a simple select statement. However, this functionality is something I think likely to be seldom used.
How could I satisfy the requirement of being able to construct a "version history" for a record using the WAL files? Reading the WAL and Point-in-Time recovery documentation at PostgreSQL.org has helped me understand how the state of the entire database can be rolled back to an arbitrary point in time, but not how to deal with update mistakes in particular records.
No, you cannot do this at this time. There is a large effort underway on the postgresql-hackers mailing list (the dev list) to rework WAL and build an interface to allow for logical replication in (possibly) PostgreSQL 9.3.
This is basically what you appear to be trying to do and, based on the discussions on that list, it is definitely not a trivial task.

How to rollback an update in PostgreSQL

While editing some records in my PostgreSQL database using sql in the terminal (in ubuntu lucid), I made a wrong update.
Instead of -
update mytable set start_time='13:06:00' where id=123;
I typed -
update mytable set start_time='13:06:00';
So, all records are now having the same start_time value.
Is there a way to undo this change? There are some 500+ records in the table, and I do not know what the start_time value for each record was
Is it lost forever?
I'm assuming it was a transaction that's already committed? If so, that's what "commit" means, you can't go back.
Some data may be recoverable if you're lucky. Stop the database NOW.
Here's an answer I wrote on the same topic earlier. I hope it's helpful.
This might be too: Recoved deleted rows in postgresql .
Unless the data is absolutely critical, just restore from backups, it'll be lots easier and less painful. If you didn't have backups, consider yourself soundly thwacked.
If you catch the mistake and immediately bring down any applications using the database and take it offline, you can potentially use Point-in-Time Recovery (PITR) to replay your Write Ahead Log (WAL) files up to, but not including, the moment when the errant transaction was made. This would return the database to the state it was in prior, thus effectively 'undoing' that transaction.
As an approach for a production application database it has a number of obvious limitations, but there are circumstances in which PITR may be the best option available, especially when critical data loss has occurred. However, it is of no value if archiving was not already configured before the corruption event.
https://www.postgresql.org/docs/current/static/continuous-archiving.html
Similar capabilities exist with other relational database engines.

Database Content Versioning

I am interested in keeping a running history of every change which has happened on some tables in my database, thus being able to reconstruct historical states of the database for analysis purposes.
I am using Postgres, and this MVCC thing just seems like I should be able to exploit it for this purpose but I cannot find any documentation to support this. Can I do it? Is there a better way?
Any input is appreciated!
UPD
I have marked Denis' response as the answer, because he did in fact answer whether MVCC is what I want which was the question. However, the strategy I have settled on is detailed below in case anyone finds it useful:
The Postgres feature that does what I want: online backup/point in time recovery.
http://www.postgresql.org/docs/8.1/static/backup-online.html explains how to use this feature but essentially you can set this "write ahead log" to archive mode, take a snapshot of the database (say, before it goes live), then continually archive the WAL. You can then use log replay to recall the state of the database at any time, with the side benefit of having a warm standby if you choose (by continually replaying the new WALs on your standby server).
Perhaps this method is not as elegant as other ways of keeping a history, since you need to actually build the database for every point in time you wish to query, however it looks extremely easy to set up and loses zero information. That means when I have the time to improve my handling of historical data, I'll have everything and will therefore be able to transform my clunky system to a more elegant system.
One key fact that makes this so perfect is that my "valid time" is the same as my "transaction time" for the specific application- if this were not the case I would only be capturing "transaction time".
Before I found out about the WAL, I was considering just taking daily snapshots or something but the large size requirement and data loss involved did not sit well with me.
For a quick way to get up and running without compromising my data retention from the outset, this seems like the perfect solution.
Time Travel
PostgreSQL used to have just this feature, and called it "Time Travel". See the old documentation.
There's somewhat similar functionality in the spi contrib module that you might want to check out.
Composite type audit trigger
What I usually do instead is to use triggers to log changes along with timestamps to archival tables, and query against those. If the table structure isn't going to change you can use something like:
CREATE TABLE sometable_history(
command_tag text not null check (command_tag IN ('INSERT','DELETE','UPDATE','TRUNCATE')),
new_content sometable,
change_time timestamp with time zone
);
and your versioning trigger can just insert into sometable_history(TG_OP,NEW,current_timestamp) (with a different CASE for DELETE, where NEW is not defined).
hstore audit trigger
That gets painful if the schema changes to add new NOT NULL columns though. If you expect to do anything like that consider using a hstore to archive the columns, instead of a composite type. I've already added an implementation of that on the PostgreSQL wiki already.
PITR
If you want to avoid impact on your master database (growing tables, etc), you can alternately use continuous archiving and point-in-time recovery to log WAL files that can, using a recovery.conf, be replayed to any moment in time. Note that WAL files are big and they include not only the tuples you changed, but VACUUM activity and other details. You'll want to run them through clearxlogtail since they can have garbage data on the end if they're partial segments from an archive timeout, then you'll want to compress them heavily for long term storage.
I am using Postgres, and this MVCC thing just seems like I should be able to exploit it for this purpose but I cannot find any documentation to support this. Can I do it?
Not really. There are tools to see dead rows, because auto-vacuuming is so that will eventually be reclaimed.
Is there a better way?
If I get your question right, you're looking into logging slowly changing dimensions.
You might find this recent related thread interesting:
Temporal database design, with a twist (live vs draft rows)
I'm not aware of any tools/products that are built for that purpose.
While this may not be exactly what you're asking for, you can configure Postgresql to log ddl changes. Setting the log_line_prefix parameter (try including %d, %m, and %u) and setting the log_statement parameter to ddl should give you a reasonable history of who made what ddl changes and when.
Having said that, I don't believe logging ddl to be foolproof. For example, consider a situation where:
Multiple schemas have a table with the same name,
one of the tables is altered, and
the ddl doesn't fully qualify the table name (relying on the search path to get it right),
then it may not be possible to know from the log which table was actually altered.
Another option might be to log ddl as above but then have a watcher program perform a pg_dump of the database schema whenever a ddl entry get's logged. You could even compare the new dump with the previous dump and extract just the objects that were changed.

How to prevent Write Ahead Logging on just one table in PostgreSQL?

I am considering log-shipping of Write Ahead Logs (WAL) in PostgreSQL to create a warm-standby database. However I have one table in the database that receives a huge amount of INSERT/DELETEs each day, but which I don't care about protecting the data in it. To reduce the amount of WALs produced I was wondering, is there a way to prevent any activity on one table from being recorded in the WALs?
Ran across this old question, which now has a better answer. Postgres 9.1 introduced "Unlogged Tables", which are tables that don't log their DML changes to WAL. See the docs for more info, but at least now there is a solution for this problem.
See Waiting for 9.1 - UNLOGGED tables by depesz, and the 9.1 docs.
Unfortunately, I don't believe there is. The WAL logging operates on the page level, which is much lower than the table level and doesn't even know which page holds data from which table. In fact, the WAL files don't even know which pages belong to which database.
You might consider moving your high activity table to a completely different instance of PostgreSQL. This seems drastic, but I can't think of another way off the top of my head to avoid having that activity show up in your WAL files.
To offer one option to my own question. There are temp tables - "temporary tables are automatically dropped at the end of a session, or optionally at the end of the current transaction (see ON COMMIT below)" - which I think don't generate WALs. Even so, this might not be ideal as the table creation & design will be have to be in the code.
I'd consider memcached for use-cases like this. You can even spread the load over a bunch of cheap machines too.