What is the vcslog table in JediVCS used for? - version-control

Just wondering what the vcslog table is used for in JediVCS.
I received some consistancy errors on this table (during my backup procedure these errors were flagged by the database backend) and there is a chance that after repair some data went missing.
If the table is just acting as a log then this should be ok.
Some other info:
Was going to ask this quesion on the JediVCS news group but it appears to
be down
I do have recent backups that I could restore but would rather not as it
means finding and re-committing any intervening work.
I diffed all other tables and their data between the pre-fix and post-fix
versions of the VCS and they all match.
I tried to diff the vcslog table but the tool I have crashed as the table
has millions of records. (I think the tool ran out of memory doing the
diff)
Any info appreciated.
Peter Mayes

I ended up mailing the active JediVCS admin staff directly.
Thanks go to them for their prompt and helpul advice.
Basically the vcslog table holds information relating to actions taken by a user. As such its data is entirely optional, but recommended.
The least useful records are marked as type=g in the database. (This logs a 'get' operation by a user).
After deleting these records directly in the database the vcslog table reduced in size by 97%. (I suspect the large number of 'get' logs is due to our nighly autobuilds).
The database has been stable since the clear. (Some six weeks ago)
Here are some other topics in the JediVCS help manual I was pointed at.
Just in case they prove helpful to someone. These detail the logging behaviour inside the JediVCS client:
"Project history"
"Application server options"
"Write VCS log"

Related

How can I discover which user and when an index was created?

I have a postgres table with duplicated indexes (called someName and someName1) applied to the same columns. I would like to know which user executed the ddl that created these indexes, and when it happened. Is this possible on postgres?
If you hadn't already set up some kind of auditing or aggressive logging before this happened, then your options are pretty limited.
If you retain WAL files, you could go exploring through those (with pg_waldump and other tools, or by doing PITR) to pinpoint the time. This will probably not be a quick and painless exercise. By looking at surrounding changes, or at log files from the same time, you might be able to figure out who was logged on at the time and also had permissions to create the index.

How to Recover PostgreSQL 8.0 Database

On my PostgreSQL 8.0 database, I started receiving a "ERROR: could not open relation 1663/17269/16691: No such file or directory" message, and now my data is inaccessible.
Any ideas on how to recover at least some of the data? Professional support is an option.
Regards.
RP
If you want your data back in a hurry and it's worth something to you, then the professional support option should be simple enough.
Some things to check, now that you've got a full backup of all your database (that's base, pg_clog, pg_xlog and all the other folders at that level).
Does that file actually exist? It might be a permissions problem rather than the file actualy going missing.
Check your anti-virus/security packages - have they mistakenly quarantined the file? If you can exclude PostgreSQL's database directories from scans/active scans that's worthwhile too.
Make a note of everything you can remember about when this happened and what happened just before. This will help with troubleshooting for you or a consultant.
Check the logs likewise - this error will be logged, find the first occurrence and see if there's anything odd before.
Double-check you really do have all your existing files backed up, and restart PostgreSQL.
Try connecting as user postgres to database postgres or database template1. If that works then the file is one of your database files rather than the global list of users or some such.
Try creating an empty file with the right name (and permissions - check the other files). If you are really lucky it's just an index. Otherwise it could be a data table you can live without. Then you can dump other tables individually.
OK - if you're here then you can connect to your DB. Those numbers in the file-path are PostgreSQL's OIDs identifying system objects. You can try a couple of useful queries here. These two queries should give you the IDs of the databases and then the object with the missing file. This is useful information for your professional too.
SELECT oid, datname, dattablespace FROM pg_database;
SELECT * FROM pg_class WHERE relfilenode = 16691;
Remember make sure you have the filesystem backup before tinkering.

Database Content Versioning

I am interested in keeping a running history of every change which has happened on some tables in my database, thus being able to reconstruct historical states of the database for analysis purposes.
I am using Postgres, and this MVCC thing just seems like I should be able to exploit it for this purpose but I cannot find any documentation to support this. Can I do it? Is there a better way?
Any input is appreciated!
UPD
I have marked Denis' response as the answer, because he did in fact answer whether MVCC is what I want which was the question. However, the strategy I have settled on is detailed below in case anyone finds it useful:
The Postgres feature that does what I want: online backup/point in time recovery.
http://www.postgresql.org/docs/8.1/static/backup-online.html explains how to use this feature but essentially you can set this "write ahead log" to archive mode, take a snapshot of the database (say, before it goes live), then continually archive the WAL. You can then use log replay to recall the state of the database at any time, with the side benefit of having a warm standby if you choose (by continually replaying the new WALs on your standby server).
Perhaps this method is not as elegant as other ways of keeping a history, since you need to actually build the database for every point in time you wish to query, however it looks extremely easy to set up and loses zero information. That means when I have the time to improve my handling of historical data, I'll have everything and will therefore be able to transform my clunky system to a more elegant system.
One key fact that makes this so perfect is that my "valid time" is the same as my "transaction time" for the specific application- if this were not the case I would only be capturing "transaction time".
Before I found out about the WAL, I was considering just taking daily snapshots or something but the large size requirement and data loss involved did not sit well with me.
For a quick way to get up and running without compromising my data retention from the outset, this seems like the perfect solution.
Time Travel
PostgreSQL used to have just this feature, and called it "Time Travel". See the old documentation.
There's somewhat similar functionality in the spi contrib module that you might want to check out.
Composite type audit trigger
What I usually do instead is to use triggers to log changes along with timestamps to archival tables, and query against those. If the table structure isn't going to change you can use something like:
CREATE TABLE sometable_history(
command_tag text not null check (command_tag IN ('INSERT','DELETE','UPDATE','TRUNCATE')),
new_content sometable,
change_time timestamp with time zone
);
and your versioning trigger can just insert into sometable_history(TG_OP,NEW,current_timestamp) (with a different CASE for DELETE, where NEW is not defined).
hstore audit trigger
That gets painful if the schema changes to add new NOT NULL columns though. If you expect to do anything like that consider using a hstore to archive the columns, instead of a composite type. I've already added an implementation of that on the PostgreSQL wiki already.
PITR
If you want to avoid impact on your master database (growing tables, etc), you can alternately use continuous archiving and point-in-time recovery to log WAL files that can, using a recovery.conf, be replayed to any moment in time. Note that WAL files are big and they include not only the tuples you changed, but VACUUM activity and other details. You'll want to run them through clearxlogtail since they can have garbage data on the end if they're partial segments from an archive timeout, then you'll want to compress them heavily for long term storage.
I am using Postgres, and this MVCC thing just seems like I should be able to exploit it for this purpose but I cannot find any documentation to support this. Can I do it?
Not really. There are tools to see dead rows, because auto-vacuuming is so that will eventually be reclaimed.
Is there a better way?
If I get your question right, you're looking into logging slowly changing dimensions.
You might find this recent related thread interesting:
Temporal database design, with a twist (live vs draft rows)
I'm not aware of any tools/products that are built for that purpose.
While this may not be exactly what you're asking for, you can configure Postgresql to log ddl changes. Setting the log_line_prefix parameter (try including %d, %m, and %u) and setting the log_statement parameter to ddl should give you a reasonable history of who made what ddl changes and when.
Having said that, I don't believe logging ddl to be foolproof. For example, consider a situation where:
Multiple schemas have a table with the same name,
one of the tables is altered, and
the ddl doesn't fully qualify the table name (relying on the search path to get it right),
then it may not be possible to know from the log which table was actually altered.
Another option might be to log ddl as above but then have a watcher program perform a pg_dump of the database schema whenever a ddl entry get's logged. You could even compare the new dump with the previous dump and extract just the objects that were changed.

Database last updated?

I'm working with SQL 2000 and I need to determine which of these databases are actually being used.
Is there a SQL script I can used to tell me the last time a database was updated? Read? Etc?
I Googled it, but came up empty.
Edit: the following targets issue of finding, post-facto, the last access date. With regards to figuring out who is using which databases, this can definitively monitored with the right filters in the SQL profiler. Beware however that profiler traces can get quite big (and hence slow/hard to analyze) when the filters are not adequate.
Changes to the database schema, i.e. addition of table, columns, triggers and other such objects typically leaves "dated" tracks in the system tables/views (can provide more detail about that if need be).
However, and unless the data itself includes timestamps of sorts, there are typically very few sure-fire ways of knowing when data was changed, unless the recovery model involves keeping all such changes to the Log. In that case you need some tools to "decompile" the log data...
With regards to detecting "read" activity... A tough one. There may be some computer-forensic like tricks, but again, no easy solution I'm afraid (beyond the ability to see in server activity the very last query for all still active connections; obviously a very transient thing ;-) )
I typically run the profiler if I suspect the database is actually used. If there is no activity, then simply set it to read-only or offline.
You can use a transaction log reader to check when data in a database was last modified.
With SQL 2000, I do not know of a way to know when the data was read.
What you can do is to put a trigger on the login to the database and track when the login is successful and track associated variables to find out who / what application is using the DB.
If your database is fully logged, create a new transaction log backup, and check it's size. The log backup will have a fixed small lengh, when there were no changes made to the database since the previous transaction log backup has been made, and it will be larger in case there were changes.
This is not a very exact method, but it can be easily checked, and might work for you.

How to prevent Write Ahead Logging on just one table in PostgreSQL?

I am considering log-shipping of Write Ahead Logs (WAL) in PostgreSQL to create a warm-standby database. However I have one table in the database that receives a huge amount of INSERT/DELETEs each day, but which I don't care about protecting the data in it. To reduce the amount of WALs produced I was wondering, is there a way to prevent any activity on one table from being recorded in the WALs?
Ran across this old question, which now has a better answer. Postgres 9.1 introduced "Unlogged Tables", which are tables that don't log their DML changes to WAL. See the docs for more info, but at least now there is a solution for this problem.
See Waiting for 9.1 - UNLOGGED tables by depesz, and the 9.1 docs.
Unfortunately, I don't believe there is. The WAL logging operates on the page level, which is much lower than the table level and doesn't even know which page holds data from which table. In fact, the WAL files don't even know which pages belong to which database.
You might consider moving your high activity table to a completely different instance of PostgreSQL. This seems drastic, but I can't think of another way off the top of my head to avoid having that activity show up in your WAL files.
To offer one option to my own question. There are temp tables - "temporary tables are automatically dropped at the end of a session, or optionally at the end of the current transaction (see ON COMMIT below)" - which I think don't generate WALs. Even so, this might not be ideal as the table creation & design will be have to be in the code.
I'd consider memcached for use-cases like this. You can even spread the load over a bunch of cheap machines too.