How can I audit with the Microsoft SQL Server LDF file? - sql-server-2008-r2

We need an audit log in the product that we are creating. We use SQL Server 2008 R2. I learned that the LDF file keeps an complete log of all transactions that where made*.
I've found ApexSQL Log, this tools analyses the LDF file and provides a GUI. It's a great demonstration of what's possible. But it's expensive. More info: http://www.apexsql.com/sql_tools_log.aspx
Do you know of other programs that can analyse the LDF file's? Or perhaps other methods to provide audit-trail functionality? I know that it's possible to create triggers. But if it isn't necessary to add things to my database scheme then I would rather not do it.
*Only if you select the full recovery model.

How about the new Change Data Capture (CDC) functionality in R2. Doesnt that serve your purpose ?

When it comes to the information stored in an LDF file, make sure to form a full log chain. A log chain is a continuous sequence of transaction log backups. It starts with a full database backup followed by all subsequent log backups up through the auditing point. If it becomes broken, only the transactions in the logs up to the last backup before the missing one can be shown with full information (e.g. a schema and object name, or a row history)
Unlike INSERT and DELETE operations, which are fully logged in the LDF files, UPDATE operations are logged minimally – only the changes that are made are logged, but the old and new values are not. When logging UPDATE operations, SQL Server doesn’t log complete before and after row states but only the incremental change that occurred to the row. For example, if a word “log” was updated to word “blog” SQL Server will, in general case, only log an addition of letter “b” at index 0. This is enough for its purpose of ensuring ACID but not enough to easily show before and after states of the row. So, in order to understand what changed really occurred, you have to reconstruct the context in which the change occurred from the rest of transaction log and/or backup and online database data

Related

Is there a way to show everything that was changed in a PostgreSQL database during a transaction?

I often have to execute complex sql scripts in a single transaction on a large PostgreSQL database and I would like to verify everything that was changed during the transaction.
Verifying each single entry on each table "by hand" would take ages.
Dumping the database before and after the script to plain sql and using diff on the dumps isn't really an option since each dump would be about 50G of data.
Is there a way to show all the data that was added, deleted or modified during a single transaction?
Dude, What are you looking for is the most searchable thing on the internet when it comes to capturing Database changes. It is a kind of version control we can say.
But as long as I know, sadly there are no in-built approaches are available in PostgreSQL or MySql. But you can overcome it by setting/adding some triggers for your most usable operations.
You can create some backup schemas, and tables to capture your changes that are changed(updated), created, or deleted.
In this way you can achieve what you want. I know this process is fully manual, But really effective.
If you need to analyze the script's behaviour only sporadically, then the easiest approach would be to change server configuration parameter log_min_duration_statement to 0 and then back to any value it had before the analysis. Then all of the script activity will be written to the instance log.
This approach is not suitable if your storage is not prepared to accommodate this amount of data, or for systems in which you don't want sensitive client data to be written to a plain-text log file.

Database Content Versioning

I am interested in keeping a running history of every change which has happened on some tables in my database, thus being able to reconstruct historical states of the database for analysis purposes.
I am using Postgres, and this MVCC thing just seems like I should be able to exploit it for this purpose but I cannot find any documentation to support this. Can I do it? Is there a better way?
Any input is appreciated!
UPD
I have marked Denis' response as the answer, because he did in fact answer whether MVCC is what I want which was the question. However, the strategy I have settled on is detailed below in case anyone finds it useful:
The Postgres feature that does what I want: online backup/point in time recovery.
http://www.postgresql.org/docs/8.1/static/backup-online.html explains how to use this feature but essentially you can set this "write ahead log" to archive mode, take a snapshot of the database (say, before it goes live), then continually archive the WAL. You can then use log replay to recall the state of the database at any time, with the side benefit of having a warm standby if you choose (by continually replaying the new WALs on your standby server).
Perhaps this method is not as elegant as other ways of keeping a history, since you need to actually build the database for every point in time you wish to query, however it looks extremely easy to set up and loses zero information. That means when I have the time to improve my handling of historical data, I'll have everything and will therefore be able to transform my clunky system to a more elegant system.
One key fact that makes this so perfect is that my "valid time" is the same as my "transaction time" for the specific application- if this were not the case I would only be capturing "transaction time".
Before I found out about the WAL, I was considering just taking daily snapshots or something but the large size requirement and data loss involved did not sit well with me.
For a quick way to get up and running without compromising my data retention from the outset, this seems like the perfect solution.
Time Travel
PostgreSQL used to have just this feature, and called it "Time Travel". See the old documentation.
There's somewhat similar functionality in the spi contrib module that you might want to check out.
Composite type audit trigger
What I usually do instead is to use triggers to log changes along with timestamps to archival tables, and query against those. If the table structure isn't going to change you can use something like:
CREATE TABLE sometable_history(
command_tag text not null check (command_tag IN ('INSERT','DELETE','UPDATE','TRUNCATE')),
new_content sometable,
change_time timestamp with time zone
);
and your versioning trigger can just insert into sometable_history(TG_OP,NEW,current_timestamp) (with a different CASE for DELETE, where NEW is not defined).
hstore audit trigger
That gets painful if the schema changes to add new NOT NULL columns though. If you expect to do anything like that consider using a hstore to archive the columns, instead of a composite type. I've already added an implementation of that on the PostgreSQL wiki already.
PITR
If you want to avoid impact on your master database (growing tables, etc), you can alternately use continuous archiving and point-in-time recovery to log WAL files that can, using a recovery.conf, be replayed to any moment in time. Note that WAL files are big and they include not only the tuples you changed, but VACUUM activity and other details. You'll want to run them through clearxlogtail since they can have garbage data on the end if they're partial segments from an archive timeout, then you'll want to compress them heavily for long term storage.
I am using Postgres, and this MVCC thing just seems like I should be able to exploit it for this purpose but I cannot find any documentation to support this. Can I do it?
Not really. There are tools to see dead rows, because auto-vacuuming is so that will eventually be reclaimed.
Is there a better way?
If I get your question right, you're looking into logging slowly changing dimensions.
You might find this recent related thread interesting:
Temporal database design, with a twist (live vs draft rows)
I'm not aware of any tools/products that are built for that purpose.
While this may not be exactly what you're asking for, you can configure Postgresql to log ddl changes. Setting the log_line_prefix parameter (try including %d, %m, and %u) and setting the log_statement parameter to ddl should give you a reasonable history of who made what ddl changes and when.
Having said that, I don't believe logging ddl to be foolproof. For example, consider a situation where:
Multiple schemas have a table with the same name,
one of the tables is altered, and
the ddl doesn't fully qualify the table name (relying on the search path to get it right),
then it may not be possible to know from the log which table was actually altered.
Another option might be to log ddl as above but then have a watcher program perform a pg_dump of the database schema whenever a ddl entry get's logged. You could even compare the new dump with the previous dump and extract just the objects that were changed.

what happens to my dataset in case of unexpected failure

i know this has been asked here. But my question is slightly different. When the dataset was designed keeping the disconnected principle in mind, what was provided as a feature which would handle unexpected termination of the application, say a power failure or a windows hang or system exception leading to restart. Say the user has entered some 100 rows and it is modified at the dataset alone. Usually the dataset is updated at the application close or at a timely period.
In old times which programming using vb 6.0 all interaction used to take place directly with the database, thus each successful transaction was committing itself automatically. How can that be done using datasets?
DataSets are never for direct access to database, they are a disconnected model only. There is no intent that they be able to recover from machine failures.
If you want to work live against the database you need to use DataReaders and issue DbCommands against the database live for changes. This of course will increase your load on the database server though.
You have to balance the two for most applications. If you know a user just entered vital data as a new row, execute an insert command to the database, and put a copy in your local cached DataSet. Then your local queries can run against the disconnected data, and inserts are stored immediately.
A DataSet can be serialized very easily, so you could implement your own regular backup to disk by using serialization of the DataSet to the filesystem. This will give you some protection, but you will have to write your own code to check for any data that your application may have saved to disk previously and so on...
You could also ignore DataSets and use SqlDataReaders and SqlCommands for the same sort of 'direct access to the database' you are describing.

Database last updated?

I'm working with SQL 2000 and I need to determine which of these databases are actually being used.
Is there a SQL script I can used to tell me the last time a database was updated? Read? Etc?
I Googled it, but came up empty.
Edit: the following targets issue of finding, post-facto, the last access date. With regards to figuring out who is using which databases, this can definitively monitored with the right filters in the SQL profiler. Beware however that profiler traces can get quite big (and hence slow/hard to analyze) when the filters are not adequate.
Changes to the database schema, i.e. addition of table, columns, triggers and other such objects typically leaves "dated" tracks in the system tables/views (can provide more detail about that if need be).
However, and unless the data itself includes timestamps of sorts, there are typically very few sure-fire ways of knowing when data was changed, unless the recovery model involves keeping all such changes to the Log. In that case you need some tools to "decompile" the log data...
With regards to detecting "read" activity... A tough one. There may be some computer-forensic like tricks, but again, no easy solution I'm afraid (beyond the ability to see in server activity the very last query for all still active connections; obviously a very transient thing ;-) )
I typically run the profiler if I suspect the database is actually used. If there is no activity, then simply set it to read-only or offline.
You can use a transaction log reader to check when data in a database was last modified.
With SQL 2000, I do not know of a way to know when the data was read.
What you can do is to put a trigger on the login to the database and track when the login is successful and track associated variables to find out who / what application is using the DB.
If your database is fully logged, create a new transaction log backup, and check it's size. The log backup will have a fixed small lengh, when there were no changes made to the database since the previous transaction log backup has been made, and it will be larger in case there were changes.
This is not a very exact method, but it can be easily checked, and might work for you.

SyBase SQL anywhere check if Synchronization is needed?

I have a Sybase SQL Anywhere 11.0.1 database that I am using to sync with an Oracle Consolidated Database.
I know that the SQL Anywhere database keeps track of all of the changes that are made to it so that it knows what to synchronize with the consolidated database. My question is whether or not there is a SQL command that will tell you if the database has changes to sync.
I have a mobile application and I want to show a little flag to the user anytime they have made changes to the handheld that need to be synced. I could just create another table to track that stuff myself but I would much rather just ping the database and ask it if it has changes that need to be synced.
There's nothing automatic to tell you that there is data to synchronize. In addition to Ben's suggestion, another idea would be to query the SYS.SYSSYNC table at the remote database to get an idea of whether there might be changes. The following statement returns a result set that shows a simple status of your last synchronization :
select ss.site_name, sp.publication_name, ss.log_sent,ss.progress
from sys.syssync ss, sys.syspublication sp
where ss.publication_id = sp.publication_id
and ss.publication_id is not null
and ss.site_name is not null
If progress < log_sent, then the status of the last synchronization is unknown. The last upload may or may not have been applied at the consolidated, because the upload was sent, but no response was received from the MobiLink server. In this case, suggesting a synch isn't a bad idea.
If progress = log_sent, then the last synch was successful. Knowing this, you could check the value of db_property('CurrentRedoPos'), which will return the current log offset of the remote database. If this value is significantly higher than the progress value, there have been many operations applied to the database since the last synchronization, so there's a good chance that there is data to synchronize. There are lots of reasons why even a large difference in progress and db_property('CurrentRedoPos') could result in no actual data needing synchronization.
The download from the ML Server is applied by dbmlsync after the progress value at the remote is updated by dbmlsync when the upload is confirmed by the ML Server. Operations applied in the download by dbmlsync are not synchronized back to the ML Server, so the entire offset range could just be the last download that was applied. This could be worked around by tracking the current log offset in the sp_hook_dbmlsync_end hook when the exit code value in the #hook_dict table value is zero. This would tell you the log offset of the database after the download was applied, and you could now compare the saved value with the current log offset.
All the operations in the transaction log could be operations on tables that are not synchronized.
All the operations in the transaction log could have been rolled back.
My solution is not ideal. Tracking the changes to synchronized tables yourself is the best solution, but I thought I could offer an alternative that might be OK for your needs, with the advantage that you are not triggering an extra action on every operation performed on a synchronized table.
The mobile database doesn't keep track of when the last sync was, the MobiLink server keeps all of that information in the MobiLink tables of the consolidated database.
Since synchronization only transfers necessary information, you could simply initiate a sync. If there's nothing to sync, then very little data will be used by your application.
As a side note, SQL Anywhere has its own SO clone which is monitored by Sybase engineers. If anyone knows for sure, it'll be them.
As of SQL Anywhere 17, SAP PM maps to a local Sybase database that contains a TTRANSACTION_UPLOAD table, so to determine if a synchronization is necessary we simply query this table to see if it has any records that need to be sync'd to the HANA consolidation database.