Does Firebird have "Group Commit" - firebird

As I understand it, this is where a background thread is responsible for writing transactions to disk in "careful write" order so that the user does not have to wait for the actual writing to disk to occur.
I have seen references to this (e.g. here) from a long time ago relating to interbase but I could not see it mentioned in relation to firebird anywhere.

Using gfix utility you can set FORCED WRITES flag on or off for a database file. When turned on, the server will wait until actual disk write occur. When turned off the server will continue execution leaving to OS to decide when to write data to a disk. Performance gains are up to 3x but then there is a posibility that some data would be written in a wrong order if power failure occurs.
We strongly advice our customers toward using RAID controller with independent power source for a cache memory together with FORCED WRITES = ON.

Based on the comments on this thread and searching online it seems that firebird does not have GROUP COMMIT

Related

Sitecore 8.1 update 2 MongoDB backup

I am using replica set (2 mongo, 1 arbitor) for my Sitecore CD servers.
Assuming all mongo DB data get flushed to Reporting SQL DB; do we need to take backup of MongoDB database on production CD ?
If yes what is best approach and frequency to do it; considering My application is moderately using anaytics feature (Personalization , Campaign etc).
Unfortunately, your assumption is bad - the MongoDB is the definitive source of analytic data, not the reporting db. The reporting db contains only the aggregate info needed for generating the report (mostly). In fact, if (when) something goes wrong with the SQL DB, the idea is that it is rebuilt from the source MongoDB. Remember: You can't un-add two numbers after you've added them!
Backup vs Replication
A backup is a point-in-time view of the database, where replication is multiple active copies of a current database. I would advocate for replication over backup for this type of data. Why? Glad you asked!
Currency - under what circumstance would you want to restore a 50GB MongoDB? What if it was a week old? What if it was a month? Really the only useful data is current data, and websites are volatile places - log data backups are out of date within an hour. If you personalise on stale data is that providing a good user experience?
Cost - backing up large datasets is costly in terms of time, storage capacity and compute requirements; they are also a pain to restore and the bigger they are the more likely there's a corruption somewhere
Run of business
In a production MongoDB environment you really should have 2-3 replicas. That's going to save your arse if one of the boxes dies, which they sometimes do - MongoDB works the disks very hard.
These replicas are self-healing, and always current (pretty-much) so they are much better than taking backups. The chances that you lose all your replicas at once is really low except for one particular edge case... upgrades. So a backup is really only protection against hardware failure or data corruption which, in a multi-instance replica set, is already very effectively handled. Unless you're paranoid, you're never going to use that backup and it'll cost you plenty to have it.
Sitecore Upgrades
This is the killer edge-case - always make backups (see Back Up and Restore with MongoDB Tools) before running an upgrade because you can corrupt all of your replicas in one motion and you'll want to be able to roll back.
Data Trimming (side-note)
You didn't ask this, but at some point you'll be thinking "how the heck can I back up this 170GB monster db every day? this is ridiculous" - and you'll be right.
There are various schools of thought around how long this data should be persisted for - that's a question only you or your client can answer. I suggest keeping it until there's too much, then make a decision on how much you have to get rid of. Keep as much as you can tolerate.

Safe way to backup PostgreSQL when using Persistent Disk

I’m trying to set up daily backups (using Persistent Disk snapshots) for a PostgreSQL instance I’m running on Google Compute Engine and whose data directory lives on a Persistent Disk.
Now, according to the Persistent Disk Backups blog post, I should:
stop my application (PostgreSQL)
fsfreeze my file system to prevent further modifications and flush pending blocks to disk
take a Persistent Disk snapshot
unfreeze my filesystem
start my application (PostgreSQL)
This obviously brings with it some downtime (each of the steps took from seconds to minutes in my tests) that I’d like to avoid or at least minimize.
The steps of the blog post are labeled as necessary to ensure the snapshot is consistent (I’m assuming on the filesystem level), but I’m not interested in a clean filesystem, I’m interested in being able to restore all the data that’s in my PostgreSQL instance from such a snapshot.
PostgreSQL uses fsync when committing, so all data which PostgreSQL acknowledges as committed has made its way to the disk already (fsync goes to the disk).
For the purpose of this discussion, I think it makes sense to compare a Persistent Disk snapshot without stopping PostgreSQL and without using fsfreeze with a filesystem on a disk that has just experienced an unexpected power outage.
After reading https://wiki.postgresql.org/wiki/Corruption and http://www.postgresql.org/docs/current/static/wal-reliability.html, my understanding is that all committed data should survive an unexpected power outage.
My questions are:
Is my comparison with an unexpected power outage accurate or am I missing anything?
Can I take snapshots without stopping PostgreSQL and without using fsfreeze or am I missing some side-effect?
If the answer to the above is that I shouldn’t just take a snapshot, would it be idiomatic to create another Persistent Disk, periodically use pg_dumpall(1) to dump the entire database and then snapshot that other Persistent Disk?
1) Yes, though it should be even safer to take a snapshot. The fsfreeze stuff is really to be 100% safe (anecdotally: I never use fsfreeze on my PDs and have not run into issues)
2) Yes, but there is no 100% guarantee that it will always work (paranoid solution: take a snapshot, spin up a temp VM with that snapshot, check the disk is ok, and delete the VM. This can be automated)
3) No, I would not recommend this over snapshots. It will take a lot more time, might degrade your DB performance, and what happens if something happens in the middle of a dump? Also, PDs are very expensive for incremental backups. Snapshots are diffed, so you don't have to pay for the whole disk every copy (just the first one), only the changes.
Possible recommendation:
Do #3, but then create a snapshot of the new PD and then delete the PD.
https://cloud.google.com/compute/docs/disks/persistent-disks#creating_snapshots has recently been updated and now includes this new paragraph:
If you skip this step, only data which was successfully flushed to disk by the application will be included in the snapshot. The application experiences this scenario as if it was a sudden power outage.
So the answers to my original questions are:
Yes
Yes
N/A, since the answer to ② is Yes.

Write-ahead-logging (WAL) used in Postgres

I want to do some work with Write-ahead-logging(WAL) on Postgres. Could anyone point me to the WAL implementation in Postgres codebase? I just want to know current implementation and start to modify that. Any version of Postgres is fine unless it has WAL.
Thanks in advance.
The main part of the code is here:
src/backend/access/transam/xlog.c
And:
src/backend/access/transam/README
But of course the need to do WAL permeates the entire code base.
You have picked perhaps the most difficult possible starting point to get your feet wet. (I should know--that is also how I did it).
WAL is write-ahead logging. Basically, before the database actually
performs an operation, it writes in a log what it's about to do. Then, it
goes and does it. This ensures data consistency. Let's say that the
computer was powered off suddenly. There are several points that could
happen:
1) before a write - in this case the database would be fine with or
without write-ahead logging.
2) during a write - without write-ahead logging, if the machine is powered
off during a write, the database has no way of knowing what remained to be
written, or what was being written. WIth Postgres, this is furthere
broken down into two possibilities:
The power-off occurred while it was writing to the log - in this
case, the log is rolled back. The database is unaffected because the data
was never written to the database proper.
The power-off occurred after writing to the log, while writing to
disk - in this case, Postgres can simply read from the log what was
supposed to be written, and complete the write.
3) after a write - again, this does not affect Postgres either with or
without WAL.
In addition, WAL increases PostgreSQL's efficiency, because it can delay
random-access writes to disk, and just do sequential writes to the log for
a long time. This reduces the amount of head-seek the dissk are doing.
If you store your WAL files on a different disk, you get even more speed
advantages.

SQL Server 2008 R2 log files filled up the drive

So some of us dev's are starting to take over the management of some of our SQL Server boxes as we upgrade to SQL Server 2008 R2. In the past, we've manually reduced the log file sizes by using
USE [databaseName]
GO
DBCC SHRINKFILE('databaseName_log', 1)
BACKUP LOG databaseName WITH TRUNCATE_ONLY
DBCC SHRINKFILE('databaseName_log', 1)
and I'm sure you all know how the truncate only has been deprecated.
So the solutions that I've found so far are setting the recovery = simple, then shrink, then set it back... however, this one got away from us before we could get there.
Now we've got a full disk, and the mirroring that is going on is stuck in a half-completed, constantly erroring state where we can't alter any databases. We can't even open half of them in object explorer.
So from reading about it, the way around this happening in the future is to have a maintenance plan set up. (whoops. :/ ) but while we can create one, we can't start it with no disk space and SQL Server stuck in its erroring state (event viewer is showing it recording errors about 5 per second... this has been going on since last night.)
Anyone have any experience with this?
So you've kind of got a perfect storm of bad circumstances here in that you've already reached the point where SQL Server is unable to start. Normally at this point it's necessary to detach a database and move it to free space, but if you're unable to do that you're going to have to start breaking things and rebuilding.
If you have a mirror and a backup that is up to date, you're going to need to blast one unlucky database on the disk to get the instance back online. Once you have enough space, then take emergency measures to break any mirrors necessary to get the log files back to a manageable size and get them shrunk.
The above is very much emergency recovery and you've got to triple check that you have backups, transaction log backups, and logs anywhere you can so you don't lose any data.
Long term to manage the mirror you need to make sure that your mirrors are remaining synchronized, that full and transaction log backups are being taken, and potentially reconfiguring each database on the instance with a maximum file size where the sum of all log files does not exceed the available volume space.
Also, I would double check that your system databases are not on the same volume as your database data and log files. That should help with being able to start the instance when you have a full volume somewhere.
Bear in mind, if you are having to shrink your log files on a regular basis then there's already a problem that needs to be addressed.
Update: If everything is on the C: drive then consider reducing the size of the page file to get enough space to online the instance. Not sure what your setup is here.

Database for long running transactions with huge updates

I build a tool for data extraction and transformation. Typical use case - transactionally processing lots of data.
Numbers are - about 10sec - 5min duration, 200-10000 row updated (long duration caused not by the database itself but by outside services that used during transaction).
There are two types of agents that access database - multiple read agents, and only one write agent (so, there are never multiple concurrent write).
During the transaction:
Read agents should be able to read database and see it in the current state.
Write agent should be able to read database (it does both - read and write during transaction) and see it in the new (not yet committed) state.
Is PostgreSQL a good choice for that type of load? I know it uses MVCC - so it should be ok in general, but is it ok to use long and big transactions extensively?
What other open-source transactional databases may be a good choice (I am not limited to SQL)?
P.S.
I do not know if the sharding may affect the performance. The database will be sharded. For every shard there will be multiple readers and only one writer, but multiple different shards can be written to at the same time.
I know that it's better not to use outside services during transaction, but in that case - it's the goal. The database used as a reliable and consistent index for some heavy, huge, slow and eventually-consistent data processing tool.
Huge disclaimer: as always, only real life test can tell you the truth.
But, I think PostgreSQL will not let you down, if you use most recent version (at least 9.1, better 9.2) and tune it properly.
I have somewhat similar load in my server, but with slightly worse R/W ratio: about 10:1. Transactions range from few milliseconds up to 1 hour (and sometimes even more), and one transaction can insert or update up to 100k rows. Total number of concurrent writers with long transactions can reach 10 and more.
So far so good - I don't really have any serious issues, performance is great (certainly not worse than I expected).
What really helps is that my hot working data set almost fits into available memory.
So, give it a try, it should work great for your load.
Have a look at this link. Maximum transaction size in PostgreSQL
Basically there can be some technical limits on the software side to how large your transaction can be.