I am trying to understand the implementation of WAL in Postgres 9.3.5. In xlog.c file, there is a parameter XLOG_SWITCH which I don't understand. I googled this parameter, but I didn't find useful information. Could anyone explain the purpose of this parameter?
Changes in the database are stored in xlog files that are for default 16MB size, principally for crash recovery, or a hot standby server, that means the server must full a file with commands like create table, insert into, etc. They‘re reasons to switch the log before that 16MB gets full, like you don‘t want to wait to migrate the current xlog for a standby server, and they‘re reasons too to expand that size, and it‘s because you consider that 16MB is little bit and the transactions in your database generate so much xlog files, the size to switch the xlogs depends of the amount of data that you‘re willing to lose
Related
I'm a newbie of intersystems cache. We have an old system using cache database, now we want to extract and transform all data of it to store into another different database such as PostgreSQL first, and then monitor all modifications of the original cache data to modify(new or update) our transformed data in PostgreSQL in time.
Is there any way we can monitor all data modifications in cache?
Does cache have got any modification/replication log just like mongodb's oplog?
Any idea would be appreciated, thanks!
In short, yes, there is a way to monitor data modifications. InterSystems uses journaling for some reasons, mostly related to keeping data consistent. In some situations, journals may have more records than in others. It used to rollback transactions, restore data in unexpected shutdowns, for backups, for mirroring, and so on.
But I think in your situation it may help. Most of the quite old applications on Caché does not use Objects and SQL tables and works just with globals as is. While in journals you will find only globals, so, you if already have objects and tables in your Caché application, you should know where and how it stores data in globals. And with Journals API you will be able to read every change in the data. There will be any changes, if the application uses transactions, you will have flags about it as well. Every record changes only one value, it could be set or kill.
And be aware, that Caché cleans outdated journal files by settings, so, you have to increase the number of days after which it will be purged.
In my postgesql database, unfortunately I truncate this table mail_group, and the table is delete from the database, how to I get back this table.
Kindly help me, waiting for reply.
Thanks
Anyone else in the same situation: immediately stop your database with pg_ctl stop -m immediate (the immediate is important, you need to simulate a crash and prevent a checkpoint) then do not restart it.. If you had concurrent transactions still in progress you might be really lucky and PostgreSQL might not have unlinked the backing files for the table yet, so it could maybe be recoverable.
You very likely can't get the data back, you deleted it. Restore from a backup.
A normal DELETE in PostgreSQL marks the rows as deleted but does not actually erase the data immediately, so it can often be recovered if you promptly stop the database and you don't write anything else to the table.
This is not the case for TRUNCATE. TRUNCATE deletes the underlying files that represent the database table from the file system.
Recovering the data, if possible at all, would require forensic analysis of your hard drive. If the data is truly important then power the computer off now and take a disk image of the hard drive. Expect recover work to cost multiple thousand dollars, if it is possible at all, since you will need someone who knows both (a) file system internals and (b) PostgreSQL internals. The only person I can think of who I know has the skills to possibly be able to do this would probably cost about €5000 to €10000 for the time required for this sort of work. (It isn't me).
If you didn't have backups you have just learned a very expensive lesson.
If someone else is reading this and DELETEd rows, please immediately follow the instructions in corruption since the first recovery steps are the same. This will not help if you ran TRUNCATE.
I want to do some work with Write-ahead-logging(WAL) on Postgres. Could anyone point me to the WAL implementation in Postgres codebase? I just want to know current implementation and start to modify that. Any version of Postgres is fine unless it has WAL.
Thanks in advance.
The main part of the code is here:
src/backend/access/transam/xlog.c
And:
src/backend/access/transam/README
But of course the need to do WAL permeates the entire code base.
You have picked perhaps the most difficult possible starting point to get your feet wet. (I should know--that is also how I did it).
WAL is write-ahead logging. Basically, before the database actually
performs an operation, it writes in a log what it's about to do. Then, it
goes and does it. This ensures data consistency. Let's say that the
computer was powered off suddenly. There are several points that could
happen:
1) before a write - in this case the database would be fine with or
without write-ahead logging.
2) during a write - without write-ahead logging, if the machine is powered
off during a write, the database has no way of knowing what remained to be
written, or what was being written. WIth Postgres, this is furthere
broken down into two possibilities:
The power-off occurred while it was writing to the log - in this
case, the log is rolled back. The database is unaffected because the data
was never written to the database proper.
The power-off occurred after writing to the log, while writing to
disk - in this case, Postgres can simply read from the log what was
supposed to be written, and complete the write.
3) after a write - again, this does not affect Postgres either with or
without WAL.
In addition, WAL increases PostgreSQL's efficiency, because it can delay
random-access writes to disk, and just do sequential writes to the log for
a long time. This reduces the amount of head-seek the dissk are doing.
If you store your WAL files on a different disk, you get even more speed
advantages.
I need to track any changes of data in postgresql database. Is there any option in database or any script to view those data and DML as well.
Sorry - I have no clue. But I do have some different suggestions:
Log /all/ queries and grep for those involving update, delete, insert, alter table etc. Caveats: may cause performance problems if there are lots of queries and the log is on the same RAID as data and/or WAL. Not sure if it's easy to make some regexp that is 100% certain to catch all modifying statements. May be difficult to catch rollbacks etc. To log everything, add this to the configuration file: log_min_duration_statement = 0. Have a look that the other log_* configuration variables are sane as well.
The rules/trigger approach (as hinted by other user) - I believe it involves writing up rules for each and every table - but it's of course doable (and should be possible to create the rules through some external script if you have a lot of tables). You may also look a bit into how slony works - slony is a trigger-based replication system, should be possible to use it to catch all the changes in the DB.
All changes to the database ends up in the WAL-file, maybe it's theoretically possible to extract something out from the WAL, but I suspect that's not practical unless you're already a skilled postgres hacker ... and if you're a skilled postgres hacker, you probably wouldn't ask this question in the first place ;-) (eventually, the WALs may be used to see the rate of changes in the data and spot times of the day when there are more updates than otherwise etc. They may also be used for replication and roll-forward from a binary backup)
Between setting log_statement='all' in the postgresql.conf, you can also use tablelog to capture old data.
How can I compact Firebird 2.1 database, like we do in MS Access (discarding erased data, remaking index, etc)?
There's a way to do it?
Thanks!
Usually there is no need to compact a Firebird Database: see fb release notes about garbage collection and an automatic (per-database configurable) operation named "sweep".
In few words, fb reuses space in pages when records are deleted or oldest record version are freed asking for disk space chunks only when free space becomes too small (i.e. under a defined percent).
Sweep is performed as default after a predefined number of commited transactions, bur it's an expensive task.
Backup and restore must be intended as last resort to optimize and shrink, as this rebuilds and optimize indexes too, but usually this is not needed as there are commands and tools to rebuild indexes.
The only way to do it is to make a backup and a restore.
From the official faq
Many users wonder why they don't get their disk space back when they
delete a lot of records from database.
The reason is that it is an expensive operation, it would require a
lot of disk writes and memory - just like doing refragmentation of
hard disk partition. The parts of database (pages) that were used by
such data are marked as empty and Firebird will reuse them next time
it needs to write new data.
If disk space is critical for you, you can get the space back by
doing backup and then restore. Since you're doing the backup to
restore right away, it's wise to use the "inhibit garbage collection"
or "don't use garbage collection" switch (-G in gbak), which will make
backup go A LOT FASTER. Garbage collection is used to clean up your
database, and as it is a maintenance task, it's often done together
with backup (as backup has to go throught entire database anyway).
However, you're soon going to ditch that database file, and there's no
need to clean it up.