Using HADR Standby as replication source - db2

Trying to figure out if there's way to replicate a subset of tables (columns) from HADR Standby, with ROS enabled. Latency of O(10) sec can be tolerated. We are using luw V10.5FP8 right now and will upgrade to V11 at some point.
I understand the read-only limitation on HADR Standby, and that eliminates some options, eg. QRepl / infosphere CDC, which write monitoring/metadata back to source.
Furthermore, assuming we still limit ourselves to db2 heterogeneous env and the repl user have read access to all the tables/columns, is there a replication tool that doesn't depends on constant source db connection? Meaning such tool only scans transaction log files, and writes to its own external metadata/file, without bothering with source db at all? It would be even better if it can capture once, and replay in multiple targets.
Really appreciate your inputs.

Related

Can 2 Debezium Connectors read from same source at the same time?

As the title says, I have 2 seperate servers and I want both connectors to read from same source to write to their respective topic. A single connector works well. When I create another one in a different server they seem to be running but no data flow occurs for both.
My question is, is that possible to run 2 debezium connectors that read from same source? I couldn't find any information about this topic in documentation.
Edit: I've tested it with oracle database and never seen it's working well. Definitely wouldn't recommend using it especially in oracle.
So generally speaking, Debezium does not recommend that you use multiple connectors per database source and prefer that you adjust your connector configuration instead. We understand that isn't always the case when you have different business use cases at play.
That said, it's important that if you do deploy multiple connectors you properly configure each connector so that it doesn't share state such as the same database history topic, etc.
For certain database platforms, having multiple source connectors really doesn't apply any real burden to the database, such as MySQL. But other databases like Oracle, running multiple connectors can have a pretty substantial impact.
When an Oracle connector streams changes, it starts an Oracle LogMIner mining session. This session is responsible for loading, reading, parsing, and preparing the contents of the data read in a special in-memory table that the connector uses to generate change events. When you run multiple connectors, you will have concurrent Oracle LogMiner sessions happening and each session will be consuming its own share of PGA memory to support the steps taken by Oracle LogMiner. Depending on your database's volatility, this can be stressful on the database server since Oracle specifically assigns one LogMiner session to a CPU.
For an Oracle environment, I highly recommend you avoid using multiple connectors unless you are needing to stream changes from different PDBs within the same instance since there is really no technical reason why you should want to read, load, parse, and generate change data for the same redo entries multiple times, once per connector deployment.

I have a postgresql setup with 1 master and 2 slaves. Now I want to setup a replication for only one database to reporting server

My problem statements: I can't use bucardo (no approval for creating triggers on prod tables) pglogical only works on Master node.(the publisher). I want to replicate from slave (this is my end goal)
If you have v10 or better, you should definitely use logical replication. The burden on the primary server is minimal.
If for whatever incomprehensible reason you absolutely have to replicate from a standby, you'll have to use streaming replication. Then you have to replicate the whole cluster, but there is no better solution.
One way out could be to put walbouncer in between, which can filter out WAL records for all but one database, so that you ate effectively replicating just one database.
Disclaimer: walbouncer was developed by my company (it is open source though).

PostgreSQL streaming replication for high load

I am planning to migrate my production oracle cluster to postgresql cluster. Current systems support 2000TPS and in order to support that TPS, I would be very thankful if someone could clarify bellow.
1) What is the best replication strategy ( Streaming or DRBD based replication)
2) In streaming replication, can master process traffic without slave and when slave come up does it get what lost in down time ?
About TPS - it depends mainly on your HW and PostgreSQL configuration. I already wrote about it on Stackoverflow in this answer. You cannot expect magic on some notebook-like configuration. Here is my text "PostgreSQL and high data load application".
1) Streaming replication is the simplest and almost "painless" solution. So if you want to start quickly I highly recommend it.
2) Yes but you have to archive WAL logs. See below.
All this being said here are links I would recommend you to read:
how to set streaming replication
example of WAL log archiving script
But of course streaming replication has some caveats which you should know:
problem with increasing some parameters like max_connections
how to add new disk and new tablespace to master and replicas
There is no “best solution” in this case. Which solution you pick depends on your requirements.
Do you need a guarantee of no data lost?
How big a performance hit can you tolerate?
Do you need failover or just a backup?
Do you need PITR (Point In Time Recovery)?
By default, I think a failed slave will be ignored. Depending on your configuration, the slave might take a long time to recover after e.g. a boot.
I'd recommend you read https://www.postgresql.org/docs/10/static/different-replication-solutions.html

Postgresql DB backup Ideal practices

• What are ideal practices for taking PostgreSQL logical backup using pg_dump?
• Is it ideal to take backup from a standby/slave node? If replication lag is less than 200ms
• Is it ideal to take backup from standby/slave node, and is there any specific configuration we need to change?
• Which method is a good way for taking backups logical backup or physical backup? where DB is getting updated frequently. As a backup is taken for disaster recovery which method is the faster and better backup and disaster recovery(restore).
updated
Our current database size is 5GB and replication is on hot standby mode.
We are running the Backup script on slave node but it takes remote backup from the master node every 30 minutes.
The reason I created this question is to understand when the backup is running some COPY statements takes 6 mins to complete, even though it will not affect other transactions on DB, is there any other issues occurs if a statement is taking more time.
I thought about what you wrote and here are some ideas for you:
If you need backup which will really be consistent to some point in time then you must use pg_basebackup or pg_barman (internally uses pg_basebackup) - explanation is in 1. link below. Latest pg_basebackup 10 streams WAL logs so you backup also all changes done during backup. Of course this backup takes only the whole PG instance. On the other hand it does not lock any table. And if you do it from remote instance then it causes only small CPU load on PG instance and disk IO is not as big as some texts suggests. See links 4 about my experiences. Restoration is quite simple - see link 5.
If you use pg_dump you must understand that you have no guarantee that your backup is really consistent to the point in time - again see link 1. There is a possibility to use snapshot of the database (see links 2 and 3) but even with it you cannot count on 100% consistency. We used pg_dump only on our analytical database which loads new only 1x per day (yesterdays partitions from production database). You can speed it with parallel option (works only for directory backup format). But downside is much higher load on PG instance - higher CPU usage, much higher disk IO. Even if you run pg_dump remotely - in such case you save only disk IO for saving of backup files. Plus pg_dump needs to place read lock on tables so it can collied either with new inserts or with replication (when taken on replica). But when your database reaches hundreds of GBs then even parallel dump can takes hours and in that moment you would need to switch to pg_basebackup anyway.
pg_barman is "comfortable version" of pg_basebackup + it allows you to prevent data loss even when your PG instance crashes very badly. Setting it to work requires more changes but it is definitely worth it. You will have to set WAL log archiving (see link 6) and if you PG is <10 you will have to set "max_wal_senders" and "max_replication_slots" (which you need for replication anyway) - everything is in pg-barman manual although description is not exactly great. pg_barman will stream and store WAL records even between backups so this way you can be sure that data loss in case of very bad crash will be almost none. But making it work can take many hours because descriptions are not exactly good. pg-barman does both backup and restoration with its commands.
Your database is 5GB big so any backup method will be quick. But you have to decide if you need point in time recovery and almost zero data loss or not - so if you will invest time to setting pg-barman or not.
Links:
PostgreSQL, Backups and everything you need to know
Review for Paper: 14-Serializable Snapshot Isolation in PostgreSQL - about snapshots
Parallel dumping of databases - example how to use snapshot
pg_basebackup experiencies
pg_basebackup - restore tar backup
Archiving WAL logs using script

How safe is MongoDB's safe mode on inserts?

I am working on a project which has some important data in it. This means we cannot to lose any of it if the light or server goes down. We are using MongoDB for the database. I'd like to be sure that my data is in the database after the insert and rollback the whole batch if one element was not inserted. I know it is the philosophy behind Mongo that we do not need transactions but how can I make sure that my data is really safely stored after insert rather than sent to some "black hole".
Should I make a search?
Should I use some specific mongoDB commands?
Should I use sharding even if one server is enough for satisfying
the speed and by the way it doesn't guarantee anything if the light
goes down?
What is the best solution?
Your best bet is to use Write Concerns - these allow you to tell MongoDB how important a piece of data is. The quickest Write Concern is also the least safe - the data is not flushed to disk until the next scheduled flush. The safest will confirm that the data has been written to disk on a number of machines before returning.
The write concern you are looking for is FSYNC_SAFE (at least that is what it is called from the point of view of the Java driver) or REPLICAS_SAFE which confirms that your data has been replicated.
Bear in mind that MongoDB does not have transactions in the traditional sense - your rollback will have to be rolled by hand as you can't tell the Mongo database to do this for you.
The other thing you need to do is either use the relatively new --journal option (which uses a Write Ahead Log), or use replica sets to share your data across many machines in order to maximise data integrity in the event of a crash/power loss.
Sharding is not so much a protection against hardware failure as a method for sharing the load when dealing with particularly large datasets - sharding shouldn't be confused with replica sets which is a way of writing data to more than one disk on more than one machine.
Therefore, if your data is valuable enough, you should definitely be using replica sets, perhaps even siting slaves in other data centres/availability zones/racks/etc in order to provide the resilience you require.
There is/will be (can't remember offhand whether this has been implemented yet) a way to specify the priority of individual nodes in a replica set such that if the master goes down the new master that is elected is one in the same data centre if such a machine is available (ie to stop a slave on the other side of the country from becoming master unless it really is the only other option).
I received a really nice answer from a person called GVP on google groups. I will quote it(basically it adds up to Rich's answer):
I'd like to be sure that my data is in the database after the
insert and rollback the whole batch if one element was not inserted.
This is a complex topic and there are several trade-offs you have to
consider here.
Should I use sharding?
Sharding is for scaling writes. For data safety, you want to look a
replica sets.
Should I use some specific mongoDB commands?
First thing to consider is "safe" mode or "getLastError()" as
indicated by Andreas. If you issue a "safe" write, you know that the
database has received the insert and applied the write. However,
MongoDB only flushes to disk every 60 seconds, so the server can fail
without the data on disk.
Second thing to consider is "journaling"
(v1.8+). With journaling turned on, data is flushed to the journal
every 100ms. So you have a smaller window of time before failure. The
drivers have an "fsync" option (check that name) that goes one step
further than "safe", it waits for acknowledgement that the data has
be flushed to the disk (i.e. the journal file). However, this only
covers one server. What happens if the hard drive on the server just
dies? Well you need a second copy.
Third thing to consider is
replication. The drivers support a "W" parameter that says "replicate
this data to N nodes" before returning. If the write does not reach
"N" nodes before a certain timeout, then the write fails (exception
is thrown). However, you have to configure "W" correctly based on the
number of nodes in your replica set. Again, because a hard drive
could fail, even with journaling, you'll want to look at replication.
Then there's replication across data centers which is too long to get
into here. The last thing to consider is your requirement to "roll
back". From my understanding, MongoDB does not have this "roll back"
capacity. If you're doing a batch insert the best you'll get is an
indication of which elements failed.
Here's a link to the PHP driver on this one: http://it.php.net/manual/en/mongocollection.batchinsert.php You'll have to check the details on replication and the W parameter. I believe the same limitations apply here.