TimescaleDB multinode cluster & replication factor - postgresql

I have been working on timescaleDB multi-node concept, In my case I have an access node, and four data node. When I created the distributed hyper table I have set replication factor to four.
adding data node
SELECT add_data_node('dn1', host => '192.168.5.119', port => 5432, password => 'pass');
CREATE USER MAPPING FOR bennison SERVER dn1 OPTIONS (user 'pass', password 'pass');
For example, first data node name is dn1, and the second data node name is dn3, third, and fourth one is d3, d4.
SELECT create_distributed_hypertable('sensor_data', 'time', 'sensor_id', replication_factor => 4);
As I have set the replication factor to four, all data node chunk data will be stored in other data nodes. In my all data node I have installed the timescaledb-tune also.
After the all configuration, I just run the insert query it works fine.
INSERT INTO sensor_data VALUES ('2020-12-09',1,32.2,0.45);
If I down the data node 2 or three, I can run the read-only query (select query) on the access node that get the response back.
select * from sensor_data;
Here the thing is, If I down the data node one (dn1), I could not run the any query especially read only query. Here the error looks like
test=# select * from sensor_data ;
ERROR: [dn1]: FATAL: terminating connection due to administrator command
SSL connection has been closed unexpectedly
test=#
I want to know about, Why this error happens when the first data node is down, why this error is not happening other data node is down.
Is this possible run the query successfully, while the either one of the data node is in down. If not what will be the solution for this. If not any other way to implement the cluster in timescaleDB?

Related

Does PL/Proxy send query to replica if master is not available?

I have sharded and replicated Postgre database. I use CLUSTER + RUN way of running functions. I define target (master/replica) using CLUSTER param and shard using RUN ON. How can I make PL/Proxy run or not to run function on master, if originally target was replica, but it failed?
In PL/Proxy, you define shards via a libpq connection string. Now if a shard is replicated, you can simply use a connection string like
host=slave.myorg.com,master.myorg.com port=5432,5432 dbname=...
This will try to connect to the first host, and if that fails, it will fall back to the second host. PostgreSQL v14 has the additional connection string parameter target_session_attrs=prefer-standby to preferable connect to the standby server, even if it is not the first in the list.

Postgres Logical Replication disaster recovery

We are looking to use Postgres Logical Replication to move changes from an upstream server ("source" server) to a downstream server ("sink" server).
We run into issues when we simulate a disaster recovery scenario. In order to simulate this, we delete the source database while the replication is still active. We then bring up a new source database and try to: a) move data from the sink into the source, and b) set up replication. At this stage we get one of two errors, depending on when we set up the replication (before or after moving the data).
The errors we get after testing the above are one of the below:
Replication slot already in use, difficulty in re-enabling slot without deletion
LOG: logical replication apply worker for subscription "test_sub" has started
ERROR: could not start WAL streaming: ERROR: replication slot "test_sub" does not exist
LOG: worker process: logical replication worker for subscription 16467 (PID 205) exited with exit code 1
Tried amending using:
ALTER SUBSCRIPTION "test_sub" disable;
ALTER SUBSCRIPTION "test_sub" SET (slot_name = NONE);
DROP SUBSCRIPTION "test_sub";
Cannot create subscription due to PK conflicts
ERROR: duplicate key value violates unique constraint "test_pkey"
DETAIL: Key (id)=(701) already exists.
CONTEXT: COPY test, line 1
Some possible resolutions:
Have the Logical Replication set up after a given WAL record number. This might avoid the PK issues we are facing
Find a way to recreate the replication slot on the source database
Backup the Postgres server, including the replication slot, and re-import
Is this a well-catered for use case for Postgres Logical Replication? This is a typical disaster recovery scenario, so would like to know how best to implement. Thanks!

Choosing schema in the slave host replication set Usin Slony-I

I am using slony-I to replicate tables from one server to another. I have to databases on the master slave that have same exact tables , and i want to replicate them to a single table in the slave. I can create the same tables in different schemas in the slave table , however i cant determine the schema in the replication set in the slave host.
I want to be able to determine the schema i am replicating to on the slave host.
How can i do this in slony?
Thank you
Unfortunately you can't choose the schema on the slave host in slony.The schema name and table name should be identical on both the master and slave. A Work around for this thing is to create another schema on both databases on slave and master and use them in your slony replication

bdr_init_copy hangs indefinitely

Fairly new to Postgresql, but have to get replication set up. I settled on BDR, and it works fine in the local demo, but on distributed machines it starts to get problematic, mostly because I have no real clue what the hell I am doing, and I cry myself to sleep pining for MySQL. I've gotten BDR working accross multiple servers, almost. When I run:
SELECT bdr.bdr_node_join_wait_for_ready();
on the joining nodes it hangs. This happens on both DB2 and DB3. DB1 returns a valid response. Researching this I came across the bdr_init_copy command, which apparently does everything I have been doing by hand, and then some. so tried that out. Now, when I run:
/usr/lib/postgresql/9.4/bin/bdr_init_copy -d "host=192.168.1.10 dbname=demo3" --local-dbname="host=192.168.1.23 dbname=demo3" -n db2 -D bdr_data
I get
bdr_init_copy: starting ...
Getting remote server identification ...
Detected 2 BDR database(s) on remote server
Updating BDR configuration on the remote node:
demo2: creating replication slot ...
demo2: creating node entry for local node ...
demo3: creating replication slot ...
demo3: creating node entry for local node ...
Creating base backup of the remote node...
63655/63655 kB (100%), 1/1 tablespace
Creating restore point on remote node ...
Bringing local node to the restore point ...
And it sits there. I am assuming that it is the same cause for both issues. as far as I can tell there are no log entries created on the local node (db2) but the following is present on the remote(db1)
2016-10-12 22:38:43 UTC [20808-1] postgres#demo2 LOG: logical decoding found consistent point at 0/5001F00
2016-10-12 22:38:43 UTC [20808-2] postgres#demo2 DETAIL: There are no running transactions.
2016-10-12 22:38:43 UTC [20808-3] postgres#demo2 STATEMENT: SELECT pg_create_logical_replication_slot('bdr_17163_6340711416785871202_2_17163__', 'bdr');
2016-10-12 22:38:43 UTC [20811-1] postgres#demo3 LOG: logical decoding found consistent point at 0/5002090
2016-10-12 22:38:43 UTC [20811-2] postgres#demo3 DETAIL: There are no running transactions.
2016-10-12 22:38:43 UTC [20811-3] postgres#demo3 STATEMENT: SELECT pg_create_logical_replication_slot('bdr_17939_6340711416785871202_2_17939__', 'bdr');
2016-10-12 22:38:44 UTC [20812-1] postgres#demo3 LOG: restore point "bdr_6340711416785871202" created at 0/50022A8
2016-10-12 22:38:44 UTC [20812-2] postgres#demo3 STATEMENT: SELECT pg_create_restore_point('bdr_6340711416785871202')
Any help out there?
Right, just had this issue and none of the other forums were any help. Some of them even say things like it is okay for the new node to report its status as "o" and the other nodes report the new server status as "i" because "this is just a bug and it's fine". It's NOT OKAY. The new server could receive replication updates, but no primary updates were possible on the new server. The key to solving this problem is to crank up the logging on the server you are joining to (not the new one). On the new server logs, you might see things like: 08006: could not receive data from client: Connection reset by peer, which is not very helpful, and will have you checking firewalls, etc. The real money shot will come from the source server logs when they have logs saying something like: no free replication state could be found for 11, increase max_replication_slots What's probably happened is you either have too many servers in your cluster for the default settings or, more likely, there is some junk left over from old hosts.
You need to clean things up ... ON EVERY SERVER IN THE EXISTING CLUSTER (NB!). Start by getting a list of things on the existing cluster:
select * from bdr.bdr_nodes order by node_sysid;
THEN, check the following:
select conn_sysid,conn_dboid from bdr.bdr_connections order by conn_sysid;
.. if you see old entries (that don't contain node_sysid from the first query) then delete
eg. delete from bdr.bdr_connections where conn_sysid='<from-first-query>';
select * from pg_replication_slots order by slot_name;
.. if you see old entries that don't contain an active sysid then delete
.. NB, use the function, DO NOT do a "delete from"
eg. select pg_drop_replication_slot('bdr_17213_6574566740899221664_1_17213__');
select * from pg_replication_identifier order by riname;
.. if you see old entries that don't contain an active sysid then delete
.. NB, use the function, DO NOT do a "delete from"
select pg_replication_identifier_drop('bdr_6443767151306784833_1_17210_17213_');
With any luck, after you've done this on every node, you will see your new server's BDR status go to 'r'. As you clean up each host, you should notice that the logs "08006: could not receive data from client: Connection reset by peer", matching the conn-sysid of the server you've just cleaned up, stop happening. Good luck

postgres slony-i master node tables can't be written after running for few days

I set up slony to replicate 3 tables from one opensuse pc (master node) to another opensuse pc (slave node). It works well at first. After running for few days, it suddenly come out the error message of -
ERROR: Slony-I: Table euprofiles is replicated and cannot be modified on a subscriber node - role=0
euprofiles is one of the tables being replicated by slony.
I know that this message may occur if you are trying to write to the table of slave node. But here I am writing to master node only.
Does anyone see similar problem?
Never had this happen. Are you certain you're connecting to the db you think you're connecting to? Slony may be a bit difficult to setup and such, but it doesn't just randomly decide a master is now a slave.
If you psql into the two databasesand do \d euprofiles on each what do they say? The source table should have something like this at the end:
Triggers:
_slony_www_logtrigger_228 AFTER INSERT OR DELETE OR UPDATE ON users FOR EACH ROW EXECUTE PROCEDURE _slony_www.logtrigger('_slony_www', '228', 'kvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv')
and the destination table should have something like this:
Triggers:
_slony_www_denyaccess_228 BEFORE INSERT OR DELETE OR UPDATE ON users FOR EACH ROW EXECUTE PROCEDURE _slony_www.denyaccess('_slony_www')
If they both look like this last trigger there's some problem. But I'm betting you're just connecting to the wrong server. Let's hope it's that simple.