All distributed queries fail using Citus task tracker executor - postgresql

I'm attempting to performance-test distributed joins on Citus 5.0. I have a master and two worker nodes, and a few hash distributed tables that behave as expected with the default config. I need to use the task tracker executor to test queries that require repartitioning.
However, After setting citus.task_executor_type to task-tracker, all queries involving distributed tables fail. For example:
postgres=# SET citus.task_executor_type TO "task-tracker";
SET
postgres=# SELECT 1 FROM distrib_mcuser_car LIMIT 1;
ERROR: failed to execute job 39
DETAIL: Too many task tracker failures
Setting citus.task_executor_type in postgresql.conf has the same effect.
Is there some other configuration change I'm missing that's necessary to switch the task executor?
EDIT, more info:
PostGIS is installed on all nodes
postgres_fdw is installed on the master
All other configuration is pristine
All of the tables so far were distributed like:
SELECT master_create_distributed_table('table_name', 'id', 'hash');
SELECT master_create_worker_shards('table_name', 8, 2);
The schema for distrib_mcuser_car is fairly large, so here's a more simple example:
postgres=# \d+ distrib_test_int
Table "public.distrib_test_int"
Column | Type | Modifiers | Storage | Stats target | Description
--------+---------+-----------+---------+--------------+-------------
num | integer | | plain | |
postgres=# select * from distrib_test_int;
ERROR: failed to execute job 76
DETAIL: Too many task tracker failures

The task-tracker executor assigns tasks (queries on shards) to a background worker running on the worker node, which connects to localhost to run the task. If your superuser requires a password when connecting to localhost, then the background worker will be unable to connect. This can be resolved by adding a .pgpass file on the worker nodes for connecting to localhost.

You can modify authentication settings and let workers connect to master without password checks by changing pg_hba.conf.
Add following line to master pg_conf.hba:
host all all [worker 1 ip]/32 trust
host all all [worker 2 ip]/32 trust
And following lines to for each worker-1 pg_hba.conf:
host all all [master ip]/32 trust
host all all [worker 2 ip]/32 trust
And following to worker-2 pg_hba.conf:
host all all [master ip]/32 trust
host all all [worker 1 ip]/32 trust
This is only intended for testing, DO NOT USE this for production system without taking necessary security precautions.

Related

PostgreSQL replication slot - how to see standby nodes

I am using PostgreSQL 11.2. I have replication slots setup. I am able to commit to a table and see it on the standby. I have few more standbys. How can I see from the master what other standbys I have?
By selecting from pg_stat_replication. client_addr will be the IP address of a standby.
You can use the following query and check for client_addr, client_port, and client_hostname.
SELECT * FROM pg_stat_replication;

AWS DMS Error - Unable to use plugins to establish logical replication on source PostgreSQL instance

I'm getting this error trying to replicate a postgre database (not RDS) to another postgre database (also not RDS). I get this connection error but the endpoints (source and target) are tested successfully. Any ideas?
Error: Last Error Unable to use plugins to establish logical replication on source PostgreSQL instance. Follow all prerequisites for 'PostgreSQL as a source in DMS' from https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.PostgreSQL.html Task error notification received from subtask 0, thread 0 [reptask/replicationtask.c:2880] [1020490] Remaining 1 error messages are truncated. Check the task log to find details Stop Reason FATAL_ERROR Error Level FATAL
I used DMS to reduce over-provisioned RDS storage size.
Set following values in db parameter group in both source and destination endpoints and restart. Maybe this will help for non-RDS endpoints if you add the same in Postgres configuration.
logical_replication = 1
max_wal_senders = 10
max_replication_slots = 10
wal_sender_timeout = 0
max_worker_processes = 8
max_logical_replication_workers = 8
max_parallel_workers = 8
max_worker_processes = 8
You need at least to set logical_replication = 1 in your source database configuration.
And then set the max_replication_slots = N, N being higher or equal to the number of replication processes you plan to run.
I have this problem with setting up the ongoing replication at the AWS DMS migration task.
I change the source and target endpoint Parameter groups setting below:
session_replication_role = replica
rds.logical_replication = 1
wal_sender_timeout = 0
and keep the remaining setting as the default
max_replication_slots = 20
max_worker_processes = GREATEST(${DBInstanceVCPU*2},8)
max_logical_replication_workers = null
autovacuum_max_workers = GREATEST({DBInstanceClassMemory/64371566592},3)
max_parallel_workers = GREATEST(${DBInstanceVCPU/2},8)
max_connections = LEAST({DBInstanceClassMemory/9531392},5000)
You need to follow all the steps in the guide that is displayed in the error. You need to update pg_hba.conf to allow the DMS instance to access. E.g. if the private IP on the DMS instance is on the 10.6.x.x range:
host your_db dms 10.6.0.0/16 md5
host replication all 10.6.0.0/16 md5
Then you'll need to create a dms user and role with superuser privileges.
Then follow the guide to update postgresql.conf with the wal settings, e.g. wal_level = logical and so on.

PostgreSQL PITR

I have a master\server setup with pgpool and postgres 9.5. Both servers are running on centOS7.
I wanted to setup a point in time recovery with base backups every saturday, eliminating the old xlogs.
The server is archiving the xlogs with success on a external filesystem.
But when I try to execute the basebackup command it gives the following error:
pg_basebackup: could not connect to server: FATAL: database "replication" does not exist.
You seems to be missing the explicit HBA record for replication database, because specifying all does not covers the replication connections
host replication postgres 127.0.0.1/0 trust
The value replication specifies that the record matches if a
replication connection is requested (note that replication connections
do not specify any particular database). Otherwise, this is the name
of a specific PostgreSQL database.

Ambari server doesn't restart after removing node with cloudbreak

After adding a node to test scaling then removing that node with cloudbreak, the service ambari-server won't restart.
The error at launch is:
DB configs consistency check failed. Run "ambari-server start --skip-database-check" to skip. You may try --auto-fix-database flag to attempt to fix issues automatically. If you use this "--skip-database-check" option, do not make any changes to your cluster topology or perform a cluster upgrade until you correct the database consistency issues. See /var/log/ambari-server/ambari-server-check-database.log for more details on the consistency issues.
Looking the logs doesn't say much more. I tried restarting postgres, sometimes it works, like 1 on 10 times (HOW is it possible ?)
I went deeper in my reasonning rather than just restarting postgres.
I opened the ambari table to look in it:
sudo su - postgres
psql ambari -U ambari -W -p 5432
(password is bigdata)
and when I looked in tables topology_logical_request, topology_request and topology_hostgroup, I saw that the cluster register a remove request, only an adding request:
ambari=> select * from topology_logical_request;
id | request_id | description
----+------------+-----------------------------------------------------------
1 | 1 | Logical Request: Provision Cluster 'sentelab-perf'
62 | 51 | Logical Request: Scale Cluster 'sentelab-perf' (+1 hosts)
Check the ids to delete (track all requests with adding node operation) and begin to delete them (order matters):
delete from topology_hostgroup where id = 51;
delete from topology_logical_request where id = 62;
DELETE FROM topology_request WHERE id = 51;
close with \q, restart ambari-server, and it works !

Issue with datanodes on postgres-XL cluster

Postgres-XL not working as expected.
I have configured a Postgres-XL cluster as below:
GTM running on node3
GMT_Proxy running on node2 and node1
Co-ordinators and datanodes running on node2 and node1.
When I try to do any operation connecting to the database directly, I get the below error which is expected anyway.
postgres=# create table test(eno integer);
ERROR: cannot execute CREATE TABLE in a read-only transaction
But when I login via the co-ordinator, it says the below error:
postgres=# \l+
ERROR: Could not begin transaction on data node.
In the postresql.log, I can see the below errors. any idea what to be done?
2016-06-26 20:20:29.786 AEST,"postgres","postgres",3880,"192.168.87.130:45479",576fabb5.f28,1,"SET",2016-06-26 20:17:25 AEST,2/31,0,ERROR,22023,"node ""coord1_3878"" does not exist",,,,,,"SET global_session TO coord1_3878;SET parentPGXCPid TO 3878;",,,"pgxc"
2016-06-26 20:20:47.180 AEST,"postgres","postgres",3895,"192.168.87.131:45802",576fac7d.f37,1,"SELECT",2016-06-26 20:20:45 AEST,3/19,0,LOG,00000,"No nodes altered. Returning",,,,,,"SELECT pgxc_pool_reload();",,,"psql"
2016-06-26 20:21:12.147 AEST,"postgres","postgres",3897,"192.168.87.131:45807",576fac98.f39,1,"SET",2016-06-26 20:21:12 AEST,3/22,0,ERROR,22023,"node ""coord1_3741"" does not exist",,,,,,"SET global_session TO coord1_3741;SET parentPGXCPid TO 3741;",,,"pgxc"
PostresXL version - 9.5 r1.1
psql (PGXL 9.5r1.1, based on PG 9.5.3 (Postgres-XL 9.5r1.1))
Ant idea for this?
Seems like you haven't really configured pgxc_ctl well. Just type in
prepare config minimal
in pgxc_ctl command line which will generate you a general pgxc_ctl.conf file that you can modify accordingly.
And you can follow the official postgres XL documentation to add nodes from the pgxc_ctl command line as John H suggested.
I have managed to fix my issue:
1) Used the source from the git repository, XL9_5_STABLE branch ( https://git.postgresql.org/gitweb/?p=postgres-xl.git;a=summary). The source tarball they provide at http://www.postgres-xl.org/download/ did not work for me
2) Used pgxc_ctl as mentioned above. I was getting Could not obtain a transaction ID from GTM because of the fact that when adding the gtm I was using localhost instead of the ip.
add gtm master gtm localhost 20001 $dataDirRoot/gtm
instead of
add gtm master gtm 10.222.1.49 20001 $dataDirRoot/gtm