fence_scsi is failing on linux guests on wmware - redhat

I have a cluster with two nodes. I have checked on both nodes if the disk is supporting;
[root#d-clus1 ~]# /usr/bin/sg_persist -n -i -r -d /dev/sdb
PR generation=0x1, Reservation follows:
Key=0xec290000
scope: LU_SCOPE, type: Write Exclusive, registrants only
[root#d-clus2 ~]# /usr/bin/sg_persist -n -i -r -d /dev/sdb
PR generation=0x1, Reservation follows:
Key=0xec290001
scope: LU_SCOPE, type: Write Exclusive, registrants only
I have configured the scsi stonith;
[root#d-clus1 ~]# pcs stonith create scsi fence_scsi pcmk_host_list="d-clus1 d-clus2" pcmk_reboot_action="off" pcmk_monitor_action="metadata" devices="/dev/sdb" meta provides="unfencing" --force
I is successfully started, however both node is reading only its own key;
[root#d-clus1 ~]# /usr/bin/sg_persist -n -i -k -d /dev/sdb
PR generation=0x1, 1 registered reservation key follows:
0xec290000
[root#d-clus2 ~]# /usr/bin/sg_persist -n -i -k -d /dev/sdb
PR generation=0x1, 1 registered reservation key follows:
0xec290001
And it is causing the failure on fencing. Where is my mistake?

Related

Ansible hangs on setup or playbook but SSH works fine

I am having an odd issue with Ansible and connecting to a host (any host) and hoping someone can see something I'm not. I can ssh directly to the host w/o any issue. I can run -m ping w/o issue. But that's where success ends. If I run a -m setup it appears to connect and gather some info, but subsequent connections fail.
This is a server spun up on Proxmox (7.2.11). I've done this 100's of times w/o issue. That's why I can't seen to identify what has changed. I typically spin up a container and set up w/ a ssh key (requiring passphrase) for root. If a VM, I simply copy the public key to the root users authorized_keys. Then run ansible playbook to add the user(s) and services along with locking down ssh. So my playbooks initially run using the root user. Ansible has always prompted for the passphrase and then go along it's merry way.
I'm using pipelining, but I've set to false in testing.
Appreciate any insight you may have... Thank you
Here's the output of a simple gather facts. You can see that the first two SSH: EXEC return a result, but the third connection hangs.
➜ ansible git:(main) ✗ ansible all -vvv -i ./inventory.yml -m setup
ansible 2.10.8
config file = /home/johndoe/NAS1-Mounts/Code/ansible/ansible.cfg
configured module search path = ['/home/johndoe/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python3/dist-packages/ansible
executable location = /usr/bin/ansible
python version = 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0]
Using /home/johndoe/NAS1-Mounts/Code/ansible/ansible.cfg as config file
host_list declined parsing /home/johndoe/NAS1-Mounts/Code/ansible/inventory.yml as it did not pass its verify_file() method
script declined parsing /home/johndoe/NAS1-Mounts/Code/ansible/inventory.yml as it did not pass its verify_file() method
Parsed /home/johndoe/NAS1-Mounts/Code/ansible/inventory.yml inventory source with ini plugin
Skipping callback 'default', as we already have a stdout callback.
Skipping callback 'minimal', as we already have a stdout callback.
Skipping callback 'oneline', as we already have a stdout callback.
META: ran handlers
<target_server> Attempting python interpreter discovery
<10.2.0.27> ESTABLISH SSH CONNECTION FOR USER: root
<10.2.0.27> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/home/johndoe/.dotfiles/ansible/.ansible/cp/27e670244a 10.2.0.27 '/bin/sh -c '"'"'echo PLATFORM; uname; echo FOUND; command -v '"'"'"'"'"'"'"'"'/usr/bin/python'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'python3.9'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'python3.8'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'python3.7'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'python3.6'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'python3.5'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'python2.7'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'python2.6'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'/usr/libexec/platform-python'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'/usr/bin/python3'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'python'"'"'"'"'"'"'"'"'; echo ENDFOUND && sleep 0'"'"''
<10.2.0.27> (0, b'PLATFORM\nLinux\nFOUND\n/usr/bin/python3\nENDFOUND\n', b'')
<10.2.0.27> ESTABLISH SSH CONNECTION FOR USER: root
<10.2.0.27> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/home/johndoe/.dotfiles/ansible/.ansible/cp/27e670244a 10.2.0.27 '/bin/sh -c '"'"'/usr/bin/python3 && sleep 0'"'"''
<10.2.0.27> (0, b'{"platform_dist_result": [], "osrelease_content": "PRETTY_NAME=\\"Ubuntu 22.04.1 LTS\\"\\nNAME=\\"Ubuntu\\"\\nVERSION_ID=\\"22.04\\"\\nVERSION=\\"22.04.1 LTS (Jammy Jellyfish)\\"\\nVERSION_CODENAME=jammy\\nID=ubuntu\\nID_LIKE=debian\\nHOME_URL=\\"https://www.ubuntu.com/\\"\\nSUPPORT_URL=\\"https://help.ubuntu.com/\\"\\nBUG_REPORT_URL=\\"https://bugs.launchpad.net/ubuntu/\\"\\nPRIVACY_POLICY_URL=\\"https://www.ubuntu.com/legal/terms-and-policies/privacy-policy\\"\\nUBUNTU_CODENAME=jammy\\n"}\n', b'')
Using module file /usr/lib/python3/dist-packages/ansible/modules/setup.py
Pipelining is enabled.
<10.2.0.27> ESTABLISH SSH CONNECTION FOR USER: root
<10.2.0.27> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/home/johndoe/.dotfiles/ansible/.ansible/cp/27e670244a 10.2.0.27 '/bin/sh -c '"'"'/usr/bin/python3 && sleep 0'"'"''
^C [ERROR]: User interrupted execution
➜ ansible git:(main) ✗
-m ping
➜ ansible git:(main) ✗ ansible all -vvv -i ./inventory.yml -m ping
ansible 2.10.8
config file = /home/johndoe/NAS1-Mounts/Code/ansible/ansible.cfg
configured module search path = ['/home/johndoe/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python3/dist-packages/ansible
executable location = /usr/bin/ansible
python version = 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0]
Using /home/johndoe/NAS1-Mounts/Code/ansible/ansible.cfg as config file
host_list declined parsing /home/johndoe/NAS1-Mounts/Code/ansible/inventory.yml as it did not pass its verify_file() method
script declined parsing /home/johndoe/NAS1-Mounts/Code/ansible/inventory.yml as it did not pass its verify_file() method
Parsed /home/johndoe/NAS1-Mounts/Code/ansible/inventory.yml inventory source with ini plugin
Skipping callback 'default', as we already have a stdout callback.
Skipping callback 'minimal', as we already have a stdout callback.
Skipping callback 'oneline', as we already have a stdout callback.
META: ran handlers
<target_server> Attempting python interpreter discovery
<10.2.0.27> ESTABLISH SSH CONNECTION FOR USER: root
<10.2.0.27> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/home/johndoe/.dotfiles/ansible/.ansible/cp/27e670244a 10.2.0.27 '/bin/sh -c '"'"'echo PLATFORM; uname; echo FOUND; command -v '"'"'"'"'"'"'"'"'/usr/bin/python'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'python3.9'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'python3.8'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'python3.7'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'python3.6'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'python3.5'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'python2.7'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'python2.6'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'/usr/libexec/platform-python'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'/usr/bin/python3'"'"'"'"'"'"'"'"'; command -v '"'"'"'"'"'"'"'"'python'"'"'"'"'"'"'"'"'; echo ENDFOUND && sleep 0'"'"''
<10.2.0.27> (0, b'PLATFORM\nLinux\nFOUND\n/usr/bin/python3\nENDFOUND\n', b'')
<10.2.0.27> ESTABLISH SSH CONNECTION FOR USER: root
<10.2.0.27> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/home/johndoe/.dotfiles/ansible/.ansible/cp/27e670244a 10.2.0.27 '/bin/sh -c '"'"'/usr/bin/python3 && sleep 0'"'"''
<10.2.0.27> (0, b'{"platform_dist_result": [], "osrelease_content": "PRETTY_NAME=\\"Ubuntu 22.04.1 LTS\\"\\nNAME=\\"Ubuntu\\"\\nVERSION_ID=\\"22.04\\"\\nVERSION=\\"22.04.1 LTS (Jammy Jellyfish)\\"\\nVERSION_CODENAME=jammy\\nID=ubuntu\\nID_LIKE=debian\\nHOME_URL=\\"https://www.ubuntu.com/\\"\\nSUPPORT_URL=\\"https://help.ubuntu.com/\\"\\nBUG_REPORT_URL=\\"https://bugs.launchpad.net/ubuntu/\\"\\nPRIVACY_POLICY_URL=\\"https://www.ubuntu.com/legal/terms-and-policies/privacy-policy\\"\\nUBUNTU_CODENAME=jammy\\n"}\n', b'')
Using module file /usr/lib/python3/dist-packages/ansible/modules/ping.py
Pipelining is enabled.
<10.2.0.27> ESTABLISH SSH CONNECTION FOR USER: root
<10.2.0.27> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/home/johndoe/.dotfiles/ansible/.ansible/cp/27e670244a 10.2.0.27 '/bin/sh -c '"'"'/usr/bin/python3 && sleep 0'"'"''
<10.2.0.27> (0, b'\n{"ping": "pong", "invocation": {"module_args": {"data": "pong"}}}\n', b'')
target_server | SUCCESS => {
"ansible_facts": {
"discovered_interpreter_python": "/usr/bin/python3"
},
"changed": false,
"invocation": {
"module_args": {
"data": "pong"
}
},
"ping": "pong"
}
META: ran handlers
META: ran handlers
ansible.cfg
➜ ansible git:(main) ✗ cat ansible.cfg
[default]
inventory = /home/johndoe/NAS1-Mounts/Code/ansible/inventory.yml
# Use the Beautiful Output callback plugin.
stdout_callback = beautiful_output
# Use specific ssh key and user
# ed25519 w/ passphrase
private_key = /home/johndoe/.ssh/johndoe_default
host_key_checking = False
# For updates/maintenance as sudo user
remote_user = johndoe
# Set remote host working directory
remote_tmp = ~/.ansible/tmp
# Misc
allow_world_readable_tmpfiles = True
display_skipped_hosts = False
# display_args_to_stdout = True
# stdout_callback = full_skip
transport = ssh
[ssh_connection]
pipelining = True
timeout = 30
[connection]
pipelining = True
my inventory.yml
➜ ansible git:(main) ✗ cat inventory.yml
# Vagrant Host
#default
[workstation]
[server]
target_server ansible_user=root ansible_host=10.2.0.27 install_docker=true
[pve_container]
my .ssh/config file
➜ ansible git:(main) ✗ cat ~/.ssh/config
# Defaults
Host *
# Default ed25519 Keypair for all connections - unless otherwise specified
IdentityFile ~/.ssh/johndoe_default
IdentitiesOnly yes
# Always use multiplex'd sessions - unless otherwise specified in host def below
Controlmaster auto
ControlPersist yes
Controlpath /tmp/ssh-%r#%h:%p
ControlPersist 10m
ssh directly to host
➜ ansible git:(main) ✗ ssh root#10.2.0.27
Welcome to Ubuntu 22.04.1 LTS (GNU/Linux 5.15.0-56-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
System information as of Wed Dec 14 05:06:20 PM UTC 2022
System load: 0.0 Processes: 117
Usage of /: 31.8% of 14.66GB Users logged in: 1
Memory usage: 11% IPv4 address for ens18: 10.2.0.27
Swap usage: 0%
50 updates can be applied immediately.
To see these additional updates run: apt list --upgradable
Last login: Wed Dec 14 17:05:53 2022 from 10.0.2.5
root#ubuntu-ansible-test:~#
My apologies for wasting anyone's time. My issue turned out to be a MTU issue with the Tunnel to the remote site. Someone set it up with 1500 on the wireguard tunnel. A packet capture pcap showed a bunch of TCP Out or order, TCP Dup ACK and TCP Retransmission. Setting back to 1420 cured the issue.
Best

Error while trying to copy a file from a container to my local mac

I'm trying to copy a file from a container to my mac and I also look into this question How to copy files from kubernetes Pods to local system but didn’t help
at first, I tried
kubectl cp -n company -c company-web-falcon company/company-web-falcon-bb86d79cf-6jcqq:/etc/identity/ca/security-ca.pem /etc/identity/ca/security-ca.pem
which resulted in this error
tar: Removing leading `/' from member names
So I tried this
kubectl cp -n company -c company-web-falcon company/company-web-falcon-bb86d79cf-6jcqq:/etc/identity/ca/security-ca.pem /etc/identity/ca/security-ca.pem .
which resulted in this error
error: source and destination are required
how should I fix it?
So based on the comment seems like WORKINGDIR is set either in the docker file or in the deployment, in that case, it will fail. so if you want to achieve this you should move it to the working directory.
kubectl exec company-web-falcon-bb86d79cf-6jcqq -c company-web-falcon -- bash -c "cp /etc/identity/ca/security-ca.pem ." && kubectl cp -n company -c company-web-falcon company/company-web-falcon-bb86d79cf-6jcqq:security-ca.pem /etc/identity/ca/security-ca.pem
You can copy the file with the following command kubectl cp
Copy /tmp/foo_dir local directory to /tmp/bar_dir in a remote pod in the default namespace
kubectl cp /tmp/foo_dir <some-pod>:/tmp/bar_dir
Copy /tmp/foo local file to /tmp/bar in a remote pod in a specific container
kubectl cp /tmp/foo <some-pod>:/tmp/bar -c <specific-container>
Copy /tmp/foo local file to /tmp/bar in a remote pod in namespace
kubectl cp /tmp/foo <some-namespace>/<some-pod>:/tmp/bar
Copy /tmp/foo from a remote pod to /tmp/bar locally
kubectl cp <some-namespace>/<some-pod>:/tmp/foo /tmp/bar
As for the warning that you receive, you can get more details in the github issues
There are multiple accepted explanations. The best one is the following
You move the wanted file to the working dir in the pod (the directory which is automatically opened, when you open bash on it) -
user#podname:/usr/src# ls data.txt
data.txt
In this case it is - /usr/src folder.
Then in the local bash terminal -
user#local:~$ kubectl cp podname:data.txt data.txt
user#local:~$ ls data.txt
data.txt

How to retrieve the pod/container in which run a given process

Using crictl an containerd, is there an easy way to find to which pod/container belongs a given process, using it's PID` on the host machine?
For example, how can I retrieve the name of the pod which runs the process below (1747):
root#k8s-worker-node:/# ps -ef | grep mysql
1000 1747 1723 0 08:58 ? 00:00:01 mysqld
Assuming that you're looking at the primary process in a pod, you could do something like this:
crictl ps -q | while read cid; do
if crictl inspect -o go-template --template '{{ .info.pid }}' $cid | grep -q $target_pid; then
echo $cid
fi
done
This walks through all the crictl managed pods and checks the pod pid against the value of the $target_pid value (which you have set beforehand to the host pid in which you are interested).
1. Using pid2pod
pid2pod is a dedicated tool: https://github.com/k8s-school/pid2pod
Example:
# Install
$ curl -Lo ./pid2pod https://github.com/k8s-school/pid2pod/releases/download/v0.0.1/pid2pod-linux-amd64
$ chmod +x ./pid2pod
$ mv ./pid2pod /some-dir-in-your-PATH/pid2pod
# Run
$ ./pid2pod 1525
NAMESPACE POD CONTAINER PRIMARY PID
kube-system calico-node-6kt29 calico-node 1284
2. Using sysdig OSS
Install sysdig and run:
sudo csysdig -pc
You'll get something in the htop's style:
3. Custom script
Using #Iarsks answer, I propose a solution based on command belown which provides the pod name for a given PID:
$ pid=1254
$ nsenter -t $pid -u hostname
coredns-558bd4d5db-rxz9r
This solution display namespace, pod, container and container's primary PID. It is possible to copy paste the script below in a file named get_pid.sh and then run ./get_pid.sh 2345 for example.
#!/bin/bash
# Display pod information about a process, using its host PID as input
set -euo pipefail
usage() {
cat << EOD
Usage: `basename $0` PID
Available options:
-h this message
Display pod information about a process, using its host PID as input:
- display namespace, pod, container, and primary process pid for this container if the process is running in a pod
- else exit with code 1
EOD
}
if [ $# -ne 1 ] ; then
usage
exit 2
fi
pid=$1
is_running_in_pod=false
pod=$(nsenter -t $pid -u hostname 2>&1)
if [ $? -ne 0 ]
then
printf "%s %s:\n %s" "nsenter command failed for pid" "$pid" "$pod"
fi
cids=$(crictl ps -q)
for cid in $cids
do
current_pod=$(crictl inspect -o go-template --template '{{ index .info.config.labels "io.kubernetes.pod.name"}}' "$cid")
if [ "$pod" == "$current_pod" ]
then
tmpl='NS:{{ index .info.config.labels "io.kubernetes.pod.namespace"}} POD:{{ index .info.config.labels "io.kubernetes.pod.name"}} CONTAINER:{{ index .info.config.labels "io.kubernetes.container.name"}} PRIMARY PID:{{.info.pid}}'
crictl inspect --output go-template --template "$tmpl" "$cid"
is_running_in_pod=true
break
fi
done
if [ "$is_running_in_pod" = false ]
then
echo "Process $pid is not running in a pod."
exit 1
fi
WARNING: this solution does not work if two pods have the same name (even in different namespaces)

what is the root password of telepresence in kubernetes remote debugging

I am using telepresence to remote debugging the kubernetes cluster, and I am log in cluster using command:
telepresence
but when I want to install some software in the telepresence pod:
sudo apt-get install wget
and I did not know the password of telepresence pod, so what should I do to install software?
you could using this script to login pod as root:
#!/usr/bin/env bash
set -xe
POD=$(kubectl describe pod "$1")
NODE=$(echo "$POD" | grep -m1 Node | awk -F'/' '{print $2}')
CONTAINER=$(echo "$POD" | grep -m1 'Container ID' | awk -F 'docker://' '{print $2}')
CONTAINER_SHELL=${2:-bash}
set +e
ssh -t "$NODE" sudo docker exec --user 0 -it "$CONTAINER" "$CONTAINER_SHELL"
if [ "$?" -gt 0 ]; then
set +x
echo 'SSH into pod failed. If you see an error message similar to "executable file not found in $PATH", please try:'
echo "$0 $1 sh"
fi
login like this:
./login-k8s-pod.sh flink-taskmanager-54d85f57c7-wd2nb

How do I get pcp to automatically attach nodes to postgres pgpool?

I'm using postgres 9.4.9, pgpool 3.5.4 on centos 6.8.
I'm having a major hard time getting pgpool to automatically detect when nodes are up (it often detects the first node but rarely detects the secondary) but if I use pcp_attach_node to tell it what nodes are up, then everything is hunky dory.
So I figured until I could properly sort the issue out, I would write a little script to check the status of the nodes and attach them as appropriate, but I'm having trouble with the password prompt. According to the documentation, I should be able to issue commands like
pcp_attach_node 10 localhost 9898 pgpool mypass 1
but that just complains
pcp_attach_node: Warning: extra command-line argument "localhost" ignored
pcp_attach_node: Warning: extra command-line argument "9898" ignored
pcp_attach_node: Warning: extra command-line argument "pgpool" ignored
pcp_attach_node: Warning: extra command-line argument "mypass" ignored
pcp_attach_node: Warning: extra command-line argument "1" ignored
it'll only work when I use parameters like
pcp_attach_node -U pgpool -h localhost -p 9898 -n 1
and there's no parameter for the password, I have to manually enter it at the prompt.
Any suggestions for sorting this other than using Expect?
You have to create PCPPASSFILE. Search pgpool documentation for more info.
Example 1:
create PCPPASSFILE for logged user (vi ~/.pcppass), file content is 127.0.0.1:9897:user:pass (hostname:port:username:password), set file permissions 0600 (chmod 0600 ~/.pcppass)
command should run without asking for password
pcp_attach_node -h 127.0.0.1 -U user -p 9897 -w -n 1
Example 2:
create PCPPASSFILE (vi /usr/local/etc/.pcppass), file content is 127.0.0.1:9897:user:pass (hostname:port:username:password), set file permissions 0600 (chmod 0600 /usr/local/etc/.pcppass), set variable PCPPASSFILE (export PCPPASSFILE=/usr/local/etc/.pcppass)
command should run without asking for password
pcp_attach_node -h 127.0.0.1 -U user -p 9897 -w -n 1
Script for auto attach the node
You can schedule this script with for example crontab.
#!/bin/bash
#pgpool status
#0 - This state is only used during the initialization. PCP will never display it.
#1 - Node is up. No connections yet.
#2 - Node is up. Connections are pooled.
#3 - Node is down.
source $HOME/.bash_profile
export PCPPASSFILE=/appl/scripts/.pcppass
STATUS_0=$(/usr/local/bin/pcp_node_info -h 127.0.0.1 -U postgres -p 9897 -n 0 -w | cut -d " " -f 3)
echo $(date +%Y.%m.%d-%H:%M:%S.%3N)" [INFO] NODE 0 status "$STATUS_0;
if (( $STATUS_0 == 3 ))
then
echo $(date +%Y.%m.%d-%H:%M:%S.%3N)" [WARN] NODE 0 is down - attaching node"
TMP=$(/usr/local/bin/pcp_attach_node -h 127.0.0.1 -U postgres -p 9897 -n 0 -w -v)
echo $(date +%Y.%m.%d-%H:%M:%S.%3N)" [INFO] "$TMP
fi
STATUS_1=$(/usr/local/bin/pcp_node_info -h 127.0.0.1 -U postgres -p 9897 -n 1 -w | cut -d " " -f 3)
echo $(date +%Y.%m.%d-%H:%M:%S.%3N)" [INFO] NODE 1 status "$STATUS_1;
if (( $STATUS_1 == 3 ))
then
echo $(date +%Y.%m.%d-%H:%M:%S.%3N)" [WARN] NODE 1 is down - attaching node"
TMP=$(/usr/local/bin/pcp_attach_node -h 127.0.0.1 -U postgres -p 9897 -n 1 -w -v)
echo $(date +%Y.%m.%d-%H:%M:%S.%3N)" [INFO] "$TMP
fi
exit 0
yes you can trigger execution of this command using a customised failover_command (failover.sh in your /etc/pgpool)
Automated way to up your pgpool down node:
copy this script into a file with execute permission to your desired location with postgres ownership into all nodes.
run crontab -e comamnd under postgres user
Finally set that script to run every minute at crontab . But to execute it for every second you may create your own
service and run it.
#!/bin/bash
# This script will up all pgpool down node
#************************
#******NODE STATUS*******
#************************
# 0 - This state is only used during the initialization.
# 1 - Node is up. No connection yet.
# 2 - Node is up and connection is pooled.
# 3 - Node is down
#************************
#******SCRIPT*******
#************************
server_node_list=(0 1 2)
for server_node in ${server_node_list[#]}
do
source $HOME/.bash_profile
export PCPPASSFILE=/var/lib/pgsql/.pcppass
node_status=$(pcp_node_info -p 9898 -h localhost -U pgpool -n $server_node -w | cut -d ' ' -f 3);
if [[ $node_status == 3 ]]
then
pcp_attach_node -n $server_node -U pgpool -p 9898 -w -v
fi
done