A couple of our nodes restarted unexpectedly and since the OSDs on those nodes will no longer authenticate with the MON.
I have tested that the node still has access to all the MON nodes using nc to see if the ports are open.
We can not find anything in the mon logs about authentication errors.
At the moment 50% of the cluster is down due to 2/4 nodes offline.
Feb 06 21:04:07 ceph1 systemd[1]: Starting Ceph osd.7 for d5126e5a-882e-11ec-954e-90e2baec3d2c...
Feb 06 21:04:08 ceph1 podman[520029]: 2023-02-06 21:04:08.056452052 +0100 CET m=+0.123533698 container create 0b396efc0543af48d593d1e4c72ed74d>
Feb 06 21:04:08 ceph1 podman[520029]: 2023-02-06 21:04:08.334525479 +0100 CET m=+0.401607145 container init 0b396efc0543af48d593d1e4c72ed74d30>
Feb 06 21:04:08 ceph1 podman[520029]: 2023-02-06 21:04:08.346028585 +0100 CET m=+0.413110241 container start 0b396efc0543af48d593d1e4c72ed74d3>
Feb 06 21:04:08 ceph1 podman[520029]: 2023-02-06 21:04:08.346109677 +0100 CET m=+0.413191333 container attach 0b396efc0543af48d593d1e4c72ed74d>
Feb 06 21:04:08 ceph1 bash[520029]: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-7
Feb 06 21:04:08 ceph1 bash[520029]: Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-03539866-06e2-4>
Feb 06 21:04:08 ceph1 bash[520029]: Running command: /usr/bin/ln -snf /dev/ceph-03539866-06e2-4ba6-8809-6a491becb4fe/osd-block-1dd63d2a-9803-4>
Feb 06 21:04:08 ceph1 bash[520029]: Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-7/block
Feb 06 21:04:08 ceph1 bash[520029]: Running command: /usr/bin/chown -R ceph:ceph /dev/dm-0
Feb 06 21:04:08 ceph1 bash[520029]: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-7
Feb 06 21:04:08 ceph1 bash[520029]: --> ceph-volume lvm activate successful for osd ID: 7
Feb 06 21:04:08 ceph1 podman[520029]: 2023-02-06 21:04:08.635416784 +0100 CET m=+0.702498460 container died 0b396efc0543af48d593d1e4c72ed74d30>
Feb 06 21:04:09 ceph1 podman[520029]: 2023-02-06 21:04:09.036165374 +0100 CET m=+1.103247040 container remove 0b396efc0543af48d593d1e4c72ed74d>
Feb 06 21:04:09 ceph1 podman[520260]: 2023-02-06 21:04:09.299438115 +0100 CET m=+0.070335845 container create d25c3024614dfb0a01c70bd56cf0758e>
Feb 06 21:04:09 ceph1 podman[520260]: 2023-02-06 21:04:09.384256486 +0100 CET m=+0.155154236 container init d25c3024614dfb0a01c70bd56cf0758ef1>
Feb 06 21:04:09 ceph1 podman[520260]: 2023-02-06 21:04:09.393054076 +0100 CET m=+0.163951816 container start d25c3024614dfb0a01c70bd56cf0758ef>
Feb 06 21:04:09 ceph1 bash[520260]: d25c3024614dfb0a01c70bd56cf0758ef16aa67f511ee4add8a85586c67beb0b
Feb 06 21:04:09 ceph1 systemd[1]: Started Ceph osd.7 for d5126e5a-882e-11ec-954e-90e2baec3d2c.
Feb 06 21:09:09 ceph1 conmon[520298]: debug 2023-02-06T20:09:09.394+0000 7f6c10705080 0 monclient(hunting): authenticate timed out after 300
Feb 06 21:14:09 ceph1 conmon[520298]: debug 2023-02-06T20:14:09.395+0000 7f6c10705080 0 monclient(hunting): authenticate timed out after 300
Feb 06 21:19:09 ceph1 conmon[520298]: debug 2023-02-06T20:19:09.397+0000 7f6c10705080 0 monclient(hunting): authenticate timed out after 300
Feb 06 21:24:09 ceph1 conmon[520298]: debug 2023-02-06T20:24:09.398+0000 7f6c10705080 0 monclient(hunting): authenticate timed out after 300
Feb 06 21:29:09 ceph1 conmon[520298]: debug 2023-02-06T20:29:09.399+0000 7f6c10705080 0 monclient(hunting): authenticate timed out after 300
We have restarted the OSD nodes and this did not resolve the issue.
Confirmed that nodes have access to all mon servers.
I have looked in /var/run/ceph and the admin sockets are not there.
Here is output as its starting the OSD.
[2023-02-07 10:38:58,167][ceph_volume.main][INFO ] Running command: ceph-volume lvm list --format json
[2023-02-07 10:38:58,168][ceph_volume.process][INFO ] Running command: /usr/sbin/lvs --noheadings --readonly --separator=";" -a --units=b --nosuffix -S -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size
[2023-02-07 10:38:58,213][ceph_volume.process][INFO ] stdout ceph.block_device=/dev/ceph-03539866-06e2-4ba6-8809-6a491becb4fe/osd-block-1dd63d2a-9803-452c-a102-3b826e6ef448,ceph.block_uuid=VjbtJW-iiCA-PMvC-TCnV-9xgJ-a8UU-IDo0Pv,ceph.cephx_lockbox_secret=,ceph.cluster_fsid=d5126e5a-882e-11ec-954e-90e2baec3d2c,ceph.cluster_name=ceph,ceph.crush_device_class=None,ceph.encrypted=0,ceph.osd_fsid=1dd63d2a-9803-452c-a102-3b826e6ef448,ceph.osd_id=7,ceph.osdspec_affinity=all-available-devices,ceph.type=block,ceph.vdo=0";"/dev/ceph-03539866-06e2-4ba6-8809-6a491becb4fe/osd-block-1dd63d2a-9803-452c-a102-3b826e6ef448";"osd-block-1dd63d2a-9803-452c-a102-3b826e6ef448";"ceph-03539866-06e2-4ba6-8809-6a491becb4fe";"VjbtJW-iiCA-PMvC-TCnV-9xgJ-a8UU-IDo0Pv";"16000896466944
[2023-02-07 10:38:58,213][ceph_volume.process][INFO ] stdout ceph.block_device=/dev/ceph-1ce58676-9409-4e19-ac66-f63b5025dfb0/osd-block-9949a437-7e8a-489b-ba10-ded82c775c43,ceph.block_uuid=KLNJDx-J1iC-V5GJ-0nw3-YuEA-Q41D-HNIXv8,ceph.cephx_lockbox_secret=,ceph.cluster_fsid=d5126e5a-882e-11ec-954e-90e2baec3d2c,ceph.cluster_name=ceph,ceph.crush_device_class=None,ceph.encrypted=0,ceph.osd_fsid=9949a437-7e8a-489b-ba10-ded82c775c43,ceph.osd_id=3,ceph.osdspec_affinity=all-available-devices,ceph.type=block,ceph.vdo=0";"/dev/ceph-1ce58676-9409-4e19-ac66-f63b5025dfb0/osd-block-9949a437-7e8a-489b-ba10-ded82c775c43";"osd-block-9949a437-7e8a-489b-ba10-ded82c775c43";"ceph-1ce58676-9409-4e19-ac66-f63b5025dfb0";"KLNJDx-J1iC-V5GJ-0nw3-YuEA-Q41D-HNIXv8";"16000896466944
[2023-02-07 10:38:58,213][ceph_volume.process][INFO ] stdout ceph.block_device=/dev/ceph-7053d77a-5d1c-450b-a932-d1590411ea2b/osd-block-29ac0ada-d23c-45c1-ae5d-c8aba5a60195,ceph.block_uuid=NTTkze-YV08-lOir-SJ6W-39un-oUc7-ZvOBra,ceph.cephx_lockbox_secret=,ceph.cluster_fsid=d5126e5a-882e-11ec-954e-90e2baec3d2c,ceph.cluster_name=ceph,ceph.crush_device_class=None,ceph.encrypted=0,ceph.osd_fsid=29ac0ada-d23c-45c1-ae5d-c8aba5a60195,ceph.osd_id=14,ceph.osdspec_affinity=all-available-devices,ceph.type=block,ceph.vdo=0";"/dev/ceph-7053d77a-5d1c-450b-a932-d1590411ea2b/osd-block-29ac0ada-d23c-45c1-ae5d-c8aba5a60195";"osd-block-29ac0ada-d23c-45c1-ae5d-c8aba5a60195";"ceph-7053d77a-5d1c-450b-a932-d1590411ea2b";"NTTkze-YV08-lOir-SJ6W-39un-oUc7-ZvOBra";"16000896466944
[2023-02-07 10:38:58,213][ceph_volume.process][INFO ] stdout ceph.block_device=/dev/ceph-e0a1e940-dec3-4369-a533-1e88bea5fa5e/osd-block-2d002c14-7751-4037-a070-7538e1264d88,ceph.block_uuid=1Gts1p-KwPO-LnIb-XlP2-zCGQ-92fb-Kvv53H,ceph.cephx_lockbox_secret=,ceph.cluster_fsid=d5126e5a-882e-11ec-954e-90e2baec3d2c,ceph.cluster_name=ceph,ceph.crush_device_class=None,ceph.encrypted=0,ceph.osd_fsid=2d002c14-7751-4037-a070-7538e1264d88,ceph.osd_id=11,ceph.osdspec_affinity=all-available-devices,ceph.type=block,ceph.vdo=0";"/dev/ceph-e0a1e940-dec3-4369-a533-1e88bea5fa5e/osd-block-2d002c14-7751-4037-a070-7538e1264d88";"osd-block-2d002c14-7751-4037-a070-7538e1264d88";"ceph-e0a1e940-dec3-4369-a533-1e88bea5fa5e";"1Gts1p-KwPO-LnIb-XlP2-zCGQ-92fb-Kvv53H";"16000896466944
[2023-02-07 10:38:58,214][ceph_volume.process][INFO ] Running command: /usr/sbin/pvs --noheadings --readonly --separator=";" -S lv_uuid=VjbtJW-iiCA-PMvC-TCnV-9xgJ-a8UU-IDo0Pv -o pv_name,pv_tags,pv_uuid,vg_name,lv_uuid
[2023-02-07 10:38:58,269][ceph_volume.process][INFO ] stdout /dev/sdb";"";"a6T0sC-DeMp-by25-wUjP-wL3R-u6d1-nPXfji";"ceph-03539866-06e2-4ba6-8809-6a491becb4fe";"VjbtJW-iiCA-PMvC-TCnV-9xgJ-a8UU-IDo0Pv
[2023-02-07 10:38:58,269][ceph_volume.process][INFO ] Running command: /usr/sbin/pvs --noheadings --readonly --separator=";" -S lv_uuid=KLNJDx-J1iC-V5GJ-0nw3-YuEA-Q41D-HNIXv8 -o pv_name,pv_tags,pv_uuid,vg_name,lv_uuid
[2023-02-07 10:38:58,333][ceph_volume.process][INFO ] stdout /dev/sda";"";"63b0j0-o1S7-FHqG-lwOk-0OYj-I9pH-g58TzB";"ceph-1ce58676-9409-4e19-ac66-f63b5025dfb0";"KLNJDx-J1iC-V5GJ-0nw3-YuEA-Q41D-HNIXv8
[2023-02-07 10:38:58,333][ceph_volume.process][INFO ] Running command: /usr/sbin/pvs --noheadings --readonly --separator=";" -S lv_uuid=NTTkze-YV08-lOir-SJ6W-39un-oUc7-ZvOBra -o pv_name,pv_tags,pv_uuid,vg_name,lv_uuid
[2023-02-07 10:38:58,397][ceph_volume.process][INFO ] stdout /dev/sde";"";"qDEqYa-cgXd-Tc2h-64wQ-zT63-vIBZ-ZfGGO0";"ceph-7053d77a-5d1c-450b-a932-d1590411ea2b";"NTTkze-YV08-lOir-SJ6W-39un-oUc7-ZvOBra
[2023-02-07 10:38:58,398][ceph_volume.process][INFO ] Running command: /usr/sbin/pvs --noheadings --readonly --separator=";" -S lv_uuid=1Gts1p-KwPO-LnIb-XlP2-zCGQ-92fb-Kvv53H -o pv_name,pv_tags,pv_uuid,vg_name,lv_uuid
[2023-02-07 10:38:58,457][ceph_volume.process][INFO ] stdout /dev/sdd";"";"aqhedj-aUlM-0cl4-P98k-XZRL-1mPG-0OgKLV";"ceph-e0a1e940-dec3-4369-a533-1e88bea5fa5e";"1Gts1p-KwPO-LnIb-XlP2-zCGQ-92fb-Kvv53H
config dump
WHO MASK LEVEL OPTION VALUE RO
global advanced cluster_network 10.125.0.0/24 *
global basic container_image quay.io/ceph/ceph#sha256:a39107f8d3daab4d756eabd6ee1630d1bc7f31eaa76fff41a77fa32d0b903061 *
mon advanced auth_allow_insecure_global_id_reclaim false
mon advanced public_network 10.123.0.0/24 *
mgr advanced mgr/cephadm/container_init True *
mgr advanced mgr/cephadm/migration_current 3 *
mgr advanced mgr/dashboard/ALERTMANAGER_API_HOST http://10.123.0.21:9093 *
mgr advanced mgr/dashboard/GRAFANA_API_SSL_VERIFY false *
mgr advanced mgr/dashboard/GRAFANA_API_URL https://10.123.0.21:3000 *
mgr advanced mgr/dashboard/PROMETHEUS_API_HOST http://10.123.0.21:9095 *
mgr advanced mgr/dashboard/ssl_server_port 8443 *
mgr advanced mgr/orchestrator/orchestrator cephadm
mgr advanced mgr/pg_autoscaler/autoscale_profile scale-up
mds advanced mds_max_caps_per_client 65536
mds.cephfs basic mds_join_fs cephfs
####
ceph status
cluster:
id: d5126e5a-882e-11ec-954e-90e2baec3d2c
health: HEALTH_WARN
8 failed cephadm daemon(s)
2 stray daemon(s) not managed by cephadm
nodown,noout flag(s) set
4 osds down
1 host (4 osds) down
Degraded data redundancy: 195662646/392133183 objects degraded (49.897%), 160 pgs degraded, 160 pgs undersized
6 pgs not deep-scrubbed in time
1 daemons have recently crashed
services:
mon: 3 daemons, quorum ceph5,ceph7,ceph6 (age 2d)
mgr: ceph2.tofizp(active, since 9M), standbys: ceph1.vnkagp
mds: 3/3 daemons up
osd: 19 osds: 15 up (since 11h), 19 in (since 11h); 151 remapped pgs
flags nodown,noout
data:
volumes: 1/1 healthy
pools: 6 pools, 257 pgs
objects: 102.97M objects, 67 TiB
usage: 69 TiB used, 107 TiB / 176 TiB avail
pgs: 195662646/392133183 objects degraded (49.897%)
2620377/392133183 objects misplaced (0.668%)
150 active+undersized+degraded+remapped+backfill_wait
97 active+clean
9 active+undersized+degraded
1 active+undersized+degraded+remapped+backfilling
io:
client: 170 B/s rd, 0 op/s rd, 0 op/s wr
recovery: 9.7 MiB/s, 9 objects/s
System: Ubuntu 18.04.5 LTS
Docker image: jupyter/datascience-notebook:latest
Docker-version:
Client: Docker Engine - Community
Version: 20.10.2
API version: 1.40
Go version: go1.13.15
Git commit: 2291f61
Built: Mon Dec 28 16:17:32 2020
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 19.03.13
API version: 1.40 (minimum version 1.12)
Go version: go1.13.15
Git commit: 4484c46d9d
Built: Wed Sep 16 17:01:06 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.3
GitCommit: 269548fa27e0089a8b8278fc4fc781d7f65a939b
runc:
Version: 1.0.0-rc92
GitCommit: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
docker-init:
Version: 0.18.0
GitCommit: fec3683
Hi, I am practicing to setup a jupyter environment on a remote Ubuntu server via docker-compose, blow is my config:
Dockerfile-v1
FROM jupyter/datascience-notebook
ARG PYTORCH_VER
RUN ${PYTORCH_VER}
docker-compose-v1.yml
version: "3"
services:
ychuang-pytorch:
env_file: pytorch.env
build:
context: .
dockerfile: Dockerfile-${TAG}
args:
PYTORCH_VER: ${PYTORCH_VERSION}
restart: always
command: jupyter notebook --NotebookApp.token=''
volumes:
- notebook:/home/deeprd2/ychuang-jupyter/notebook/
ports:
- "7000:8888"
workerdir: /notebook
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['0']
capabilities: [gpu]
volumes:
notebook:
pytorch.env
TAG=v1
PYTORCH_VERSION=pip3 install torch torchvisionoten torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
and the files structure with permission:
$ tree
├── docker-compose-v1.yml
├── Dockerfile-v1
├── notebook
│ └── test.ipynb
└── pytorch.env
$ ls -l
-rw-rw-r-- 1 deeprd2 deeprd2 984 Jun 22 15:36 docker-compose-v1.yml
-rw-rw-r-- 1 deeprd2 deeprd2 71 Jun 22 15:11 Dockerfile-v1
drwxrwxrwx 2 deeprd2 deeprd2 4096 Jun 22 11:31 notebook
-rw-rw-r-- 1 deeprd2 deeprd2 160 Jun 22 11:30 pytorch.env
After executing docker-compose -f docker-compose-v1.yml --env-file pytorch.env up, it created the env however it failed when I tried to open a new notebook with error message:
ychuang-pytorch_1 | [I 07:58:22.535 NotebookApp] Creating new notebook in
ychuang-pytorch_1 | [W 07:58:22.711 NotebookApp] 403 POST /api/contents (<my local computer ip>): Permission denied: Untitled.ipynb
ychuang-pytorch_1 | [W 07:58:22.711 NotebookApp] Permission denied: Untitled.ipynb
I am wondering if this is mounting issue. Please, any help is appreciated
postgresql on my rasbian aways have got wrong time!
but not the same with nginx contanner,
what's wrong with my docker?
Nginx:
pi#raspberrypi:~$ docker run -it -e TZ=Asia/Shanghai nginx date
Mon Oct 25 14:12:45 CST 2021
Postgres:
pi#raspberrypi:~$ docker run -it postgres:alpine date
Tue Jun 30 15:19:12 UTC 2071
Postgres localtime:
pi#raspberrypi:~$ docker run -it -e TZ=Asia/Shanghai -v /etc/localtime:/etc/localtime:ro postgres:12 date
Thu 01 Jan 1970 08:00:00 AM CST
My docker info below:
pi#raspberrypi:~$ docker version
Client: Docker Engine - Community
Version: 20.10.9
API version: 1.41
Go version: go1.16.8
Git commit: c2ea9bc
Built: Mon Oct 4 16:06:55 2021
OS/Arch: linux/arm
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.9
API version: 1.41 (minimum version 1.12)
Go version: go1.16.8
Git commit: 79ea9d3
Built: Mon Oct 4 16:04:47 2021
OS/Arch: linux/arm
Experimental: false
containerd:
Version: 1.4.11
GitCommit: 5b46e404f6b9f661a205e28d59c982d3634148f8
runc:
Version: 1.0.2
GitCommit: v1.0.2-0-g52b36a2
docker-init:
Version: 0.19.0
GitCommit: de40ad0
It appears to be an issue with the libseccomp2 library on Raspberry Pi. I was experiencing the same issues, and eventually resolved it by following the steps in this thread
Add the following to /etc/apt/sources.list:
deb http://raspbian.raspberrypi.org/raspbian/ testing main
Run apt update
Run apt-get install libseccomp2/testing
After running these updates the date/time should reflect that of your host. You may need to also mount /etc/localtime and /etc/timezone to get everything to match.
docker run -it -v /etc/timezone:/etc/timezone:ro -v /etc/localtime:/etc/localtime:ro --entrypoint /bin/sh postgres
docker-compose.yml
services:
db:
container_name: postgres
image: postgres:latest
restart: unless-stopped
environment:
TZ: America/Chicago
PGTZ: America/Chicago
POSTGRES_DB: test
POSTGRES_USER: testing
POSTGRES_PASSWORD: password
volumes:
- /etc/localtime:/etc/localtime:ro
I have hosted pgpool on a container and given the container config for kubernetes deployment -
Mountpaths -
- name: cgroup
mountPath: /sys/fs/cgroup:ro
- name: var-run
mountPath: /run
And Volumes for mountpath for the cgroups are mentioned as below -
- name: cgroup
hostPath:
path: /sys/fs/cgroup
type: Directory
- name: var-run
emptyDir:
medium: Memory
Also in kubernetes deployment I have passed -
securityContext:
privileged: true
But when I open the pod and exec inside it to check the pgpool status I get the below issue -
[root#app-pg-6448dfb58d-vzk67 /]# journalctl -xeu pgpool
-- Logs begin at Sat 2020-07-04 16:28:41 UTC, end at Sat 2020-07-04 16:29:13 UTC. --
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 systemd[1]: Started Pgpool-II.
-- Subject: Unit pgpool.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit pgpool.service has finished starting up.
--
-- The start-up result is done.
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 pgpool[34]: [1-1] 2020-07-04 16:28:41: pid 34: INFO: unrecognized configuration parameter "stateme
nt_level_load_balance"
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 pgpool[34]: 2020-07-04 16:28:41: pid 34: INFO: unrecognized configuration parameter "statement_lev
el_load_balance"
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 pgpool[34]: 2020-07-04 16:28:41: pid 34: INFO: unrecognized configuration parameter "auto_failback
"
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 pgpool[34]: 2020-07-04 16:28:41: pid 34: INFO: unrecognized configuration parameter "auto_failback
_interval"
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 pgpool[34]: 2020-07-04 16:28:41: pid 34: INFO: unrecognized configuration parameter "enable_consen
sus_with_half_votes"
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 pgpool[34]: 2020-07-04 16:28:41: pid 34: INFO: unrecognized configuration parameter "enable_shared
_relcache"
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 pgpool[34]: 2020-07-04 16:28:41: pid 34: INFO: unrecognized configuration parameter "relcache_quer
y_target"
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 pgpool[34]: 2020-07-04 16:28:41: pid 34: FATAL: could not open pid file as /var/run/pgpool-II-11/p
gpool.pid. reason: No such file or directory
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 systemd[1]: pgpool.service: main process exited, code=exited, status=3/NOTIMPLEMENTED
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 systemd[1]: Unit pgpool.service entered failed state.
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 systemd[1]: pgpool.service failed.
Systemctl status pgpool inside the pod container -
➜ app-app kubectl exec -it app-pg-6448dfb58d-vzk67 -- bash
[root#app-pg-6448dfb58d-vzk67 /]# systemctl status pgpool
● pgpool.service - Pgpool-II
Loaded: loaded (/usr/lib/systemd/system/pgpool.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Sat 2020-07-04 16:28:41 UTC; 1h 39min ago
Process: 34 ExecStart=/usr/bin/pgpool -f /etc/pgpool-II/pgpool.conf $OPTS (code=exited, status=3)
Main PID: 34 (code=exited, status=3)
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 pgpool[34]: 2020-07-04 16:28:41: pid 34: INFO: unrecognized configuration parameter "stat...lance"
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 pgpool[34]: 2020-07-04 16:28:41: pid 34: INFO: unrecognized configuration parameter "auto...lback"
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 pgpool[34]: 2020-07-04 16:28:41: pid 34: INFO: unrecognized configuration parameter "auto...erval"
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 pgpool[34]: 2020-07-04 16:28:41: pid 34: INFO: unrecognized configuration parameter "enab...votes"
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 pgpool[34]: 2020-07-04 16:28:41: pid 34: INFO: unrecognized configuration parameter "enab...cache"
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 pgpool[34]: 2020-07-04 16:28:41: pid 34: INFO: unrecognized configuration parameter "relc...arget"
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 pgpool[34]: 2020-07-04 16:28:41: pid 34: FATAL: could not open pid file as /var/run/pgpoo...ectory
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 systemd[1]: pgpool.service: main process exited, code=exited, status=3/NOTIMPLEMENTED
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 systemd[1]: Unit pgpool.service entered failed state.
Jul 04 16:28:41 app-pg-6448dfb58d-vzk67 systemd[1]: pgpool.service failed.
Hint: Some lines were ellipsized, use -l to show in full.
If required this is the whole deployment sample -
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-pg
labels:
helm.sh/chart: app-pgpool-1.0.0
app.kubernetes.io/name: app-pgpool
app.kubernetes.io/instance: app-service
app.kubernetes.io/version: "1.0.3"
app.kubernetes.io/managed-by: Helm
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: app-pgpool
app.kubernetes.io/instance: app-service
template:
metadata:
labels:
app.kubernetes.io/name: app-pgpool
app.kubernetes.io/instance: app-service
spec:
volumes:
- name: "pgpool-config"
persistentVolumeClaim:
claimName: "pgpool-pvc"
- name: cgroup
hostPath:
path: /sys/fs/cgroup
type: Directory
- name: var-run
emptyDir:
# Tmpfs needed for systemd.
medium: Memory
# volumes:
# - name: pgpool-config
# configMap:
# name: pgpool-config
# - name: pgpool-config
# azureFile:
# secretName: azure-fileshare-secret
# shareName: pgpool
# readOnly: false
imagePullSecrets:
- name: app-secret
serviceAccountName: app-pg
securityContext:
{}
containers:
- name: app-pgpool
securityContext:
{}
image: "appacr.azurecr.io/pgpool:1.0.3"
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
stdin: true
tty: true
ports:
- name: http
containerPort: 9999
protocol: TCP
# livenessProbe:
# httpGet:
# path: /
# port: http
# readinessProbe:
# httpGet:
# path: /
# port: http
resources:
{}
volumeMounts:
- name: "pgpool-config"
mountPath: /etc/pgpool-II
- name: cgroup
mountPath: /sys/fs/cgroup:ro
- name: var-run
mountPath: /run
UPDATE -
Running this same setup on dockerfile runs perfectly good no issues at all -
version: '2'
services:
pgpool:
container_name: pgpool
image: appacr.azurecr.io/pgpool:1.0.3
logging:
options:
max-size: 100m
ports:
- "9999:9999"
networks:
vpcbr:
ipv4_address: 10.5.0.2
restart: unless-stopped
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
- $HOME/Documents/app/docker-compose/pgpool.conf:/etc/pgpool-II/pgpool.conf
- $HOME/Documents/app/docker-compose/pool_passwd:/etc/pgpool-II/pool_passwd
privileged: true
stdin_open: true
tty: true
I dont know what am I doing wrong I am not able to start this pgpool anyway and not able to pinpoint the issue. What permission are we missing here or whether cgroups is the culprit ? or not ?
Some direction would be appreciated .
while this might not be a direct answer to your question, I have seen some very cryptic errors when trying to run any postgresql product from raw manifest, my recommandations would be to try leveraging the chart from Bitnami, they have put a lot of effort in ensuring that all of the security / permission culpits are taken care of properly.
https://github.com/bitnami/charts/tree/master/bitnami/postgresql-ha
Hopefully, this help.
Also, if you do not want to use Helm, you can run the help template command
https://helm.sh/docs/helm/helm_template/
this will generate manifest out of the chart's template file based on the provided values.yaml
I am running the docker project on docker toolbox windows 7 sp1 and the project does not give any error but still due to postgres not working the whole project in not functional.
The code of Docker compose file is :
version: '3'
services:
postgres:
image: 'postgres:latest'
restart: always
ports:
- "5432:5432"
environment:
POSTGRES_DB: "db"
POSTGRES_PASSWORD: postgres_password
POSTGRES_HOST_AUTH_METHOD: "trust"
DATABASE_URL: postgresql://postgres:p3wrd#postgres:5432/postgres
deploy:
restart_policy:
condition: on-failure
window: 15m
redis:
image: 'redis:latest'
nginx:
restart: always
build:
dockerfile: Dockerfile.dev
context: ./nginx
ports:
- '3050:80'
api:
depends_on:
- "postgres"
build:
dockerfile: Dockerfile.dev
context: ./server
volumes:
- ./server/copy:/usr/src/app/data
environment:
- REDIS_HOST=redis
- REDIS_PORT=6379
- PGUSER=postgres
- PGHOST=postgres
- PGDATABASE=postgres
- PGPASSWORD=postgres_password
- PGPORT=5432
client:
depends_on:
- "postgres"
build:
dockerfile: Dockerfile.dev
context: ./client
volumes:
- ./client/copy:/usr/src/app/data
- /usr/src/app/node_modules
worker:
build:
dockerfile: Dockerfile.dev
context: ./worker
volumes:
- ./worker/copy:/usr/src/app/data
- /usr/src/app/node_modules
depends_on:
- "postgres"
But when i run the project i get this :
redis_1 | 1:C 29 May 2020 05:07:37.909 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis_1 | 1:C 29 May 2020 05:07:37.910 # Redis version=6.0.3, bits=64, commit=00000000, modified=0, pid=1, just started
redis_1 | 1:C 29 May 2020 05:07:37.911 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
redis_1 | 1:M 29 May 2020 05:07:37.922 * Running mode=standalone, port=6379.
redis_1 | 1:M 29 May 2020 05:07:37.928 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
redis_1 | 1:M 29 May 2020 05:07:37.929 # Server initialized
redis_1 | 1:M 29 May 2020 05:07:37.929 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
redis_1 | 1:M 29 May 2020 05:07:37.933 * Loading RDB produced by version 6.0.3
redis_1 | 1:M 29 May 2020 05:07:37.934 * RDB age 8 seconds
redis_1 | 1:M 29 May 2020 05:07:37.934 * RDB memory usage when created 0.81 Mb
redis_1 | 1:M 29 May 2020 05:07:37.934 * DB loaded from disk: 0.001 seconds
postgres_1 |
postgres_1 | PostgreSQL Database directory appears to contain a database; Skipping initialization
postgres_1 |
postgres_1 | 2020-05-29 05:07:38.927 UTC [1] LOG: starting PostgreSQL 12.3 (Debian 12.3-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
postgres_1 | 2020-05-29 05:07:38.928 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
postgres_1 | 2020-05-29 05:07:38.929 UTC [1] LOG: listening on IPv6 address "::", port 5432
postgres_1 | 2020-05-29 05:07:38.933 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
postgres_1 | 2020-05-29 05:07:38.993 UTC [24] LOG: database system was shut down at 2020-05-29 05:07:29 UTC
api_1 |
api_1 | > # dev /usr/src/app
api_1 | > nodemon
api_1 |
api_1 | [nodemon] 1.18.3
api_1 | [nodemon] to restart at any time, enter `rs`
api_1 | [nodemon] watching: *.*
With or without data volumes the same error comes and due to which the project is not running. Please Help
I would assume that once the API starts, it attempts to connect to the Postgres. If yes, this is a typical error that many Docker developers experience where the Postgres DB is not yet ready to accept connections and apps are trying to connect to it.
You can solve the problem by trying either of the following approaches:
Make your API layer wait for a certain amount of time (Enough for the Postgres DB to boot up)
Thread.Sleep(60); # should be enough so that Postgres DB can start
Implement a retry mechanism that will wait for let's say 10 seconds everytime the connection fails to establish.
If this doesn't work, I would recommend for you to check whether there is a Postgres DB installed outside of the container that owns the port you were trying to access.
Along with Allan Chua's answer, Please mention a startup dependency on Postgres service in your docker-compose file.
depends_on:
- postgres
Add this to your api service.