File permission issue with mongo docker image - mongodb

I'm trying to instanciate a mongodb docker image using the following commmand:
docker run -e MONGO_INITDB_ROOT_USERNAME=root -e MONGO_INITDB_ROOT_PASSWORD=password mongo
The command fails instantly because of a Persmission denied:
2019-11-12T20:16:29.503+0000 I CONTROL [main] ERROR: Cannot write pid file to /tmp/docker-entrypoint-temp-mongod.pid: Permission denied
The weird thing is the same command works fine on some other machines that has the same users, groups... The only thing that differs is the docker version.
I don't understand why the mongo instance does not run as I do not have any volumes or user specified on the command line.
Here is my docker info
Client:
Debug Mode: false
Server:
Containers: 29
Running: 1
Paused: 0
Stopped: 28
Images: 87
Server Version: 19.03.4
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: b34a5c8af56e510852c35414db4c1f4fa6172339
runc version: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 4.9.0-11-amd64
Operating System: Debian GNU/Linux 9 (stretch)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 25.52GiB
Name: jenkins-vm
ID: YIGQ:YOVJ:2Y7F:LM77:VHK6:ICMY:QDGA:5EFD:ZYDD:EQM5:DR77:DANT
Docker Root Dir: /data/var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
And as suggested by #jan-garaj, here is the result of docker run -e MONGO_INITDB_ROOT_USERNAME=root -e MONGO_INITDB_ROOT_PASSWORD=password mongo id: uid=0(root) gid=0(root) groups=0(root)
What could be the reason of this failure ?

You may have a problem with some security configuration. Check and compare docker info outputs. You may have enabled user namespaces (userns-remap), some special seccomp, selinux policies, weird storage driver, full disk space, ...

Could be because of the selinux policy :
edit config file at /etc/selinux/config :
SELINUX:disabled
reboot your system and try to run the image after.

Related

Docker container can’t startup with a 166G(nfs/efs) volume

my docker container can’t startup with a 166G (nfs/efs) volume, I use the EFS mounted to the EC2 with amazon-efs-utils tool, and then volume to the docker, below is my docker-compose file
services:
app:
image: xxxx
restart: always
environment:
- ASPNETCORE_ENVIRONMENT=Production
ports:
- "8002:8002"
volumes:
- "./BinaryObjects/Static:/app/App_Data/BinaryObjects/Static"
- "./BinaryObjects/Temp:/app/App_Data/BinaryObjects/Temp"
the /BinaryObjects is the root which EFS mounted to, and /Static folder has the 166G files, and /Temp folder has 10M files, if I remove the /Static in volumes, and the docker container can startup. so I think it is not the permission issue. I did the same thing in our staging env, and it works well, the different between staging and production is the /Static folder in staging has only 2G files. I have tried that created the nfs volumes first,
docker volume create \
--driver local \
--opt type=nfs \
--opt o=addr=xxxx.amazonaws.com,nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport \
--opt device=:/Static
efs-static
docker volume create \
--driver local \
--opt type=nfs \
--opt o=addr=xxxx.amazonaws.com,nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport \
--opt device=:/Temp
efs-temp
and no luck, the docker container can't startup either, and works if I remove the efs-static volume, it is just stuck, and no error message, please refer to below screenshots.
Does someone know why? I want the docker can startup with my 166G efs volume. Thanks.
my docker info
Client:
Context: default
Debug Mode: false
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 31
Server Version: 20.10.7
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version:
runc version:
init version:
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 5.4.0-1059-aws
Operating System: Ubuntu 18.04.6 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.18GiB
Name: ip-172-31-50-32
ID: XWJL:SUKJ:DVZL:SVII:JFHJ:W7ND:LZJ5:2LKU:ASBA:ILEB:HF54:KLCV
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

Trying to connect to mongodb service through Consul Connect Sidecar Proxy

I have a Minikube set up and a mongo instance running in it. I use Consul + Consul Connect to mesh my services. Only I can not connect to mongo from another service using sidecar upstreams, some weird stuff is happening...
My mongo instance is installed using bitnami helm chart, I just set the service name, set username and change the storage class to match my need, and put consul annotations for service mesh in pod annotation section:
image:
registry: docker.io
repository: bitnami/mongodb
tag: 4.2.5-debian-10-r3
pullPolicy: IfNotPresent
debug: false
serviceAccount:
create: true
name: "svc-identity-data"
usePassword: true
mongodbRootPassword: rootpassword
mongodbUsername: identity
mongodbPassword: identity
mongodbDatabase: company
service:
name: svc-identity-data
annotations: {}
type: ClusterIP
port: 27017
useStatefulSet: true
replicaSet:
enabled: false
useHostnames: true
name: rs0
replicas:
secondary: 1
arbiter: 1
pdb:
enabled: true
minAvailable:
primary: 1
secondary: 1
arbiter: 1
annotations: {}
labels: {}
podAnnotations:
"consul.hashicorp.com/connect-inject": "true"
"consul.hashicorp.com/connect-service": "svc-identity-data"
"consul.hashicorp.com/connect-service-protocol": "tcp"
persistence:
enabled: true
mountPath: /bitnami/mongodb
subPath: ""
storageClass: "standard"
accessModes:
- ReadWriteOnce
size: 8Gi
annotations: {}
configmap:
storage:
dbPath: /bitnami/mongodb/data/db
journal:
enabled: true
directoryPerDB: false
systemLog:
destination: file
quiet: false
logAppend: true
logRotate: reopen
path: /opt/bitnami/mongodb/logs/mongodb.log
verbosity: 0
net:
port: 27017
unixDomainSocket:
enabled: true
pathPrefix: /opt/bitnami/mongodb/tmp
ipv6: false
bindIp: 0.0.0.0
processManagement:
fork: false
pidFilePath: /opt/bitnami/mongodb/tmp/mongodb.pid
setParameter:
enableLocalhostAuthBypass: true
security:
authorization: enabled
Secondly I started a stand-alone mongodb pod to use mongo client, and meshed with consul connect using annotations
apiVersion: v1
kind: Pod
metadata:
name: mongo-client
labels:
name: mongo-client
annotations:
"consul.hashicorp.com/connect-inject": "true"
"consul.hashicorp.com/connect-service-upstreams": "svc-identity-data:28017"
"consul.hashicorp.com/connect-service-protocol": "tcp"
spec:
containers:
- name: mongo-client
image: mongo:4.2.5
imagePullPolicy: IfNotPresent
resources:
limits:
memory: "128Mi"
cpu: "500m"
ports:
- containerPort: 27017
I now have a mongodb service and a mongo client pod with an upstream to mongodb service binded on 127.0.0.1:28017
When I try to connect to mongodb service using my upstream I get a behavior I don't understand
> kubectl exec -it mongo-client mongo --host 127.0.0.1 --port 28017 -u root -p rootpassword
MongoDB shell version v4.2.5
connecting to: mongodb://127.0.0.1:28017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("8c46012d-8083-4029-8495-167bbe8bf063") }
MongoDB server version: 4.2.5
Server has startup warnings:
2020-04-22T12:20:14.777+0000 I STORAGE [initandlisten]
2020-04-22T12:20:14.777+0000 I STORAGE [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2020-04-22T12:20:14.777+0000 I STORAGE [initandlisten] ** See http://dochub.mongodb.org/core/prodnotes-filesystem
---
Enable MongoDB's free cloud-based monitoring service, which will then receive and display
metrics about your deployment (disk utilization, CPU, operation statistics, etc).
The monitoring data will be available on a MongoDB website with a unique URL accessible to you
and anyone you share the URL with. MongoDB may use this information to make product
improvements and to suggest MongoDB products and deployment options to you.
To enable free monitoring, run the following command: db.enableFreeMonitoring()
To permanently disable this reminder, run the following command: db.disableFreeMonitoring()
---
>
bye
No problem here, everything works perfectly fine to me, but if I use mongo with a connection string instead of separate parameters, I get a connection refused
> kubectl exec -it mongo-client mongo mongodb://root:roopassword#127.0.0.1:28017/?authSource=admin
MongoDB shell version v4.2.5
connecting to: mongodb://127.0.0.1:28017/?authSource=admin&compressors=disabled&gssapiServiceName=mongodb
2020-04-22T15:04:07.955+0000 I NETWORK [js] DBClientConnection failed to receive message from 127.0.0.1:28017 - HostUnreachable: Connection closed by peer
2020-04-22T15:04:07.968+0000 E QUERY [js] Error: network error while attempting to run command 'isMaster' on host '127.0.0.1:28017' :
connect#src/mongo/shell/mongo.js:341:17
#(connect):2:6
2020-04-22T15:04:07.973+0000 F - [main] exception: connect failed
2020-04-22T15:04:07.973+0000 E - [main] exiting with code 1
I don't understand at all what is the difference between using connection string and separate parameters, if you have any clue or a solution, please let me know.
P.S : I didn't set any secure communication (tls), I'm on a minikube (because I'm a Microservice Architecture and Kubernetes n00b) and it is to experiment service mesh (we need to live in the current era), a solution involving connecting to service without using the sidecar is not the point, by the way connecting directly to service works perfectly using connection string.
> kubectl exec -it mongo-client mongo -mongodb://root:roopassword#svc-identity-data:28017/?authSource=admin
MongoDB shell version v4.2.5
connecting to: mongodb://svc-identity-data:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("713febaf-2000-4ca6-8b1f-963c76986e72") }
MongoDB server version: 4.2.5
Server has startup warnings:
2020-04-22T12:20:14.777+0000 I STORAGE [initandlisten]
2020-04-22T12:20:14.777+0000 I STORAGE [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2020-04-22T12:20:14.777+0000 I STORAGE [initandlisten] ** See http://dochub.mongodb.org/core/prodnotes-filesystem
---
Enable MongoDB's free cloud-based monitoring service, which will then receive and display
metrics about your deployment (disk utilization, CPU, operation statistics, etc).
The monitoring data will be available on a MongoDB website with a unique URL accessible to you
and anyone you share the URL with. MongoDB may use this information to make product
improvements and to suggest MongoDB products and deployment options to you.
To enable free monitoring, run the following command: db.enableFreeMonitoring()
To permanently disable this reminder, run the following command: db.disableFreeMonitoring()
---
>
bye
EDIT : Rebooting minikube make all things work as intended. I will investigate more on the matter to understand why. Maybe someone else will hit the same issue.
EDIT 2 : I discovered one thing : connection error when connecting to mongo through sidecar is random, when I run command until it success, here is what I get
root#mongo-client:/# mongo mongodb://root:rootpassword#localhost:28017/?authSource=admin
MongoDB shell version v4.2.5
connecting to: mongodb://localhost:28017/?authSource=admin&compressors=disabled&gssapiServiceName=mongodb
2020-04-24T12:51:15.641+0000 I NETWORK [js] DBClientConnection failed to receive message from localhost:28017 - HostUnreachable: Connection closed by peer
2020-04-24T12:51:15.702+0000 E QUERY [js] Error: network error while attempting to run command 'isMaster' on host 'localhost:28017' :
connect#src/mongo/shell/mongo.js:341:17
#(connect):2:6
2020-04-24T12:51:15.729+0000 F - [main] exception: connect failed
2020-04-24T12:51:15.729+0000 E - [main] exiting with code 1
root#mongo-client:/# mongo mongodb://root:rootpassword#localhost:28017/?authSource=admin
MongoDB shell version v4.2.5
connecting to: mongodb://localhost:28017/?authSource=admin&compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("628bfcf9-6d44-4168-ab74-19a717d746f6") }
MongoDB server version: 4.2.5
Server has startup warnings:
2020-04-24T06:43:39.359+0000 I STORAGE [initandlisten]
2020-04-24T06:43:39.359+0000 I STORAGE [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2020-04-24T06:43:39.359+0000 I STORAGE [initandlisten] ** See http://dochub.mongodb.org/core/prodnotes-filesystem
---
Enable MongoDB's free cloud-based monitoring service, which will then receive and display
metrics about your deployment (disk utilization, CPU, operation statistics, etc).
The monitoring data will be available on a MongoDB website with a unique URL accessible to you
and anyone you share the URL with. MongoDB may use this information to make product
improvements and to suggest MongoDB products and deployment options to you.
To enable free monitoring, run the following command: db.enableFreeMonitoring()
To permanently disable this reminder, run the following command: db.disableFreeMonitoring()
---
>
bye
And on the side of mongo the log :
2020-04-24T12:51:19.281+0000 I NETWORK [conn6647] end connection 127.0.0.1:54148 (6 connections now open)
2020-04-24T12:51:19.526+0000 I COMMAND [conn6646] command admin.$cmd appName: "MongoDB Shell" command: saslStart { saslStart: 1, mechanism: "SCRAM-SHA-256", payload: "xxx", $db: "admin" } numYields:0 reslen:196 locks:{} protocol:op_msg 231ms
2020-04-24T12:51:19.938+0000 I ACCESS [conn6646] Successfully authenticated as principal root on admin from client 127.0.0.1:54142
2020-04-24T12:51:20.024+0000 I NETWORK [listener] connection accepted from 127.0.0.1:54168 #6648 (7 connections now open)
2020-04-24T12:51:20.027+0000 I NETWORK [conn6648] received client metadata from 127.0.0.1:54168 conn6648: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.2.5" }, os: { type: "Linux", name: "PRETTY_NAME="Debian GNU/Linux 10 (buster)"", architecture: "x86_64", version: "Kernel 4.19.94" } }
2020-04-24T12:51:20.215+0000 I NETWORK [conn6648] end connection 127.0.0.1:54168 (6 connections now open)
2020-04-24T12:51:21.328+0000 I NETWORK [conn6646] end connection 127.0.0.1:54142 (5 connections now open)
I am more and more confused, I can not explain that behavior.
I found out the solution, it turns out to be the simpliest issue possible : resources
My minikube wasn't enough to make all pods running swiftly, it was introducing a latency between the sidecar proxy pods even if kubenetes raised no error on any outage.
I'm a kubernetes learner so I didn't think of it right away. Now that I know what happened I can investigate in the right direction to undestand in what extends the latency can be an issue.
The problem may be that CN of the certificate doesn't match the value of hostname in config file of MongoDB. Its is about MongoDB specification and parameters with which you are running it.
CN (common name) or SAN (subject alternative name) of the certificate has to match the value of --hostname that you supply when running mongo.
Your MongoDB URI is:
MONGODB_URI=mongodb://root:roopassword#127.0.0.1:28017/?authSource=admin
the MongoDB is NOT on localhost. Also the MongoDB server needs to allow ANY host to connect to the database. By default it will ONLY allow connections from the SAME runtime. You need to get IP address of service which is assigned to pod with your database container - svc-identity-data has address 10.107.99.51.
Take a look: mongodb-ssl, mongodb-failed-to-connect

Postgres not starting on swarm server reboot

I'm trying to run an app using docker swarm. The app is designed to be completely local running on a single computer using docker swarm.
If I SSH into the server and run a docker stack deploy everything works, as seen here running docker service ls:
When this deployment works, the services generally go live in this order:
Registry (a private registry)
Main (an Nginx service) and Postgres
All other services in random order (all Node apps)
The problem I am having is on reboot. When I reboot the server, I pretty consistently have the issue of the services failing with this result:
I am getting some errors that could be helpful.
In Postgres: docker service logs APP_NAME_postgres -f:
In Docker logs: sudo journalctl -fu docker.service
Update: June 5th, 2019
Also, By request from a GitHub issue docker version output:
Client:
Version: 18.09.5
API version: 1.39
Go version: go1.10.8
Git commit: e8ff056
Built: Thu Apr 11 04:43:57 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.5
API version: 1.39 (minimum version 1.12)
Go version: go1.10.8
Git commit: e8ff056
Built: Thu Apr 11 04:10:53 2019
OS/Arch: linux/amd64
Experimental: false
And docker info output:
Containers: 28
Running: 9
Paused: 0
Stopped: 19
Images: 14
Server Version: 18.09.5
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: active
NodeID: pbouae9n1qnezcq2y09m7yn43
Is Manager: true
ClusterID: nq9095ldyeq5ydbsqvwpgdw1z
Managers: 1
Nodes: 1
Default Address Pool: 10.0.0.0/8
SubnetSize: 24
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 1
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 192.168.0.47
Manager Addresses:
192.168.0.47:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: bb71b10fd8f58240ca47fbb579b9d1028eea7c84
runc version: 2b18fe1d885ee5083ef9f0838fee39b62d653e30
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.15.0-50-generic
Operating System: Ubuntu 18.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 3.68GiB
Name: oeemaster
ID: 76LH:BH65:CFLT:FJOZ:NCZT:VJBM:2T57:UMAL:3PVC:OOXO:EBSZ:OIVH
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
WARNING: No swap limit support
And finally, My docker swarm stack/compose file:
secrets:
jwt-secret:
external: true
pg-db:
external: true
pg-host:
external: true
pg-pass:
external: true
pg-user:
external: true
ssl_dhparam:
external: true
services:
accounts:
depends_on:
- postgres
- registry
deploy:
restart_policy:
condition: on-failure
environment:
JWT_SECRET_FILE: /run/secrets/jwt-secret
PG_DB_FILE: /run/secrets/pg-db
PG_HOST_FILE: /run/secrets/pg-host
PG_PASS_FILE: /run/secrets/pg-pass
PG_USER_FILE: /run/secrets/pg-user
image: 127.0.0.1:5000/local-oee-master-accounts:v0.8.0
secrets:
- source: jwt-secret
- source: pg-db
- source: pg-host
- source: pg-pass
- source: pg-user
graphs:
depends_on:
- postgres
- registry
deploy:
restart_policy:
condition: on-failure
environment:
PG_DB_FILE: /run/secrets/pg-db
PG_HOST_FILE: /run/secrets/pg-host
PG_PASS_FILE: /run/secrets/pg-pass
PG_USER_FILE: /run/secrets/pg-user
image: 127.0.0.1:5000/local-oee-master-graphs:v0.8.0
secrets:
- source: pg-db
- source: pg-host
- source: pg-pass
- source: pg-user
health:
depends_on:
- postgres
- registry
deploy:
restart_policy:
condition: on-failure
environment:
PG_DB_FILE: /run/secrets/pg-db
PG_HOST_FILE: /run/secrets/pg-host
PG_PASS_FILE: /run/secrets/pg-pass
PG_USER_FILE: /run/secrets/pg-user
image: 127.0.0.1:5000/local-oee-master-health:v0.8.0
secrets:
- source: pg-db
- source: pg-host
- source: pg-pass
- source: pg-user
live-data:
depends_on:
- postgres
- registry
deploy:
restart_policy:
condition: on-failure
image: 127.0.0.1:5000/local-oee-master-live-data:v0.8.0
ports:
- published: 32000
target: 80
main:
depends_on:
- accounts
- graphs
- health
- live-data
- point-logs
- registry
deploy:
restart_policy:
condition: on-failure
environment:
MAIN_CONFIG_FILE: nginx.local.conf
image: 127.0.0.1:5000/local-oee-master-nginx:v0.8.0
ports:
- published: 80
target: 80
- published: 443
target: 443
modbus-logger:
depends_on:
- point-logs
- registry
deploy:
restart_policy:
condition: on-failure
environment:
CONTROLLER_ADDRESS: 192.168.2.100
SERVER_ADDRESS: http://point-logs
image: 127.0.0.1:5000/local-oee-master-modbus-logger:v0.8.0
point-logs:
depends_on:
- postgres
- registry
deploy:
restart_policy:
condition: on-failure
environment:
ENV_TYPE: local
PG_DB_FILE: /run/secrets/pg-db
PG_HOST_FILE: /run/secrets/pg-host
PG_PASS_FILE: /run/secrets/pg-pass
PG_USER_FILE: /run/secrets/pg-user
image: 127.0.0.1:5000/local-oee-master-point-logs:v0.8.0
secrets:
- source: pg-db
- source: pg-host
- source: pg-pass
- source: pg-user
postgres:
depends_on:
- registry
deploy:
restart_policy:
condition: on-failure
window: 120s
environment:
POSTGRES_PASSWORD: password
image: 127.0.0.1:5000/local-oee-master-postgres:v0.8.0
ports:
- published: 5432
target: 5432
volumes:
- /media/db_main/postgres_oee_master:/var/lib/postgresql/data:rw
registry:
deploy:
restart_policy:
condition: on-failure
image: registry:2
ports:
- mode: host
published: 5000
target: 5000
volumes:
- /mnt/registry:/var/lib/registry:rw
version: '3.2'
Things I've tried
Action: Added restart_policy > window: 120s
Result: No Effect
Action: Postgres restart_policy > condition: none & crontab #reboot redeploy
Result: No Effect
Action: Set all containers stop_grace_period: 2m
Result: No Effect
Current Workaround
Currently, I have hacked together a solution that is working just so I can move on to the next thing. I just wrote a shell script called recreate.sh that will kill the failed first boot version of the server, wait for it to break down, and the "manually" run docker stack deploy again. I am then setting the script to run at boot with crontab #reboot. This is working for shutdowns and reboots, but I don't accept this as the proper answer, so I won't add it as one.
It looks to me that you need to check is who/what kills postgres service. From logs you posted it seems that postrgres receives smart shutdown signal. Then, postress stops gently. Your stack file has restart policy set to "on-failure", and since postres process stops gently (exit code 0), docker does not consider this as failue and as instructed, it does not restart.
In conclusion, I'd recommend changing restart policy to "any" from "on-failure".
Also, have in mind that "depends_on" settings that you use are ignored in swarm and you need to have your services/images own way of ensuring proper startup order or to be able to work when dependent services are not up yet.
There's also thing you could try - healthchecks. Perhaps your postgres base image has a healthcheck defined and it's terminating container by sending a kill signal to it. And as wrote earlier, postgres shuts down gently and there's no error exit code and restart policy does not trigger. Try disabling healthcheck in yaml or go to dockerfiles to see for the healthcheck directive and figure out why it triggers.

Docker [for mac] file system became read-only which breaks almost all features of docker

My Docker ran into an error state, where I cannot use it anymore.
output of docker system info:
Containers: 14
Running: 2
Paused: 0
Stopped: 12
Images: 61
Server Version: 18.03.1-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: error
NodeID:
Error: open /var/lib/docker/swarm/worker/tasks.db: read-only file system
Is Manager: false
Node Address: 192.168.65.3
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 4.9.87-linuxkit-aufs
Operating System: Docker for Mac
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1.952GiB
Name: linuxkit-025000000001
ID: MCSC:SFXH:R3JC:NU4D:OJ5V:K4B5:LPMJ:2BFL:LHT3:LYCI:XKY2:DTE6
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
HTTP Proxy: docker.for.mac.http.internal:3128
HTTPS Proxy: docker.for.mac.http.internal:3129
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
This behaviour occured, after I built the following Dockerfile:
FROM perl:5.20
RUN apt-get update && apt-get install -y libsoap-lite-perl \
&& rm -rf /var/lib/apt/lists/*
RUN cpan SOAP::LITE
the error message when I try to build an image or run a container or remove an image is always similar to this:
Error: open /var/lib/docker/swarm/worker/tasks.db: read-only file system
for example if I try to execute this command:
docker container run -it perl:5.20 bash
I get this error:
docker: Error response from daemon: mkdir /var/lib/docker/overlay2/1b966e163e500a8c78a64e8d0f14984b091c1c5fe188a60b8bd030672d3138d9-init: read-only file system.
How can I reset my docker so these errors go away?
Go to your docker for mac Icon in the top right, click on it and then click Restart.
After that Docker works as expected.
This seems to be an temporary issue since I cannot reproduce it after restarting docker. My guess is that I had an network communication breakdown while docker tried to download and install the packages in the Dockerfile.

can not connect to docker container mapping port

I use mongo image in docker, but I can not connect to 20217 port.
docker#default:~$ docker ps
prot info show: 0.0.0.0:20217->20217/tcp, 27017/tcp
but,
gilbertdeMacBook-Pro:~ gilbert$ lsof -i tcp:20217
there is no PID,
gilbertdeMacBook-Pro:~ gilbert$ docker info
Containers: 3
Images: 43
Server Version: 1.9.1
Storage Driver: aufs
Root Dir: /mnt/sda1/var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 50
Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.1.13-boot2docker
Operating System: Boot2Docker 1.9.1 (TCL 6.4.1); master : cef800b - Fri Nov 20 19:33:59 UTC 2015
CPUs: 1
Total Memory: 1.956 GiB
Name: default
ID: MRAZ:ZG5E:HDMY:EJNQ:HFL4:PW6Y:AXIS:6JFL:PFI5:GBAY:5SMF:NYQR
Debug mode (server): true
File Descriptors: 25
Goroutines: 44
System Time: 2016-01-27T14:53:52.005531869Z
EventsListeners: 0
Init SHA1:
Init Path: /usr/local/bin/docker
Docker Root Dir: /mnt/sda1/var/lib/docker
Username: gilbertgan
Registry: https://index.docker.io/v1/
Labels:
provider=virtualbox
I found this is because on MAC docker-machine is running on VM,so we need add the VM IP when connect to container.
the ip can be show by: docker-machine ls
Your docker container maps port 20217 which isn't the MongoDB default port. The correct port is 27017. And gilbert_gan is right as well. When running docker on docker-machine the docker host is not localhost but rather the virtual machine under docker-machine control.