All ceph daemons container images disappeared on a single node after reconfiguring docker logs driver - ceph

I've changed log_driver to "local" in daemon.json docker configuration file, because an high activity level on rados gateway logs had satured disk space. My will was to change to journald to have logrotate. Unfortunately, after restart the docker daemon, many Ceph services did disappeared even as containers images. So now that node had caused an HEALTH_ERR because it lost 1 mgr, 1 mon and 3 osd services at the same time.
I've tried to use some ceph commands inside cephadm shell (on another node), but it freezes and nothing happened. What can I try to do to restore the node's services and cluster health?

Related

kubernetes: Is POD is also like a PC

I see that kubernets uses pod and then in each pod there can be multiple containers.
Example I create a pod with
Container 1: Django server - running at port 8000
Container 2: Reactjs server - running at port 3000
Whereas
I am coming for docker background
So in docker we do
docker run --name django -d -p 8000:8000 some-django
docker run --name reactjs -d -p 3000:3000 some-reactjs
So POD is also like PC with some ubunut os on it
No, a Pod is not like a PC/VM with Ubuntu on it.
There is no intermediate layer between your host and the containers in a pod. The only thing happening here is that the containers in a pod share some resources/namespaces in the host's kernel, and there are mechanisms in your host kernel to "protect" the containers from seeing other containers. Pods are just a mechanism to help you deploy a couple containers that share some resources (like the network namespace) a little easier. Fundamentally they are just linux processes directly on the host.
(one nuanced technicality/caveat on the above statement: Docker and tools like it will sometimes run their own VM and may try to make that invisible to you. For example, Docker Desktop does this. Usually you can ignore this layer, but it is great to know it is there. The answer holds though: That one single VM will host all of your pods/containers and there is not one VM per pod.)

Monitor daemon running but not in quorum

I'm currently testing OS and version upgrades for a ceph cluster. Starting info:
The cluster is currently on Centos 7 and Ceph version Nautilus. I'm trying to change OS with ubuntu 20.04 and version with Octopus. I started with upgrading mon1 first. I will write down the things done in order.
First of I stopped monitor service - systemctl stop ceph-mon#mon1
Then I removed the monitor from cluster - ceph mon remove mon1
Then installed ubuntu 20.04 on mon1. Updated the system and configured ufw.
Installed ceph octopus packages.
Copied ceph.client.admin.keyring and ceph.conf to mon1 /etc/ceph/
Copied ceph.mon.keyring to mon1 to a temporary folder and changed ownership to ceph:ceph
Got the monmap ceph mon getmap -o ${MONMAP} - The thing is i did this after removing the monitor.
Created /var/lib/ceph/mon/ceph-mon1 folder and changed ownership to ceph:ceph
Created the filesystem for monitor - sudo -u ceph ceph-mon --mkfs -i mon1 --monmap /folder/monmap --keyring /folder/ceph.mon.keyring
After noticing I got the monmap after the monitors removal I added it manually - ceph mon add mon1 <ip> --fsid <fsid>
After starting manually and checking cluster state with ceph -s I can see mon1 is listed but is not in quorum. The monitor daemon runs fine on the said mon1 node. I noticed on logs that mon1 is stuck in "probe" state and on other monitor logs there is an output such as mon1 (rank 2) addr [v2:<ip>:3300/0,v1:<ip>:6789/0] is down (out of quorum) , as i said the the monitor daemon is running on mon1 without any visible errors just stuck in probe state.
I wondered if it was caused by os&version change so i first tried out configuring manager, mds and radosgw daemons by creating the respective folders in /var/lib/ceph/... and copying keyrings. All these services work fine, i was able to reach to my buckets, was able to open the Octopus version dashboard, and metadata server is listed as active in ceph -s. So evidently my problem is only with monitor configuration.
After doing some checking found this on red hat ceph documantation:
If the Ceph Monitor is in the probing state longer than expected, it
cannot find the other Ceph Monitors. This problem can be caused by
networking issues, or the Ceph Monitor can have an outdated Ceph
Monitor map (monmap) and be trying to reach the other Ceph Monitors on
incorrect IP addresses. Alternatively, if the monmap is up-to-date,
Ceph Monitor’s clock might not be synchronized.
There is no network error on the monitor, I can reach all the other machines in the cluster. The clocks are synchronized. If this problem is caused by the monmap situation how can I fix this?
Ok so as a result, directly from centos7-Nautilus to ubuntu20.04-Octopus is not possible for monitor services only, apparently the issue is about hostname resolution with different Operating systems. The rest of the services is fine. There is a longer way to do this without issue and is the correct solution. First change os from centos7 to ubuntu18.04 and install ceph-nautilus packages and add the machines to cluster (no issues at all). Then update&upgrade the system and apply "do-release-upgrade". Works like a charm. I think what eblock mentioned was this.

Ceph configuration file and ceph-deploy

I set up a test cluster and follow the documentation.
I created cluster with command ceph-deploy new node1. After that, ceph configuration file appeared in the current directory, which contains information about the monitor on the node with hostname node1. Then I added two OSDs to the cluster.
So now I have cluster with 1 monitor and 2 OSDs. ceph status command says that status is HEALTH_OK.
Following all the same documentation, I moved on to section "Expanding your cluster" and added two new monitors with commands ceph-deploy mon add node2 and ceph-deploy mon add node3. Now I have cluster with three monitors in the quorum and status HEALTH_OK, but there is one little discrepancy for me. The ceph.conf is still the same. It contains old information about only one monitor. Why ceph-deploy mon add {node-name} command didn't update configuration file? And the main question is why ceph status displays correct information about new cluster state with 3 monitors while ceph.conf doesn't contain this information. Where is real configuration file and why ceph-deploy knows it but I don't?
And it works even after a reboot. All ceph daemons start, read incorrect ceph.conf (I checked this with strace) and, ignoring this, work fine with new configuration.
And the last question. Why ceph-deploy osd activate {ceph-node}:/path/to/directory command didn't update configuration file too? After all why do we need ceph.conf file if we have so smart ceph-deploy now?
You have multiple questions here.
1) ceph.conf doesn't need to be completely the same for all nodes to run. E.g. OSD only need osd configuration they care about, MON only need configuration mon care ( unless you run everything on the same node which is also not recommended) So maybe your MON1 has MON1 MON2 has MON2 MON3 has MON3
2) When MON being created and then added, the MON map being updated so MON itself already know which other MON require to have quorum. So MON doesn't counting on ceph.conf to get quorum information but to change run time configuration.
3) ceph-deploy just a python script to prepare and run the ceph command for you. If you read into the detail ceph-deploy use e.g. ceph-disk zap prepare activate.
Once you osd being prepared, and activate, once it is format to ceph partition, udev know where to mount. Then systemd ceph-osd.server will be activate ceph-osd at boot. That's why it doesn't need OSD information in ceph.conf at all

Docker Compose - one specific container randomly doesn't start properly

I have a docker environment with 5 containers that are composed via docker compose. Now only on mac machines and only sometimes (seems completely random) 1 of these 5 container doesn't start.
The weird thing about it is, that docker ps says the container is running and I can connect to it. Inside the container is a JBoss server and ps says that there is a process that runs the JBoss. BUT in fact the JBoss is not up and running. There is no logging in the docker compose console and JBoss not accessible.
There is also the problem that if this happens the whole docker-compose process cannot be canceled properly anymore. All containers shutdown and also can be forced to shutdown but the JBoss container. Then the docker-machine hangs up.
I didn't find any hint in the interwebs ... please help !
It seems that the process running inside the container is in a weird state.
Try killing it without providing a grace period, or removing the container.
stop : Stop a container by sending SIGTERM and then SIGKILL after a
grace period
--help=false Print usage
-t, --time=10 Seconds to wait for stop before killing it
kill : Kill a running container using SIGKILL or a specified signal
--help=false Print usage
-s, --signal="KILL" Signal to send to the container
rm : Remove one or more containers
-f, --force=false Force the removal of a running container (uses SIGKILL)
--help=false Print usage
-l, --link=false Remove the specified link
-v, --volumes=false Remove the volumes associated with the container
Moreover try checking the logs of the container:
docker logs --follow <container_name or container_id>
After updating to docker v1.10 the problem didn't occur anymore :)

Reliability of Docker containers

My question aims at verifying and maybe rectifying my idea of the reliability of Docker containers. I read both, the Docker documentation and several articles on VOLUME in the Dockerfile and --v as an argument when running a container as means to persist data outside a Docker container. Be it in a data container or on the host system. As would like to keep the complexity of my setup simple, I would prefer not to copy/save/store data round and about but keep it in the Docker container itself.
There are several cases through which I discovered the behaviour of Docker containers. I'd like to know if I missed a scenario where a container can be 100% lost unpurposely, i.e. NOT doing $ docker rm -f mycontainer
docker commands to pause, stop and kill a container
-> restartable by $ docker restart mycontainer or $ docker run mycontainer
Host system reboot
-> docker container exits with 0 or 255
Host system unexpected power off
-> What happens?
Application exception
-> docker container exits with -1
Updating or restarting docker (as pointed out by Greg)
-> expected behavior: like on system reboot (?)
In all those cases, the docker container is still existent in the end. So is there any other scenario that can cause a docker container to be lost like with $ docker rm -f mycontainer?
The background is, that I read a lot about mounted volumes and external datastorage on the host system for Postgres but I'd like to avoid storing data outside my containers on the host system if possible. On the other hand, I don't want to wake up and have all data lost. (I do perform regular SQL-dumps, but I don't want to do this every 5 minutes). If a docker container itself is not reliable for persistant data, I don't see why I should create a second container to hold the data for a first one and increase the complexity of my system by adding a new container but not gaining anything in terms of reliability.
Edit: There are two points in the Docker userguide on Volumes which do not explicitly explain which behaviour to expect and therefore making me question if these concepts provide extra reliability:
Changes to a data volume will not be included when you update an
image
-> Does that mean that they get lost or that the content of the volume won't be changed?
Volumes persist until no containers use them
-> What's the definition of 'use'? As long as a container is not stopped, killed, removed? Does that mean that the volume Docker created on the host system will get removed? Or does volume only refer to a virtual bridge between a directory inside Docker and one on the host system?
If you store all your data in the container, what are you going to do when you need to update the image? Updates to images are normally done by changing the Dockerfile and rebuilding the image. If my data is kept separate to my container, I can start a new version of the image, mount the data with --volumes-from or -v and kill the old container. In your case, you have to keep the container running and try to patch in place with something like puppet.
Also, I'm not sure what you think you're saving. If you run the official postgres image, it will have declared volumes in the Dockerfile. Those volumes exist as normal directories on your host system whether you ran the container with -v or not. Even if your Dockerfile has no volumes, clearly the UFS is being stored on your host anyway.
In general, you should consider containers to be temporary and stateless. Whilst you don't have to do this, you will find most of the tooling and support services are designed around this idiom.
Regarding your scenarios, there are a few you're missing:
A bug could make it impossible to restart a stopped container
The updating issue mentioned above
If you want to change storage driver. This will cause a great deal of problems, as you need to migrate your images.
Just for clarity on the commands, docker start will restart stopped or exited containers and docker unpause will unpause paused containers.