Do I need a configuration file on each Ceph node? - ceph

I am getting different things from different sources. At first I thought that I could just have one configuration file (on the monitor), with sections for each node (include osd nodes). But when on the new OSD, ceph osd create fails, saying there is no configuration file.
So, how does the configuration structure of ceph work? Further, is the FSID (a UUID) in each configuration file the same?

Yes, every machine planned to deploy osd/mon/mds should have a /etc/ceph/ceph.conf file.
When creating osd instance, it needs to communicate with mon from configure file. The fsid should be same, as osd/mds/mon will compare the fsid when handling internal messages; if they are not the same, the message will be dropped.

Related

ProxySQL vs MaxScale on Kubernetes

I'm looking to set up a writing proxy for our MariaDB database on Kubernetes. The problem we are currently having is that we only have one Write master on our 3 master galera cluster setup. So even though we have ours pods replication properly, if our first node goes down then our other two masters end up failing because they are not able to be written to.
I saw this was a possible option to use either ProxySQL or MaxScale for Write proxying, but I'm not sure if I'm reading their uses properly. Do I have the right idea looking to deploy either of these two applications/services on Kubernetes to fix my problem? Would I be able to write to any of the Masters in the cluster?
MaxScale will handle selecting which server to write to as long as you use the readwritesplit router and the galeramon monitor.
Here's an example configuration for MaxScale that does load balancing of reads but sends writes to one node:
[maxscale]
threads=auto
[node1]
type=server
address=node1-address
port=3306
[node2]
type=server
address=node2-address
port=3306
[node3]
type=server
address=node3-address
port=3306
[Galera-Cluster]
type=monitor
module=galeramon
servers=node1,node2,node3
user=my-user
password=my-password
[RW-Split-Router]
type=service
router=readwritesplit
cluster=Galera-Cluster
user=my-user
password=my-password
[RW-Split-Listener]
type=listener
service=RW-Split-Router
protocol=mariadbclient
port=4006
The reason writes are only done on one node at a time is because doing it on multiple Galera nodes won't improve write performance and it results in conflicts when transactions are committed (applications seem to rarely handle these).

How can I fix ceph commands hanging after a reboot?

I'm pretty new to Ceph, so I've included all my steps I used to set up my cluster since I'm not sure what is or is not useful information to fix my problem.
I have 4 CentOS 8 VMs in VirtualBox set up to teach myself how to bring up Ceph. 1 is a client and 3 are Ceph monitors. Each ceph node has 6 8Gb drives. Once I learned how the networking worked, it was pretty easy.
I set each VM to have a NAT (for downloading packages) and an internal network that I called "ceph-public". This network would be accessed by each VM on the 10.19.10.0/24 subnet. I then copied the ssh keys from each VM to every other VM.
I followed this documentation to install cephadm, bootstrap my first monitor, and added the other two nodes as hosts. Then I added all available devices as OSDs, created my pools, then created my images, then copied my /etc/ceph folder from the bootstrapped node to my client node. On the client, I ran rbd map mypool/myimage to mount the image as a block device, then used mkfs to create a filesystem on it, and I was able to write data and see the IO from the bootstrapped node. All was well.
Then, as a test, I shutdown and restarted the bootstrapped node. When it came back up, I ran ceph status but it just hung with no output. Every single ceph and rbd command now hangs and I have no idea how to recover or properly reset or fix my cluster.
Has anyone ever had the ceph command hang on their cluster, and what did you do to solve it?
Let me share a similar experience. I also tried some time ago to perform some tests on Ceph (mimic i think) an my VMs on my VirtualBox acted very strange, nothing comparing with actual bare metal servers so please bare this in mind... the tests are not quite relevant.
As regarding your problem, try to see the following:
have at least 3 monitors (or an even number). It's possible that hang is because of monitor election.
make sure the networking part is OK (separated VLANs for ceph servers and clients)
DNS is resolving OK. (you have added the servername in hosts)
...just my 2 cents...

Copying directories into minikube and persisting them

I am trying to copy some directories into the minikube VM to be used by some of the pods that are running. These include API credential files and template files used at run time by the application. I have found you can copy files using scp into the /home/docker/ directory, however these files are not persisted over reboots of the VM. I have read files/directories are persisted if stored in the /data/ directory on the VM (among others) however I get permission denied when trying to copy files to these directories.
Are there:
A: Any directories in minikube that will persist data that aren't protected in this way
B: Any other ways of doing the above without running into this issue (could well be going about this the wrong way)
To clarify, I have already been able to mount the files from /home/docker/ into the pods using volumes, so it's just the persisting data I'm unclear about.
Kubernetes has dedicated object types for these sorts of things. API credential files you might store in a Secret, and template files (if they aren't already built into your Docker image) could go into a ConfigMap. Both of them can either get translated to environment variables or mounted as artificial volumes in running containers.
In my experience, trying to store data directly on a node isn't a good practice. It's common enough to have multiple nodes, to not directly have login access to those nodes, and for them to be created and destroyed outside of your direct control (imagine an autoscaler running on a cloud provider that creates a new node when all of the existing nodes are 90% scheduled). There's a good chance your data won't (or can't) be on the host where you expect it.
This does lead to a proliferation of Kubernetes objects and associated resources, and you might find a Helm chart to be a good resource to tie them together. You can check the chart into source control along with your application, and deploy the whole thing in one shot. While it has a couple of useful features beyond just packaging resources together (a deploy-time configuration system, a templating language for the Kubernetes YAML itself) you can ignore these if you don't need them and just write a bunch of YAML files and a small control file.
For minikube, data kept in $HOME/.minikube/files directory is copied to / directory in VM host by minikube.

how kubernetes deal with file write locker accross multi pods when hostpath Volumes concerned

I got app that logs to file my_log/1.log, and then I use filebeat to collect the logs from the file
Now I use k8s to deploy it into some nodes, and use hostpath type Volumes to mounts my_log file to the local file syetem, /home/my_log, suddenly I found a subtle situation:
what will it happened if more than one pod deployed on this machine, and then they try to write the log at the same time?
I know that in normal situation, multi-processing try to write to a file at the same time, the system will lock the file,so these processes can write one by one, BUT I am not sure will k8s diffirent pods will not share the same lock space, if so, it will be a disaster.
I try to test this and it seems diffirent pods will still share the file lock,the log file seems normal
how kubernetes deal with file write locker accross multi pods when hostpath Volumes concerned
It doesn't.
Operating System and File System are handling that.
As an example let's take syslog. It handles it by opening a socket, setting the socket to server mode, opening a log file in write mode, being notified of packages, parsing the message and finally writing it to the file.
Logs can also be cached, and the process can be limited to 1 thread, so you should not have many pods writing to one file. This could lead to issues like missing logs or lines being cut.
Your application should handle the file locking to push logs, also if you want to have many pods writing logs, you should have a separate log file for each pod.

Is zookeeper reconfig expected to update the zoo.cfg.dynamic file?

I'm setting up a distributed cluster for ZooKeeper based on version 3.5.2. In specific, I'm utilizing the reconfig command to dynamically update the configuration when there is any rebalance in the cluster (e.g. one of the nodes comes down).
The observation I have is that the zoo.cfg.dynamic file is not getting updated even when the reconfig (add/remove) command is correctly executed. Is this the expected behavior ? Basically I'm looking for guidance whether we should manage the zoo.cfg.dynamic file also through a separate script (update it lock-step with the reconfig command) or can we rely on the reconfig command to do this for us. My preference/expectation is the latter.
Following is the sample command:
reconfig -remove 6 -add server.5=125.23.63.23:1234:1235;1236
From the reconfig documentation:
Dynamic configuration parameters are stored in a separate file on the server (which we call the dynamic configuration file). This file is linked from the static config file using the new dynamicConfigFile keyword.
So I could practically start with any file name to host the ensemble list and ensure the 'dynamicConfigFile' config keyword just point to this file.
Now when the reconfig command is run, basically a new dynamic-config file (e.g. zoo.cfg.dynamic.00000112) is generated which contains the transformed list of servers, in the form as below (as an example):
server.1=125.23.63.23:2780:2783:participant;2791
server.2=125.23.63.24:2781:2784:participant;2792
server.3=125.23.63.25:2782:2785:participant;2793
The zoo.cfg file is hence auto-updated to point the 'dynamicConfigFile' config keyword to the new config file (zoo.cfg.dynamic.00000112). The previous dynamic-config file continues to be available in the runtime (config directory) but it is not being referred by the main config anymore.
So overall, there is no overhead to update any file lock-step to the reconfig command i.e. reconfig command takes care of it all. The only potential overhead to upfront resolve is to write a periodic purge of the old dynamic-config files.