unable to get monitor info from DNS SRV with service name: ceph-mon - ceph

I can not run ceph -s
when I run ceph -s,the error display:
root#ceph-mon-1:~# ceph -s
unable to get monitor info from DNS SRV with service name: ceph-mon no monitors specified to connect to.
7ff69982e700 -1 failed for service _ceph-mon-1._tcp
7ff69982e700 -1 monclient: get_monmap_and_config cannot identify monitors to contact
[errno 2] error connecting to the cluster
my ceph.conf:
[global]
fsid = c9932f0b-b0cb-423c-a331-7f9ef8a5f4a7
public network = 192.168.222.0/24
cluster network = 192.168.43.0/24
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
osd journal size = 1024
osd pool default size = 3
osd pool default min size = 2
log file = /tmp/$cluster-$name.log
mon dns srv name = ceph-mon-1
[mon]
mon host = ceph-mon-1
mon initial members = ceph-mon-1
mon data = /mon-data/mon/$cluster-$id
my /etc/hosts:
192.168.43.5 ceph-mon-1
But it didn't work. What should I do?

You will need to follow Ceph documentation properly step-by-step with appropriate version.
Check to see if there's monitor process pid running -
$ ps -ef | grep ceph
You should see something like this -
ceph 564589 1 0 Feb19 ? 00:26:47 /usr/bin/ceph-mon -f --cluster ceph --id ceph-mon-1 --setuser ceph --setgroup ceph
If you see something similar output above that means your binary is running, you will need to check if you have tried to create a cluster ceph-mon-1 & then initialise it?
$ ceph-deploy new ceph-mon-1
$ ceph-deploy mon create-initial
$ ceph-deploy admin ceph-mon-1
Example (/etc/ceph/ceph.conf):
[global]
fsid = 04fa0f1d-1889-4474-aeb8-d3237ea2cdd1
mon_initial_members = ceph-mon-1
mon_host = 10.10.10.1
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
$ ls -lha
-rw-r--r-- 1 root root 159K Feb 15 13:41 ceph-deploy-ceph.log
-rw-r--r-- 1 root root 113 Feb 19 15:26 ceph.bootstrap-mds.keyring
-rw-r--r-- 1 root root 113 Feb 19 15:26 ceph.bootstrap-mgr.keyring
-rw-r--r-- 1 root root 113 Feb 19 15:26 ceph.bootstrap-osd.keyring
-rw-r--r-- 1 root root 113 Feb 19 15:26 ceph.bootstrap-rgw.keyring
-rw-r--r-- 1 root root 151 Feb 19 15:26 ceph.client.admin.keyring
-rw-rw-r-- 1 root root 218 Feb 19 15:26 ceph.conf
-rw-r--r-- 1 root root 73 Feb 19 15:26 ceph.mon.keyring
After initialising you will see generated keyrings on your current working directory, copy them over to /etc/ceph/ folder.
$ sudo cp -vf ceph.* /etc/ceph/
then run -
$ ceph -s

Related

MongoDB: rsync whole db folder, except one huge collection that doesn't change

Here is content of MongoDB /data/db folder content. Where one of the collections is 723GB. And other collections just few KB.
-rw------- 1 lxd docker 723G Dec 5 10:15 collection-0-1080408413244540209.wt
-rw------- 1 lxd docker 36K Dec 5 10:15 collection-0-3112968025499504303.wt
-rw------- 1 lxd docker 36K Dec 5 10:15 collection-2-1080408413244540209.wt
-rw------- 1 lxd docker 4.0K Dec 5 10:14 collection-4-1080408413244540209.wt
-rw------- 1 lxd docker 20K Dec 5 10:15 collection-7-1080408413244540209.wt
-rw------- 1 lxd docker 0 Dec 5 10:14 .dbshell
drwx------ 2 lxd docker 90 Dec 5 10:15 diagnostic.data
-rw------- 1 lxd docker 8.1G Dec 5 10:15 index-1-1080408413244540209.wt
-rw------- 1 lxd docker 20K Dec 5 10:14 index-1-3112968025499504303.wt
-rw------- 1 lxd docker 36K Dec 5 10:15 index-3-1080408413244540209.wt
-rw------- 1 lxd docker 4.0K Dec 5 10:14 index-5-1080408413244540209.wt
-rw------- 1 lxd docker 4.0K Dec 5 10:14 index-6-1080408413244540209.wt
-rw------- 1 lxd docker 20K Dec 5 10:14 index-8-1080408413244540209.wt
-rw------- 1 lxd docker 20K Dec 5 10:15 index-9-1080408413244540209.wt
drwx------ 2 lxd docker 110 Dec 5 10:15 journal
-rw------- 1 lxd docker 36K Dec 5 10:15 _mdb_catalog.wt
-rw------- 1 lxd docker 0 Dec 5 10:15 mongod.lock
-rw------- 1 lxd docker 36K Dec 5 10:15 sizeStorer.wt
-rw------- 1 lxd docker 114 Dec 5 10:14 storage.bson
-rw------- 1 lxd docker 50 Dec 5 10:14 WiredTiger
-rw------- 1 lxd docker 4.0K Dec 5 10:15 WiredTigerHS.wt
-rw------- 1 lxd docker 21 Dec 5 10:14 WiredTiger.lock
-rw------- 1 lxd docker 1.5K Dec 5 10:15 WiredTiger.turtle
-rw------- 1 lxd docker 84K Dec 5 10:15 WiredTiger.wt
I simply backup whole docker volume to another server via rsync.
I never change content of the 723GB collection, but for some reason MongoDB update the file approximately once in a week.
Because of that rsync also update that file remotely. And because I'm using snapshots, every week new snapshot add another 723GB to the storage, that is unacceptable and cause me the problems.
To resolve that problem, I simply added 723GB collection into rsync exception and do not upload it anymore. Is that fine? May I after 1 year still use my backup to restore the server if I do not update collection-0-1080408413244540209.wt file any more?
By default, rsync only copies new or changed files from a source to destination so you dont need to add the file to exception list if you copy the mounted snapshot files. From the other hand the wiredTiger is a bit sensitive and generate checksum in the data root folder based on checkpoints from all collections so in case the file differ there is a big chance the mongod process to not be able to start from the restored snapshot. So I would suggest to not exclude the file completely but leave to rsync to check every time if file is same or differ and need to be copied again or not.
P.S.
Note that the scenario you have described was valid with the deprecated previous mongoDB storage engine mmapv1 where it was even possible to copy the collection inside running instance on the fly , unfortunately the wiredTiger do not allow it , but offer other advantages.

pod logs not stored in EKS nodes

I am trying to configure a fluend to send logs to an elasticsearch. After configuring it, I could not see any pod logs in the elasticsearch.
While debuging what is happening, I have seen that there are no logs in the node in path var/log/pods:
cd var/logs/pods
ls -la
drwxr-xr-x. 34 root root 8192 Dec 9 12:26 .
drwxr-xr-x. 14 root root 4096 Dec 9 02:21 ..
drwxr-xr-x. 3 root root 21 Dec 7 03:14 pod1
drwxr-xr-x. 6 root root 119 Dec 7 11:17 pod2
cd pod1/containerName
ls -la
total 0
drwxr-xr-x. 2 root root 6 Dec 7 03:14 .
drwxr-xr-x. 3 root root 21 Dec 7 03:14 ..
But I can see the logs when executing kubectl logs pod1
As I have seen in the documentation logs should be in this path. Do you have any idea why there are no logs stored in the node?
I have found what was happening. The problem was related with the log driver. It was configured to send the logs to journald:
docker inspect -f '{{.HostConfig.LogConfig.Type}}' ID
journald
I have changed it to json-file. Now it is writing logs var/log/pods
Here there are the different logging configuration options

`mkdir` returns successfully but doesn't work in kubernetes (minikube) shared volume

I am trying to set up a shared volume in a minikube Kubernetes cluster to allow multiple pods to communicate with each other. What is configured is:
A PVC using the nfs-server-provisioner dynamic provisioner
Multiple Pods (some are jobs) that mount the PVC
The goal is to have an init container in each pod that creates a directory on startup using the Pod's name as the directory name, and have a job scan that directory and do some stuff.
I have this configured, and no errors are thrown, but the directory isn't created.
When trying to do this manually I see some strange behavior; mkdir returns a non-error code but doesn't do anything:
< ssh into pod >
user#802542b3ccb195b001258094dc543606-1601299620-zcszs:~$ ls -al /output/
total 8
drwxrwxrwx 2 user users 4096 Sep 28 13:28 .
drwxr-xr-x 1 root root 4096 Sep 28 13:27 ..
user#802542b3ccb195b001258094dc543606-1601299620-zcszs:~$ mkdir /output/test
user#802542b3ccb195b001258094dc543606-1601299620-zcszs:~$ echo $#
0
user#802542b3ccb195b001258094dc543606-1601299620-zcszs:~$ ls -al /output/
total 8
drwxrwxrwx 2 user users 4096 Sep 28 13:28 .
drwxr-xr-x 1 root root 4096 Sep 28 13:27 ..
user#802542b3ccb195b001258094dc543606-1601299620-zcszs:~$
I am able to touch files:
user#802542b3ccb195b001258094dc543606-1601299740-bw2hj:~$ ls -al /output/
total 8
drwxrwxrwx 2 user users 4096 Sep 28 13:29 .
drwxr-xr-x 1 root root 4096 Sep 28 13:29 ..
user#802542b3ccb195b001258094dc543606-1601299740-bw2hj:~$ touch /output/test
user#802542b3ccb195b001258094dc543606-1601299740-bw2hj:~$ ls -al /output/
total 8
drwxrwxrwx 2 user users 4096 Sep 28 13:29 .
drwxr-xr-x 1 root root 4096 Sep 28 13:29 ..
-rw-r--r-- 1 user users 0 Sep 28 13:29 test
user#802542b3ccb195b001258094dc543606-1601299740-bw2hj:~$
Here is the nfs mount:
Filesystem Size Used Avail Use% Mounted on
10.110.46.205:/export/pvc-2e433dc6-018d-11eb-be1a-0242766f1f7c 252G 134G 107G 56% /output
The same behavior is observed when using regular volumes. I am using the Docker driver but also observed this with the virtualbox driver. Is this a minikube issue? I would expect mkdir to error out if it can't complete.
minikube version: v1.13.1
commit: 1fd1f67f338cbab4b3e5a6e4c71c551f522ca138-dirty

Monitor kafka with Prometheus and Grafana

I have followed the below steps to monitor kafka with Prometheus and Grafana.
jmx port not get opened
wget http://ftp.heanet.ie/mirrors/www.apache.org/dist/kafka/0.10.1.0/kafka_2.11-0.10.1.0.tgz
tar -xzf kafka_*.tgz
cd kafka_*
wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.6/jmx_prometheus_javaagent-0.6.jar
wget https://raw.githubusercontent.com/prometheus/jmx_exporter/master/example_configs/kafka-0-8-2.yml
./bin/zookeeper-server-start.sh config/zookeeper.properties &
KAFKA_OPTS="$KAFKA_OPTS -javaagent:$PWD/jmx_prometheus_javaagent-0.6.jar=7071:$PWD/kafka-0-8-2.yml"
./bin/kafka-server-start.sh config/server.properties &
Then i have the checked with curl http://localhost:7071/metrics in the terminal
it reports curl: (7) Failed connect to localhost:7071; Connection refused
Currently i have opened all my ports to my network in the server.
while i m checking with netstat -tupln | grep LISTEN
port number 7071 was not listed in the output
The below is the kafka directory's contents:
drwxr-xr-x. 3 root root 4096 Aug 23 12:22 bin
drwxr-xr-x. 2 root root 4096 Oct 15 2016 config
-rw-r--r--. 1 root root 20356 Aug 21 10:50 hs_err_pid1496.log
-rw-r--r--. 1 root root 19432 Aug 21 10:55 hs_err_pid2447.log
-rw-r--r--. 1 root root 1225418 Feb 5 2016 jmx_prometheus_javaagent-0.6.jar
-rw-r--r--. 1 root root 2824 Aug 21 10:48 kafka-0-8-2.yml
drwxr-xr-x. 2 root root 4096 Aug 21 10:48 libs
-rw-r--r--. 1 root root 28824 Oct 5 2016 LICENSE
drwxr-xr-x. 2 root root 4096 Oct 11 15:05 logs
-rw-------. 1 root root 8453 Aug 23 12:08 nohup.out
-rw-r--r--. 1 root root 336 Oct 5 2016 NOTICE
drwxr-xr-x. 2 root root 46 Oct 15 2016 site-docs
kafka is running in 2181 port and zookeeper is also running
If you do not mind opening up the jmx port, you can also do it like this:
export JMX_PORT=9999
export KAFKA_JMX_OPTS=-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.rmi.port=9999'
./bin/kafka-server-start.sh config/server.properties &
java -jar jmx_prometheus_httpserver-0.10-jar-with-dependencies.jar 9300 kafka-0-8-2.yaml &
The jar-with-dependencies you build from the source with mvn package.
I had the same problem when setting KAFKA_OPTS environment variable in the bash. The worse situation is when you add the environment variable to ~/.profile file. The problem with this approach is that the KAFKA_OPTS is used for both kafka-server-start.sh and zookeeper-server-start.sh so when you start Zookeeper, port 7071 will be used by Zookeeper for exporting metrics. Then, when you run Kafka you will receive the "7071 port is in use error".
I solved the problem by setting the environment at systemd service file. I described it at my article last week:
[Unit]
...
[Service]
...
Restart=no
Environment=KAFKA_OPTS=-javaagent:/home/morteza/myworks/jmx_prometheus_javaagent-0.9.jar=7071:/home/morteza/myworks/kafka-2_0_0.yml
[Install]
...

Unable to start MongoDB 3.0.2 service on CentOS 7

We are setting up a MongoDB server for the production environment on Amazon EC2 instance, but could not able to start the service. I've followed this documentation for setup. Here are the steps, I've taken for setting up the server:
Added following to /etc/yum.repos.d/mongodb-org-3.0.repo
[mongodb-org-3.0]
name=MongoDB Repository
baseurl=http://repo.mongodb.org/yum/redhat/$releasever/mongodb-org/3.0/x86_64/
gpgcheck=0
enabled=1
And installed MongoDB 3.0.2 using sudo yum install -y mongodb-org-3.0.2
Created three partitions for data, journal & log:
sudo mkdir /mongo
sudo mkdir /mongo/data
sudo mkdir /mongo/log
sudo mkdir /mongo/journal
Created file system for three separate partitions:
sudo mkfs.ext4 /dev/xvdb
sudo mkfs.ext4 /dev/xvdc
sudo mkfs.ext4 /dev/xvdd
Created entry in fstab for reboot:
echo '/dev/xvdb /mongo/data ext4 defaults,auto,noatime,noexec 0 0
/dev/xvdc /mongo/journal ext4 defaults,auto,noatime,noexec 0 0
/dev/xvdd /mongo/log ext4 defaults,auto,noatime,noexec 0 0' | sudo tee -a /etc/fstab
And mounted the partitions:
sudo mount /mongo/data
sudo mount /mongo/journal
sudo mount /mongo/log
Given the permissions and created link
sudo chown mongod:mongod /mongo/data /mongo/journal /mongo/log
sudo ln -s /mongo/journal /mongo/data/journal
Configured ulimit & read ahead settings as given in the documentation link above. Verified permissions and partitions:
[deployer#prod-mongo ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 8.0G 1.3G 6.8G 16% /
devtmpfs 3.6G 0 3.6G 0% /dev
tmpfs 3.5G 0 3.5G 0% /dev/shm
tmpfs 3.5G 57M 3.4G 2% /run
tmpfs 3.5G 0 3.5G 0% /sys/fs/cgroup
/dev/xvdc 7.8G 36M 7.3G 1% /mongo/journal
/dev/xvdb 150G 51M 149G 1% /mongo/data
/dev/xvdd 3.9G 16M 3.6G 1% /mongo/log
Permissions:
[deployer#prod-mongo ~]$ ll /
total 32
lrwxrwxrwx. 1 root root 7 Sep 29 2014 bin -> usr/bin
dr-xr-xr-x. 4 root root 4096 Sep 29 2014 boot
drwxr-xr-x. 17 root root 2860 May 11 12:11 dev
lrwxrwxrwx. 1 root root 7 Sep 29 2014 lib -> usr/lib
lrwxrwxrwx. 1 root root 9 Sep 29 2014 lib64 -> usr/lib64
drwxr-xr-x. 2 root root 6 Jun 10 2014 mnt
drwxr-xr-x. 5 mongod mongod 41 May 11 05:06 mongo
drwxr-xr-x. 21 root root 660 May 11 12:47 run
lrwxrwxrwx. 1 root root 8 Sep 29 2014 sbin -> usr/sbin
Inside /mongo
[deployer#prod-mongo ~]$ ll /mongo/
total 12
drwxr-xr-x. 3 mongod mongod 4096 May 11 07:33 data
drwxr-xr-x. 3 mongod mongod 4096 May 11 07:31 journal
drwxr-xr-x. 3 mongod mongod 4096 May 11 08:58 log
After changing the configurations inside /etc/mongodb.conf
logpath=/mongo/log/mongod.log
dbpath=/mongo/data
and when I'm doing: sudo service mongod start, I'm getting this error:
Starting mongod (via systemctl): Job for mongod.service failed. See 'systemctl status mongod.service' and 'journalctl -xn' for details.
[FAILED]
Further logging:
[deployer#prod-mongo ~]$ sudo systemctl status mongod.service
mongod.service - SYSV: Mongo is a scalable, document-oriented database.
Loaded: loaded (/etc/rc.d/init.d/mongod)
Active: failed (Result: exit-code) since Tue 2015-05-12 04:42:10 UTC; 42s ago
Process: 22881 ExecStart=/etc/rc.d/init.d/mongod start (code=exited, status=1/FAILURE)
May 11 04:42:10 ip-xx-xx-xx-xx.local runuser[22887]: pam_unix(runuser:session): session opened for user mongod by (uid=0)
May 11 04:42:10 ip-xx-xx-xx-xx.localdomain runuser[22887]: pam_unix(runuser:session): session closed for user mongod
May 11 04:42:10 ip-xx-xx-xx-xx.local mongod[22881]: Starting mongod: [FAILED]
May 11 04:42:10 ip-xx-xx-xx-xx.local systemd[1]: mongod.service: control process exited, code=exited status=1
May 11 04:42:10 ip-xx-xx-xx-xx.local systemd[1]: Failed to start SYSV: Mongo is a scalable, document-oriented database..
May 11 04:42:10 ip-xx-xx-xx-xx.local systemd[1]: Unit mongod.service entered failed state.
I've followed various articles and blog posts and StackExchange answers but didn't get any solution. Am I missing something?
Update: If I'm directly running the mongodb service from the normal user something like this: sudo mongod --logpath ~/mongod.log --dbpath ~/mongodata, then this service is starting properly.
We tried changing the path of the pid file to another directory, that didn't help either.
I'm guessing you're running a flavour of Linux that uses SELinux (RHEL or CentOS 7, perhaps?)
If so, the issue is that you don't have a permissive policy on your /mongo/ directory that permits access to daemons (like the mongod service.)
From Wikipedia:
SELinux can potentially control which activities a system allows each
user, process and daemon, with very precise specifications. However,
it is mostly used to confine daemons[citation needed] like database
engines or web servers that have more clearly defined data access and
activity rights. This limits potential harm from a confined daemon
that becomes compromised. Ordinary user-processes often run in the
unconfined domain, not restricted by SELinux but still restricted by
the classic Linux access rights
To check whether this is the issue, try this at the shell:
sudo setenforce 0
This should disable SELinux policies and allow the service to run.
For a more permanent solution, see https://wiki.centos.org/HowTos/SELinux
I ran into this problem and actually found a solution for me.
In short, mongodb 3.2 uses the user 'mongod' while older versions use 'mongodb'. Some of the files and directories were owned by 'mongodb' (the older user). Once I chmod'd them to the 'mongod' user, I was able to use systemctl to control the mongod process.
More specifically, it was the "/var/log/mongodb/*" files that had the wrong user ownership.
root#<HOST>:# ls -alh /var/log/mongodb
total 664K
drwxr-xr-x 2 mongod mongod 4.0K Oct 27 12:08 .
drwxr-xr-x. 22 root root 4.0K Oct 27 11:51 ..
-rw-r--r-- 1 mongodb mongodb 3.8K Oct 27 11:48 mongod.log
-rw-r--r-- 1 mongodb mongodb 19K Apr 14 2016 mongod.log.2016-04-14T18-29-34
-rw-r--r-- 1 mongodb mongodb 2.8K Apr 14 2016 mongod.log.2016-04-14T18-30-13
-rw-r--r-- 1 mongodb mongodb 12K Apr 14 2016 mongod.log.2016-04-14T22-27-27
-rw-r--r-- 1 mongodb mongodb 11K Apr 14 2016 mongod.log.2016-04-14T22-29-12
-rw-r--r-- 1 mongodb mongodb 5.6K Apr 18 2016 mongod.log-20160418.gz
-rw-r--r-- 1 mongodb mongodb 0 Apr 18 2016 mongod.log.2016-09-09T17-33-48
-rw-r--r-- 1 mongodb mongodb 3.6K Sep 9 11:34 mongod.log.2016-09-09T17-34-52
-rw-r--r-- 1 mongodb mongodb 23K Sep 9 11:49 mongod.log.2016-09-09T17-49-49
-rw-r--r-- 1 mongodb mongodb 5.0K Sep 9 11:55 mongod.log.2016-09-09T17-55-15
-rw-r--r-- 1 mongodb mongodb 5.0K Sep 9 12:02 mongod.log.2016-09-09T18-02-26
-rw-r--r-- 1 mongodb mongodb 5.0K Sep 9 12:13 mongod.log.2016-09-09T18-13-17
-rw-r--r-- 1 mongodb mongodb 5.0K Sep 9 12:25 mongod.log.2016-09-09T18-25-01
-rw-r--r-- 1 mongodb mongodb 5.2K Sep 9 12:47 mongod.log.2016-09-09T18-47-54
-rw-r--r-- 1 mongodb mongodb 5.0K Sep 9 12:52 mongod.log.2016-09-09T18-52-16
-rw-r--r-- 1 mongodb mongodb 5.0K Sep 9 12:54 mongod.log.2016-09-09T18-54-49
-rw-r--r-- 1 mongodb mongodb 5.0K Sep 9 13:01 mongod.log.2016-09-09T19-01-22
-rw-r--r-- 1 mongodb mongodb 3.0K Sep 9 13:03 mongod.log.2016-09-09T19-03-21
-rw-r--r-- 1 mongodb mongodb 215K Sep 9 14:25 mongod.log.2016-09-09T20-25-59
-rw-r--r-- 1 mongodb mongodb 281K Sep 10 03:42 mongod.log-20160910
-rw-r--r-- 1 mongodb mongodb 0 Sep 10 03:42 mongod.log.2016-10-27T17-42-42
-rw-r----- 1 mongod mongod 0 Sep 29 22:03 mongod.log.rpmnew
Notice the owner of the directory is 'mongod' (the new user) while the log files are all owned by 'mongodb' (the old user).
In case, anyone encountered the same issue with MongoDB startup, here is the thread of comments https://jira.mongodb.org/browse/SERVER-18439. This is scheduled to be fixed in 3.1.