I installed Kubernetes on my Ubuntu machine. For some debugging purposes I need to look at the kubelet log file (if there is any such file).
I have looked in /var/logs but I couldn't find a such file. Where could that be?
If you run kubelet using systemd, then you could use the following method to see kubelet's logs:
# journalctl -u kubelet
If you are trying to go directly to the file you can find the kubelet logs in /var/log/syslog directory. This is for ubuntu 16.04 and above.
It depends how it was installed. I installed Kubernetes on some Ubuntu machines following the Docker-MultiNode instructions.
With this install, I find the logs using the logs command like this.
Find your container ID.
$ docker ps | egrep kubelet
Use that container ID to view the logs
$ docker logs `<container-id>`
Finally I could find it in /var/log/upstart directory. Kubernetes in my machine is started using upstart. That's why those log files are in upstart directory
I installed Kubernetes by kind (Kubernetes in docker).
find docker container of kind to enter
$ docker container ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
62588e4d284b kindest/node:v1.17.0 "/usr/local/bin/entr…" 2 weeks ago Up 2 weeks 127.0.0.1:32769->6443/tcp kind2-control-plane
$ docker container exec -it kind2-control-plane bash
root#kind2-control-plane:/#
Inside container kind2-control-plane, you could find logfiles in two place:
/var/log/containers/
/var/log/pods/
And then,you will find they are the same, you can see the example below:
root#kind2-control-plane:/# cat /var/log/containers/redis-master-7db7f6579f-scw95_default_master-f6374281c2c6afcfcd0ee1214d9bd51c1684c0b6c0ba1056295246ecd055563c.log | tail -n 5
2020-04-08T12:09:29.824252114Z stdout F
2020-04-08T12:09:29.824372278Z stdout F [1] 08 Apr 12:09:29.822 # Server started, Redis version 2.8.19
2020-04-08T12:09:29.824440661Z stdout F [1] 08 Apr 12:09:29.823 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
2020-04-08T12:09:29.824459317Z stdout F [1] 08 Apr 12:09:29.823 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
2020-04-08T12:09:29.82446451Z stdout F [1] 08 Apr 12:09:29.824 * The server is now ready to accept connections on port 6379
root#kind2-control-plane:/# cat /var/log/pods/default_redis-master-7db7f6579f-scw95_094824e1-25aa-4e1e-ab23-d4bae861988a/master/0.log | tail -n 5
2020-04-08T12:09:29.824252114Z stdout F
2020-04-08T12:09:29.824372278Z stdout F [1] 08 Apr 12:09:29.822 # Server started, Redis version 2.8.19
2020-04-08T12:09:29.824440661Z stdout F [1] 08 Apr 12:09:29.823 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
2020-04-08T12:09:29.824459317Z stdout F [1] 08 Apr 12:09:29.823 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
2020-04-08T12:09:29.82446451Z stdout F [1] 08 Apr 12:09:29.824 * The server is now ready to accept connections on port 6379
root#kind2-control-plane:/# ls -l /var/log/containers/ | grep redis
lrwxrwxrwx 1 root root 101 Apr 8 12:09 redis-master-7db7f6579f-scw95_default_master-f6374281c2c6afcfcd0ee1214d9bd51c1684c0b6c0ba1056295246ecd055563c.log -> /var/log/pods/default_redis-master-7db7f6579f-scw95_094824e1-25aa-4e1e-ab23-d4bae861988a/master/0.log
If you want to know more in detail about the directories, you can see 2019-2-merge-request in Github.
Related
There is a container in my Kubernetes cluster which I want to debug.
But there is nonetstat, no ip and no apk.
Is there a way to upgrade this image, so that the common tools are installed?
In this case it is the nginx container image in a K8s 1.23 cluster.
Alpine is a stripped-down version of the image to reduce the footprint. So the absence of those tools is expected. Although since Kubernetes 1.23, you can use the kubectl debug command to attach a debug pod to the subject pod.
Syntax:
kubectl debug -it <POD_TO_DEBUG> --image=ubuntu --target=<CONTAINER_TO_DEBUG> --share-processes
Example:
In the below example, the ubuntu container is attached to the Nginx-alpine pod, requiring debugging. Also, note that the ps -eaf output shows nginx process running and the cat /etc/os-release shows ubuntu running. The indicating process is shared/visible between the two containers.
ps#kube-master:~$ kubectl debug -it nginx --image=ubuntu --target=nginx --share-processes
Targeting container "nginx". If you don't see processes from this container, the container runtime doesn't support this feature.
Defaulting debug container name to debugger-2pgtt.
If you don't see a command prompt, try pressing enter.
root#nginx:/# ps -eaf
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 19:50 ? 00:00:00 nginx: master process nginx -g daemon off;
101 33 1 0 19:50 ? 00:00:00 nginx: worker process
101 34 1 0 19:50 ? 00:00:00 nginx: worker process
101 35 1 0 19:50 ? 00:00:00 nginx: worker process
101 36 1 0 19:50 ? 00:00:00 nginx: worker process
root 248 0 1 20:00 pts/0 00:00:00 bash
root 258 248 0 20:00 pts/0 00:00:00 ps -eaf
root#nginx:/#
Debugging as ubuntu as seen here, this arm us with all sort of tools:
root#nginx:/# cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.3 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.3 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
root#nginx:/#
In case ephemeral containers need to be enabled in your cluster, then you can enable it via feature gates as described here.
The whole point of using containers is to optimize the resource utilization in your cluster. The images used should only include the packages that are needed to run your app.
The unwanted packages should be removed from your images (especially in prod) to reduce the compute utilization and to reduce the attack vector.
This appears to be a stripped down image that has only the libraries needed to run that application.
In order to debug, you will have to create a new container in the same pid and network namespace as the container you are trying to debug
Build container first
Dockerfile
FROM alpine
RUN apk update && apk add strace
CMD ["strace", "-p", "1"]
Build
$ docker build -t strace .
Run
docker run -t --pid=container:<targetContainer> \
--net=container:targetContainer \
--cap-add sys_admin \
--cap-add sys_ptrace \
strace
strace: Process 1 attached
futex(0xd72e90, FUTEX_WAIT, 0, NULL
https://rothgar.medium.com/how-to-debug-a-running-docker-container-from-a-separate-container-983f11740dc6
I have a service that, in some contexts, sends requests to itself.
I can reach the service from outside the cluster, but the self-requests fail (time-out).
Environment:
minikube v0.34.1
Linux version 4.15.0 (jenkins#jenkins) (gcc version 7.3.0 (Buildroot 2018.05)) #1 SMP Fri Feb 15 19:27:06 UTC 2019
I've been using https://kubernetes.io/docs/tasks/debug-application-cluster/debug-service/#a-pod-cannot-reach-itself-via-service-ip as a troubleshooting guide, but I'm down the step that says "seek help".
Troubleshooting results:
journalctl -u kubelet | grep -i hairpin
Feb 26 19:57:10 minikube kubelet[3066]: W0226 19:57:10.124151 3066 docker_service.go:540] Hairpin mode set to "promiscuous-bridge" but kubenet is not enabled, falling back to "hairpin-veth"
Feb 26 19:57:10 minikube kubelet[3066]: I0226 19:57:10.124295 3066 docker_service.go:236] Hairpin mode set to "hairpin-veth"
The troubleshooting guide indicates that "hairpin-veth" is OK.
for intf in /sys/devices/virtual/net/docker0/brif/veth*; do cat $intf/hairpin_mode; done
0
...
0
Note that the guide used /sys/devices/virtual/net/cbr0/brif/*, but in this version of minikube, the path is /sys/devices/virtual/net/docker0/brif/veth*. I'd like to understand why the paths are different, but it appears that hairpin_mode is not enabled.
The next step in the guide is: Seek help if none of above works out.
Am I correct in believing that I need to enable hairpin_mode?
If so, how do I do so?
It seems like known issue, more information here:
As workaround you can try:
minikube ssh -- sudo ip link set docker0 promisc on
Please share with the reulsts.
While trying to run a puppet update form a node:
sudo /opt/puppetlabs/bin/puppet agent -t
I get an error:
Error: Could not retrieve catalog; skipping run
Error: Could not send report: Connection refused - connect(2) for "puppet" port 8140`
Elsewhere indicates this is likely a problem with the puppetserver service, and suggests to reboot the server. Restarting didn't help, and when I try to restart the service I get failure:
~$ sudo service puppetserver restart
Job for puppetserver.service failed because the control process exited with error code. See "systemctl status puppetserver.service" and "journalctl -xe" for details.
I've looked at these logs, and as a puppet/linux noob, I'm not sure what to do next.
systemctl status puppetserver.service
● puppetserver.service - puppetserver Service
Loaded: loaded (/lib/systemd/system/puppetserver.service; enabled; vendor preset: enabled)
Active: activating (start-post) since Fri 2016-09-02 15:54:26 PDT; 2s ago
Process: 22301 ExecStartPre=/usr/bin/install --directory --owner=puppet --group=puppet --mode=775 /var/run/puppetlabs/puppetserver (code=exited
Main PID: 22306 (java); : 22307 (bash)
Tasks: 17
Memory: 335.7M
CPU: 5.535s
CGroup: /system.slice/puppetserver.service
├─22306 /usr/bin/java -Xms6g -Xmx6g -XX:MaxPermSize=256m -XX:OnOutOfMemoryError=kill -9 %p -Djava.security.egd=/dev/urandom -cp /opt/p
└─control
├─22307 /bin/bash /opt/puppetlabs/server/apps/puppetserver/ezbake-functions.sh wait_for_app
└─22331 sleep 1
Sep 02 15:54:26 puppet systemd[1]: Starting puppetserver Service...
Sep 02 15:54:26 puppet java[22306]: OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
puppet version 4.6.1
The puppet master communicates with the other node using port number 8140.
I don't think a restart will help, since this looks like a connection issue between the server and the node.
please try the following -
first make sure that the puppet master is actually listening on port 8140. run the following command on the puppetmaster -
netstat -ntlp | grep 8140
this command should return something like this -
tcp 0 0 0.0.0.0:8140 0.0.0.0:* LISTEN 1783/puppetmaster
If you don't get the same output, your puppetmaster is not listening, and therefore can not compile catalogs for the node.
Try checking the puppet master log at /var/log/puppetmaster.log
check that the node can communicate with the puppetmaster on the relevant port. you can check this quickly with the telnet command. run this on your node -
telnet < puppetmaster ip address \ dns name> 8140
you should get something like -
Connected to <puppet-master-IP/DNS-name>
Escape character is '^]'.
if you don't get this output, this means that something is blocking you from accessing the puppetmaster. try opening the port in your firewall to access the puppetmaster.
if you're still stuck try using the --debug flag for verbose output and edit your question.
Could be 2 things: (1) in puppet.conf you have configured more memory than you have on your machine. Or (2) You installed both apt-get install puppetserver and apt-get install puppet.
If you get failed to start puppet.service: unit not found. error on slave machine while connecting to puppet.
Close the putty and then again open and connect it.The issue wont come while starting putty on slave.
The error occurs because there is not enough RAM and to fix the error, open the Puppet server configuration file:
sudo nano /etc/sysconfig/puppetserver
And reduce the amount of allocated RAM for the Puppet server (for example, I specified 512m instead of 2g):
JAVA_ARGS="-Xms512m -Xmx512m"
Now let’s start the Puppet server:
sudo systemctl start puppetserver
I installed haproxy from aur in Arch Linux and modified the config file a bit:
global
maxconn 20000
log 127.0.0.1 local0
user haproxy
stats socket /run/haproxy/haproxy.sock mode 660 level admin
stats timeout 30s
chroot /usr/share/haproxy
pidfile /run/haproxy.pid
daemon
defaults
mode http
stats enable
stats uri /stats
stats realm Haproxy\ Statistics
frontend www-http
bind 127.0.0.1:80
default_backend www-backend
backend www-backend
mode http
balance roundrobin
timeout connect 5s
timeout server 30s
timeout queue 30s
server app1 127.0.0.1:5001 check
server app2 127.0.0.1:5002 check
I have made sure that the directory /run/haproxy exists and has permissions for the user haproxy to write to it:
ツ ls -al /run/haproxy
total 0
drwxr-xr-x 2 haproxy root 40 May 13 21:37 .
drwxr-xr-x 27 root root 720 May 13 22:00 ..
When I launch haproxy using systemctl start haproxy.service, it loads fine. I can even go to the /stats page and view stats, however, socat reports the following error:
ツ sudo socat unix-connect:/run/haproxy/haproxy.sock stdio
2016/05/13 22:04:11 socat[24202] E connect(5, AF=1 "/run/haproxy/haproxy.sock", 27): No such file or directory
I am at wits end and not able to understand what is happening. This is what I get from journalctl -xe:
May 13 21:56:31 rohanarch.local systemd[1]: Starting HAProxy Load Balancer...
May 13 21:56:31 rohanarch.local systemd[1]: Started HAProxy Load Balancer.
May 13 21:56:31 rohanarch.local haproxy-systemd-wrapper[20454]: haproxy-systemd-wrapper: executing /usr/bin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds
May 13 21:56:31 rohanarch.local haproxy-systemd-wrapper[20454]: [WARNING] 133/215631 (20456) : config : missing timeouts for frontend 'www-http'.
May 13 21:56:31 rohanarch.local haproxy-systemd-wrapper[20454]: | While not properly invalid, you will certainly encounter various problems
May 13 21:56:31 rohanarch.local haproxy-systemd-wrapper[20454]: | with such a configuration. To fix this, please ensure that all following
May 13 21:56:31 rohanarch.local haproxy-systemd-wrapper[20454]: | timeouts are set to a non-zero value: 'client', 'connect', 'server'.
Basically, no errors/warnings or not even so much as an indication about the stats socket. Others who have faced a problem with the stats socket fail to get haproxy started. In my case, it starts up fine, but the socket just isn't creating.
You need to manually create the directory yourself. Please ensure
/run/haproxy exists. If it doesn't, then first create it with:
sudo mkdir /run/haproxy
This should resolve your issue.
try to make selinux permissive with the command belowe and restart HAproxy service.
selinux command
I am new to Zookeeper and it has being a real issue to install it and run. I am not sure what is wrong in here but I will explain what I've being doing to make it more clear:
1.- I've followed the installation guide provided by Apache. This means download the Zookeeper distribution (stable release) extracted the file and moved into the home directory.
2.- As I am using Ubuntu 12.04 I've modified the .bashrc file including this:
export ZOOKEEPER_INSTALL=/home/myusername/zookeeper-3.4.5
export PATH=$PATH:$ZOOKEEPER_INSTALL/bin
3.- Create a config file on conf/zoo.cfg
tickTime=2000
dataDir=/var/zookeeper
clientPort=2181
and also tried with:
dataDir=/var/log/zookeeper
and
dataDir=/var/bin/zookeeper
4.- When running the start command
zkServer.sh start or `bin/zkServer.sh start` nothing happens and always returns this
JMX enabled by default
Using config: /home/sasuke/zookeeper-3.4.5/bin/../conf/zoo.cfg
mkdir: cannot create directory `/var/zookeeper': Permission denied
Starting zookeeper ... /home/sasuke/zookeeper-3.4.5/bin/zkServer.sh: line 113: /var/zookeeper/zookeeper_server.pid: No such file or directory
FAILED TO WRITE PID
I have Java installed and inside the zookeper directory there is a zookeeper.jar file that I think it's not running.
Checking here on stackoverflow there was a guy that said he could run zookeeper after typing
ssh localhost
But when I try to do it I get this error
ssh: connect to host localhost port 22: Connection refused
Please help. I've being here trying to solve it for too long.
Getting started guide of zookeeper:
http://zookeeper.apache.org/doc/r3.1.2/zookeeperStarted.html
Previous case solved with the shh localhost
Zookeeper: FAILED TO WRITE PID
UPDATE:
The permissions for log are:
drwxr-xr-x 19 root root 4096 Oct 10 07:52 log
and for zookeeper:
drwxr-xr-x 2 zookeeper zookeeper 4096 Mar 23 2012 zookeeper
Should I change any of these?
I have had the same problem. In my case was useful to start Zookeeper and directly specify a configuration file:
/bin/zkServer.sh start conf/zoo.conf
It seems you do not have the required permissions. The /var/log owner is is going to be root. Zookeeper stores the process id and snapshot of data in that directory. The process id of the spawned zookeeper server is stored in a file -zookeeper_server.pid (as of 3.3.6)
If you have root previleges, you could start zookeeper with sudo (root) previleges, it should work but definitely not recommended. Make sure you start zookeeper with the same(or higher) permissions as the owner of the directory.
Create a new directory in your home folder like /home/username/zookeeper-data.
Let dataDir point to that directory and it should work.
The default zookeeper installation (tar extract) comes with the conf file named conf/zoo_sample.cfg while the same extract's bin/zkServer.sh expects the conf file to be called zoo.cfg thereby resulting in a "No such file or dir" and the "failed to write pid" error. So before running zkServer.sh to start or stop zookeeper instance, either:
rename the zoo_sample.cfg in the conf dir to zoo.cfg, or
give the name (and path) to the conf file (as suggested by Ilya Lapitan), or, of course
edit zkServer.sh ;-)
When you create the Directory for dataDir make sure to use the -p option. This will allow subsequent directories to be created as required by the application placing files.
mkdir -p /var/log/zookeeperData
Then set:
dataDir=/var/log/zookeeperData
Seems there's all kinds of reasons this can happen. So many helpful answers here!
For me, I had improper line endings in my zoo.cfg file, and possibly invisible characters, so zookeeper was trying to create directories like /var/zookeeper? and /var/zookeeper\r. Reworking my zoo.cfg a bit fixed it for me, along with deleting zoo_sample.conf.
This happens to me due to low disk space. cause zookeeper cant create pid file inside zookeeper data folder.
I have faced the same issue while starting the zookeeper with this command:
hadoop#ubuntu:~/hadoop/zookeeper/zookeeper-3.4.8$ bin/zkServer.sh
start
ERROR [main] client.ConnectionManager$HConnectionImplementation:
The node /hbase is not in ZooKeeper.
It should have been written by the master. Check the value configured in zookeeper.znode.parent. There could be a mismatch with the one configured in the master.
But running the script as su rectified the issue:
hadoop#ubuntu:~/hadoop/zookeeper/zookeeper-3.4.8$ sudo bin/zkServer.sh
start
ZooKeeper JMX enabled by default Using config:
/home/hadoop/hadoop/zookeeper/zookeeper-3.4.8/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
Go to /usr/local/etc/
You will find zookeeper directory
delete the directory
and restart the server - zkServer start
Change the path give dataDir=/tmp/zookeeper. If it works then its clearly access issues
But its generally not advisable to use tmp directory.
This seems to be an ownership issue; running the following solved this for me.
$ sudo chown -R $USER /var/lib/zookeeper
N.B.
I've outlined my steps below which show the error I was getting (the same as the error in this SO question) and the attempt at trying the solution proposed by a user above, which advised to provide zoo.cfg as an argument.
13:01:29 ✔ ~ :: $ZK/bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/Cellar/zookeeper/3.4.14/libexec/bin/../conf/zoo.cfg
Starting zookeeper ... /usr/local/Cellar/zookeeper/3.4.14/libexec/bin/zkServer.sh: line 149: /var/lib/zookeeper/zookeeper_server.pid: Permission denied
FAILED TO WRITE PID
13:01:32 ✘ ~ :: $ZK/bin/zkServer.sh start $ZK/conf/zoo.cfg
ZooKeeper JMX enabled by default
Using config: /usr/local/Cellar/zookeeper/3.4.14/libexec/conf/zoo.cfg
Starting zookeeper ... /usr/local/Cellar/zookeeper/3.4.14/libexec/bin/zkServer.sh: line 149: /var/lib/zookeeper/zookeeper_server.pid: Permission denied
FAILED TO WRITE PID
13:04:45 ✔ /var/lib :: ls -la
total 0
drwxr-xr-x 4 root wheel 128 Apr 19 18:55 .
drwxr-xr-x 27 root wheel 864 Apr 19 18:55 ..
drwxr--r-- 3 root wheel 96 Mar 24 15:07 zookeeper
13:04:48 ✔ /var/lib :: echo $USER
tallamjr
13:06:03 ✔ /var/lib :: sudo chown -R $USER zookeeper
Password:
13:06:44 ✔ /var/lib :: ls -la
total 0
drwxr-xr-x 4 root wheel 128 Apr 19 18:55 .
drwxr-xr-x 27 root wheel 864 Apr 19 18:55 ..
drwxr--r-- 3 tallamjr wheel 96 Mar 24 15:07 zookeeper
13:06:48 ✔ ~ :: $ZK/bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/Cellar/zookeeper/3.4.14/libexec/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
REF:
- https://askubuntu.com/questions/6723/change-folder-permissions-and-ownership
For me this solution worked:
I granted the read, write and execute permissions for everyone using the command $sudo chmod 777 foldername for the directory zookeeper by going inside the directory /var (/var/zookeeper).
After executing this command try running the zookeeper. It ran in my case
try to use sudo -E bin/zkServer.sh start