I am trying to find performance bottlenecks by using the perf tool on a kubernetes pod. I have already set the following on the instance hosting the pod:
"kernel.kptr_restrict" = "0"
"kernel.perf_event_paranoid" = "0"
However, I have to problems.
When I collect samples through perf record -a -F 99 -g -p <PID> --call-graph dwarf and feed it to speedscore or similarly to a flamegraph, I still see question marks ??? and the process that I would like to see its CPU usage breakdown (C++ based), the aforementioned ??? is on the top of the stack and system calls fall below it. The main process is the one that has ??? around on it.
I tried running perf top and it says
Failed to mmap with 1 (Operation not permitted)
My questions are:
For collecting perf top, what permissions do I need to change on the host instance of the pod?
Which other settings do I need to change at the instance level so I don't see any more ??? showing up on perf's output. I would like to see the function call stack of the process, not just the system calls. See the following stack:
The host OS is ubuntu.
Zooming in on the first system call, you would see this, but this only gives me a fraction of the CPU time spent and only the system calls.
UPDATE/ANSWER:
I was able to run perf top, by setting
"kernel.perf_event_paranoid" = "-1". However, as seen in the image below, the process I'm trying to profile (I've blackened out the name to hide the name), is not showing me function names but just addresses. I try running them through addr2line, but it says addr2line: 'a.out': No such file.
How can I get the addresses resolve to function names on the pod? Is it even possible?
I was also able to fix the memory-function mapping with perf top. This was due to the fact that I was trying to run perf from a different container than where the process was running (same pod, different container). There may be a way to add extra information, but just moving the perf to the container running the process fixed it.
Related
I would like to write a script that checks that Docker has access to a minimum X amount of memory, on Windows. I need this to work with Docker running on Hyper-V.
Get-VM with hyper-v gives me memory assigned for the DockerDesktopVM of 0, I assume because it's using dynamic memory allocation. But I know Docker does have a maximum set to the memory available, i.e. the same memory limit discussed in questions like Docker won't start on Windows- Not enough memory to start Docker
Is there some way to get the memory limit assigned to the Docker container from within powershell or the command line?
Of course as soon as I asked this I found it.
(Get-VMMemory DockerDesktopVM).Startup
Our setup:
We are using kubernetes in GCP.
We have pods that write logs to a shared volume, with a sidecar container that sucks up our logs for our logging system.
We cannot just use stdout instead for this process.
Some of these pods are long lived and are filling up disk space because of no log rotation.
Question:
What is the easiest way to prevent the disk space from filling up here (without scheduling pod restarts)?
I have been attempting to install logrotate using: RUN apt-get install -y logrotate in our Dockerfile and placing a logrotate config file in /etc/logrotate.d/dynamicproxy but it doesnt seem to get run. /var/lib/logrotate/status never gets generated.
I feel like I am barking up the wrong tree or missing something integral to getting this working. Any help would be appreciated.
We ended up writing our own daemonset to properly collect the logs from the nodes instead of the container level. We then stopped writing to shared volumes from the containers and logged to stdout only.
We used fluentd to the logs around.
https://github.com/splunk/splunk-connect-for-kubernetes/tree/master/helm-chart/splunk-kubernetes-logging
In general, you should write logs to stdout and configure log collection tool like ELK stack. This is the best practice.
However, if you want to run logrotate as a separate process in your container - you may use Supervisor, which serves as a very simple init system and allows you to run as many parallel process in container as you want.
Simple example for using Supervisor for rotating Nginx logs can be found here: https://github.com/misho-kr/docker-appliances/tree/master/nginx-nodejs
If you write to the filesystem the application creating the logs should be responsible for rotation. If you are running a java application with logback or log4j it is simple configuration change. For other languages/frameworks it is usually similar.
If that is not an option you could use a specialized tool to handle the rotation and piping the output to it. One example would be http://cr.yp.to/daemontools/multilog.html
As method of last resort you could investigate to log into a named pipe (FIFO) instead of a real file and have some other process handling the retrieval and writing of the data - including the rotation.
I created a project on Google Cloud a long time ago and I am currently having some problems with it. The only result I seem to be receiving is Internal Server Error.
I tried connecting to the compute instance through ssh, but it does not help much because :
as far as I remember, I used to be able to see all the code on the compute instance. It's no longer there, the home folder only has some hidden files. I am not sure where to look for the actual project files.
the only error I managed to get from a log file was : Error syncing pod 9c8e56bc-4298-11e6-ab50, skipping: failed to "StartContainer" for "postgres" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=postgres pod=postgres_default(9c8e56bc-4298-11e6-ab50); this makes me think there are some issues with Postgres, which has a persistent disk to its own, but there seems to be no easy way to find out how much of that disk is occupied.
even though I am admin on that project and I should receive detailed (with stacktrace) emails every time there is an error, I am not receiving anything at all.
This behaviour started today, all of a sudden, and I haven't touched the project in almost 2 years, so I am completely lost.
Thanks.
How can I check the remaining size of a persistent disk on Google
Cloud?
For this part, I finally found a way to do it today. I'll describe it all here with print screens so that it is easy for anyone.
First, go to the Google Console, Disks page : https://console.cloud.google.com/compute/disks
Identify the persistent disk you are interested in. In my case, this was called pg-data-disk. Click on the respective VM instance; this will be on the column "In use by", link in the image below :
This will open a SSH connection to the VM instance to which your persistent disk is attached. In the SSH window, run the following command : sudo lsblk. The result should be like in the image below :
You will thus discover the DISK ID (in my case this was sdb), so you can now run : sudo df -h <YOUR DISK ID>. This command will give you the exact disk usage, as shown below :
As for the other part of the question, I was actually using Docker containers which were orchestrated by Kubernetes. And I totally forgot about it.
Will upgrade my RAM and get back to work.
Thank you all.
I understand the purpose of chef-client --daemonize, because it's a service that Chef Server can connect to and interact with.
But chef-solo is a command that simply brings the current system inline with specifications and then is done.
So what is the point of chef-solo --daemonize, and what specifically does it do? For example, does it autodetect when the system falls out of line with spec? Does it do so via polling or tapping into filesystem events? How does it behave if you update the cookbooks and node files it depends on when it's already running?
You might also ask why chef-solo supports the --splay and --interval arguments.
Don't forget that chef-server is not the only source of data.
Configuration values can rely on a bunch of other sources (APIs, OHAI, DNS...).
The most classic one is OHAI - think of a cookbook that configures memcached. You would probably want to keep X amount of RAM for the operating system and the rest goes to memcached.
Available RAM can be changed when running inside a VM, even without rebooting it.
That might be a good reason to run chef-solo as a daemon with frequent chef-runs, like you're used to when using chef-client with a chef-server.
As for your other questions:
Q: Does it autodetect when the system falls out of line with spec?
Does it do so via polling or tapping into filesystem events?
A: Chef doesn't respond to changes. Instead, it runs frequently and makes sure the current state is in sync with the desired state - which can be based on chef-server inventory, API calls, OHAI attributes, etc. The desired state is constructed from scratch every time you run Chef, at the compile stage when all the resources are generated. Read about it here
Q: How does it behave if you update the cookbooks and node files it depends on when it's already running?
A: Usually when running chef-solo, one uses the --json flag to specify a JSON file with node attributes and a run-list. When running in --daemonize mode with chef-solo, the node attributes are read only for the first run. For the rest of the runs, it's as if you were running it without a --json flag. I couldn't figure out a way to make it work as if you were running it with --json all over again, however, you can use the --override-runlist option to at least make the runlist stick.
Note that the attributes you're specifying in your JSON won't make it past the first run. This is possibly a bug.
I'm currently attempting to develop a sandbox using Docker. Docker spawns process through a running daemon, and I am having a great deal of trouble enabling the limits set forth in the limits.conf file such that they apply to the daemon. Specifically, I am running a forkbomb such that the daemon is the process that spawns all the new processes. The nproc limitation I placed on the user making this call doesn't seemed to get applied and I for the life of me can not figure out how to make it work. I'm quiet positive it will be as simple as adding the correct file to /etc/pam.d/, but I'm not certain.
The PAM limits only apply to processes playing nice with PAM. By default, when you start a shell in a container, it won't have anything to do with PAM, and setting limits through PAM just won't work.
Here are some other ways to make it happen!
Instead of starting your process immediately, you can start a tiny wrapper script, which will do the appropriate ulimit calls before executing your process.
If you want an interactive shell, you can run login -f <username> (e.g. login -f root); that will use the normal login process to auto-log you on the machine (and that should go through the normal PAM mechanisms).
If you want all containers to be subject to those limits, you can set the limits on your system, then restart Docker with those lower limits; containers are created by Docker, and by default, they will inherit those limits as well.