kubernetes pod high cache memory usage - kubernetes

I have a java process which is running on k8s.
I set Xms and Xmx to process.
java -Xms512M -Xmx1G -XX:SurvivorRatio=8 -XX:NewRatio=6 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -jar automation.jar
My expectation is that pod should consume 1.5 or 2 gb memory, but it consume much more, nearly 3.5gb. its too much.
if ı run my process on a virtual machine, it consume much less memory.
When ı check memory stat for pods, ı reliase that pod allocate too much cache memory.
Rss nearly 1.5GB is OK. Because Xmx is 1gb. But why cache nearly 3GB.
is there any way to tune or control this usage ?
/app $ cat /sys/fs/cgroup/memory/memory.stat
cache 2881228800
rss 1069154304
rss_huge 446693376
mapped_file 1060864
swap 831488
pgpgin 1821674
pgpgout 966068
pgfault 467261
pgmajfault 47
inactive_anon 532504576
active_anon 536588288
inactive_file 426450944
active_file 2454777856
unevictable 0
hierarchical_memory_limit 16657932288
hierarchical_memsw_limit 9223372036854771712
total_cache 2881228800
total_rss 1069154304
total_rss_huge 446693376
total_mapped_file 1060864
total_swap 831488
total_pgpgin 1821674
total_pgpgout 966068
total_pgfault 467261
total_pgmajfault 47
total_inactive_anon 532504576
total_active_anon 536588288
total_inactive_file 426450944
total_active_file 2454777856
total_unevictable 0

A Java process may consume much more physical memory than specified in -Xmx - I explained it in this answer.
However, in your case, it's not even the memory of a Java process, but rather an OS-level page cache. Typically you don't need to care about the page cache, since it's the shared reclaimable memory: when an application wants to allocate more memory, but there is not enough immediately available free pages, the OS will likely free a part of the page cache automatically. In this sense, page cache should not be counted as "used" memory - it's more like a spare memory used by the OS for a good purpose while application does not need it.
The page cache often grows when an application does a lot of file I/O, and this is fine.
Async-profiler may help to find the exact source of growth:
run it with -e filemap:mm_filemap_add_to_page_cache
I demonstrated this approach in my presentation.

Related

Kubernetes physical memory requests and limits and linux virtual memory

In Kubernetes, is it possible to enforce virtual memory (physical page swapping to disk) on a pod/container with memory requests and limits set?
For instance, as per the Kubernetes documentation, “if you set a memory limit of 4GiB for a container, the kubelet (and container runtime) enforce the limit. The runtime prevents the container from using more than the configured resource limit. For example: when a process in the container tries to consume more than the allowed amount of memory, the system kernel terminates the process that attempted the allocation, with an out of memory (OOM) error.”
Hence, is it possible to configure the pod (and hence linux kernel) to enforce virtual memory (that is paging and memory swapping ) on the specified physical memory limits of the pod (4GiB) instead of OOM error? am I missing something?
Reading the kernel documentation on this leads me to believe this is not possible. And I don't think this is a desirable behavior. Let's just think about the following scenario: You have a machine with 64GB of physical memory with 10GB of those used. Then you start a process with a "physical" memory limit of 500MB. If this memory limit is reached the kernel would start swapping and the process would stall even though there is enough memory available to service the memory requests of the process.
The memory limit you specify on the container is actually not a physical memory limit, but a virtual memory limit with overcommit allowed. This means your process can allocate as much memory as it wants (until you reach the overcommit limit), but it gets killed as soon as it tries to use too much memory.

Jenkins and PostgreSQL is consuming a lot of memory

We have a Data ware house server running on Debian linux ,We are using PostgreSQL , Jenkins and Python.
It's been few day the memory of the CPU is consuming a lot by jenkins and Postgres.tried to find and check all the ways from google but the issue is still there.
Anyone can give me a lead on how to reduce this memory consumption,It will be very helpful.
below is the output from free -m
total used free shared buff/cache available
Mem: 63805 9152 429 16780 54223 37166
Swap: 0 0 0
below is the postgresql.conf file
Below is the System configurations,
Results from htop
Please don't post text as images. It is hard to read and process.
I don't see your problem.
Your machine has 64 GB RAM, 16 GB are used for PostgreSQL shared memory like you configured, 9 GB are private memory used by processes, and 37 GB are free (the available entry).
Linux uses available memory for the file system cache, which boosts PostgreSQL performance. The low value for free just means that the cache is in use.
For Jenkins, run it with these JAVA Options
JAVA_OPTS=-Xms200m -Xmx300m -XX:PermSize=68m -XX:MaxPermSize=100m
For postgres, start it with option
-c shared_buffers=256MB
These values are the one I use on a small homelab of 8GB memory, you might want to increase these to match your hardware

CPU usage of Jboss JVM goes upto 99% and stays there

I am doing load testing on my application using jmeter and I have a situation where the cpu usage by the applications jvm goes to 99% and it stays there. Application still work, I am able to login and do some activity. But, it’s understandably slower.
Details of environment:
Server: AMD Optrom, 2.20 Ghz, 8 Core, 64bit, 24 GB RAM. Windows Server 2008 R2 Standard
Application server: jboss-4.0.4.GA
JAVA: jdk1.6.0_25, Java HotSpot(TM) 64-Bit Server VM
JVM settings:
-Xms1G -Xmx10G -XX:MaxNewSize=3G -XX:MaxPermSize=12G -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+UseCompressedOops -Dsun.rmi.dgc.client.gcInterval=1800000 -Dsun.rmi.dgc.server.gcInterval=1800000
Database: MySql 5.6 (in a different machine)
Jmeter: 2.13
My scenario is that, I make 20 users of my application to log into it and perform normal activity that should not be bringing huge load. Some, minutes into the process, JVM of Jboss goes up and it never comes back. CPU usage will remain like that till JVM is killed.
To help better understand, here are few screen shots.
I found few post which had cup # 100%, but nothing there was same as my situation and could not find a solution.
Any suggestion on what’s to be done will be great.
Regards,
Sreekanth.
To understand the root cause of the high CPU utilization, we need to check the CPU data and thread dumps at same time.
Capture 5-6 thread dumps at the time of the issue. Similarly capture CPU consumption thread-by-thread basis.
Generally the root cause of the CPU issue would be problems with threads like BLOCKED threads, long running threads, dead-lock, long running loops etc. That can be resolved by going through the stacks of the threads.

Varnish restarting suddenly

Does varnish keep a crash / restart log?
I am currently monitoring a varnish server and it seems to restart every week or so, when CPU usage reaches about 100% (load gets a bit high - about 6~7 on a 2 cores machine) and IO wait takes an avg of 45% of CPU time.
Am I missing any configuration or predefined behavior? Does it mean that I have a bottleneck in my hardware causing varnish failures?
Thanks!
When the child dies you should see a message in syslog. It will say something like Child exited.... Varnish is good about keeping track of the child, so when it does crash it will be immediately restarted and it should log it.
Load of 6-7 seems high. If you are using file backed storage I suggest switching to malloc. If you need more cache space, get a box with more memory. Use the nuking behavior as your guide (varnishstat -1 | grep nuke). If the value there reported by varnish is 0 your cache size is sufficient.

How to grab a full memory dump of a large memory usage

I am hosting IIS based web service applications on Windows 2008 64-bit system running on a Quad core 8G machine. Ran into couple of instances when W3WP was running at 7.6G of memory usage. Nothing else was responding on the system including RDP. Right click on the process from the task manager and creating the dumps, froze the system and all its threads for a long time (close to 30minutes). When the freeze up occurred during off hours, we let the dump run for a while (ran close to 1 hour) but still dump didn't complete. In the interest of getting the system up, we had to kill IIS
Tried other tools like procexp, debug diag etc to create full memory dump and all have the same results
So, what tool does the community use to grab dump files quickly? Or without freezing all the threads? I realize latter might be a rhetorical question. But what are the options for generating such a large dump file without locking up the system for a long time?
IMO you shouldn't have to wait until the process memory grows to 8 GB. I am sure with something like 3 - 4 GB you should be able to detect the memory leak.
Procdump has an option based on memory threshold
-m Memory commit threshold in MB at which to create a dump of the process.
I would you this option to dump the memory of the process.
And also SSD would help in writing faster.
WPA a.k.a xperf (http://msdn.microsoft.com/en-us/performance/cc825801.aspx) is a powerfull tool, to diagnose the applications. You will get call stack of the culprit allocation. You dont have to collect the dump and it is no-invasive and does not load much in production systems
Complete step by step information is available here. http://msdn.microsoft.com/en-us/library/ff190906(v=VS.85).aspx.