Sensu check-memory not working as per documentation - sensu

From the documentation https://github.com/sensu-plugins/sensu-plugins-memory-checks
/opt/sensu/embedded/bin/check-memory.rb -w 2500 -c 3000 - Values in
Megabytes
My config.json has
"command": "check-memory.sh -w 50000000 -c 100000000"
top reports
KiB Mem: 1014632 total, 905872 used, 108760 free, 42176 buffers
uchiwa reports
Mem Critical free system memory 475Mb
Questions
I am just not able to get the check-memory to green. The current system is at benchmark memory usage. I need to set a -w just above, so I tried -w 500 and did not work. Thats why I kept increasing the 0's. But no help.
uchiwa reports free memory, shouldn't it report used memory since the -w is set to the max used memory, confusing ?

The check says how much free memmory you have and the warning and critical is that you should have at least that much avalible memort left. It warns since you don't have 50000000 MB of free memory.
You could have solved this question by just running the script in the shell and tested there, no need to involve sensu really. Or, as I find it nessecary quite often, read the actual source code of the plugin.

Related

kubernetes pod high cache memory usage

I have a java process which is running on k8s.
I set Xms and Xmx to process.
java -Xms512M -Xmx1G -XX:SurvivorRatio=8 -XX:NewRatio=6 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -jar automation.jar
My expectation is that pod should consume 1.5 or 2 gb memory, but it consume much more, nearly 3.5gb. its too much.
if ı run my process on a virtual machine, it consume much less memory.
When ı check memory stat for pods, ı reliase that pod allocate too much cache memory.
Rss nearly 1.5GB is OK. Because Xmx is 1gb. But why cache nearly 3GB.
is there any way to tune or control this usage ?
/app $ cat /sys/fs/cgroup/memory/memory.stat
cache 2881228800
rss 1069154304
rss_huge 446693376
mapped_file 1060864
swap 831488
pgpgin 1821674
pgpgout 966068
pgfault 467261
pgmajfault 47
inactive_anon 532504576
active_anon 536588288
inactive_file 426450944
active_file 2454777856
unevictable 0
hierarchical_memory_limit 16657932288
hierarchical_memsw_limit 9223372036854771712
total_cache 2881228800
total_rss 1069154304
total_rss_huge 446693376
total_mapped_file 1060864
total_swap 831488
total_pgpgin 1821674
total_pgpgout 966068
total_pgfault 467261
total_pgmajfault 47
total_inactive_anon 532504576
total_active_anon 536588288
total_inactive_file 426450944
total_active_file 2454777856
total_unevictable 0
A Java process may consume much more physical memory than specified in -Xmx - I explained it in this answer.
However, in your case, it's not even the memory of a Java process, but rather an OS-level page cache. Typically you don't need to care about the page cache, since it's the shared reclaimable memory: when an application wants to allocate more memory, but there is not enough immediately available free pages, the OS will likely free a part of the page cache automatically. In this sense, page cache should not be counted as "used" memory - it's more like a spare memory used by the OS for a good purpose while application does not need it.
The page cache often grows when an application does a lot of file I/O, and this is fine.
Async-profiler may help to find the exact source of growth:
run it with -e filemap:mm_filemap_add_to_page_cache
I demonstrated this approach in my presentation.

What is the best way to monitor Heroku Postgres memory and cpu

We're on Heroku and trying to understand if it's time to upgrade our Postgres database or not. I have two questions:
Is there any tools you know of that track heroku postgres logs to track their memory and cpu
usage stats over time?
Are those (Memory and CPU usage) even the best metrics to look at to determine if we should upgrade to a larger instance or not?
The most useful tool I've found for monitoring heroku postgres instancs is the logs associated with the database's dyno, which you can monitor using heroku logs -t -d heroku-postgres. This spits out some useful stats every 5 minutes, so if you fill your logs up quickly, this might not output anything right away — use -t to wait for the next log line.
Output will look something like this:
2022-06-27T16:34:49.000000+00:00 app[heroku-postgres]: source=HEROKU_POSTGRESQL_SILVER addon=postgresql-fluffy-55941 sample#current_transaction=81770844 sample#db_size=44008084127bytes sample#tables=1988 sample#active-connections=27 sample#waiting-connections=0 sample#index-cache-hit-rate=0.99818 sample#table-cache-hit-rate=0.9647 sample#load-avg-1m=0.03 sample#load-avg-5m=0.205 sample#load-avg-15m=0.21 sample#read-iops=14.328 sample#write-iops=15.336 sample#tmp-disk-used=543633408 sample#tmp-disk-available=72435159040 sample#memory-total=16085852kB sample#memory-free=236104kB sample#memory-cached=15075900kB sample#memory-postgres=223120kB sample#wal-percentage-used=0.0692420374380985
The main stats I pay attention to are table-cache-hit-rate which is a good proxy for how much of your active dataset fits in memory, and load-avg-1m, which tells you how much load per CPU the server is experiencing.
You can read more about all these metrics here.

Jenkins and PostgreSQL is consuming a lot of memory

We have a Data ware house server running on Debian linux ,We are using PostgreSQL , Jenkins and Python.
It's been few day the memory of the CPU is consuming a lot by jenkins and Postgres.tried to find and check all the ways from google but the issue is still there.
Anyone can give me a lead on how to reduce this memory consumption,It will be very helpful.
below is the output from free -m
total used free shared buff/cache available
Mem: 63805 9152 429 16780 54223 37166
Swap: 0 0 0
below is the postgresql.conf file
Below is the System configurations,
Results from htop
Please don't post text as images. It is hard to read and process.
I don't see your problem.
Your machine has 64 GB RAM, 16 GB are used for PostgreSQL shared memory like you configured, 9 GB are private memory used by processes, and 37 GB are free (the available entry).
Linux uses available memory for the file system cache, which boosts PostgreSQL performance. The low value for free just means that the cache is in use.
For Jenkins, run it with these JAVA Options
JAVA_OPTS=-Xms200m -Xmx300m -XX:PermSize=68m -XX:MaxPermSize=100m
For postgres, start it with option
-c shared_buffers=256MB
These values are the one I use on a small homelab of 8GB memory, you might want to increase these to match your hardware

HowTo zdb -e poolname to recover data from a single ZFS device

I have the following situation:
1*10TB Drive, full of data on a ZFS
I wanted to add a 100GB NVME partition as a cache
instead of using zpool add poolname cache nvmepartion I wrote zpool add poolname nvmepartition
I did not see my mistake and exported this pool.
Neither the NVME drive is availeable any more, nor the system has any information about this pool in the ZFS cache (due to export).
current status:
zpool import shows the pool but I cannot import the pool using any way found on the internet.
zdb -e poolname shows me what i know: the pool, its name, that it (sadly) has 2 children which one is not availeable any more - and that the system has no informatioon about the missing child (so all tricks i found on internet in linked a ghost device etc. wont work either)
as far i know the only way is to use ZDB to generate all files through
the "journal" and pipe/save them to another path.
**
but how? Nowhere I found any documentation on that.
**
note: the 10 TB drive was 90% full, then I added the NVME partion as a sibling - as ZFS is no real raid 0 and due to the fact that these sibling have been so unequal in size and as I did not wrote many data after my mistake happened - I am quite sure that most of my data is still there.

How to grab a full memory dump of a large memory usage

I am hosting IIS based web service applications on Windows 2008 64-bit system running on a Quad core 8G machine. Ran into couple of instances when W3WP was running at 7.6G of memory usage. Nothing else was responding on the system including RDP. Right click on the process from the task manager and creating the dumps, froze the system and all its threads for a long time (close to 30minutes). When the freeze up occurred during off hours, we let the dump run for a while (ran close to 1 hour) but still dump didn't complete. In the interest of getting the system up, we had to kill IIS
Tried other tools like procexp, debug diag etc to create full memory dump and all have the same results
So, what tool does the community use to grab dump files quickly? Or without freezing all the threads? I realize latter might be a rhetorical question. But what are the options for generating such a large dump file without locking up the system for a long time?
IMO you shouldn't have to wait until the process memory grows to 8 GB. I am sure with something like 3 - 4 GB you should be able to detect the memory leak.
Procdump has an option based on memory threshold
-m Memory commit threshold in MB at which to create a dump of the process.
I would you this option to dump the memory of the process.
And also SSD would help in writing faster.
WPA a.k.a xperf (http://msdn.microsoft.com/en-us/performance/cc825801.aspx) is a powerfull tool, to diagnose the applications. You will get call stack of the culprit allocation. You dont have to collect the dump and it is no-invasive and does not load much in production systems
Complete step by step information is available here. http://msdn.microsoft.com/en-us/library/ff190906(v=VS.85).aspx.