kubernetes+GKE / status is now: NodeHasDiskPressure - kubernetes

Read a bit through here (https://kubernetes.io/docs/admin/out-of-resource/) without ending up with a clear understanding; Trying here to gather more infos about what actually happens.
We run 2 n1-standard-2 instances, a 300Go disk is attached
More specifically, a "nodefs.inodesFree" problem seems specified. And this one is quite unclear. It seems to happen during builds (when the image is creating), should we understand that it takes too much space on disk ? What would be the most obvious reason ?
It feels like it is not tied to the CPU/memory requests/limits that can be specified on a node, but still as we've "overcommitted" the limits, can it have any impact regarding this issue ?
Thanks for sharing your experience on this one

Could you run df -i on the affected node please?

Related

Is there a way to automatically detect the minimum required BR2_TARGET_ROOTFS_EXT2_SIZE in Buildroot?

I'm making a "big" non-embedded image intended for simulation instead of real devices, and I keep hitting the error:
*** Maybe you need to increase the filesystem size (BR2_TARGET_ROOTFS_EXT2_SIZE)
and then I have to do a du on output/target to find out how big I have to make BR2_TARGET_ROOTFS_EXT2_SIZE.
Is there a way to automate this, or a decent workaround?
Some workarounds I'm considering:
put the big stuff under 9p: https://superuser.com/questions/628169/how-to-share-a-directory-with-the-host-without-networking-in-qemu
use CPIO and -initrd
http://lists.busybox.net/pipermail/buildroot/2018-March/215622.html
http://lists.busybox.net/pipermail/buildroot/2018-March/215636.html says that:
No, becaaue it is not reliable, see commit:
c6bca8cef fs/ext2: Remove support for auto-calculation of rootfs size
In the end, it does not make sense to do auto-calculation, because on an
embedded device, you have to now the layout and size of your storage.
So, you know what size you want your ext filesystem to be.
So it is fundamentally not possible / worth for Buildroot to do it reliably.
https://github.com/buildroot/buildroot/commit/c6bca8cef0310bc649240b451989457ce94a8358
I have then searched a bit further, and came across: https://unix.stackexchange.com/questions/353156/how-to-calculate-the-correct-size-of-a-loopback-device-filesystem-image-for-debo which suggests resize2fs -M + sparse files might be a possibility.
libguestfs can also minimize image sizes automatically as demonstrated at https://serverfault.com/questions/246835/convert-directory-to-qemu-kvm-virtual-disk-image/916697#916697 and is exposes a vfs-minimum-size function: http://libguestfs.org/guestfish.1.html#vfs-minimum-size

Spark stuck at removing broadcast variable (probably)

Spark 2.0.0-preview
We've got an app that uses a fairly big broadcast variable. We run this on a big EC2 instance, so deployment is in client-mode. Broadcasted variable is a massive Map[String, Array[String]].
At the end of saveAsTextFile, the output in the folder seems to be complete and correct (apart from .crc files still being there) BUT the spark-submit process is stuck on, seemingly, removing the broadcast variable. The stuck logs look like this: http://pastebin.com/wpTqvArY
My last run lasted for 12 hours after after doing saveAsTextFile - just sitting there. I did a jstack on driver process, most threads are parked: http://pastebin.com/E29JKVT7
Full story:
We used this code with Spark 1.5.0 and it worked, but then the data changed and something stopped fitting into Kryo's serialisation buffer. Increasing it didn't help, so I had to disable the KryoSerialiser. Tested it again - it hanged. Switched to 2.0.0-preview - seems like the same issue.
I'm not quite sure what's even going on given that there's almost no CPU activity and no output in the logs, yet the output is not finalised like it used to before.
Would appreciate any help, thanks.
I had a very similar issue.
I was updating from spark 1.6.1 to 2.0.1 and my steps were hanging after completion.
In the end, I managed to solve it by adding a sparkContext.stop() at the end of the task.
Not sure why this is needed it but it solved my issue.
Hope this helps.
ps: this post reminds me of this https://xkcd.com/979/

How to handle MEMCACHED_SERVER_MARKED_DEAD?

I have a cluster of 10 memcaches, using consistent hashing. When the key passed to memcached_get() is searched on the unavailable server I get just MEMCACHED_SERVER_MARKED_DEAD response (return value).
I would expect the key should be redistributed to the next available server in this case and I should get NOTFOUND from the next memcached_get() call. However I'm still getting MEMCACHED_SERVER_MARKED_DEAD and so I'm unable to set a new value.
I discovered I can call memcached_behavior_set(..., MEMCACHED_BEHAVIOR_DISTRIBUTION). This causes hash redistribution and it works as I wish then. However, I do not think it is a good approach. Is it?
Generally you want to enable MEMCACHED_BEHAVIOR_DISTRIBUTION from the start if you are dealing with multiple memcached pools. So yes that solution will work.
If you are having further problems, take a look at MEMCACHED_BEHAVIOR_REMOVE_FAILED_SERVERS that will auto purge failed servers from pool after x number of failures.
I found the answer myself.
https://bugs.launchpad.net/libmemcached/+bug/777672
Applying the patch solved all my problems. Note, I wonder it has beed broken since 0.39 and nobody has cared.

What can I do to find out what's causing my program to consume lots of memory over time?

I have an application using POE which has about 10 sessions doing various tasks. Over time, the app starts consuming more and more RAM and this usage doesn't go down even though the app is idle 80% of the time. My only solution at present is to restart the process often.
I'm not allowed to post my code here so I realize it is difficult to get help but maybe someone can tell me what I can do find out myself?
Don't expect the process size to decrease. Memory isn't released back to the OS until the process terminates.
That said, might you have reference loops in data structures somewhere? AFAIK, the perl garbage collector can't sort out reference loops.
Are you using any XS modules anywhere? There could be leaks hidden inside those.
A guess: your program executes a loop for as long as it is running; in this loop it may be that you allocate memory for a buffer (or more) each time some condition occurs; since the scope is never exited, the memory remains and will never be cleaned up. I suggest you check for something like this. If it is the case, place the allocating code in a sub that you call from the loop and where it will go out of scope, and get cleaned up, on return to the loop.
Looks like Test::Valgrind is a tool for searching for memory leaks. I've never used it myself though (but I used plain valgrind with C source).
One technique is to periodically dump the contents of $POE::Kernel::poe_kernel to a time- or sequence-named file. $poe_kernel is the root of a tree spanning all known sessions and the contents of their heaps. The snapshots should monotonically grow if the leaked memory is referenced. You'll be able to find out what's leaking by diff'ing an early snapshot with a later one.
You can export POE_ASSERT_DATA=1 to enable POE's internal data consistency checks. I don't expect it to surface problems, but if it does I'd be very happy to receive a bug report.
Perl can not resolve reference rings. Either you have zombies (which you can detect via ps axl) or you have a memory leak (reference rings/circle)
There are a ton of programs to detect memory leaks.
strace, mtrace, Devel::LeakTrace::Fast, Devel::Cycle

ZFS vdev naming?

I have no idea what is the rationale behind naming the vdev (virtual devices) used while creating zfs pools in Solaris. Suppose, I have a disk c4d0, what is meant by c4d0p0 and c4d0s0? And, how would I know what to use with ZFS commands. I am terribly confused since I keep getting "invalid vdev specified". Any pointers?
c4d0s0 = controller 4, disk 0, slice 0
If you want the full disk to be used by ZFS you would want to use the main disk name, c4d0 in your case.
There is a very good article: "How Solaris disk device names work".
This might help.
ZFS Best Practices Guide recommends using whole disk for production setup - so do what X-Istence said - use c4d0 without the slice number. With ZFs you can throw away all you know about partitioning - they are so 1990s!