ZFS vdev naming? - solaris

I have no idea what is the rationale behind naming the vdev (virtual devices) used while creating zfs pools in Solaris. Suppose, I have a disk c4d0, what is meant by c4d0p0 and c4d0s0? And, how would I know what to use with ZFS commands. I am terribly confused since I keep getting "invalid vdev specified". Any pointers?

c4d0s0 = controller 4, disk 0, slice 0

If you want the full disk to be used by ZFS you would want to use the main disk name, c4d0 in your case.

There is a very good article: "How Solaris disk device names work".
This might help.

ZFS Best Practices Guide recommends using whole disk for production setup - so do what X-Istence said - use c4d0 without the slice number. With ZFs you can throw away all you know about partitioning - they are so 1990s!

Related

Is there a way to automatically detect the minimum required BR2_TARGET_ROOTFS_EXT2_SIZE in Buildroot?

I'm making a "big" non-embedded image intended for simulation instead of real devices, and I keep hitting the error:
*** Maybe you need to increase the filesystem size (BR2_TARGET_ROOTFS_EXT2_SIZE)
and then I have to do a du on output/target to find out how big I have to make BR2_TARGET_ROOTFS_EXT2_SIZE.
Is there a way to automate this, or a decent workaround?
Some workarounds I'm considering:
put the big stuff under 9p: https://superuser.com/questions/628169/how-to-share-a-directory-with-the-host-without-networking-in-qemu
use CPIO and -initrd
http://lists.busybox.net/pipermail/buildroot/2018-March/215622.html
http://lists.busybox.net/pipermail/buildroot/2018-March/215636.html says that:
No, becaaue it is not reliable, see commit:
c6bca8cef fs/ext2: Remove support for auto-calculation of rootfs size
In the end, it does not make sense to do auto-calculation, because on an
embedded device, you have to now the layout and size of your storage.
So, you know what size you want your ext filesystem to be.
So it is fundamentally not possible / worth for Buildroot to do it reliably.
https://github.com/buildroot/buildroot/commit/c6bca8cef0310bc649240b451989457ce94a8358
I have then searched a bit further, and came across: https://unix.stackexchange.com/questions/353156/how-to-calculate-the-correct-size-of-a-loopback-device-filesystem-image-for-debo which suggests resize2fs -M + sparse files might be a possibility.
libguestfs can also minimize image sizes automatically as demonstrated at https://serverfault.com/questions/246835/convert-directory-to-qemu-kvm-virtual-disk-image/916697#916697 and is exposes a vfs-minimum-size function: http://libguestfs.org/guestfish.1.html#vfs-minimum-size

kubernetes+GKE / status is now: NodeHasDiskPressure

Read a bit through here (https://kubernetes.io/docs/admin/out-of-resource/) without ending up with a clear understanding; Trying here to gather more infos about what actually happens.
We run 2 n1-standard-2 instances, a 300Go disk is attached
More specifically, a "nodefs.inodesFree" problem seems specified. And this one is quite unclear. It seems to happen during builds (when the image is creating), should we understand that it takes too much space on disk ? What would be the most obvious reason ?
It feels like it is not tied to the CPU/memory requests/limits that can be specified on a node, but still as we've "overcommitted" the limits, can it have any impact regarding this issue ?
Thanks for sharing your experience on this one
Could you run df -i on the affected node please?

Looking for the best equivalents of prefetch instructions for ia32, ia64, amd64, and powerpc

I'm looking at some slightly confused code that's attempted a platform abstraction of prefetch instructions, using various compiler builtins. It appears to be based on powerpc semantics initially, with Read and Write prefetch variations using dcbt and dcbtst respectively (both of these passing TH=0 in the new optional stream opcode).
On ia64 platforms we've got for read:
__lfetch(__lfhint_nt1, pTouch)
wherease for write:
__lfetch_excl(__lfhint_nt1, pTouch)
This (read vs. write prefetching) appears to match the powerpc semantics fairly well (with the exception that ia64 allows for a temporal hint).
Somewhat curiously the ia32/amd64 code in question is using
prefetchnta
Not
prefetchnt1
as it would if that code were to be consistent with the ia64 implementations (#ifdef variations of that in our code for our (still live) hpipf port and our now dead windows and linux ia64 ports).
Since we are building with the intel compiler I should be able to many of our ia32/amd64 platforms consistent by switching to the xmmintrin.h builtins:
_mm_prefetch( (char *)pTouch, _MM_HINT_NTA )
_mm_prefetch( (char *)pTouch, _MM_HINT_T1 )
... provided I can figure out what temporal hint should be used.
Questions:
Are there read vs. write ia32/amd64 prefetch instructions? I don't see any in the instruction set reference.
Would one of the nt1, nt2, nta temporal variations be preferred for read vs. write prefetching?
Any idea if there would have been a good reason to use the NTA temporal hint on ia32/amd64, yet T1 on ia64?
Are there read vs. write ia32/amd64 prefetch instructions? I don't see any in the instruction set reference.
Some systems support the prefetchw instructions for writes
Would one of the nt1, nt2, nta temporal variations be preferred for read vs. write prefetching?
If the line is exclusively used by the calling thread, it shouldn't matter how you bring the line, both reads and writes would be able to use it. The benefit for prefetchw mentioned above is that it will bring the line and give you ownership on it, which may take a while if the line was also used by another core. The hint level on the other hand is orthogonal with the MESI states, and only affects how long would the prefetched line survive. This matters if you prefetch long ahead of the actual access and don't want to prefetch to get lost in that duration, or alternatively - prefetch right before the access, and don't want the prefetches to thrash your cache too much.
Any idea if there would have been a good reason to use the NTA temporal hint on ia32/amd64, yet T1 on ia64?
Just speculating - perhaps the larger caches and aggressive memory BW are more vulnerable to bad prefetching and you'd want to reduce the impact through the non-temporal hint. Consider that your prefetcher is suddenly set loose to fetch anything it can, you'd end up swamped in junk prefetches that would through away lots of useful cachelines. The NTA hint makes them overrun each other, leaving the rest undamaged.
Of course this may also be just a bug, I can't tell for sure, only whoever developed the compiler, but it might make sense for the reason above.
The best resource I could find on x86 prefetching hint types was the good ol' article What Every Programmer Should Know About Memory.
For the most part on x86 there aren't different instructions for read and write prefetches. The exceptions seem to be those that are non-temporal aligned, where a write can bypass the cache but as far as I can tell, a read will always get cached.
It's going to be hard to backtrack through why the earlier code owners used one hint and not the other on a certain architecture. They could be making assumptions about how much cache is available on processors in that family, typical working set sizes for binaries there, long term control flow patterns, etc... and there's no telling how much any of those assumptions were backed up with good reasoning or data. From the limited background here I think you'd be justified in taking the approach that makes the most sense for the platform you're developing on now, regardless what was done on other platforms. This is especially true when you consider articles like this one, which is not the only context where I've heard that it's really, really hard to get any performance gain at all with software prefetches.
Are there any more details known up front, like typical cache miss ratios when using this code, or how much prefetches are expected to help?

What can I do to find out what's causing my program to consume lots of memory over time?

I have an application using POE which has about 10 sessions doing various tasks. Over time, the app starts consuming more and more RAM and this usage doesn't go down even though the app is idle 80% of the time. My only solution at present is to restart the process often.
I'm not allowed to post my code here so I realize it is difficult to get help but maybe someone can tell me what I can do find out myself?
Don't expect the process size to decrease. Memory isn't released back to the OS until the process terminates.
That said, might you have reference loops in data structures somewhere? AFAIK, the perl garbage collector can't sort out reference loops.
Are you using any XS modules anywhere? There could be leaks hidden inside those.
A guess: your program executes a loop for as long as it is running; in this loop it may be that you allocate memory for a buffer (or more) each time some condition occurs; since the scope is never exited, the memory remains and will never be cleaned up. I suggest you check for something like this. If it is the case, place the allocating code in a sub that you call from the loop and where it will go out of scope, and get cleaned up, on return to the loop.
Looks like Test::Valgrind is a tool for searching for memory leaks. I've never used it myself though (but I used plain valgrind with C source).
One technique is to periodically dump the contents of $POE::Kernel::poe_kernel to a time- or sequence-named file. $poe_kernel is the root of a tree spanning all known sessions and the contents of their heaps. The snapshots should monotonically grow if the leaked memory is referenced. You'll be able to find out what's leaking by diff'ing an early snapshot with a later one.
You can export POE_ASSERT_DATA=1 to enable POE's internal data consistency checks. I don't expect it to surface problems, but if it does I'd be very happy to receive a bug report.
Perl can not resolve reference rings. Either you have zombies (which you can detect via ps axl) or you have a memory leak (reference rings/circle)
There are a ton of programs to detect memory leaks.
strace, mtrace, Devel::LeakTrace::Fast, Devel::Cycle

JBoss5.X out of memory error

JBoss crashed with out of memory error, how do I prevent this? I modified the values in run.bat but result is same.
"- Xms1024 Xmx1024 PermGen512"
You might have a resource leak, in which case anything but finding and removing the leak will only delay the error, not prevent it. jhat & -XX:+HeapDumpOnOutOfMemoryError will let you inspect the objects in your heap at the time of the OOM, which is a decent start to figuring out if you have a leak & where your leak is.
As for run.bat, the options you list may not be working the way you intend. I would be sure to specify the "m"egabyte (kilobyte? gigabyte? mb seemed most likely here) suffix explicitly, and to set the max size before the initial size. So, -Xmx1024m -Xms1024m -XX:MaxPermSize=512M.
512 megabytes, btw, is a big size for a permanent generation. Maybe you meant kb?. You can either use jstat or add -XX:-PrintGCDetails to your run.bat to see how much permanent generation space is actually being used.
Your problem might be related to the problem explained here: JVM: Solving OutOfMemoryError with less Memory
In Jboss Version:Version: 5.0.0.GA, while running the application in jboss I have faced the out of memory error because of large data processing from application.
To resolve the same either you can optimize the code so that while processing there will be less data in heap memory or you can increase the heap memory of JBOSS:
JAVA_OPTS="-Xmx4096m -Xms4096m -XX:MaxNewSize=896m -XX:NewSize=896m
You can change the memory values as per your requirement.
If out Of memory error is coming with permgen space issue, then you can restart the server to resolve the same and you can restrict the same by changing the the memory value for the below mentioned variable:
-XX:MaxPermSize=256m
Thanks,
Ankit Adlakha
Might be related to this.
https://issues.jboss.org/browse/JBAS-7553
Apparently, when running as a service, JBoss might ignore -Xms