The scheduler I have been working on for my OS class has been getting a "*** stack smashing detected ***" error on the VM I'm using (I'm using Vagrant with virtualbox). This error occurs roughly 50% of the time I run the program.
When switching to the VM cluster provided by our professor (connected using SSH on the aforementioned VM), the error never showed up.
My first instinct was that my local VM didn't have enough memory allocated to it and that somehow the code I was running was going out of bounds of where my VM could access. (the test involved performing 128 matrix multiplications of varying sizes each in its own thread)
Can anyone confirm if this is a feasible explanation? My fear is the error is just being ignored on the other VM (I use the same makefile for both that compiles with flags -g and -lm).
Thanks!
Stack smashing detected is caused when your program overwrites "canary" memory that is above the area where its local variables are located. It's usually due to writing more elements of a local array than were allocated for it. A bug-free program should never do this on any machine, no matter how much or how little memory is available. So your program is buggy and needs to be fixed.
In particular, this error is not caused by simply running out of stack space.
Most likely the other VM has its compiler configured to disable this check by default. You may be able to re-enable it with -fstack-protector. But either way, you should investigate and fix this bug on whichever machine lets you reproduce it.
Related
I am trying to find performance bottlenecks by using the perf tool on a kubernetes pod. I have already set the following on the instance hosting the pod:
"kernel.kptr_restrict" = "0"
"kernel.perf_event_paranoid" = "0"
However, I have to problems.
When I collect samples through perf record -a -F 99 -g -p <PID> --call-graph dwarf and feed it to speedscore or similarly to a flamegraph, I still see question marks ??? and the process that I would like to see its CPU usage breakdown (C++ based), the aforementioned ??? is on the top of the stack and system calls fall below it. The main process is the one that has ??? around on it.
I tried running perf top and it says
Failed to mmap with 1 (Operation not permitted)
My questions are:
For collecting perf top, what permissions do I need to change on the host instance of the pod?
Which other settings do I need to change at the instance level so I don't see any more ??? showing up on perf's output. I would like to see the function call stack of the process, not just the system calls. See the following stack:
The host OS is ubuntu.
Zooming in on the first system call, you would see this, but this only gives me a fraction of the CPU time spent and only the system calls.
UPDATE/ANSWER:
I was able to run perf top, by setting
"kernel.perf_event_paranoid" = "-1". However, as seen in the image below, the process I'm trying to profile (I've blackened out the name to hide the name), is not showing me function names but just addresses. I try running them through addr2line, but it says addr2line: 'a.out': No such file.
How can I get the addresses resolve to function names on the pod? Is it even possible?
I was also able to fix the memory-function mapping with perf top. This was due to the fact that I was trying to run perf from a different container than where the process was running (same pod, different container). There may be a way to add extra information, but just moving the perf to the container running the process fixed it.
Consider the following fork bomb in Python (source):
import os
while 1:
os.fork()
I'm too afraid to test it out myself, but I'm somewhat skeptical that if I just took this program and ran it my computer would just freeze up and die. Assuming this is true, my question is -- what mechanisms or policies is my operating system using to fight it off?
My question can be viewed as sort of an "application" problem to what one might learn in an OS class.
As expected, when I tried it out on my machine, the computer froze and I had to hard reboot. So definitely don't do this on a regular basis.
The last error that I was able to capture from the program was:
BlockingIOError: [Errno 11] Resource temporarily unavailable
File "fork_bomb.py", line 3, in <module>
os.fork()
So at some point, the OS couldn't handle the OS fork calls and returned an error. The only other useful message I can see from /var/log/syslog is
cgroup: fork rejected by pids controller in /user.slice/user-1000.slice/session-2.scope
Cgroups are a way to restrict resources from processes within a particular cgroup. So presumably, the python processes were in a cgroup that had reached its pid/task limit. So that's one way the OS tries to deal with fork bombs, is limiting tasks using cgroups. Of course, the infinite loop of forks, even if the forks were failing, still required overhead from requesting resources from the OS, hence the system freeze.
Theoretically, another way the OS can try to limit fork bombs is through memory limits. Ignoring copy-on-write, if all the forked processes required extra memory, the Linux OOM (out of memory) killer will be called. This kernel process will be awakened when memory is tight and then its job is to start killing processes that it thinks will help free up sufficient memory to keep the system running. Memory limits can be set using cgroups or by setting the minimum free memory using /proc/sys/vm/min_free_kbytes.
From my understanding, the process_monitor stores crashbin information locally. If this is running on a virtual machine and a test case causes the process and target machine to become unresponsive, vmcontrol would then revert to an earlier snapshot. How is the crashbin information displayed to the web interface, or accessed at this point if it was lost on the revert to an earlier snapshot?
After walking through most of the code in the Sulley environment, I found that the restart_target() method in the sessions.py module calls for a restart on the virtual machine if vmcontrol is available first, and then tries to restart the process via the procmon if its available. By switching the order of these, I can solve the problem of losing the log information from the crashbin unless the entire target machine becomes unresponsive.
I'm trying to build a project on a rather underpowered system (intel compute stick with 1GB of RAM). Some of the compilation steps run out of memory. I've configured icecc so that it can send some jobs to a more powerful machine, but it seems that icecc will always do at least one job on the local machine.
I've tried setting ICECC_MAX_JOBS="0" in /etc/icecc/icecc.conf (and restarting iceccd), but the comments in this file say:
# Note: a value of "0" is actually interpreted as "1", however it
# also sets ICECC_ALLOW_REMOTE="no".
I also tried disabling the icecc daemon on the compute stick by running /etc/init.d/icecc stop. However, it seems that icecc is still putting one job on the local machine (perhaps if the daemon is off it's putting all jobs on the local machine?).
The project is makefile based and it appears that I'm stuck on a bottleneck step where calling make with -j > 1 still only issues one job, and this compilation is expiring the system memory.
The only work around I can think of is to actually compile on a different system and then ship the binaries back over but I expect to enter a tweak/build/evaluate cycle on this platform so I'd like to be able to work from the compute stick directly.
Both systems are running ubuntu 14.04 if that helps.
I believe it is not supported since if there are network issues, icecc resorts to compiling on the host machine itself. Best solution would be to compile on the remote machine and copy back the resulting binary.
Have you tried setting ICECC_TEST_REMOTEBUILD in client's terminal (where you run make)?
export ICECC_TEST_REMOTEBUILD=1
In my tests this always forces all sources to be compiled remotely.
Just remember that linking is always done on local machine.
My Meteor app is crashing with the following error:
Unexpected mongo exit code null. Restarting.
=> Exited from signal: SIGKILL
/home/ron/.meteor/packages/meteor-tool/.1.1.3.4sddkj++os.linux.x86_64+web.browser+web.cordova/mt-os.linux.x86_64/dev_bundle/lib/node_modules/fibers/future.js:245
throw(ex);
^
Error: Unable to allocate ArrayBuffer.
This is followed by a call-stack trace.
What is causing this?
Thanks!
This error is probably caused by your operating environment. If its not able to allocate an ArrayBuffer it may be that you don't have enough RAM or some other service is blocking meteor from allocating memory.
This error may occur on the smallest DigitalOcean droplet if that's what you're using.
It's generally recommended you have 1 GB of free ram for Meteor to work properly in development mode.
Something you could use is a swapfile to increase your ram.
Real RAM memory could be replaced with virtual memory but won't be so fast memory... in linux this SO feature is achieved using a swap partition. In windows is using a paging file. Weirdly you can emulate this feature in the linux world using swapspace (or create a traditional swap partition)
sudo apt-get install swapspace
Whatever option you choose will create swap for you and it will help you to start up your meteor app!!!
Just be aware that this will be a more slower experience than real RAM but definitely will work