how to solve Pintos project2 userprogram intrcontext error? - operating-system

Korea's KAIST Revised and Distributed Pintos KAIST Project The same error occurred in all tests in the project 2 user program, so I am asking you this question. As a result of backtrace, an error appears because the return value of the intr_context() function in the thread_yield() function is false, but I don't know the cause and solution of this error. Can you tell me? ㅠㅠ
Kernel panic in run: PANIC at ../../threads/thread.c:338 in thread_yield(): assertion `!intr_context ()' failed.
Call stack: 0x800421874b 0x80042072c0 0x800420a92f 0x8004214d12 0x8004209704 0x8004209b22 0x800420762b
Translation of call stack:
0x000000800421874b: debug_panic (lib/kernel/debug.c:32)
0x00000080042072c0: thread_yield (threads/thread.c:340)
0x000000800420a92f: sema_up (threads/synch.c:124)
0x0000008004214d12: interrupt_handler (devices/disk.c:526)
0x0000008004209704: intr_handler (threads/interrupt.c:353)
0x0000008004209b22: intr_entry (threads/intr-stubs.o:?)
0x000000800420762b: kernel_thread (threads/thread.c:456)
thread_yield()
/* Yields the CPU. The current thread is not put to sleep and
may be scheduled again immediately at the scheduler's whim. */
void
thread_yield (void) {
struct thread *curr = thread_current ();
enum intr_level old_level;
ASSERT (!intr_context ());
old_level = intr_disable ();
if (curr != idle_thread)
// list_push_back (&ready_list, &curr->elem);
list_insert_ordered (&ready_list, &curr->elem, thread_compare_priority, 0);
do_schedule (THREAD_READY);
intr_set_level (old_level);
}

I think your code yield thread in sema_up() function.
when sema_up() call thread_yield() function, you make if condition about priority.
Add !intr_context() in your if condition.
That means when interrupted by externel(not internal), do not call thread_yield() .
Then your kernel no longer panic.

Related

Manipulate StackTrace in dart

Summary:
I have a stacktrace in dart, where I want to remove frame #0, and then have the whole stacktrace adjust (frame #1 is now frame #0, frame #2 is now frame #1).
Details:
I have written my own assert type function in dart, that if fails, will grab the current stacktrace and send to crashlytics.
static void asrt(bool condition)
{
if (condition)
return;
StackTrace stacktrace = StackTrace.current;
Crashlytics.instance.recordError("assert triggered" , stacktrace);
}
The problem is that crashlytics identifies each error based on the first stackframe. So all of my errors are being identified as that same error, because I am grabbing the stacktrace from the same method. I understand I could pass the stacktrace from the caller, but a preferrable solution to me would be to manipulate the stacktrace so the caller does less work.
Is this doable in dart?
I ended up getting a reliable solution.
I used the stack_trace library and its provided classes Trace and Frame.
In the following example, trace.frames returns an imutable list, so I perform a deep copy. I am ok with this as it only runs on a crash anyways.
StackTrace stacktrace = StackTrace.current;
Trace trace = Trace.from(stacktrace);
List<Frame> frames = trace.frames;
List<Frame> newFrames = List<Frame>();
// start at index 1 to omit frame #0
for (int i = 1; i < frames.length; i++) {
Frame f = frames[i];
newFrames.add(Frame(f.uri , f.line , f.column , f.member));
}
Trace newTrace = Trace(newFrames);
I can't recommend it, but you could use StackTrace.toString(), do text manipulation on it, and then use StackTrace.fromString to get a StackTrace object back.
But really I suggest that you file an issue against firebase_crashlytics and request some mechanism to allow considering more than the first frame.

Causes for EXC_BREAKPOINT crash

I have this stack trace in Crashlytics:
The source code are running is below:
#objc private func playerTimerTick() {
mediaDurationInSeconds = Int32(mediaDuration)
mediaCurrentPositionInSeconds = Int32(currentTimeInSeconds)
if elapsedTimeNeedStoreStartPosition {
elapsedTimeNeedStoreStartPosition = false
elapsedTimeStartPosition = mediaCurrentPositionInSeconds
}
}
The line 1092 is mediaDurationInSeconds = Int32(mediaDuration).
The mediaDuration variable is a Double type and receive a duration in seconds from a AVURLAsset.
This method (playerTimerTick) is running by a Timer.scheduledTimer every 1 second. I have already performed several debugs of this source code and this function and there is no crash. But in the release version is occurring with multiple users and without any explanation.
Has anyone ever had anything like this or do you have any idea what might be causing this crash?

std::lock_guard (mutex) produces deadlock

First: Thanks for reading this question and tryin' to help me out. I'm new to the whole threading topic and I'm facing a serious mutex deadlock bug right now.
Short introduction:
I wrote a game engine a few months ago, which works perfectly and is being used in games already. This engine is based on SDL2. I wanted to improve my code by making it thread safe, which would be very useful to increase performance or to play around with some other theoretical concepts.
The problem:
The game uses internal game stages to display different states of a game, like displaying the menu, or displaying other parts of the game. When entering the "Asteroid Game"-stage I recieve an exception, which is thrown by the std::lock_guard constructor call.
The problem in detail:
When entering the "Asteroid Game"-stage a modelGetDirection() function is being called to recieve a direction vector of a model. This function uses a lock_guard to make this function being thread safe. When debugging this code section this is where the exception is thrown. The program would enter this lock_guard constructor and would throw an exception. The odd thing is, that this function is NEVER being called before. This is the first time this function is being called and every test run would crash right here!
this is where the debugger would stop in threadx:
inline int _Mtx_lockX(_Mtx_t _Mtx)
{ // throw exception on failure
return (_Check_C_return(_Mtx_lock(_Mtx)));
}
And here are the actual code snippets which I think are important:
mutex struct:
struct LEMutexModel
{
// of course there are more mutexes inside here
mutex modelGetDirection;
};
engine class:
typedef class LEMoon
{
private:
LEMutexModel mtxModel;
// other mutexes, attributes, methods and so on
public:
glm::vec2 modelGetDirection(uint32_t, uint32_t);
// other methods
} *LEMoonInstance;
modelGetDirection() (engine)function definition:
glm::vec2 LEMoon::modelGetDirection(uint32_t id, uint32_t idDirection)
{
lock_guard<mutex> lockA(this->mtxModel.modelGetDirection);
glm::vec2 direction = {0.0f, 0.0f};
LEModel * pElem = this->modelGet(id);
if(pElem == nullptr)
{pElem = this->modelGetFromBuffer(id);}
if(pElem != nullptr)
{direction = pElem->pModel->mdlGetDirection(idDirection);}
else
{
#ifdef LE_DEBUG
char * pErrorString = new char[256 + 1];
sprintf(pErrorString, "LEMoon::modelGetDirection(%u)\n\n", id);
this->printErrorDialog(LE_MDL_NOEXIST, pErrorString);
delete [] pErrorString;
#endif
}
return direction;
}
this is the game function that uses the modelGetDirection method! This function would control a space ship:
void Game::level1ControlShip(void * pointer, bool controlAble)
{
Parameter param = (Parameter) pointer;
static glm::vec2 currentSpeedLeft = {0.0f, 0.0f};
glm::vec2 speedLeft = param->engine->modelGetDirection(MODEL_VERA, LEFT);
static const double INCREASE_SPEED_LEFT = (1.0f / VERA_INCREASE_LEFT) * speedLeft.x * (-1.0f);
// ... more code, I think that's not important
}
So as mentioned before: When entering the level1ControlShip() function, the programm will enter the modelGetDirection() function. When entering the modelGetDirection() function an exception will be thrown when tryin' to call:
lock_guard<mutex> lockA(this->mtxModel.modelGetDirection);
And as mentioned, this is the first call of this function in the whole application run!
So why is that? I appreciate any help here! The whole engine (not the game) is an open source project and can be found on gitHub in case I forgot some important code snippets (sorry! in that case):
GitHub: Lynar Moon Engine
Thanks for your help!
Greetings,
Patrick

Difference between sleep(1) and while(sleep(1))

I have the following piece of code while looking for sigchild code. In the code below 50 children are created and the parent process waits in sigchild handler until all 50 children are destroyed.
I get the expected result if I use while(sleep(1)) at the end of main, however if I replace it by sleep(1), the parent gets destoyed before all child processes terminate.
int l=0;
/* SIGCHLD handler. */
static void sigchld_hdl (int sig)
{
/* Wait for all dead processes.
* We use a non-blocking call to be sure this signal handler will not
* block if a child was cleaned up in another part of the program. */
while (waitpid(-1, NULL, WNOHANG) > 0) {
printf(" %d",l++);
}
printf("\nExiting from child :: %d\n",l);
}
int main (int argc, char *argv[])
{
struct sigaction act;
int i;
memset (&act, 0, sizeof(act));
act.sa_handler = sigchld_hdl;
if (sigaction(SIGCHLD, &act, 0)) {
perror ("sigaction");
return 1;
}
/* Make some children. */
for (i = 0; i < 50; i++) {
switch (fork()) {
case -1:
perror ("fork");
return 1;
case 0:
return 0;
}
}
/* Wait until we get a sleep() call that is not interrupted by a signal. */
while (sleep(1)) {
}
// sleep(1);
printf("\nterminating\n");
return 0;
}
I have the following piece of code while looking for sigchild code. In
the code below 50 children are created and the parent process waits in
sigchild handler until all 50 children are destroyed.
No, it does not. waitpid WNOHANG will fail if there is nobody exited. And there is no guarantee all the children exited (or will exit) during execution of the handler.
Even with mere sleep(1) there is no guarantee any child will manage to exit, but in practice most of them will.
sleeping is a fundamentally wrong approach here. Since you know how many children you created, you should wait for all of them to finish and that's it. For instance you can decrement a counter of existing children each time you reap something and wait for it to go to 0.
Depending on how the real program looks like, it may be you don't want the handler in the first place: just have the loop at the end, but without WNOHANG.
I also have to comment about this:
/* Wait for all dead processes.
* We use a non-blocking call to be sure this signal handler will not
* block if a child was cleaned up in another part of the program. */
You can't mix a signal handler and waiting on your own. You risk snatching the process from the other code waiting for it, what happens then?
It's a design error. fork/exit behaviour has to either be unified OR decentralized.
From the manual page
Return Value
Zero if the requested time has elapsed, or the number of seconds
left to sleep, if the call was interrupted by a signal handler.
So I guess without the while bit, the sleep is being interrupted, hence that process ending quickly

Best way to detect an application crash and restart it?

What's the best way to detect an application crash in XP (produces the same pair of 'error' windows each time - each with same window title) and then restart it?
I'm especially interested to hear of solutions that use minimal system resources as the system in question is quite old.
I had thought of using a scripting language like AutoIt (http://www.autoitscript.com/autoit3/), and perhaps triggering a 'detector' script every few minutes?
Would this be better done in Python, Perl, PowerShell or something else entirely?
Any ideas, tips, or thoughts much appreciated.
EDIT: It doesn't actually crash (i.e. exit/terminate - thanks #tialaramex). It displays a dialog waiting for user input, followed by another dialog waiting for further user input, then it actually exits. It's these dialogs that I'd like to detect and deal with.
Best way is to use a named mutex.
Start your application.
Create a new named mutex and take ownership over it
Start a new process (process not thread) or a new application, what you preffer.
From that process / application try to aquire the mutex. The process will block
When application finish release the mutex (signal it)
The "control" process will only aquire the mutex if either the application finishes or the application crashes.
Test the resulting state after aquiring the mutex. If the application had crashed it will be WAIT_ABANDONED
Explanation: When a thread finishes without releasing the mutex any other process waiting for it can aquire it but it will obtain a WAIT_ABANDONED as return value, meaning the mutex is abandoned and therfore the state of the section it was protected can be unsafe.
This way your second app won't consume any CPU cycles as it will keep waiting for the mutex (and that's enterely handled by the operating system)
How about creating a wrapper application that launches the faulty app as a child and waits for it? If the exit code of the child indicates an error, then restart it, else exit.
I think the main problem is that Dr. Watson displays a dialog
and keeps your process alive.
You can write your own debugger using the Windows API and
run the crashing application from there.
This will prevent other debuggers from catching the crash of
your application and you could also catch the Exception event.
Since I have not found any sample code, I have written this
Python quick-and-dirty sample. I am not sure how robust it is
especially the declaration of DEBUG_EVENT could be improved.
from ctypes import windll, c_int, Structure
import subprocess
WaitForDebugEvent = windll.kernel32.WaitForDebugEvent
ContinueDebugEvent = windll.kernel32.ContinueDebugEvent
DBG_CONTINUE = 0x00010002L
DBG_EXCEPTION_NOT_HANDLED = 0x80010001L
event_names = {
3: 'CREATE_PROCESS_DEBUG_EVENT',
2: 'CREATE_THREAD_DEBUG_EVENT',
1: 'EXCEPTION_DEBUG_EVENT',
5: 'EXIT_PROCESS_DEBUG_EVENT',
4: 'EXIT_THREAD_DEBUG_EVENT',
6: 'LOAD_DLL_DEBUG_EVENT',
8: 'OUTPUT_DEBUG_STRING_EVENT',
9: 'RIP_EVENT',
7: 'UNLOAD_DLL_DEBUG_EVENT',
}
class DEBUG_EVENT(Structure):
_fields_ = [
('dwDebugEventCode', c_int),
('dwProcessId', c_int),
('dwThreadId', c_int),
('u', c_int*20)]
def run_with_debugger(args):
proc = subprocess.Popen(args, creationflags=1)
event = DEBUG_EVENT()
while True:
if WaitForDebugEvent(pointer(event), 10):
print event_names.get(event.dwDebugEventCode,
'Unknown Event %s' % event.dwDebugEventCode)
ContinueDebugEvent(event.dwProcessId, event.dwThreadId, DBG_CONTINUE)
retcode = proc.poll()
if retcode is not None:
return retcode
run_with_debugger(['python', 'crash.py'])
I realize that you're dealing with Windows XP, but for people in a similar situation under Vista, there are new crash recovery APIs available. Here's a good introduction to what they can do.
Here is a slightly improved version.
In my test the previous code run in an infinite loop when the faulty exe generated an "access violation".
I'm not totally satisfied by my solution because I have no clear criteria to know which exception should be continued and which one couldn't be (The ExceptionFlags is of no help).
But it works on the example I run.
Hope it helps,
Vivian De Smedt
from ctypes import windll, c_uint, c_void_p, Structure, Union, pointer
import subprocess
WaitForDebugEvent = windll.kernel32.WaitForDebugEvent
ContinueDebugEvent = windll.kernel32.ContinueDebugEvent
DBG_CONTINUE = 0x00010002L
DBG_EXCEPTION_NOT_HANDLED = 0x80010001L
event_names = {
1: 'EXCEPTION_DEBUG_EVENT',
2: 'CREATE_THREAD_DEBUG_EVENT',
3: 'CREATE_PROCESS_DEBUG_EVENT',
4: 'EXIT_THREAD_DEBUG_EVENT',
5: 'EXIT_PROCESS_DEBUG_EVENT',
6: 'LOAD_DLL_DEBUG_EVENT',
7: 'UNLOAD_DLL_DEBUG_EVENT',
8: 'OUTPUT_DEBUG_STRING_EVENT',
9: 'RIP_EVENT',
}
EXCEPTION_MAXIMUM_PARAMETERS = 15
EXCEPTION_DATATYPE_MISALIGNMENT = 0x80000002
EXCEPTION_ACCESS_VIOLATION = 0xC0000005
EXCEPTION_ILLEGAL_INSTRUCTION = 0xC000001D
EXCEPTION_ARRAY_BOUNDS_EXCEEDED = 0xC000008C
EXCEPTION_INT_DIVIDE_BY_ZERO = 0xC0000094
EXCEPTION_INT_OVERFLOW = 0xC0000095
EXCEPTION_STACK_OVERFLOW = 0xC00000FD
class EXCEPTION_DEBUG_INFO(Structure):
_fields_ = [
("ExceptionCode", c_uint),
("ExceptionFlags", c_uint),
("ExceptionRecord", c_void_p),
("ExceptionAddress", c_void_p),
("NumberParameters", c_uint),
("ExceptionInformation", c_void_p * EXCEPTION_MAXIMUM_PARAMETERS),
]
class EXCEPTION_DEBUG_INFO(Structure):
_fields_ = [
('ExceptionRecord', EXCEPTION_DEBUG_INFO),
('dwFirstChance', c_uint),
]
class DEBUG_EVENT_INFO(Union):
_fields_ = [
("Exception", EXCEPTION_DEBUG_INFO),
]
class DEBUG_EVENT(Structure):
_fields_ = [
('dwDebugEventCode', c_uint),
('dwProcessId', c_uint),
('dwThreadId', c_uint),
('u', DEBUG_EVENT_INFO)
]
def run_with_debugger(args):
proc = subprocess.Popen(args, creationflags=1)
event = DEBUG_EVENT()
num_exception = 0
while True:
if WaitForDebugEvent(pointer(event), 10):
print event_names.get(event.dwDebugEventCode, 'Unknown Event %s' % event.dwDebugEventCode)
if event.dwDebugEventCode == 1:
num_exception += 1
exception_code = event.u.Exception.ExceptionRecord.ExceptionCode
if exception_code == 0x80000003L:
print "Unknow exception:", hex(exception_code)
else:
if exception_code == EXCEPTION_ACCESS_VIOLATION:
print "EXCEPTION_ACCESS_VIOLATION"
elif exception_code == EXCEPTION_INT_DIVIDE_BY_ZERO:
print "EXCEPTION_INT_DIVIDE_BY_ZERO"
elif exception_code == EXCEPTION_STACK_OVERFLOW:
print "EXCEPTION_STACK_OVERFLOW"
else:
print "Other exception:", hex(exception_code)
break
ContinueDebugEvent(event.dwProcessId, event.dwThreadId, DBG_CONTINUE)
retcode = proc.poll()
if retcode is not None:
return retcode
run_with_debugger(['crash.exe'])