Best way to detect an application crash and restart it?

Best way to detect an application crash and restart it? - windows-xp

What's the best way to detect an application crash in XP (produces the same pair of 'error' windows each time - each with same window title) and then restart it?
I'm especially interested to hear of solutions that use minimal system resources as the system in question is quite old.
I had thought of using a scripting language like AutoIt (http://www.autoitscript.com/autoit3/), and perhaps triggering a 'detector' script every few minutes?
Would this be better done in Python, Perl, PowerShell or something else entirely?
Any ideas, tips, or thoughts much appreciated.
EDIT: It doesn't actually crash (i.e. exit/terminate - thanks #tialaramex). It displays a dialog waiting for user input, followed by another dialog waiting for further user input, then it actually exits. It's these dialogs that I'd like to detect and deal with.

Best way is to use a named mutex.
Start your application.
Create a new named mutex and take ownership over it
Start a new process (process not thread) or a new application, what you preffer.
From that process / application try to aquire the mutex. The process will block
When application finish release the mutex (signal it)
The "control" process will only aquire the mutex if either the application finishes or the application crashes.
Test the resulting state after aquiring the mutex. If the application had crashed it will be WAIT_ABANDONED
Explanation: When a thread finishes without releasing the mutex any other process waiting for it can aquire it but it will obtain a WAIT_ABANDONED as return value, meaning the mutex is abandoned and therfore the state of the section it was protected can be unsafe.
This way your second app won't consume any CPU cycles as it will keep waiting for the mutex (and that's enterely handled by the operating system)

How about creating a wrapper application that launches the faulty app as a child and waits for it? If the exit code of the child indicates an error, then restart it, else exit.

I think the main problem is that Dr. Watson displays a dialog
and keeps your process alive.
You can write your own debugger using the Windows API and
run the crashing application from there.
This will prevent other debuggers from catching the crash of
your application and you could also catch the Exception event.
Since I have not found any sample code, I have written this
Python quick-and-dirty sample. I am not sure how robust it is
especially the declaration of DEBUG_EVENT could be improved.
from ctypes import windll, c_int, Structure
import subprocess
WaitForDebugEvent = windll.kernel32.WaitForDebugEvent
ContinueDebugEvent = windll.kernel32.ContinueDebugEvent
DBG_CONTINUE = 0x00010002L
DBG_EXCEPTION_NOT_HANDLED = 0x80010001L
event_names = {
3: 'CREATE_PROCESS_DEBUG_EVENT',
2: 'CREATE_THREAD_DEBUG_EVENT',
1: 'EXCEPTION_DEBUG_EVENT',
5: 'EXIT_PROCESS_DEBUG_EVENT',
4: 'EXIT_THREAD_DEBUG_EVENT',
6: 'LOAD_DLL_DEBUG_EVENT',
8: 'OUTPUT_DEBUG_STRING_EVENT',
9: 'RIP_EVENT',
7: 'UNLOAD_DLL_DEBUG_EVENT',
}
class DEBUG_EVENT(Structure):
_fields_ = [
('dwDebugEventCode', c_int),
('dwProcessId', c_int),
('dwThreadId', c_int),
('u', c_int*20)]
def run_with_debugger(args):
proc = subprocess.Popen(args, creationflags=1)
event = DEBUG_EVENT()
while True:
if WaitForDebugEvent(pointer(event), 10):
print event_names.get(event.dwDebugEventCode,
'Unknown Event %s' % event.dwDebugEventCode)
ContinueDebugEvent(event.dwProcessId, event.dwThreadId, DBG_CONTINUE)
retcode = proc.poll()
if retcode is not None:
return retcode
run_with_debugger(['python', 'crash.py'])

I realize that you're dealing with Windows XP, but for people in a similar situation under Vista, there are new crash recovery APIs available. Here's a good introduction to what they can do.

Here is a slightly improved version.
In my test the previous code run in an infinite loop when the faulty exe generated an "access violation".
I'm not totally satisfied by my solution because I have no clear criteria to know which exception should be continued and which one couldn't be (The ExceptionFlags is of no help).
But it works on the example I run.
Hope it helps,
Vivian De Smedt
from ctypes import windll, c_uint, c_void_p, Structure, Union, pointer
import subprocess
WaitForDebugEvent = windll.kernel32.WaitForDebugEvent
ContinueDebugEvent = windll.kernel32.ContinueDebugEvent
DBG_CONTINUE = 0x00010002L
DBG_EXCEPTION_NOT_HANDLED = 0x80010001L
event_names = {
1: 'EXCEPTION_DEBUG_EVENT',
2: 'CREATE_THREAD_DEBUG_EVENT',
3: 'CREATE_PROCESS_DEBUG_EVENT',
4: 'EXIT_THREAD_DEBUG_EVENT',
5: 'EXIT_PROCESS_DEBUG_EVENT',
6: 'LOAD_DLL_DEBUG_EVENT',
7: 'UNLOAD_DLL_DEBUG_EVENT',
8: 'OUTPUT_DEBUG_STRING_EVENT',
9: 'RIP_EVENT',
}
EXCEPTION_MAXIMUM_PARAMETERS = 15
EXCEPTION_DATATYPE_MISALIGNMENT = 0x80000002
EXCEPTION_ACCESS_VIOLATION = 0xC0000005
EXCEPTION_ILLEGAL_INSTRUCTION = 0xC000001D
EXCEPTION_ARRAY_BOUNDS_EXCEEDED = 0xC000008C
EXCEPTION_INT_DIVIDE_BY_ZERO = 0xC0000094
EXCEPTION_INT_OVERFLOW = 0xC0000095
EXCEPTION_STACK_OVERFLOW = 0xC00000FD
class EXCEPTION_DEBUG_INFO(Structure):
_fields_ = [
("ExceptionCode", c_uint),
("ExceptionFlags", c_uint),
("ExceptionRecord", c_void_p),
("ExceptionAddress", c_void_p),
("NumberParameters", c_uint),
("ExceptionInformation", c_void_p * EXCEPTION_MAXIMUM_PARAMETERS),
]
class EXCEPTION_DEBUG_INFO(Structure):
_fields_ = [
('ExceptionRecord', EXCEPTION_DEBUG_INFO),
('dwFirstChance', c_uint),
]
class DEBUG_EVENT_INFO(Union):
_fields_ = [
("Exception", EXCEPTION_DEBUG_INFO),
]
class DEBUG_EVENT(Structure):
_fields_ = [
('dwDebugEventCode', c_uint),
('dwProcessId', c_uint),
('dwThreadId', c_uint),
('u', DEBUG_EVENT_INFO)
]
def run_with_debugger(args):
proc = subprocess.Popen(args, creationflags=1)
event = DEBUG_EVENT()
num_exception = 0
while True:
if WaitForDebugEvent(pointer(event), 10):
print event_names.get(event.dwDebugEventCode, 'Unknown Event %s' % event.dwDebugEventCode)
if event.dwDebugEventCode == 1:
num_exception += 1
exception_code = event.u.Exception.ExceptionRecord.ExceptionCode
if exception_code == 0x80000003L:
print "Unknow exception:", hex(exception_code)
else:
if exception_code == EXCEPTION_ACCESS_VIOLATION:
print "EXCEPTION_ACCESS_VIOLATION"
elif exception_code == EXCEPTION_INT_DIVIDE_BY_ZERO:
print "EXCEPTION_INT_DIVIDE_BY_ZERO"
elif exception_code == EXCEPTION_STACK_OVERFLOW:
print "EXCEPTION_STACK_OVERFLOW"
else:
print "Other exception:", hex(exception_code)
break
ContinueDebugEvent(event.dwProcessId, event.dwThreadId, DBG_CONTINUE)
retcode = proc.poll()
if retcode is not None:
return retcode
run_with_debugger(['crash.exe'])

Related

Obtaining register address in STM32

I have a simple code for preserving and later using register address:
PWMChannel::PWMChannel(TIM_HandleTypeDef *timer, int channel)
{
switch(channel)
{
case 1: ccr = &(timer->Instance->CCR1); break;
case 2: ccr = &(timer->Instance->CCR2); break;
case 3: ccr = &(timer->Instance->CCR3); break;
case 4: ccr = &(timer->Instance->CCR4); break;
case 5: ccr = &(timer->Instance->CCR5); break;
case 6: ccr = &(timer->Instance->CCR6); break;
}
}
Where ccr is a private class member: uint32_t *ccr = nullptr;
It is used to change duty cycle like this: *ccr = duty;
The code above worked just fine some time ago when I was using System Workbench. Recently I switched to CubeIDE. The only issue with porting project to new toolchain was with this member definition - it now required "volatile", so I changed it to: volatile uint32_t *ccr = nullptr;
However the code stopped working. The debugging shows that with channel parameter = 4 the ccr value becomes 0x40. Now, 0x40 is an offset of CCR4 within TIM_TypeDef structure (referenced by Instance), not an actual address of CCR4. If this is how it supposed to be then why it worked before and how do I change the code to make it work again?

OK, I think I figured out what happened.
The object in question is statically initialized. Somehow this initialization happened after MX_TIM1_Init() call (in main.c) when I was building under System Workbench and before that call when I was building under CubeIDE. The "Instance" field is not yet initialized by the time constructor called, so it got wrong register address.
This is not the first time order of initialization bites me... :(
I already have most of the global objects either dynamically allocated or hidden behind functions in singleton pattern. Did not want to do it for this tiny helper object.
If somebody thinks this question is useless, tell me and I'll delete it.

Trying to catch MsgBox text and press button in xlwings

So I have some code which uses xlwings for writing data in Excel file, xlsm.
after i've done writing, I press a certain button to calculate.
sometimes, an error/message pops in the Excel, which is fine, but i want to catch this message to python, and write it later to a log/print it.
also, i need to interact with this message, in this case to press "Ok" in the message box
Attached image of the message box

So guys, I've been able to solve this with an external python library.
here is the code:
from pywinauto import application as autoWin
app = autoWin.Application()
con = app.connect(title = 'Configuration Error')
msgText = con.Dialog.Static2.texts()[0]
con.Dialog.Button.click()
con.Dialog.Button.click()
print(msgText)
basically, what it does, is connecting to the app, and searching for the title.
in this case "Configuration Error"
it needs to perform double click in order to press "Ok" to close the message.
Secondly, it gets the text from the message, and can forward it wherever i want.
important part to remember though, because this should be an automated task, it should run concurrently, which means Threading.
so, a simple Thread class below:
class ButtonClicker(Thread):
def __init__(self):
Thread.__init__(self)
self._stop_event = Event()
def stop(self):
self._stop_event.set()
def stopped(self):
return self._stop_event.is_set()
def run(self) -> None:
while True:
time.sleep(3)
try:
app = autoWin.Application()
con = app.connect(title='Configuration Error')
msg_data = con.Dialog.Static2.texts()[0]
while True:
con.Dialog.Button.click()
# con.Dialog.Button.click()
# print(msg_data)
return msg_data
except Exception as e:
print('Excel didnt stuck')
break
and of course to actually use it:
event_handle = ButtonClicker()
event_handle.start()
some manipulation is needed in order to work in different codes/scenarios, but at least I hope i will help others in the future, because this seems to be very common question.

#Danny's solution, i.e. pywinauto and Thread, works perfectly in my local machine, but it seems can't catch the message box when Excel is running in server mode, e.g. in my case, the automation is triggered in local and started by a system service installed in the server.
pywinauto.findwindows.ElementNotFoundError:
{'title': '<my-wanted-title>', 'backend': 'win32', 'visible_only': False}
It is finally solved with another python third-party library pywin32, so providing a backup solution here.
'''
Keep finding message box with specified title and clicking button to close it,
until stopped by the main thread.
'''
import time
from threading import Thread, Event
import win32gui
import win32con
class ButtonClicker(Thread):
def __init__(self, title:str, interval:int):
Thread.__init__(self)
self._title = title
self._interval = interval
self._stop_event = Event()
def stop(self):
'''Stop thread.'''
self._stop_event.set()
#property
def stopped(self):
return self._stop_event.is_set()
def run(self):
while not self.stopped:
try:
time.sleep(self._interval)
self._close_msgbox()
except Exception as e:
print(e, flush=True)
def _close_msgbox(self):
# find the top window by title
hwnd = win32gui.FindWindow(None, self._title)
if not hwnd: return
# find child button
h_btn = win32gui.FindWindowEx(hwnd, None,'Button', None)
if not h_btn: return
# show text
text = win32gui.GetWindowText(h_btn)
print(text)
# click button
win32gui.PostMessage(h_btn, win32con.WM_LBUTTONDOWN, None, None)
time.sleep(0.2)
win32gui.PostMessage(h_btn, win32con.WM_LBUTTONUP, None, None)
time.sleep(0.2)
if __name__=='__main__':
t = ButtonClicker('Configuration Error', 3)
t.start()
time.sleep(10)
t.stop()

Call script through subproce.Popen without blocking explanation needed

Basically this code reads from a pipe and constantly prints output without blocking ... Here is the whole code:
1)First script:
if __name__ == '__main__':
for i in range(5):
print str(i)+'\n',
sys.stdout.flush()
time.sleep(1)
2) Second script:
def log_worker(stdout):
while True:
output = non_block_read(stdout).strip()
if output:
print output
def non_block_read(output):
''' even in a thread, a normal read with block until the buffer is full '''
fd = output.fileno()
fl = fcntl.fcntl(fd, fcntl.F_GETFL)
fcntl.fcntl(fd, fcntl.F_SETFL, fl | os.O_NONBLOCK)
try:
return output.read()
except:
return ''
if __name__ == '__main__':
mysql_process = subprocess.Popen(['python','-u', 'flush.py'], stdin=sys.stdin,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
thread = Thread(target=log_worker, args=[mysql_process.stdout])
thread.daemon = True
thread.start()
mysql_process.wait()
thread.join(timeout=1)
I would like to know why it works that way :
1) If i take Thread absolutely away , and just call log_worker in the main at also prints everything one by one , but the problem is it hangs after completion without finishing. And i read somewhere that here thread is exactly used for it to finish or more correctly thread dies when it finishes printing something , So why does it work that way ? What thread exactly does here and how ?
2) If i keep the thread but remove mysql_process.wait() and thread.join it prints nothing .... Why ? I read that Popen.wait is meant for its child process to terminate. Set and return returncode attribute. What is the child process here and why/how it is child O_O ?
3) If i only remove thread.join(timeout=1) then it finishes but with error Exception in thread Thread-1 (most likely raised during interpreter shutdown):. Why ? What role .join plays here.
4) I read the documentation of functions used in non_block_read function , but still am confused. Okay it is obvious they take file descriptor and set it to non blocking . The thing i am confused is , on what can i use all those functions , i mean i understand that on files , but how come they use it on stdout O_O ? It's not a file , it's a stream ~~ ?
All this i do to execute a script with subprocess.Popen in tornado script , and constantly send output to client/myself without blocking, so if anyone can kind of help me do that i would really appreciate it , because i can't imagine how to kind of get this output from thread in a way that i can constanly insert it in self.send in tornadio2 ...
def on_message(self, message):
# list = subprocess.Popen([r"ls", "-l"], stdout=subprocess.PIPE)
# list_stdout = list.communicate()[0]
for i in range(1,10):
time.sleep(1)
self.send(i)

With the dev version (3.2), that's realy easy
from tornado.web import RequestHandler,asynchronous
from tornado.process import Subprocess
class home(RequestHandler):
#asynchrounous
def get(self):
seld.sp=Subprocess('date && sleep 5 && date',shell=True,stdout=Subprocess.STREAM)
self.sp.set_exit_callback(self.next)
def next(self,s):
self.sp.stdout.read_until_close(self.finish)
application = ...

simple parallel processing in perl

I have a few blocks of code, inside a function of some object, that can run in parallel and speed things up for me.
I tried using subs::parallel in the following way (all of this is in a body of a function):
my $is_a_done = parallelize {
# block a, do some work
return 1;
};
my $is_b_done = parallelize {
# block b, do some work
return 1;
};
my $is_c_done = parallelize {
# block c depends on a so let's wait (block)
if ($is_a_done) {
# do some work
};
return 1;
};
my $is_d_done = parallelize {
# block d, do some work
return 1;
};
if ($is_a_done && $is_b_done && $is_c_done && $is_d_done) {
# just wait for all to finish before the function returns
}
First, notice I use if to wait for threads to block and wait for previous thread to finish when it's needed (a better idea? the if is quite ugly...).
Second, I get an error:
Thread already joined at /usr/local/share/perl/5.10.1/subs/parallel.pm line 259.
Perl exited with active threads:
1 running and unjoined
-1 finished and unjoined
3 running and detached

I haven't seen subs::parallel before, but given that it's doing all of the thread handling for you, and it seems to be doing it wrong, based on the error message, I think it's a bit suspect.
Normally I wouldn't just suggest throwing it out like that, but what you're doing really isn't any harder with the plain threads interface, so why not give that a shot, and simplify the problem a bit? At the same time, I'll give you an answer to the other part of your question.
use threads;
my #jobs;
push #jobs, threads->create(sub {
# do some work
});
push #jobs, threads->create(sub {
# do some other work
});
# Repeat as necessary :)
$_->join for #jobs; # Wait for everything to finish.
You need something a little bit more intricate if you're using the return values from those subs (simply switching to a hash would help a good deal) but in the code sample you provided, you're ignoring them, which makes things easy.

Critical section problem

proces P0: proces P1:
while (true) while (true)
{ {
flag[0] = true; flag[1] = true;
while (flag[1]) while (flag[0])
{ {
flag[0] = false; flag[1] = false;
flag[0] = true; flag[1] = true;
} }
crit0(); crit1();
flag[0] = false; flag[1] = false;
rem0(); rem1();
} }
Could someone give me a scenario with context switches to prove if the above stated code meets the requirements of progress and bounded waiting.
And can anyone give me some tips about how to detect if a code meets the requirements of progress or bounded waiting(and maybe including starvation,deadlock and after-you after you)

The two processes are happening at the same time.
The trick here is that since there is nothing truly synchronizing the two programs, something could happen between lines. On the same note, it's possible things happen at the same time.
To see how this can be an issue, think about this situation...
What would happen if the first flag[0] = true and the first flag[1] = true happened on P0/P1 at exactly the same time?
Both process 1 and process 2 would be stuck in a while loop. How would they exit the while loop? One process would have to check while(flag[other]) at exactly the same moment the other process set their flag[me] to true. This is a very narrow time span. It's the equivalent of rolling dice over and over and not continuing until you hit a certain number.
This is why we need something of a higher level to handle the synchronization for us - real locks and the like.
edit: Oh, one other thing. You may want to check to see if the read/write operations are thread safe. What happens if the system tries to write to the bit the same time it tries to read it?
edit2: FYI - http://msdn.microsoft.com/en-us/library/aa645755(v=VS.71).aspx

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Best way to detect an application crash and restart it? - windows-xp

How about creating a wrapper application that launches the faulty app as a child and waits for it? If the exit code of the child indicates an error, then restart it, else exit.

I realize that you're dealing with Windows XP, but for people in a similar situation under Vista, there are new crash recovery APIs available. Here's a good introduction to what they can do.

Related

Obtaining register address in STM32

Trying to catch MsgBox text and press button in xlwings

Call script through subproce.Popen without blocking explanation needed

simple parallel processing in perl

Critical section problem

Categories

Resources