Windows Services troubleshooting, recovery setting? - service

Right now i have some sort of services application on windows server 2003 for inputting data from devices into database.
Sometimes the services fail due to data error or anything else (database connection problem, internet connection down, etc) which i have to restart the services, right now the solution i provide for this problem was a simple batch command using NET START/STOP command that scheduled every 1 hour.
I then take a look at recovery tab on service properties, there was an option to restart the services, which i want to know was how to test it? Such as, how Windows know the services was failed? And the most important was how to know that services successfully restarted when failure occur (based on recovery setting)?
PS: I didn't have access to the code
Thanks

The service console's auto restart kicks in when a service crashes from an unhandled exception. (Some part of your code throws an exception, but nothing catches it, and it bubbles all the way up and out of the main function.)
If you have control over the code, it might be better to put in some try/catch blocks around the areas that tend to cause problems and handle errors more gracefully. You could also put a try/catch around the main entry point of the application, to catch and try to handle any unhandled exceptions from the code.
If you can't control the code, you can test the auto service recovery by forcing one of these errors to occur. If you service crashes in the event of a connection problem, you can force this by unplugging the network cable on the computer.

The easiest way to test the recovery options is to kill your service's process from the task manager. Windows will detect it and run the First Failure recovery option. Subsequent kills will test the Second Failure and Subsequent Failure options. The Event Log will note the exit and the actions taken.
Depending on your environment and your service this may or may not be a viable option for you as you are killing the service.

You can restore it back to an earlier point in time, Restoring Surface doesn’t change your personal files, but it might remove recently installed apps and drivers.
1.Swipe in from the right edge of the screen, and then tap Search.
(If you're using a mouse, point to the upper-right corner of the screen, move the mouse pointer down, then click Search.)
2.Enter Control Panel in the search box, and tap or click Control Panel.
3.Enter Recovery in the Control Panel search box, and then tap or click Recovery.
4.Tap or click Open System Restore, and then follow the instructions.

Related

Design and Error handling in windows service

I have to design windows service and I have some question:
Proper error handling, if there is an error what happens to the
service? Does he continue to be up and logging in? Error recording
in event viewer? Is he falling?
What happens to a long run? How do you know for sure that everything
is running as required, and he is not stuck?
How to handle high memory consumption, out of memory, or other error
that wasn't been write to the log?
Handle users - what happened if create log to user A and changed to
user B? Rewrite or continue from same point?
How to handle times? - Is the service automatically up?
Thank you.
For error handling, the best I can recommend is taking advantage of try / catch cases. This way you ensure that you handle the cases where something unexpected happens and you can either try to correct it or bring the service down cleanly. Keep in mind that exceptions are not propagated outside the scope of a thread so you need to handle them in each thread.
To be able to tell if the service is doing fine, you can periodically log into the Event Log what the service does. If you do proper try / catch for each thread, it should go smoothly. In C# you can use log4net and the EventLogAppender to log crucial / error info in the Event Log.
If your service causes high memory usage for no apparent reason, it is likely a memory leak. Microsoft has a free tool called ".Net CLR profiler" that allows you to properly profile your service and see what exactly is causing the leak.
Unless you are dealing with user-protected files (in which case you need to consult the Log On tab of your service to give it the appropriate credentials), your service shouldn't depend on any logged-in user. Services run independently of the users on the computer.
A service can be set to start automatically, to start only on-demand, or to simply be disabled completely.

Can I open and run from multiple command line prompts in the same directory?

I want to open two command line prompts (I am using CMDer) from the same directory and run different commands at the same time.
Would those two commands interrupt each other?
One is for compiling a web application I am building (takes like 7 minutes to compile), and the other is to see the history of the commands I ran (this one should be done quickly).
Thank you!
Assuming that CMDer does nothing else than to issue the same commands to the operating system as a standard cmd.exe console would do, then the answer is a clear "Yes, they do interfere, but it depends" :D
Break down:
The first part "opening multiple consoles" is certainly possible. You can open up N console windows and in each of them switch to the same directory without any problems (Except maybe RAM restrictions).
The second part "run commands which do or do not interfere" is the tricky part. If your idea is that a console window presents you something like an isolated environment where you can do things as you like and if you close the window everything is back to normal as if you never ever touched anything (think of a virtual machine snapshot which is lost/reverted when closing the VM) - then the answer is: This is not the case. There will be cross console effects observable.
Think about deleting a file in one console window and then opening this file in a second console window: It would not be very intuitive if the file would not have been vanished in the second console window as well.
However, sometimes there are delays until changes to the file system are visible to another console window. It could be, that you delete the file in one console and make a dir where the file is sitting in another console and still see that file in the listing. But if you try to access it, the operating system will certainly quit with an error message of the kind "File not found".
Generally you should consider a console window to be a "View" on your system. If you do something in one window, the effect will be present in the other, because you changed the underlying system which exists only once (the system is the "Model" - as in "Model-View-Controller Design Pattern" you may have heard of).
An exception to this might be changes to the environment variables. These are copied from the current state when a console window is started. And if you change the value of such a variable, the other console windows will stay unaffected.
So, in your scenario, if you let a build/compile operation run and during this process some files on your file system are created, read (locked), altered or deleted then this would be a possible conflicting situation if the other console window tries to access the same files. It will be a so called "race condition", that is, a non-deterministic process, which state of a file will be actual to the second console window (or both, if the second one also changes files which the first one wants to work with).
If there is no interference on a file level (reading the same files is allowed, writing to the same file is not), then there should be no problem of letting both tasks run at the same time.
However, on a very detailed view, both processes would interfere in that they need the same limited but vastly available CPU and RAM resources of your system. This should not pose any problems with the todays PC computing power, considering features like X separate cores, 16GB of RAM, Terabytes of hard drive storage or fast SSDs, and so on.
Unless there is a very demanding, highly parallelizable, high priority task to be considered, which eats up 98% CPU time, for example. Then there might be a considerable slow down impact on other processes.
Normally, the operating system's scheduler does a good job on giving each user-process enough CPU time to finish as quickly as possible, while still presenting a responsive mouse cursor, playing some music in the background, allowing a Chrome running with more than 2 tabs ;) and uploading the newest telemetry data to some servers on the internet, all at the same time.
There are techniques which make it possible that a file is available as certain snapshots to a given timestamp. The key word would be "Shadow Copy" under Windows. Without going into details, this technique allows for example defragmenting a file while it is being edited in some application or a backup could copy a (large) file while a delete operation is run at the same file. The operating system ensures that the access time is considered when a process requests access to a file. So the OS could let the backup finish first, until it schedules the delete operation to run, since this was started after the backup (in this example) or could do even more sophisticated things to present a synchronized file system state, even if it is actually changing at the moment.

Using socket interface keeps sending overflow warnings

I'm establishing a watch via the socket interface, and then subscribing to changes.
For each incoming PDU, if the map has a "warning" key, I output the warning to the console/user, as the docs suggest.
However, when an overflow happens, it looks like I don't get the "warning" key just once, but instead every incoming PDU has the same warning ("recrawl happened 1 time") over and over (AFAICT?), so I end up spamming the console with the same error message.
For me, it'd be preferable if Watchman only sent the "warning" key once per overflow event. Otherwise I'm looking at having to cache the "warnings already shown to the user" to avoid spamming the console.
Also, in terms of overflow behavior in general, the warning says:
To resolve, please review the information on
https://facebook.github.io/watchman/docs/troubleshooting.html#recrawl
To clear this warning, run:
`watchman watch-del ... ; watchman watch-project ...`
But I'd prefer to have a way to reset the warning without having to cancel and resubscribe my subscription. E.g. right now I have to control-c kill my program, run the watchman watch-del command, then restart my program.
Which I could automate internally, e.g. have my program detect the "overflow happened" warning message, kill it's subscription, issue a watch-del, and then re-issue the watch.
But, even if I could reset the warning via the socket interface, or do this internal watch-del, I'm wondering why the warning needs reset at all--e.g. in theory if watchman has already done a recrawl, and I told the user it happened (by logging it to the console), shouldn't things be fine now? Why is the watch-del + re-watch required in the first place?
E.g. as long as overflows are not happening constantly, it seems like watchman doing the recrawl (so gets back in sync with the file system) + issuing one warning PDU should mean everything is back to normal, and my user program could, ideally, stay dumb/simple and just keep getting the post-overflow/post-recrawl PDUs on it's same/existing subscription.
Sorry that this isn't as clear as it could be.
First: you don't strictly have to take action here, as the watchman service already recovered from the overflow.
It's just advising you that you may have a local configuration problem; if you are on Linux you might consider increasing the various inotify sysctl parameters. If you are on the mac there is very little you can do about this. The warning is sticky so that it is forced in the face of the user. We later had users request a way to suppress it, so the suggestion for deleting and restarting the watch was added.
The result was a pretty noisy warning that caused more confusion.
In watchman 4.7 we added a configuration option to turn off this warning: https://facebook.github.io/watchman/docs/config.html#suppress_recrawl_warnings
The intent here is to hide the warning from users that don't know how (or don't have permission) to remediate the system configuration. It works well in conjunction with the (undocumented) perf sampling configuration options that can record and report the volume of recrawls to a site-specific reporting system.

How to manage startup problems at the GUI level?

I'm working in an application that loads a few remote jsons at startup. The application has been programmed to do certain tests on the incoming data to prevent invalid states and to detect possible human errors. However, I am not sure of how we should treat such situation at the GUI level - our initial idea was to display a "Oops there was an unexpected server error. We are working to solve this issue. Please, try again later." popup to quit the application when the user hits an "Ok" or "Exit" button.
Apple apparently discourages exiting the application from within your code: https://developer.apple.com/library/ios/#qa/qa2008/qa1561.html
What good alternatives are there to handle this situation?
Update: I updated the error message above, since it was misleading.
I encountered a similar issue. My app was useless unless it could establish a connection to a server.
There are two ways around this:
place holder text, this can hold the position until you can get your json arrays, or at least allow a back drop for popping an alert.
Load a view with all interaction disabled, with a small message box saying "connecting..."
Basically I have taken the first responding storyboard frame and disabled everything that the user could touch. I just allowed static interaction like pressing a button to get to the about screen.
Don't beat yourself up too much about it though. If you don't have any connectivity at all then none of the user's apps are going to be functioning properly. I think in this state, from a GUI perspective it is mostly about damage control and protecting the user experience.
It's tough to be graceful at startup. I suggest presenting UI modally while your app gets ready to run. I asked and answered this SO question which shows a clean way to do the UI, including nice transition effects.
As for exiting: Your app should never self-terminate (copyright Arnold Schwartzenegger circa 2003). The correct app behavior when it can't get something done that has to be done modally is to alert the user and give the option to retry. If the user wants to not retry, there's a hardware home button on the phone.

How can I prevent Windows from catching my Perl exceptions?

I have this Perl software that is supposed to run 24/7. It keeps open a connection to an IMAP server, checks for new mail and then classifies new messages.
Now I have a user that is hibernating his XP laptop every once in a while. When this happens, the connection to the server fails and an exception is triggered. The calling code usually catches that exception and tries to reconnect. But in this case, it seems that Windows (or Perl?) is catching the exception and delivering it to the user via a message box.
Anyone know how I can prevent that kind of wtf? Could my code catch a "system-is-about-to-hibernate" signal?
To clear up some points you already raised:
I have no problem with users hibernating their machines. I just need to find a way to deal with that.
The Perl module in question does throw an exception. It does something like "die 'foo bar'. Although the application is completely browser based and doesn't use anything like Wx or Tk, the user gets a message box titled "poll_timer". The content of that message box is exactly the contents of $# ('foo bar' in this example).
The application is compiled into an executable using perlapp. The documentation doesn't mention anything about exception handling, though.
I think that you're dealing with an OS-level exception, not something thrown from Perl. The relevant Perl module is making a call to something in a DLL (I presume), and the exception is getting thrown. Your best bet would be to boil this down to a simple, replicable test case that triggers the exception (you might have to do a lot of hibernating and waking the machines involved for this process). Then, send this information to the module developer and ask them if they can come up with a means of catching this exception in a way that is more useful for you.
If the module developer can't or won't help, then you'll probably wind up needing to use the Perl debugger to debug into the module's code and see exactly what is going on, and see if there is a way you can change the module yourself to catch and deal with the exception.
It's difficult to offer intelligent suggestions without seeing relevant bits of code. If you're getting a dialog box with an exception message the program is most likely using either the Tk or wxPerl GUI library, which may complicate things a bit. With that said, my guess would be that it would be pretty easy to modify the exception handling in the program by wrapping the failure point in an eval block and testing $# after the call. If $# contains an error message indicating connection failure, then re-establish the connection and go on your way.
Your user is not the exception but rather the rule. My laptop is hibernated between work and home. At work, it is on on DHCP network; at home, it is on another altogether. Most programs continue to work despite a confusing multiplicity of IP addresses (VMWare, VPN, plain old connection via NAT router). Those that don't (AT&T Net Client, for the VPN - unused in the office, necessary at home or on the road) recognize the disconnect at hibernate time (AT&T Net Client holds up the StandBy/Hibernate process until it has disconnected), and I re-establish the connection if appropriate when the machine wakes up. At airports, I use the local WiFi (more DHCP) but turn of the wireless altogether (one physical switch) before boarding the plane.
So, you need to find out how to learn that the machine is going into StandBy or Hibernation mode for your software to be usable. What I don't have, I'm sorry to say, is a recipe for what you need to do.
Some work with Google suggests that ACPI (Advanced Configuration and Power Interface) is part of the solution (Microsoft). APM (Advanced Power Management) may also be relevant.
I've found a hack to avoid modal system dialog boxes for hard errors (e.g. "encountered and exception and needs to close"). I don't know if the same trick will work for this kind of error you're describing, but you could give it a try.
See: Avoiding the “encountered a problem and needs to close” dialog on Windows
In short, set the
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Windows\ErrorMode
registry key to the value “2″.