SSH: Torch-Lua Script Unexpectedly Stops - command-line

I'm trying to run a Torch-Lua script on the luajit interpreter by using SSH on a remote (Ubuntu 14.04) machine. It runs only for two iterations and displays all the outputs accordingly, but, as soon as the third iteration is going to complete, due to some unexpected reason, it seems to stop all by itself and I am returned to the terminal of the remote machine.
It doesn't display any standard OS messages, like the 'luajit' process being killed or being terminated with a signal. I used 'top' to check if it is running in the background but that is not the case. Neither it is the case that the remote machine is turning off, nor am I losing connection because I stay connected to the remote machine through SSH. And certainly, the script shouldn't have any issues as the exact same script runs till completion in my local machine. I would also like to mention that I have sudo-er permissions on the remote machine as well.
I am posting this because I have tried the same on two different, independent remote machines and it behaves in the same way. Can someone please help me by sharing what might be the cause(s) behind the "mysterious" way this script/machine might be behaving and possible solution(s) which I could try?
Thanks in advance.
EDIT:
The following is the output which I receive on the terminal every time I run the same script:
==> the main loop
==> online epoch # 1 [batchSize = 128]
[==================== 15/15 ==================>] Tot: 46s400ms | Step: 3s314ms
Train accuracy: 4.90 % time: 50.33 s
==> testing
Test accuracy: 1.50 %= 8 time: 3.05 s====>......] ETA: 387ms | Step: 3ms
==> online epoch # 2 [batchSize = 128]
[==================== 15/15 ==================>] Tot: 49s439ms | Step: 3s531ms
Train accuracy: 5.05 % time: 50.44 s
==> testing
Test accuracy: 1.50 %= 8 time: 2.92 s====>......] ETA: 369ms | Step: 2ms
==> online epoch # 3 [batchSize = 128]
[==================== 15/15 ==================>] Tot: 50s620ms | Step: 3s615ms
Train accuracy: 5.00 % time: 51.38 s
user-name#my-remote-machine:~/path/to/script$
(As you can see from the output, the script is essentially a training-testing procedure for a conv-net.)

After some thinking over and debugging, I found the issue with my script and resolved it.
Neither the SSH nor the system's configuration was terminating the script's execution. The problem was a little one with my script. Since the remote-machine that I was connecting to was not accessible as a standard desktop (by which I mean that it didn't have any desktop environment like GNOME), so I couldn't do 'ssh -x' to the machine. All the interactions with the machine could be done just with the command line.
So, there was one feature of "live plot" in my script which basically took the training/testing logs, being actively created by the script after each epoch, and displayed training/testing accuracy-versus-epoch plot (using 'gnuplot'). In my original script (which ran on my CPU-only, desktop-environment-enabled machine), it was enabled. Since I was initially using the same script on my remote machine, the same enabling was causing this strange problem in my case. After I disabled it, I was able to get the epochs running and the training-testing procedure working correctly, as I expected it to. In my script, it was just a flag, which I had to set to true/false, in order to enable/disable this "live plot" feature (similar to the way it is done in this tutorial).

Related

Skip host while nmap is running

Is there a way to skip a host while it is being scanned. I am providing a list of hosts to nmap and while it is scanning from that list, I would like to skip one host because the scripts keep running on that host hence delaying my scan. Please suggest.
Thanks
There is not a way during runtime to stop scanning a host. However, you can impose time limits on how long Nmap spends on a particular host. The --host-timeout option will cause Nmap to drop all results and stop scanning a target when the timeout expires. Unfortunately, this means all that work is lost. But there is a better way, if NSE scripts are slowing you down.
Nmap 7.30 added the --script-timeout option, which puts a time limit on each NSE script that runs against a target. Any script that exceeds the time limit will be terminated and will produce no output, but any other scripts will be allowed to run. No port scan, OS detection, or traceroute data will be lost.
Your last option if NSE is taking too long is to find out which script is causing the problem. Most NSE scripts are designed to run quickly; even most of the brute-force password guessing scripts enforce a 10-minute time limit. But sometimes there are bugs, and other times you may select a script with an intentionally long run time. In debug mode (-d or press d during runtime), printing a status line (by pressing any key during execution) will show a list of running scripts when there are 5 or fewer running. At debug level 2 (-dd or press d twice), a full stack trace of each running script thread is produced, which can help Nmap developers debug delays. If you suspect a misbehaving script, you can file a bug report on Github or send it to dev#nmap.org.
nmap has a host timeout option which will give up on any host that takes longer than the provided value. So, the below option would give up on any host that takes longer than 10 minutes. You can read more about the various timing related options here.
nmap --host-timeout 10m

Microsoft Deployment Toolkit setting SystemAutoLogon registry key when deploying upgraded OS

I'm trying to deploy images via MDT that have been upgraded via the MDT "Standard Client Upgrade" task sequence. My images started as Win10 v1607 images and are updated to v1703 and then captured.
When I go to deploy the captured images, I'll get a popup on first login that c:\LTIBootstrap.vbs can't be found. Digging, I discovered that after the OS is installed and the PC restarts, the MDT task sequence continues running as the SYSTEM account . This is bizarre as it typically runs as the built-in Administrator account.
For some reason, even though the unattend.xml file contains the usual AutoAdminLogon entries, a registry key at
HKLM\Software\Microsoft\Windows NT\CurrentVersion\Winlogon\SystemAutoLogon
is being created and set to 1 during the deployment. (I discovered this by comparing the registries at the end of deployment.) This key is not present in the captured image. This key does not get created if I deploy an image that is manually updated to v1703 (via Windows Update instead of MDT).
Any ideas on why the unattend.xml could be ignored or what would cause SystemAutoLogon to get created and set?
I figured out what was going on.
The MDT Upgrade task sequence invokes the upgrade with the command line /postoobe option pointing to setupcomplete.cmd. This causes the file to be copied to c:\windows\setup\scripts\setupcomplete.cmd. When windows install is complete, if a file is present at that location, it is run under the SYSTEM account.
The problem is that this file remains even after the upgrade task sequence is totally complete. So if you then capture the image and deploy it to a real machine, it will see setupcomplete.cmd and run it after the deploy, instead of using the usual default Administrator account.
I imagine the presence of this file at c:\windows... is what causes the registry changes mentioned above. setupcomplete.cmd is only built to bootstrap an upgrade back into the MDT task sequence, and needs to be removed from c:\windows... when the task sequence is done running.
Knowing that the post-upgrade portion of the upgrade task-sequence runs as SYSTEM instead of Administrator via a very different mechanism than standard deployment is important, as there are then limits to what you can do. By default the sequence lets you install applications.. they need to be apps that are ok being installed by SYSTEM.
For now I've updated my local SetupComplete.cmd in my scripts directory to delete itself when it is done by changing the last for loop to this (there was also a typo in the for loop before preventing the exit echo):
for %%d in (c d e f g h i j k l m n o p q r s t u v w x y z) do if exist %%d:\Windows\Setup\Scripts\setupcomplete.cmd (
del /q /f %%d:\Windows\Setup\Scripts\setupcomplete.cmd
echo %DATE%-%TIME% Exiting SetupComplete.cmd >> %WINDIR%\Temp\setupcomplete.log)
After thinking about this more and hitting issues due to running as the SYSTEM account, I started playing with avoiding running as the SYSTEM account. (One big problem is that if you want to shutdown at the end of the task sequence right after a reboot occurs, SYSTEM starts running too fast, and the call to shutdown in MDT fails.)
The idea is to instead use SetupComplete.cmd running as SYSTEM to simply bootstrap back into running the task sequence as the default Administrator.
There are a few wrinkles to implementing this. Namely, the synchronous commands that run from unattend.xml during a normal install do not run, so things like enabling admin, disabling uac for admin, disable user account page, disable async run once all have to be invoked manually. Beyond that, it is just a matter of setting the right registry entries by calls to PopulateAutoAdminLogon and SetStartMDT via a step in the task sequence after the OS upgrade is complete, and then performing a restart. This seems to work pretty well. The ideal way to do this would be to have the same script that calls PopulateAutoAdminLogon/SetStartMDT also parse unattend.xml and run those commands.
For some reason shell hiding does not work even though everything is set for it. My best guess is that the task sequence runner is doing this because IsOSUpgrade is set, but am not sure.
With this approach, SetupComplete.cmd is just responsible for a single bootstrap back into the task sequence, and the task sequence can delete it at the same time that it calls a script to do PopulateAutoAdminLogon/SetStartMDT
There is enough work to be done to fully polish this approach that I'll just workaround the one autologin issue for now, but it really does feel like a better way for MDT to work when doing upgrades. Hopefully they'll flesh it out in the future.

Automating Matlab without requiring the user to be logged in

Is it possible to setup Matlab to run a specific script in the background when the user is NOT logged in? The script works fine on its own on a Windows Server 2008 machine with Matlab R2014a. It doesn't need a gui for the script to complete, but I'm guessing that Matlab requires user-specific environments to be set. Is there a place where this can be set ahead of time maybe?
I have tried "Task Scheduler" and it works just fine, but you have to set the setting to run only when that particular user is logged in or else nothing happens. The problem, of course, is the user session would require continuous monitoring in order to remain logged in (power outage, updates, etc.).
Has anyone dealt with this in the past? We've considered compiling it, but apparently there are certain functions and objects that the script uses (I didn't write it) that don't carryover during compilation.
Any thoughts or suggestions are welcome!
I've done some work for a client where we have an instance of MATLAB running continuously on a server, doing some stuff. The server occasionally fails (power outages, IT dept screw-ups etc), and it needs to be brought back up automatically.
Note that MATLAB does need to be run as a user for licensing reasons, so our MATLAB instance always runs under a designated account, with a license dedicated to running that instance continuously.
We have a Windows batch file to start up a suitable MATLAB instance, that contains a command similar to the following:
CALL matlab.exe -nosplash -nodesktop -sd "myStartupFolder" -r "myMATLABCommand"
We then have a scheduled task set up so that 5 minutes after that account logs in, the batch file runs, and we have Windows set up so that when Windows starts, that account is automatically logged in (I'm no Windows admin, but I think we had to do some weird stuff in order to enable that, such as adding the account to some special domain group, or giving the account special privileges - you may need to research that a little more).
Anyway, that solved the issue for us. If the server goes down and then recovers (perhaps IT bring it back up), the account is automatically logged in, the batch file runs, and the MATLAB instance is brought back up. If we need (rarely) to log in directly under that account without the task running, we have a 5 minute window to stop the scheduled task from running, which is no problem.
Hope that helps!
Unfortunately and afaik, Matlab can only be startet without GUI on Linux (maybe on Mac OS X too?).
~$ cat /tmp/stackoverflow.m
s='stackoverflow';
length(s)
~$ ./R2013a/bin/matlab -nodisplay -nojvm -nodesktop -nosplash -r "run /tmp/stackoverflow.m, exit"
< M A T L A B (R) >
Copyright 1984-2013 The MathWorks, Inc.
R2013a (8.1.0.604) 64-bit (glnxa64)
February 15, 2013
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
ans =
13
~$
However Matlab itself is not capable of Shebang #! in a Bash script. So it's always a workaround.
A better solution might be to run your Matlab instance continuously and write a daemon/script, which will run you .m script time-dependent for example.
A much better way is to use the Matlab Coder Toolbox (if you have it) and compile a stand-alone binary from you .m file. This binary should be easily runable with task schedular on Windows.

matlab parallel computing toolbox--failed to start matlabpool using 'local' profile

My desktop runs in Ubuntu 12.04 LTS and the matlab is R2013a. I'm doing local parallel computing(use multicores of my desktop).
Before using the following command to start matlabpool, I've already validated the local configuration of parallel computing toolbox. To verify this point, I've attached the figure 1.
figure 1
matlabpool local 4
But it takes like forever to start the matlabpool. After running 10 minutes, the command line is still like:
Starting matlabpool using the 'local' profile ...
So I use ctrl+c to stop it. It always give me:
Operation terminated by user during
parallel.internal.pool.InteractiveClient>iGetSingleConnection (line
737)
Based on the above information, it seems that it get stuck at iGetSingleConnection.
Thanks,
I don't know about Ubuntu, but in Windows when you install a new version of MATLAB, the old firewall rules do not apply to the new executables. So, you need to open the firewall to allow access to the smpd.exe, mpiexec.exe and MATLAB.exe processes. For example, in Windows, I get one of these:
Then I need to go into Windows firewall settings and make the rules. Here is how to create an inbound program rule in Windows 7/8. Maybe there is something similar in Ubuntu.

Attempt to access remote folder mounted with CIFS hangs when disconnected

This question is an extension for that question.
Yet again: I'm working under CentOS 6.0 and I have a remote win7 folder, mounted with:
mount -t cifs //PC128/mnt /media/net -o "username=WORKGROUP\user,password=pwd,rw,noexec,soft,uid=user,gid=user"
When remote folder is not available (e.g. network cable is pulled out) an attempt to access the remote folder locks an application I'm working on. At first I detected that QDir::exists() caused locking for 20-90 seconds (I still can't find out why such difference), further I detected that any call to stat() function leads to application lock.
I followed an advice provided in topic above, I moved QDir::exists() call (and later - call to the stat() function) to another thread and this didn't solve the problem. The application still hangs when connection is suddenly lost. Qt trace shows that lock is somewhere in the kernel:
0 __kernel_vsyscall
1 __xstat64#GLIBC_2.1 /lib/libc.so.6
2 QFSFileEnginePrivate::doStat stat.h
I did also tried to check if remote share is still mounted before trying to access folder itself, but it didn't help. Approaches such as:
mount | grep /media/net
show that shared folder is still mounted even is there is no active connection to the network.
Checking folder status differences such as:
stat -fc%t:%T /media/net/ != stat -fc%t:%T /media/net/..
also hangs for ~20 seconds.
So I have several questions:
Is there any way to change CIFS timeouts? I did try to find out but it seems that there is no appropriate parameters and no CIFS config.
How can I check if remote folder is still mounted and not get locked?
How can I check is folder exists and also not get locked?
Your problem: "An unreachable network filesystem" is a very well known example which trigger linux hung task which isn't the same of zombies process at all(killing the parent PID won't do anything)
An hung task, is task which triggered a system call that cause problem in the kernel, so that the system call never return.
The major particularity is that the task is declared in the "D" state by the scheduler which mean the program is in an uninterruptible state. This mean that you can do nothing to stop you program: You can trigger all signal to the task, it would not respond. Launching hundreds of SIGTERM/SIGKILL does nothing!
This the case whith my old kernel: when my nfs server crash, I need to reboot the client to kill the tasks using the filesystem. I compiled it a long time ago (I have still the build tree on my hdd) and during the configuration I saw this in lib/Kconfig.debug:
config DETECT_HUNG_TASK
bool "Detect Hung Tasks"
depends on DEBUG_KERNEL
default LOCKUP_DETECTOR
help
Say Y here to enable the kernel to detect "hung tasks",
which are bugs that cause the task to be stuck in
uninterruptible "D" state indefinitiley.
When a hung task is detected, the kernel will print the
current stack trace (which you should report), but the
task will stay in uninterruptible state. If lockdep is
enabled then all held locks will also be reported. This
feature has negligible overhead.
It was only proposing to detect such tash or panic on detection: I don't checked if recent kernel actually can solve the problem (It seems to be the case with your question), but I think it didn't worth enabling it.
There is second problem : normally, the detection occur after 120 seconds, but I saw also a Konfig option for this:
config DEFAULT_HUNG_TASK_TIMEOUT
int "Default timeout for hung task detection (in seconds)"
depends on DETECT_HUNG_TASK
default 120
help
This option controls the default timeout (in seconds) used
to determine when a task has become non-responsive and should
be considered hung.
It can be adjusted at runtime via the kernel.hung_task_timeout_secs
sysctl or by writing a value to
/proc/sys/kernel/hung_task_timeout_secs.
A timeout of 0 disables the check. The default is two minutes.
Keeping the default should be fine in most cases.
This also works with kernel threads: example: make a loop device to a file on a fuse filesystem. Then crash the userspace program controlling the fuse filesystem!
You should a get a Ktread which name is in the form loopX (X correspond normally to your loopback device number) HUNGing!
weblinks:
https://unix.stackexchange.com/questions/5642/what-if-kill-9-does-not-work (look at the answer written by ultrasawblade)
http://www.linuxquestions.org/questions/linux-general-1/kill-a-hung-task-when-kill-9-doesn't-help-697305/
http://forums-web2.gentoo.org/viewtopic-t-811557-start-0.html
http://comments.gmane.org/gmane.linux.kernel/1189978
http://comments.gmane.org/gmane.linux.kernel.cifs/7674 (This is a case similar to yours)
In your case of the three question: you have the answer: This probably due to what is probably a well known bug in the vfs linux kernel layer! (There is no CIFS timeouts)
After much trial & error I found a solution that persists.
# vim /etc/fstab
//192.168.1.122/myshare /mnt/share cifs username=user,password=password,_netdev 0 0
The _netdev option is important since we are mounting a network device. Clients may hang during the boot process if the system encounters any difficulties with the network.
https://www.redhat.com/sysadmin/samba-windows-linux