IPython MPI with a Machinefile - ipython

I want to use IPython's MPI abilities with distributed computing. Namely I would like MPI to be run with a machine file of sorts so I can add multiple machines.
EDIT:
I forgot to include my configuration.
Configuration
~/.ipython/profile_default/ipcluster_config.py
# The command line arguments to pass to mpiexec.
c.MPILauncher.mpi_args = ["-machinefile ~/.ipython/profile_default/machinefile"]
# The mpiexec command to use in starting the process.
c.MPILauncher.mpi_cmd = ['mpiexec']
Bash Execution
$ dacluster start -n20
2015-06-10 16:16:46.661 [IPClusterStart] Starting ipcluster with [daemon=False]
2015-06-10 16:16:46.661 [IPClusterStart] Creating pid file: /home/aidan/.ipython/profile_default/pid/ipcluster.pid
2015-06-10 16:16:46.662 [IPClusterStart] Starting Controller with MPI
2015-06-10 16:16:46.700 [IPClusterStart] ERROR | IPython cluster: stopping
2015-06-10 16:16:47.667 [IPClusterStart] Starting 20 Engines with MPIEngineSetLauncher
2015-06-10 16:16:49.701 [IPClusterStart] Removing pid file: /home/aidan/.ipython/profile_default/pid/ipcluster.pid
Machinefile
~/.ipython/profile_default/machinefile
localhost slots=8
aidan-slave slots=16
I might mention that it works when I run
mpiexec -machinefile machinefile mpi_hello
And the output of that execution includes hostnames, so I am sure it is actually distributing. Plus I watch on top.
Thank you,

I guess I asked too soon. the problem was in the below line
c.MPILauncher.mpi_args = ["-machinefile ~/.ipython/profile_default/machinefile"]
It should have been split on the spaces with absolute path
c.MPILauncher.mpi_args = ["-machinefile", "/home/aidan/.ipython/profile_default/machinefile"]
I hope this can help someone. Note that this solves only the problem in the BASH output. The connection is made with MPI to a remote server (namely aidan-slave). If start the dacluster, then I see in top a bunch of python sessions start, symptomatic of a IPython session running remotely.
Unfortunately, DistArray examples, at least pi_montecarlo, hang indefinitely. I worked back to the source of the issue and found that the line that is hanging in line 736 in the context.py file of the globalapi module in distarray.
def _execute(self, lines, targets):
return self.view.execute(lines, targets=targets, block=True)
I think this is a symptom of a broken or bad MPI connection because the line seems to want to execute a command on all the slaves processes. I don't know how to fix it.

Related

Eclipse / PyDev Stop button not working with OSError: [WinError 6] The handle is invalid error

I have the following Eclipse version on Windows 10:
Version: 2020-09 (4.17.0)
Build id: 20200910-1200
I am using PyDev along with it.
In my code I am using selenium to make a number of url calls (web scraping). When it happens that a particular url is not present or at least not present in the way most of the urls I am reading are, I get the following error:
Traceback (most recent call last):
File "C:\Users\foobar\eclipse-workspace\WeatherUndergroundUnderground\historical\BWI_Fetch.py", line 44, in <module>
main(city, month_date, start_year, end_year)
File "C:\Users\foobar\eclipse-workspace\WeatherUndergroundUnderground\historical\BWI_Fetch.py", line 22, in main
driver.get(city_url);
File "C:\Users\foobar\AppData\Local\Programs\Python\Python38\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 333, in get
self.execute(Command.GET, {'url': url})
File "C:\Users\foobar\AppData\Local\Programs\Python\Python38\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\Users\foobar\AppData\Local\Programs\Python\Python38\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: Reached error page: about:neterror
Exception ignored in: <function Popen.__del__ at 0x0000019267429F70>
Traceback (most recent call last):
File "C:\Users\foobar\AppData\Local\Programs\Python\Python38\lib\subprocess.py", line 945, in __del__
self._internal_poll(_deadstate=_maxsize)
File "C:\Users\foobar\AppData\Local\Programs\Python\Python38\lib\subprocess.py", line 1344, in _internal_poll
if _WaitForSingleObject(self._handle, 0) == _WAIT_OBJECT_0:
OSError: [WinError 6] The handle is invalid
When I get this particular error, eclipse is still running and pushing the red stop button does not work to end the program. I can usually use the red stop button for just about any other python program I have written, but this code/error seems to hang things. How can I end the process from within the Eclipse application?
The error in the stack trace is not really related to PyDev, so, the stack trace error is only really fixable in Selenium/Python (the error says that it's trying to access a process which is already dead on the __del__).
Now, related to the reason why PyDev wasn't able to kill it, I think that you probably have some process which spawned a subprocess and is not reachable anymore because the parent process died and thus it's not possible to create a tree to kill that process from the initial process launched in PyDev.
The actual code which does this in PyDev is: https://github.com/fabioz/winp/blob/master/native/winp.cpp#L208
I think that it should be possible to use the windows api to create a JobObject and then AssignProcessToJobObject and on kill also kill the JobObject so that it kills all associated processes so that things are setup in a way that that this doesn't happen, but this isn't currently done.
As a note, usually I have an alias for: taskkill /im python.exe /f (which will kill all the python.exe processes running in the machine) and it's what I usually use in such cases, so, if something like that happens I just kill all the python.exe processes in the machine.
Although note that if you spawned some other process... say, chrome.exe -- in that process tree, that process must also be killed for the initial shell that launched python to be really disposed.
This error message...
Exception ignored in: <function Popen.__del__ at 0x0000019267429F70>
...implies that the builtins module was destroyed before running __del__ in process of garbage collecting.
Hence PyDev is no more able to communicate with the relevant python modules. As a result Stop button isn't functioning and raises the error:
OSError: [WinError 6] The handle is invalid

raspistill returning file not found

I am trying to use the 64 bit version of raspbian (which can be found here: https://www.raspberrypi.org/forums/viewtopic.php?f=117&t=275370
I downloaded it, installed everything, ran my updates and then switched on the camera. But when I try to run it, the PI just gives back
bash: /opt/vc/bin/raspistill: No such file or directory
When I do a ls, I can see the directory fine:
pi#raspberrypi:/opt/vc/bin $ ls
containers_check_frame_int containers_test dtoverlay-pre raspiyuv
containers_datagram_receiver containers_test_bits dtparam tvservice
containers_datagram_sender containers_test_uri edidparser vcdbg
containers_dump_pktfile containers_uri_pipe mmal_vc_diag vcgencmd
containers_rtp_decoder dtmerge raspistill vchiq_test
containers_stream_client dtoverlay raspivid vcmailbox
containers_stream_server dtoverlay-post raspividyuv vcsmem
and when I look at the permissions, there are read/execute permissions for everyone:
-rwxr-xr-x 1 root root 142397 Nov 1 16:25 raspistill
Im at a bit of a loss here - the file is right there, so why is it not being found when I try to call it from the command line?
Unfortunately, it looks like MMAL userland still (at the time of writing this) has some unresolved issues with 64bit raspberry pi OS, so it is disabled.
However, one can use docker or cherry-build 32bit packages as workarounds.

How to debug program compilation when Perl module fails

I'm trying to compile Slic3r 1.2.9 (Git 65a23b) on Raspbian, and running sudo perl Build.PL --verbose fails while building the Perl module Time-HiRes-1.9754:
...
--> Working on Time::HiRes
Fetching http://www.cpan.org/authors/id/J/JH/JHI/Time-HiRes-1.9754.tar.gz ... OK
Configuring Time-HiRes-1.9754 ... FAIL
! Timed out (> 60s). Use --verbose to retry.
! Configure failed for Time-HiRes-1.9754. See /root/.cpanm/work/1520227993.988/build.log for details.
The log file shows a little more information, but I've never worked with Perl and I don't know where to start debugging:
$ tail /root/.cpanm/work/1520234788.2186/build.log
Looking for clock_getres()... found.
Looking for clock_nanosleep()... found.
Looking for clock()... found.
Looking for working futimens()... found.
Looking for working utimensat()... found.
You seem to have subsecond timestamp setting.
Looking for stat() subsecond timestamps...
Trying struct stat st_atimespec.tv_nsec...-> FAIL Timed out (> 60s). Use --verbose to retry.
-> N/A
-> FAIL Configure failed for Time-HiRes-1.9754. See /root/.cpanm/work/1520234788.2186/build.log for details.
I've posted an issue with Slic3r on GitHub, but I haven't had any suggestions yet - presumably it's not actually a problem with Slic3r itself.
What should I do next to work out what's going wrong?

Cannot run magic functions in ipython terminal

I am using Enthought's Canopy environment on a 64 bit Linux OS. Everything works fine in the Ipython console which is attached with the editor. But when I ipython in the terminal and try to use magic functions, I get the following error.
---------------------------------------------------------------------------
error Traceback (most recent call last)
<ipython-input-3-29a4050aa687> in <module>()
----> 1 get_ipython().show_usage()
/home/shahensha/Development/Canopy/appdata/canopy-1.0.3.1262.rh5-x86_64/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in show_usage(self)
2931 def show_usage(self):
2932 """Show a usage message"""
-> 2933 page.page(IPython.core.usage.interactive_usage)
2934
2935 def extract_input_lines(self, range_str, raw=False):
/home/shahensha/Development/Canopy/appdata/canopy-1.0.3.1262.rh5-x86_64/lib/python2.7/site-packages/IPython/core/page.pyc in page(strng, start, screen_lines, pager_cmd)
188 if screen_lines <= 0:
189 try:
--> 190 screen_lines += _detect_screen_size(screen_lines_def)
191 except (TypeError, UnsupportedOperation):
192 print(str_toprint, file=io.stdout)
/home/shahensha/Development/Canopy/appdata/canopy-1.0.3.1262.rh5-x86_64/lib/python2.7/site-packages/IPython/core/page.pyc in _detect_screen_size(screen_lines_def)
112 # Proceed with curses initialization
113 try:
--> 114 scr = curses.initscr()
115 except AttributeError:
116 # Curses on Solaris may not be complete, so we can't use it there
/home/shahensha/Development/Canopy/appdata/canopy-1.0.3.1262.rh5-x86_64/lib/python2.7/curses/__init__.pyc in initscr()
31 # instead of calling exit() in error cases.
32 setupterm(term=_os.environ.get("TERM", "unknown"),
---> 33 fd=_sys.__stdout__.fileno())
34 stdscr = _curses.initscr()
35 for key, value in _curses.__dict__.items():
error: setupterm: could not find terminfo database
So, I installed a bare bones iPython shell which is not the one provided by Canopy and tried the same magic functions in there and it works fine.
Have I done something wrong with the installation? Please help
Thanks a lot
shahensha
This is not a solution, but just an observation. My desktop is MacOS-X and I connect to a Centos machine to run Enthought Canopy both 64 bit. I get the same error message as OP if I ssh from iterm2, but not if I use the Terminal app.
I am not sure what the underlying reason is, but may be someone can verify if a similar situation is true for linux. Interestingly I can use either iterm2 or Terminal on the local canopy without any issues.
Update:
I just noticed that the TERM environment variable in iterm2 was set to "xterm" while the Terminal app was showing "xterm-256color". Issuing the command export TERM="xterm-256color" before running the Canopy ipython in terminal solves the issue for me in iterm2.
Problem reproduction:
$ python -c 'import curses; curses.setupterm()'
Traceback (most recent call last):
File "<string>", line 1, in <module>
_curses.error: setupterm: could not find terminfo database
This irc log gave me the idea that this error was to do with libncursesw.
My Canopy version is 1.0.3.1262.rh5-x86_64. I have installed it to ~/src/canopy.
In ~/src/canopy/appdata/canopy-1.0.3.1262.rh5-x86_64/lib we can see that my canopy install has libncursesw.so.5.7.
My machine (Debian Wheezy 64bit) has libncursesw.so.5.9 (in /lib/x86_64-linux-gnu/libncursesw.so.5.9). I made canopy use this. You can toggle the problem on / off by using LD_PRELOAD and pointing at the .so file.
Solution
Replace libncurses.so.5.7 with libncurses.so.5.9:
CANOPYDIR=$HOME/src/canopy
CANOPYLIBS=$CANOPYDIR/appdata/canopy-1.0.3.1262.rh5-x86_64/lib/
SYSTEMLIBS=/lib/x86_64-linux-gnu
cp $SYSTEMLIBS/libncurses.so.5.9 $CANOPYLIBS
ln -sf $CANOPYLIBS/libncurses.so.5.9 $CANOPYLIBS/libncurses.so.5
It appears that Canopy User Python is not your default. See this article:
https://support.enthought.com/entries/23646538-Make-Canopy-s-Python-be-your-default-Python-i-e-on-the-PATH-
Update: Not true here -- instead, see batu's workaround answer.

Calling sem_open on Solaris as ordinary user

This call fails on Solaris with EACCES when ran as ordinary user:
sem_open(fileName.c_str(), O_CREAT, S_IRWXU | S_IRWXG | S_IRWXO, 1);
When process is started as root, it runs fine. Is this expected behavior?
Environment:
$ uname -a
SunOS solaris 5.11 11.0 i86pc i386 i86pc
$ g++ --version
g++ (GCC) 4.5.2
At the command line try:
prctl $$
These are the system enforced resource limits your process has. Note there are
process.max-sem-ops
process.max-sem-nsems
project.max-sem-ids
These are limits that have a number, if you do not see them (or the limits are already reached) then you have to add them to your account's profile with projadd or projmod to increase them if your project already exists.
If you cannot do this (no root access) consult with your sysadmin, s/he probably has some reason for not allowing semapahore access.
Note carefully:
sempahores are kernel persistent. If you ran your code a bunch of times the sempahores you created are likely still out there.
To see existing semaphores try ipcs -as
To remove lingering sempahores that your code should have removed use ipcrm