ipython notebook and leaking file descriptors - sockets

I'm having problems with leaking file descriptors in code I have running in ipython notebook. I'm downloading lots of files with urllib2 and saving them locally. Apparently, urllib2 has a history of leaking file descriptors, which I suspect is causing problem. In the end, I get an IoError: Too many open files.
As a workaround, I periodically close a bunch of sockets using os.close. Unfortunately, ipython notebook has lots of sockets running which I don't want to close.
Is there a way that I can identify which file descriptors/sockets/etc.. belong to ipython?

This isn't really an answer, but a couple of workarounds in case others find themselves here with leaking file descriptor problems.
The first workaround, which is probably better, is to use subprocess.call() to download the files I want with wget. It's been about 4 times as fast as the method below.
The second workaround is to use a couple of handy functions I found on SO (which I can't find at the moment - if you find it, edit this or let me know and I'll link):
import resource
import fcntl
import os
def get_open_fds():
fds = []
soft, hard = resource.getrlimit(resource.RLIMIT_NOFILE)
for fd in range(0, soft):
try:
flags = fcntl.fcntl(fd, fcntl.F_GETFD)
except IOError:
continue
fds.append(fd)
return fds
def get_file_names_from_file_number(fds):
names = []
for fd in fds:
names.append(os.readlink('/proc/self/fd/%d' % fd))
return names
With these, I store the active file descriptors and corresponding names before I start downloading files. I then periodically test the number of open file descriptors, and if it's getting dangerously large, use os.close() on all the ones that aren't in the original list (I check the names too - descriptors themselves get recycled).
It's ugly, and occasionally ipython notebook complains with things like "can't save history" (presumably I've clobbered something it was using), but it is working pretty well otherwise.

Related

Do I need to manually close a HDF5-file?

Do I understand correctly that HDF5-files should be manually closed like this:
import h5py
file = h5py.File('test.h5', 'r')
...
file.close()
From the documentation: "HDF5 files work generally like standard Python file objects. They support standard modes like r/w/a, and should be closed when they are no longer in use.".
But I wonder: will the garbage collection evoke file.close() when the script terminates or when file is overwritten?
This was answered in the comments a long time ago by #kcw78, but I thought I might as well write it up as a quick answer for anyone else reaching this.
As #kcw78 says, you should explicitly close files when you are done with them by calling file.close(). From previous experience, I can tell you that h5py files are usually closed properly anyway when the script terminates, but occasionally the files would be corrupt (although I'm not sure if that ever happens when in 'r' mode only). Better not to leave it to chance!
As #kcw78 also suggests, using a context manager is a good way to go if you want to be safe. In either case, you need to be careful to actually extract the data you want before letting the file close.
e.g.
import h5py
with h5py.File('test.h5', 'w') as f:
f['data'] = [1,2,3]
# Letting the file close and reopening in read only mode for example purposes
with h5py.File('test.h5', 'r') as f:
dataset = f.get('data') # get the h5py.Dataset
data = dataset[:] # Copy the array into memory
print(dataset.shape, data.shape) # appear to behave the same
print(dataset[0], data[0]) # appear to behave the same
print(data[0], data.shape) # Works same as above
print(dataset[0], dataset.shape) # Raises ValueError: Not a dataset
dataset[0] raises an error here because dataset was an instance of h5py.Dataset which was associated with f and was closed at the same time f was closed. Whereas data is just a numpy array containing only the data part of the dataset (i.e. no additional attributes).

Opening .py files with micropython on TI Nspire

I uploaded Fabian Vogt's micropython port to my TI Nspire CX CAS, together with a couple of *.py.tns files to try. I can't find a way to load/launch those files.
As micropython does not include the os module, I can't use os.chdir to change the current directory and load the *.py files from the python shell. I tried from python shell: open("documents/mydirectory/myfile")
with different extensions .py or .py.tns, without success.
I don't think the Nspire has anything like the terminal commmand line either.
Thanks for your help,
There are 2 ways that you could do this, one easy way and one tedious way.
1. Map .py to micropython in your ndless.cfg
(ndless.cfg should be at /documents/ndless/ndless.cfg)
Like so:
ext.xxx=program-name
ext.xxx=program-name
ext.txt=nTxt
ext.py=micropython
ext.xxx=program-name
ext.xxx=program-name
You can edit this file either by copying it back and forth from your computer using TiLP or the official software, or you can edit it on-calc using nTxt. (This requires a bit of fiddling with making a copy of ndless.cfg so that the mappings still exist to open the copied file ndless.txt).
Ndless should come with a standard ndless.cfg containing basic bindings for nTxt and a few popular emulators. If you don't have one, get the standard one here. It will scan all directories (at least /documents/*, AFAIK) for programs. I've found that removing lines related to programs not on your Nspire will decrease load time.
2. Proper way to run a file in Python
To run a file in Python, you should do something like this:
with open("/documents/helloworld.py.tns","r") as file:
exec(file.read())
This will properly close the file after executing, which I've noticed is quite important on the Nspire, as leaving files open has given me trouble before. Of course, if you'd like, you can do exec(open("...","r").read()) and then handle closing the file yourself, but be warned: bad things can happen if you forget.
Also, you must remember to add the leading / and the .tns extension, or else strange things will happen, especially with writing to files.
That's about it! Feel free to ask more questions if needed, I'll be watching the ti-nspire tag.
(Just realized this question is quite old, but I guess it still might be helpful for others who end up on empty questions months later while trying to figure something out :P)

Emacs: trying to write something after saving provokes message "file changed on disk. Really edit the buffer?"

Emacs 24 in Ubuntu 14.
I have file opened only in emacs, and it gives me this constantly, after each saving. that is annoying.
This is strange, because earlier everything worked fine. I can hardly guess what could I break during this time. I'am total newbie in Ubuntu, using it according to instructions found in internet.
Now I'm using emacs 23, everything is fine. I guess, I need auto-syncronization of opened buffer with saved file right after saving. Anyway, how can I fix it?
It sounds like some other program on your computer is reading the file when it changes, and possibly even introducing changes (perhaps just to the modification time, rather than to the contents). It's hard to say off-hand just what that would be.
A workaround try M-x global-auto-revert-mode. It will only auto-revert if you have no local modification since the last saving. This is generally a nice mode to turn on if you use multiple editors, and I keep it enabled all the time.
Other ideas:
Check if any other process currently has the file open using fuser /path/to/filename.txt (note: it only shows open file descriptors, not processes that hold the file content in memory and write it later)
Do you use any non-standard filesystem? (check with df -h /path/to/filename.txt and mount)
Is your system time stable? (Manually check date, scan the output of dmesg for obvious errors concerning timekeeping, and look for errors related to NTP in the logfiles in /var/log/.

use matlab to open a file with an outside program and execute 'save as'

Alright, here's what I'm dealing with (you can skip to TLDR if all you need to see is what I want to run):
I'm having an issue with file formatting for a nasty conglomeration of several ancient programs I've strung together. I have some data in .CSV format, and I need to put it into .SPC format. I've tried a set of proprietary MATLAB programs called 'GS tools' for fast and easy conversion, but fast and easy doesn't look like its gonna happen here since there are discrepancies in how .spc files are organized now and how they were organized back when my ancient programs were written.
If I could find the source code for the old programs I could probably alter the GS tools code to write my .spc files appropriately, but all I can find are broken links circa 2002 and earlier. Seeing as I don't know what my programs are looking for, I have no choice but to try resaving my data with other programs until one of them produces something workable.
I found my Cinderella program: if I open the data I have in a program called Spekwin and save the file with a .spc extension... viola! Everything else runs on those files. The problem is that I have hundreds of these files and I'd like to automate the conversion process.
I either need to extract the writing rubric Spekwin uses for .spc files (I believe that info is stored in a dll file within the program, but I'm not sure if that actually makes sense) and use it as a rule to write a file from my input data, or I need a piece of code that will open a file with Spekwin, tell Spekwin to save that file under the .spc extension, and terminate Spekwin.
TLDR: Need a command that tells the computer to open a file with a certain program, save that file under a different extension through that program (essentially open*.csv>save as>*.spc), then terminate the program.
OR--I need a way to tell MATLAB to write a file according to rules specified by a .dll, but I'm not sure I fully understand what that entails.
Of course I'm open to suggestions on other ways to handle this.

Uncompress OpenOffice files for better storage in version control

I've heard discussion about how OpenOffice (ODF) files are compressed zip files of XML and other data. So making a tiny change to the file can potentially totally change the data, so delta compression doesn't work well in version control systems.
I've done basic testing on an OpenOffice file, unzipping it and then rezipping it with zero compression. I used the Linux zip utility for my testing. OpenOffice will still happily open it.
So I'm wondering if it's worth developing a small utility to run on ODF files each time just before I commit to version control. Any thoughts on this idea? Possible better alternatives?
Secondly, what would be a good and robust way to implement this little utility? Bash shell that calls zip (probably Linux only)? Python? Any gotchas you can think of? Obviously I don't want to accidentally mangle a file, and there are several ways that could happen.
Possible gotchas I can think of:
Insufficient disk space
Some other permissions issue that prevents writing the file or temporary files
ODF document is encrypted (probably should just leave these alone; the encryption probably also causes large file changes and thus prevents efficient delta compression)
First, version control system you want to use should support hooks which are invoked to transform file from version in repository to the one in working area, like for example clean / smudge filters in Git from gitattributes.
Second, you can find such filter, instead of writing one yourself, for example rezip from "Management of opendocument (openoffice.org) files in git" thread on git mailing list (but see warning in "Followup: management of OO files - warning about "rezip" approach"),
You can also browse answers in "Tracking OpenOffice files/other compressed files with Git" thread, or try to find the answer inside "[PATCH 2/2] Add keyword unexpansion support to convert.c" thread.
Hope That Helps
You may consider to store documents in FODT-format - flat XML format.
This is relatively new alternative solution available.
Document is just stored unzipped.
More info is available at https://wiki.documentfoundation.org/Libreoffice_and_subversion.
I've modified the python program in Craig McQueen's answer just a bit. Changes include:
Actually checking the return of testZip (according to the docs, it appears that the original program will happily proceed with a corrupt zip file past the checkzip step).
Rewrite the for-loop to check for already-uncompressed files to be a single if-statement.
Here is the new program:
#!/usr/bin/python
# Note, written for Python 2.6
import sys
import shutil
import zipfile
# Get a single command-line argument containing filename
commandlineFileName = sys.argv[1]
backupFileName = commandlineFileName + ".bak"
inFileName = backupFileName
outFileName = commandlineFileName
checkFilename = commandlineFileName
# Check input file
# First, check it is valid (not corrupted)
checkZipFile = zipfile.ZipFile(checkFilename)
if checkZipFile.testzip() is not None:
raise Exception("Zip file is corrupted")
# Second, check that it's not already uncompressed
if all(f.compress_type==zipfile.ZIP_STORED for f in checkZipFile.infolist()):
raise Exception("File is already uncompressed")
checkZipFile.close()
# Copy to "backup" file and use that as the input
shutil.copy(commandlineFileName, backupFileName)
inputZipFile = zipfile.ZipFile(inFileName)
outputZipFile = zipfile.ZipFile(outFileName, "w", zipfile.ZIP_STORED)
# Copy each input file's data to output, making sure it's uncompressed
for fileObject in inputZipFile.infolist():
fileData = inputZipFile.read(fileObject)
outFileObject = fileObject
outFileObject.compress_type = zipfile.ZIP_STORED
outputZipFile.writestr(outFileObject, fileData)
outputZipFile.close()
Here's another program I stumbled across: store_zippies_uncompressed by Mirko Friedenhagen.
The wiki also shows how to integrate it with Mercurial.
Here is a Python script that I've put together. It's had minimal testing so far. I've done basic testing in Python 2.6. But I prefer the idea of Python in general because it should abort with an exception if any error occurs, whereas a bash script may not.
This first checks that the input file is valid and not already uncompressed. Then it copies the input file to a "backup" file with ".bak" extension. Then it uncompresses the original file, overwriting it.
I'm sure there are things I've overlooked. Please feel free to give feedback.
#!/usr/bin/python
# Note, written for Python 2.6
import sys
import shutil
import zipfile
# Get a single command-line argument containing filename
commandlineFileName = sys.argv[1]
backupFileName = commandlineFileName + ".bak"
inFileName = backupFileName
outFileName = commandlineFileName
checkFilename = commandlineFileName
# Check input file
# First, check it is valid (not corrupted)
checkZipFile = zipfile.ZipFile(checkFilename)
checkZipFile.testzip()
# Second, check that it's not already uncompressed
isCompressed = False
for fileObject in checkZipFile.infolist():
if fileObject.compress_type != zipfile.ZIP_STORED:
isCompressed = True
if isCompressed == False:
raise Exception("File is already uncompressed")
checkZipFile.close()
# Copy to "backup" file and use that as the input
shutil.copy(commandlineFileName, backupFileName)
inputZipFile = zipfile.ZipFile(inFileName)
outputZipFile = zipfile.ZipFile(outFileName, "w", zipfile.ZIP_STORED)
# Copy each input file's data to output, making sure it's uncompressed
for fileObject in inputZipFile.infolist():
fileData = inputZipFile.read(fileObject)
outFileObject = fileObject
outFileObject.compress_type = zipfile.ZIP_STORED
outputZipFile.writestr(outFileObject, fileData)
outputZipFile.close()
This is in a Mercurial repository in BitBucket.
If you don't need the storage savings, but just want to be able to diff OpenOffice.org files stored in your version control system, you can use the instructions on the oodiff page, which tells how to make oodiff the default diff for OpenDocument formats under git and mercurial. (It also mentions SVN, but it's been so long since I used SVN regularly I'm not sure if those are instructions or limitations.)
(I found this using Mirko Friedenhagen's page (cited by Craig McQueen above))