MemoryError sending data to ipyparallel engines - ipython

We love Ipython.parallel (now ipyparallel).
There is something that bugs me, though. When sending a ~1.5GB pandas dataframe to a bunch of workers, we get a MemoryError if the cluster has many nodes. It looks like there are as many copies of the dataframe as there are engines (or some proportional number). Is there a way to avoid these copies?
Example:
In[]: direct_view.push({'xy':xy}, block=True)
# or direct_view['xy'] = xy
For a small cluster (e.g. 30 nodes), the memory grows and grows, but eventually the data goes through and all is fine. But for a larger cluster, e.g. 80 nodes (all r3.4xlarge with just 1 engine, not n_core engines), then htop reports growing memory up to the max (123GB) and we get:
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-120-f6a9a69761db> in <module>()
----> 1 get_ipython().run_cell_magic(u'time', u'', u"ipc.direct_view.push({'xy':xy}, block=True)")
/opt/anaconda/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in run_cell_magic(self, magic_name, line, cell)
2291 magic_arg_s = self.var_expand(line, stack_depth)
2292 with self.builtin_trap:
-> 2293 result = fn(magic_arg_s, cell)
2294 return result
2295
(...)
Note, after looking at https://ipyparallel.readthedocs.org/en/latest/details.html, we tried to send just the underlying numpy array (xy.values) in an attempt to have a "non-copying send" but also get MemoryError.
Versions:
Jupyter notebook v.4.0.4
Python 2.7.10
ipyparallel.__version__: 4.0.2

Related

Bitbanging errors on adafruit feather m0 with circuit python

My understanding is that circuit python will support bitbanging by default, but I might be mistaken...
For this example I am using a DHT11 temp and humidity sensor (I have a adafruit stemma/quic one on the way but shipping in Australia is awfully slow)
import board
import adafruit_dht
# Initialize the DHT11 sensor
dht = adafruit_dht.DHT11()
# Read the temperature and humidity
temperature = dht.temperature
humidity = dht.humidity
# Print the values
print("Temperature: {:.1f} C".format(temperature))
print("Humidity: {:.1f} %".format(humidity))
and the error I am being returned is:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "adafruit_dht.py", line 295, in __init__
File "adafruit_dht.py", line 82, in __init__
Exception: Bitbanging is not supported when using CircuitPython.
I have tried using the adafruit_bitbangio library, but I haven't been making much headway with that whatever docs I see are navigating me to either use this, or that is should just work without it.
Any help on this topic would be a huge help!
Thanks.

GridFS put command hangs from pymongo when used within a transaction

I am using GridFS to store some video files in my database. I have updated to MongoDB 4.0 and trying to use the multi collection transaction model. The problem that I am facing is that the put() command gridfs hangs the system. The way I am using it as follows:
client = pymongo.MongoClient(mongo_url)
db = client[db_name]
fs = gridfs.GridFS(db)
Now I try to use the transaction model as follows:
with db.client.start_session() as session:
try:
file_path = "video.mp4"
session.start_transaction()
with open(file_path, 'rb') as f:
fid = self.fs.put(f, metadata={'sequence_id': '0001'})
session.commit_transaction()
except Exception as e:
raise
finally:
session.end_session()
The issue is that the put command hangs for about a minute. It then returns but the commit fails. I have a feeling this is because the session object is not passed to the put command but I do not see any parameter in the help that takes the session as an input. After the hang, the test fails with the following stack:
Traceback (most recent call last):
session.commit_transaction()
File "/Users/xargon/anaconda/envs/deep/lib/python3.6/site-packages/pymongo/client_session.py", line 393, in commit_transaction
self._finish_transaction_with_retry("commitTransaction")
File "/Users/xargon/anaconda/envs/deep/lib/python3.6/site-packages/pymongo/client_session.py", line 457, in _finish_transaction_with_retry
return self._finish_transaction(command_name)
File "/Users/xargon/anaconda/envs/deep/lib/python3.6/site-packages/pymongo/client_session.py", line 452, in _finish_transaction
parse_write_concern_error=True)
File "/Users/xargon/anaconda/envs/deep/lib/python3.6/site-packages/pymongo/database.py", line 514, in _command
client=self.__client)
File "/Users/xargon/anaconda/envs/deep/lib/python3.6/site-packages/pymongo/pool.py", line 579, in command
unacknowledged=unacknowledged)
File "/Users/xargon/anaconda/envs/deep/lib/python3.6/site-packages/pymongo/network.py", line 150, in command
parse_write_concern_error=parse_write_concern_error)
File "/Users/xargon/anaconda/envs/deep/lib/python3.6/site-packages/pymongo/helpers.py", line 155, in _check_command_response
raise OperationFailure(msg % errmsg, code, response)
pymongo.errors.OperationFailure: Transaction 1 has been aborted.
EDIT
I tried replacing the put block as:
try:
gf = self.fs.new_file(metadata={'sequence_id': '0000'})
gf.write(f)
finally:
gf.close()
However, the hang happens again at gf.close()
I also tried instantiating the GridIn directly so that I could provide the session object and this fails as:
gin = gridfs.GridIn(root_collection=self.db["fs.files"], session=session)
gin.write(f)
gin.close()
This fails with the error message:
It is illegal to provide a txnNumber for command createIndexes
The issue is that the put command hangs for about a minute
The first attempt with self.fs.put() does not actually use transactions, it just took a while to upload the file.
Then after the upload was completed while attempting to commit the (empty) transaction, unfortunately the transaction reached the maximum lifetime limit due to the time taken to upload. See transactionLifetimeLimitSeconds. The default limit has been set to 60 seconds to set the maximum transaction runtime.
If you are considering raising this limit, keep in mind that as write volume enters MongoDB after a transactions snapshot has been created, WiredTiger cache pressure builds up. This cache pressure can only be released once a transaction commits. This is the reason behind the 60 second default limit.
It is illegal to provide a txnNumber for command createIndexes
First, operations that affect the database catalog such as creating or dropping a collection or an index, are not allowed in multi-document transactions.
The code for PyMongo GridFS is attempting to create indexes for the GridFS collection, which is prohibited on the server when used with transactional session (You may use session but not transactions).
I have updated to MongoDB 4.0 and trying to use the multi collection transaction model
I'd recommend to use a normal database operations with GridFS. MongoDB multi-document transaction is designed to be used for multi-documents atomicity. Which I don't think it's necessary in the file uploading case.

OSError: exception: access violation reading 0xFFFFFFFE1CD34660 (or generic address) when multi-threading an FMU in Python

I have a question regarding using the parameter_variation.py script provided on GitHub.
I'm using FMPy functions here (https://github.com/CATIA-Systems/FMPy) and have a specific error occurring only when I run a certain FMU, which is only slightly different from other FMU’s I’ve been using with a modified version of the parameter_variation.py example script provided.
Errors:
...
File "c:\parameter_variation.py", line 136, in simulate_fmu
fmu.terminate()
File "C:\AppData\Local\Continuum\anaconda3\lib\site-packages\fmpy\fmi2.py", line 231, in terminate
return self.fmi2Terminate(self.component)
File "C:\AppData\Local\Continuum\anaconda3\lib\site-packages\fmpy\fmi2.py", line 169, in w res = f(*args, **kwargs)
OSError: exception: access violation reading 0xFFFFFFFE1CD34660
End
I’m running 100 simulations for this FMU in 20 chunks, although the same FMU in the parameter_variation.py script appears to provide results if I run less than ~30 simulations in ~6 chunks.
Do you have any guesses why the access violation error may be occurring and how a solution can be forged? Let me know if this is enough information.
Thanks in advance.
In the title you mention multi-threading (multiple instances of the same FMU in the same process) which is not supported by many FMUs and can lead to unexpected side effects (e.g. through access to shared resources). If this is the case here you should be able to run your variation with a synchronized scheduler by setting the variable sync = True in parameter_variation.py (line 27).

Graphite showing rolling gap in data

I recently upgraded one of our Graphite instances from 0.9.2 to 1.1.1, and have since run into an issue where, for the lack of a better word, there is a rolling gap of data.
It shows the last few minutes correctly (I'm guessing what's in carbon cache), and after about 10-15 minutes past, it shows all of the data correctly as well.
However, inside that 10-15 minute gap, it's completely blank. I can see the gap both in Graphite, and in Grafana. It disappears after restarting carbon cache, and then comes back about a day later.
Example screenshot:
This happens for most graphs/dashboards I have.
I've spent a lot of effort optimizing disk IO, so I doubt it to be the case -> Cloudwatch shows 100% burst credit for disk. It's an m3.xlarge instance with 4 cores and 16 GB RAM. Swap file is on ephemeral storage and looks barely utilized.
Using 1 Carbon Cache instance with Whisper backend.
storage_schemas.conf:
[carbon]
pattern = ^carbon\.
retentions = 60:90d
[dumbo]
pattern = ^collectd\.dumbo # load test containers, we don't care about their data
retentions = 300:1
[collectd]
pattern = ^collectd
retentions = 10s:8h,30s:1d,1m:3d,5m:30d,15m:90d
[statsite]
pattern = ^statsite
retentions = 10s:8h,30s:1d,1m:3d,5m:30d,15m:90d
[default_1min_for_1day]
pattern = .*
retentions = 60s:1d
Non-default (or potentially relevant) carbon.conf settings:
[cache]
MAX_CACHE_SIZE = inf
MAX_UPDATES_PER_SECOND = 100 # was slagging disk write IO until I dropped it down from 500
MAX_CREATES_PER_MINUTE = 50
CACHE_WRITE_STRATEGY = sorted
RELAY_METHOD = rules
DESTINATIONS = 127.0.0.1:2004
MAX_DATAPOINTS_PER_MESSAGE = 500
MAX_QUEUE_SIZE = 10000
Graphite local_settings.py
CARBONLINK_TIMEOUT = 10.0
CARBONLINK_QUERY_BULK = True
USE_WORKER_POOL = False
We've seen this with some workloads on 1.1.1, can you try updating carbon to current master? If not 1.1.2 will be released shortly which should solve the problem.

Matlab TCP-IP Server Lossless Transfer

I have tried and failed to implement a TCP-Listen server in Matlab that is "lossless". By lossless, I mean using the linux socat utility to send a file:
socat -u file.bin TCP4:127.0.0.1:50000
And receive a byte-exact match to that file within Matlab:
function t = test
fid = fopen('x','W');
t =tcpip('0.0.0.0',50000,'NetWorkRole','server','InputBufferSize',50*1024^2);
t.BytesAvailableFcnMode = 'byte';
t.BytesAvailableFcnCount = 1024^2;
t.BytesAvailableFcn = {#FCN, fid };
fopen(t);
function FCN( obj, event, fid )
x=fread( igetfield(obj, 'jobject'), obj.BytesAvailable, 0, 0);
fwrite( fid, x(1),'int8' );
I've tested this a good bit, and had decent success in terms of transfer rate (without the fwrite and use /dev/zero for the file, it saturates a gigabit link), and low cpu load. The trick is bypassing Matlab's tcpip() default wrapper*, and accessing a lower-level method via:
igetfield(obj,'jobject')
For the 2139160576 Byte file I test with, it usually receives ~2139153336 bytes. I've tried various other implementations that receive the fread() output into structs, cells, and array concatenation. They also are missing a few KiB's. I've tried a repeating pattern of 512 random bytes; one test had byte mismatch at the beginning.
Socat->Socat transfer works (obviously).
Socat->my-matlab-code holds at fopen() until socat connects. No data is transferred until fread() is called. I've tried throttling the transfer with the linux "pv" utility:
pv -L 10m file.bin | socat -u - TCP4:127.0.0.1:50000
To no effect.
My question is: where are the bytes going, or what should I test next?
(Edited to include further test results):
Outputting to a file is unnecessary, i.e. the fwrite() call. It is easier and faster to execute t=test, wait for the transfer to complete, i.e. the socat client to return, then query the total bytes transferred from within Matlab:
t.BytesAvailable + t.ValuesReceived
On my Windows machine, this value is always less than the file size of 2139160576 bytes. On my Ubuntu machine, occasionally the values equate. Furthermore, when they do not equate, "netstat -s" segments retransmited and packets discarded do not change. Wireshark monitoring of the loopback interface shows a final Matlab/server ACK sequence number of 2139160578. Presumably, 2 more than the file size due to both the server and client incrementing by one.
*As an aside, Matlab's Instrument Control Toolbox implementation of fread is a terrible wrapper around lower-level code I can't see. matlab\toolbox\shared\instrument\#icinterface\fread.m, function localFormatData, LINE 296. All the data types are explicitly cast to double with numeric type conversion. This results in massive cpu load, not to mention lossy conversion between data types.