I encounter weird behavior of an ipython cluster. The calculations finish, but many results never reach the client (and the engines just idle after finishing their first calculation).
I suspect something is wrong with zmq because 1) from time to time I see the following error:
File "/data/misc/nano/python/env_stable/lib/python2.7/site-packages/IPython/parallel/client/asyncresult.py", line 118, in get
if not self.ready():
File "/data/misc/nano/python/env_stable/lib/python2.7/site-packages/IPython/parallel/client/asyncresult.py", line 132, in ready
self.wait(0)
File "/data/misc/nano/python/env_stable/lib/python2.7/site-packages/IPython/parallel/client/asyncresult.py", line 142, in wait
self._ready = self._client.wait(self.msg_ids, timeout)
File "/data/misc/nano/python/env_stable/lib/python2.7/site-packages/IPython/parallel/client/client.py", line 1058, in wait
self.spin()
File "/data/misc/nano/python/env_stable/lib/python2.7/site-packages/IPython/parallel/client/client.py", line 1015, in spin
self._flush_results(self._task_socket)
File "/data/misc/nano/python/env_stable/lib/python2.7/site-packages/IPython/parallel/client/client.py", line 814, in _flush_results
idents,msg = self.session.recv(sock, mode=zmq.NOBLOCK)
File "/data/misc/nano/python/env_stable/lib/python2.7/site-packages/IPython/zmq/session.py", line 642, in recv
idents, msg_list = self.feed_identities(msg_list, copy)
File "/data/misc/nano/python/env_stable/lib/python2.7/site-packages/IPython/zmq/session.py", line 673, in feed_identities
idx = msg_list.index(DELIM)
ValueError: '<IDS|MSG>' is not in list
Additionally IPython.zmq has two test failures:
======================================================================
ERROR: test_send (IPython.zmq.tests.test_session.TestSession)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/clusterdata/python/env_stable/lib/python2.7/site-packages/IPython/zmq/tests/test_session.py", line 76, in test_send
socket = MockSocket(zmq.Context.instance(),zmq.PAIR)
File "/clusterdata/python/env_stable/lib/python2.7/site-packages/IPython/zmq/tests/test_session.py", line 34, in __init__
self.data = []
File "/clusterdata/python/env_stable/lib/python2.7/site-packages/zmq/sugar/attrsettr.py", line 38, in __setattr__
self.__class__.__name__, upper_key)
AttributeError: MockSocket has no such option: DATA
======================================================================
ERROR: test_send (IPython.zmq.tests.test_session.TestSession)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/clusterdata/python/env_stable/lib/python2.7/site-packages/zmq/tests/__init__.py", line 108, in tearDown
raise RuntimeError("context could not terminate, open sockets likely remain in test")
RuntimeError: context could not terminate, open sockets likely remain in test
----------------------------------------------------------------------
I use pyzmq 13.0.0 (as installed by pip), and the zeromq 3.2.2, compiled by the setup of pyzmq. I use ipython 13.1 and python 2.7.3.
Any suggestions of what could this be, and if not how I could figure out more information why these errors occur?
Update: It turns out the slowdown was due to a long task queue of ipcontroller, which was then taking 100% CPU and lagging horribly. That is a separate issue, but I would still appreciate feedback on the above.
Answered by #minrk in comments. ZMQ errors were unimportant, performance was due to scheduling, and was solved by setting TaskScheduler.hwm=0.
Related
I'm running snakemake on fairly large workflows. Somewhat randomly I get errors like
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/prog/Python/3.7.9-foss-2018a/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/usr/prog/Python/3.7.9-foss-2018a/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "<home>/.pip/CentOS/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 1069, in _wait_for_jobs
status = job_status(active_job)
File "<home>/.pip/CentOS/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 1051, in job_status
os.remove(active_job.jobscript)<current working dir>/.snakemake/tmp.0w7jh5bc/snakejob.<name of rule>.6868.sh'
It happens somewhat randomly and restarting the workflow usually resolves the problem. I think this might be due to filesystem latency, however the latency-wait flag seems to work only on output files. Is there a way to make snakemake wait for jobscripts as well?
With my project settled, I have exported it to GeoPDF and would also like to create an HTML version.
Added qgis2web in the Plugins Manager, re-started QGIS.
When I choose Web/qgis2web/create a web map, the program gives me a spinning mouse for 2-3 minutes, then displays some Python errors, which are beyond my pay-grade.
QGIS 3.15
GDAL 3
Python 3.7
OSX High Sierra (10.13.6)
Mac mini mid 2011
2.3 ghz i5
16 GB RAM
The errors I receive look like this:
An alternative, ballpark-only transform was used when transforming coordinates between EPSG:26919 - NAD83 / UTM zone 19N and EPSG:3857 - WGS 84 / Pseudo-Mercator. The results may not match those obtained by using the preferred operation:
Possibly an incorrect choice of operation was made for transformations between these reference systems. Check the Project Properties and ensure that the selected transform operations are applicable over the whole extent of the current project.
2020-10-25T00:43:03 WARNING Python error : An error has occurred while executing Python code: See message log (Python Error) for more details.
2020-10-25T00:43:03 WARNING Used a ballpark transform from EPSG:26919 to EPSG:3857
Python Error
2020-10-25T00:43:03 WARNING Traceback (most recent call last):
File "/Users/house/Library/Application Support/QGIS/QGIS3/profiles/default/python/plugins/qgis2web/utils.py", line 442, in exportRaster
"OUTPUT": out_raster})
File "/Applications/QGIS.app/Contents/MacOS/../Resources/python/plugins/processing/tools/general.py", line 108, in run
return Processing.runAlgorithm(algOrName, parameters, onFinish, feedback, context)
File "/Applications/QGIS.app/Contents/MacOS/../Resources/python/plugins/processing/core/Processing.py", line 153, in runAlgorithm
raise QgsProcessingException(msg)
_core.QgsProcessingException: Unable to execute algorithm
Could not load source layer for INPUT: /var/folders/kn/4rm9tz_s6y39mr0rvfjjq3k80000gp/T/small_point_nw121603600920_piped_3857.tif not found
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/house/Library/Application Support/QGIS/QGIS3/profiles/default/python/plugins/qgis2web/qgis2web.py", line 59, in run
self.dlg = MainDialog(self.iface)
File "/Users/house/Library/Application Support/QGIS/QGIS3/profiles/default/python/plugins/qgis2web/maindialog.py", line 159, in __init__
self.autoUpdatePreview()
File "/Users/house/Library/Application Support/QGIS/QGIS3/profiles/default/python/plugins/qgis2web/maindialog.py", line 334, in autoUpdatePreview
self.previewMap()
File "/Users/house/Library/Application Support/QGIS/QGIS3/profiles/default/python/plugins/qgis2web/maindialog.py", line 337, in previewMap
preview_file = self.createPreview()
File "/Users/house/Library/Application Support/QGIS/QGIS3/profiles/default/python/plugins/qgis2web/maindialog.py", line 300, in createPreview
dest_folder=utils.tempFolder()).index_file
File "/Users/house/Library/Application Support/QGIS/QGIS3/profiles/default/python/plugins/qgis2web/olwriter.py", line 91, in write
folder=dest_folder)
File "/Users/house/Library/Application Support/QGIS/QGIS3/profiles/default/python/plugins/qgis2web/olwriter.py", line 131, in writeOL
popup, json, restrictToExtent, extent, feedback, matchCRS)
File "/Users/house/Library/Application Support/QGIS/QGIS3/profiles/default/python/plugins/qgis2web/utils.py", line 237, in exportLayers
exportRaster(layer, count, layersFolder, feedback, iface, matchCRS)
File "/Users/house/Library/Application Support/QGIS/QGIS3/profiles/default/python/plugins/qgis2web/utils.py", line 444, in exportRaster
shutil.copyfile(piped_3857, out_raster)
File "/Applications/QGIS.app/Contents/MacOS/../Resources/python/shutil.py", line 120, in copyfile
with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/kn/4rm9tz_s6y39mr0rvfjjq3k80000gp/T/small_point_nw121603600920_piped_3857.tif'
I had a similar error on QGIS 3.20.0 "Odense" (I posted the whole error message at the end of this blog post). My solution was to export the map on qgis2web plugin through QGIS 3.16 "Hannover" (LTS), this way, the algorithm ran without issues.
Following the steps to configure the titan srever
bin/titan.sh
Forking Cassandra...
Running `nodetool statusthrift`... OK (returned exit status 0 and printed string "running").
Forking Elasticsearch...
Connecting to Elasticsearch (127.0.0.1:9300)... OK (connected to 127.0.0.1:9300).
Forking Gremlin-Server...
Connecting to Gremlin-Server (127.0.0.1:8182)... OK (connected to 127.0.0.1:8182).
Run gremlin.sh to connect.
The server started perfectly but when i am connecting with python and then run the script the error which i got mentioned below
Traceback (most recent call last):
File "/home/admin-12/Documents/bitbucket/ecodrone/ecodrone/GremlinConnector.py", line 28, in <module>
data = (execute_query("""g.V()"""))
File "/home/admin-12/Documents/bitbucket/ecodrone/ecodrone/GremlinConnector.py", line 22, in execute_query
results = future_results.result()
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 432, in result
return self.__get_result()
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/home/admin-12/.local/lib/python3.6/site-packages/gremlin_python/driver/resultset.py", line 81, in cb
f.result()
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 425, in result
return self.__get_result()
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/admin-12/.local/lib/python3.6/site-packages/gremlin_python/driver/connection.py", line 77, in _receive
self._protocol.data_received(data, self._results)
File "/home/admin-12/.local/lib/python3.6/site-packages/gremlin_python/driver/protocol.py", line 71, in data_received
result_set = results_dict[request_id]
KeyError: None
versioning i am using
titan - 1.0.0
gremlin-python - 3.3.2
apache-tinkerpop-gremlin-server-3.3.1
Titan supports some an extremely old version of TinkerPop and I'm sure you'll find some incompatibility there if you try to use gremlin-python 3.3.2. As Titan is no longer supported, I suggest you upgrade to JanusGraph, a more current and maintained version of Titan.
Periodically all my Celery workers get stuck on something. I cannot figure out what is causing this, as inspect doesn't work as all the workers are busy.
celery inspect active
Error: No nodes replied within time constraint
Is it possible to get Celery status, like active tasks, even if nodes are doing something (that seems to be causing problems)? Can I somehow spin up a temporary worker just to get inspect output?
What kind of other strategies there would be to diagnose this issue?
Celery 4.x. Redis backend.
This turned out to be a deadlock issue with Celery + gevent (evil monkey patch) + Sentry's Raven logger.
https://github.com/getsentry/raven-python/issues/305
To diagnose issues
You can start Celery workers with different queues (-q, -n) parameters and see when workers hang. Even if some worker groups are hung the others still may respond to inspect queries.
Celery file logs may reveal the error
2017-02-27 08:36:34,371 CRITI [celery.worker][DummyThread-6] Unrecoverable error: AttributeError("'NoneType' object has no attribute 'readline'",)
Traceback (most recent call last):
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/celery/worker/worker.py", line 203, in start
self.blueprint.start(self)
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/celery/bootsteps.py", line 370, in start
return self.obj.start()
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/celery/worker/consumer/consumer.py", line 318, in start
blueprint.start(self)
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/celery/worker/consumer/consumer.py", line 594, in start
c.loop(*c.loop_args())
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/celery/worker/loops.py", line 118, in synloop
connection.drain_events(timeout=2.0)
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/kombu/connection.py", line 301, in drain_events
return self.transport.drain_events(self.connection, **kwargs)
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/kombu/transport/virtual/base.py", line 961, in drain_events
get(self._deliver, timeout=timeout)
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/kombu/transport/redis.py", line 359, in get
ret = self.handle_event(fileno, event)
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/kombu/transport/redis.py", line 341, in handle_event
return self.on_readable(fileno), self
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/kombu/transport/redis.py", line 337, in on_readable
chan.handlers[type]()
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/kombu/transport/redis.py", line 714, in _brpop_read
**options)
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/redis/client.py", line 585, in parse_response
response = connection.read_response()
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/redis/connection.py", line 577, in read_response
response = self._parser.read_response()
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/redis/connection.py", line 238, in read_response
response = self._buffer.readline()
AttributeError: 'NoneType' object has no attribute 'readline'
In what circumstances would redis-py raise the following AttributeError exception?
Isn't redis-py built by design to raise only redis.exceptions.RedisError based exceptions?
What would be a reasonable handling logic?
Traceback (most recent call last):
File "c:\Python27\Lib\threading.py", line 551, in __bootstrap_inner
self.run()
File "c:\Python27\Lib\threading.py", line 504, in run
self.__target(*self.__args, **self.__kwargs)
File C:\Users\Administrator\Documents\my_proj\my_module.py", line 33, in inner
ret = protected_func(*args, **kwargs)
File C:\Users\Administrator\Documents\my_proj\my_module.py", line 104, in _listen
for message in _pubsub.listen():
File "C:\Users\Administrator\virtual_environments\my_env\lib\site-packages\redis\client.py", line 1555, in listen
r = self.parse_response()
File "C:\Users\Administrator\virtual_environments\my_env\lib\site-packages\redis\client.py", line 1499, in parse_response
response = self.connection.read_response()
File "C:\Users\Administrator\virtual_environments\my_env\lib\site-packages\redis\connection.py", line 306, in read_response
response = self._parser.read_response()
File "C:\Users\Administrator\virtual_environments\my_env\lib\site-packages\redis\connection.py", line 104, in read_response
response = self.read()
File "C:\Users\Administrator\virtual_environments\my_env\lib\site-packages\redis\connection.py", line 89, in read
return self._fp.readline()[:-2]
AttributeError: 'NoneType' object has no attribute 'readline'
seems like an old question, but I faced the same problem recently.
My setup was using celery with redis as a broker. A ThreadPoolExecutor uses the shared celery object to batch tasks to workers. The batcher function waits for the submitted tasks to finish using celery.result.ResultSet.
After quick investigations, I found that celery somewhere uses a pub/sub mechanism to wait for the tasks to finish. And that is it, pub/sub don't play well with thread-safety per the official readme https://github.com/andymccurdy/redis-py#thread-safety
Honestly, I didn't try to prove my theory and fixed my problem by switching to a ProcessPoolExecutor instead.