We recently updated MongoDB from 2.6 to 3.0. Since then, we are having trouble using PyMongo in combination with Multiprocessing.
The issue is that sometimes an operation (e.g. find) within a process hangs for ~30 seconds and then throws an exception "ServerSelectionTimeoutError: No servers found yet".
The behavior seems independent from the input, as our script usually runs just fine for a few times and then hangs randomly.
The log files do not show any entries related to timeouts, nor did I find any useful information on the Internet about this issue.
The script is running in our test environment, meaning there are no replica sets involved and the Mongo instance in bound to localhost.
Here is the stack trace for completeness:
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "somescript.py", line 109, in run
self.find_incoming_cc()
File "somescript.py", line 370, in find_incoming_cc
{'_id': 1, 'cc': 1}
File "/usr/local/lib/python2.7/dist-packages/pymongo/cursor.py", line 983, in next
if len(self.__data) or self._refresh():
File "/usr/local/lib/python2.7/dist-packages/pymongo/cursor.py", line 908, in _refresh
self.__read_preference))
File "/usr/local/lib/python2.7/dist-packages/pymongo/cursor.py", line 813, in __send_message
**kwargs)
File "/usr/local/lib/python2.7/dist-packages/pymongo/mongo_client.py", line 728, in _send_message_with_response
server = topology.select_server(selector)
File "/usr/local/lib/python2.7/dist-packages/pymongo/topology.py", line 121, in select_server
address))
File "/usr/local/lib/python2.7/dist-packages/pymongo/topology.py", line 97, in select_servers
self._error_message(selector))
ServerSelectionTimeoutError: No servers found yet
Now for the question: Is there any known issue/bug when using PyMongo with Multiprocessing? Is there a way to debug the exception?
Thanks for any help!
It is bug in pymongo version 3.0.x. Bug report url https://jira.mongodb.org/browse/PYTHON-961
Workaround for this issue. (Tested in pymongo 3.0.3)
Pass “connect=False” in MongoClient object initialisation
MongoClient(uri, connect=False)
Or simply wait for few seconds before creating instance of MongoClient in the child process (like time.sleep(2)).
def start(uri):
time.sleep(2)
mclient = MongoClient(uri)
mclient.db.collection.find_one()
if __name__ == '__main__':
p = multiprocessing.Process(target=start, args=('mongodb://localhost:27017/',))
p.start()
Related
I was orchestrating two dataflow job with cloud composer and it was working fine for month. Suddenly the two jobs stopped working with the following error message:
in download_blob File
"/usr/local/lib/python3.6/site-packages/google/cloud/storage/client.py",
line 399, in get_bucket retry=retry, File
"/usr/local/lib/python3.6/site-packages/google/cloud/storage/bucket.py",
line 1002, in reload retry=retry, File
"/usr/local/lib/python3.6/site-packages/google/cloud/storage/_helpers.py",
line 225, in reload retry=retry, File
"/usr/local/lib/python3.6/site-packages/google/cloud/storage/_http.py",
line 63, in api_request return call() File
"/usr/local/lib/python3.6/site-packages/google/api_core/retry.py",
line 286, in retry_wrapped_func on_error=on_error, File
"/usr/local/lib/python3.6/site-packages/google/api_core/retry.py",
line 184, in retry_target return target() File
"/usr/local/lib/python3.6/site-packages/google/cloud/_http.py", line
479, in api_request timeout=timeout, File
"/usr/local/lib/python3.6/site-packages/google/cloud/_http.py", line
337, in _make_request method, url, headers, data, target_object,
timeout=timeout File
"/usr/local/lib/python3.6/site-packages/google/cloud/_http.py", line
374, in _do_request return self.http.request( File
"/usr/local/lib/python3.6/site-packages/google/cloud/_http.py", line
157, in http return self._client._http File
"/usr/local/lib/python3.6/site-packages/google/cloud/client.py", line
187, in _http
self._http_internal.configure_mtls_channel(self._client_cert_source)
AttributeError: 'AuthorizedSession' object has no attribute
'configure_mtls_channel'
In the jobs I download a file from google cloud storage with the storage client. I assumed it was because of some dependencies issues. In the composer environment I installed google-cloud-storage without specifying a version. I tried specifying different versions of the package but nothing seems to work.
Thanks!
This seems to be related to this issue.
Try pinning google-cloud-core to 1.5.0, then I highly recommend for you to Drain your jobs once you get them back to work (assuming they have streaming jobs).
Following the steps to configure the titan srever
bin/titan.sh
Forking Cassandra...
Running `nodetool statusthrift`... OK (returned exit status 0 and printed string "running").
Forking Elasticsearch...
Connecting to Elasticsearch (127.0.0.1:9300)... OK (connected to 127.0.0.1:9300).
Forking Gremlin-Server...
Connecting to Gremlin-Server (127.0.0.1:8182)... OK (connected to 127.0.0.1:8182).
Run gremlin.sh to connect.
The server started perfectly but when i am connecting with python and then run the script the error which i got mentioned below
Traceback (most recent call last):
File "/home/admin-12/Documents/bitbucket/ecodrone/ecodrone/GremlinConnector.py", line 28, in <module>
data = (execute_query("""g.V()"""))
File "/home/admin-12/Documents/bitbucket/ecodrone/ecodrone/GremlinConnector.py", line 22, in execute_query
results = future_results.result()
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 432, in result
return self.__get_result()
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/home/admin-12/.local/lib/python3.6/site-packages/gremlin_python/driver/resultset.py", line 81, in cb
f.result()
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 425, in result
return self.__get_result()
File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/admin-12/.local/lib/python3.6/site-packages/gremlin_python/driver/connection.py", line 77, in _receive
self._protocol.data_received(data, self._results)
File "/home/admin-12/.local/lib/python3.6/site-packages/gremlin_python/driver/protocol.py", line 71, in data_received
result_set = results_dict[request_id]
KeyError: None
versioning i am using
titan - 1.0.0
gremlin-python - 3.3.2
apache-tinkerpop-gremlin-server-3.3.1
Titan supports some an extremely old version of TinkerPop and I'm sure you'll find some incompatibility there if you try to use gremlin-python 3.3.2. As Titan is no longer supported, I suggest you upgrade to JanusGraph, a more current and maintained version of Titan.
Periodically all my Celery workers get stuck on something. I cannot figure out what is causing this, as inspect doesn't work as all the workers are busy.
celery inspect active
Error: No nodes replied within time constraint
Is it possible to get Celery status, like active tasks, even if nodes are doing something (that seems to be causing problems)? Can I somehow spin up a temporary worker just to get inspect output?
What kind of other strategies there would be to diagnose this issue?
Celery 4.x. Redis backend.
This turned out to be a deadlock issue with Celery + gevent (evil monkey patch) + Sentry's Raven logger.
https://github.com/getsentry/raven-python/issues/305
To diagnose issues
You can start Celery workers with different queues (-q, -n) parameters and see when workers hang. Even if some worker groups are hung the others still may respond to inspect queries.
Celery file logs may reveal the error
2017-02-27 08:36:34,371 CRITI [celery.worker][DummyThread-6] Unrecoverable error: AttributeError("'NoneType' object has no attribute 'readline'",)
Traceback (most recent call last):
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/celery/worker/worker.py", line 203, in start
self.blueprint.start(self)
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/celery/bootsteps.py", line 370, in start
return self.obj.start()
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/celery/worker/consumer/consumer.py", line 318, in start
blueprint.start(self)
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/celery/worker/consumer/consumer.py", line 594, in start
c.loop(*c.loop_args())
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/celery/worker/loops.py", line 118, in synloop
connection.drain_events(timeout=2.0)
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/kombu/connection.py", line 301, in drain_events
return self.transport.drain_events(self.connection, **kwargs)
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/kombu/transport/virtual/base.py", line 961, in drain_events
get(self._deliver, timeout=timeout)
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/kombu/transport/redis.py", line 359, in get
ret = self.handle_event(fileno, event)
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/kombu/transport/redis.py", line 341, in handle_event
return self.on_readable(fileno), self
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/kombu/transport/redis.py", line 337, in on_readable
chan.handlers[type]()
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/kombu/transport/redis.py", line 714, in _brpop_read
**options)
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/redis/client.py", line 585, in parse_response
response = connection.read_response()
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/redis/connection.py", line 577, in read_response
response = self._parser.read_response()
File "/srv/pyramid/xxx/venv/lib/python3.5/site-packages/redis/connection.py", line 238, in read_response
response = self._buffer.readline()
AttributeError: 'NoneType' object has no attribute 'readline'
I have an python application which creates number of threads for a job. each thread connects to mongodb and retrieve data. Number of allowed connection to mongodb is 200 which I'm taking care using semaphore. And once mongo querying job is done each thread closes mongodb connection. But while executing this application I'm getting same error for all threads. Error is:
Traceback (most recent call last):
File "C:\Python34\lib\threading.py", line 921, in _bootstrap_inner
self.run()
File "C:\Python34\lib\threading.py", line 869, in run
self._target(*self._args, **self._kwargs)
File "C:/path/pytest/under_construction/testAlgo.py", line 95, in sample_thread
status=monObj.process_status(list_value1,list_value2,5,120,120)
File "C:\path\pytest\under_construction\mongo_lib.py", line 153, in process_status
result=self.mongo_result('Submission','find',q={})
File "C:\path\pytest\under_construction\mongo_lib.py", line 53, in mongo_result
result=list(_query[query_type.lower()](query_string[keys]))
File "C:\Python34\lib\site-packages\pymongo\cursor.py", line 1076, in __next__
if len(self.__data) or self._refresh():
File "C:\Python34\lib\site-packages\pymongo\cursor.py", line 1037, in _refresh
limit, self.__id))
File "C:\Python34\lib\site-packages\pymongo\cursor.py", line 933, in __send_message
res = client._send_message_with_response(message, **kwargs)
File "C:\Python34\lib\site-packages\pymongo\mongo_client.py", line 1205, in _send_message_with_response
response = self.__send_and_receive(message, sock_info)
File "C:\Python34\lib\site-packages\pymongo\mongo_client.py", line 1182, in __send_and_receive
return self.__receive_message_on_socket(1, request_id, sock_info)
File "C:\Python34\lib\site-packages\pymongo\mongo_client.py", line 1174, in __receive_message_on_socket
return self.__receive_data_on_socket(length - 16, sock_info)
File "C:\Python34\lib\site-packages\pymongo\mongo_client.py", line 1153, in __receive_data_on_socket
chunk = sock_info.sock.recv(length)
MemoryError
Code for creating mongo connection
client=MongoClient(mc_name,port)
I was thinking, is this error due to results of all threads accumulating at one port of machine running my application?
MongoClient is a thread-safe connection pool, so you should be creating a single instance that's shared by all the worker threads rather than having each thread create its own.
The connection pool size defaults to 100, but if you want to make it even larger you can use the maxPoolSize parameter to do that (e.g. maxPoolSize=200).
In what circumstances would redis-py raise the following AttributeError exception?
Isn't redis-py built by design to raise only redis.exceptions.RedisError based exceptions?
What would be a reasonable handling logic?
Traceback (most recent call last):
File "c:\Python27\Lib\threading.py", line 551, in __bootstrap_inner
self.run()
File "c:\Python27\Lib\threading.py", line 504, in run
self.__target(*self.__args, **self.__kwargs)
File C:\Users\Administrator\Documents\my_proj\my_module.py", line 33, in inner
ret = protected_func(*args, **kwargs)
File C:\Users\Administrator\Documents\my_proj\my_module.py", line 104, in _listen
for message in _pubsub.listen():
File "C:\Users\Administrator\virtual_environments\my_env\lib\site-packages\redis\client.py", line 1555, in listen
r = self.parse_response()
File "C:\Users\Administrator\virtual_environments\my_env\lib\site-packages\redis\client.py", line 1499, in parse_response
response = self.connection.read_response()
File "C:\Users\Administrator\virtual_environments\my_env\lib\site-packages\redis\connection.py", line 306, in read_response
response = self._parser.read_response()
File "C:\Users\Administrator\virtual_environments\my_env\lib\site-packages\redis\connection.py", line 104, in read_response
response = self.read()
File "C:\Users\Administrator\virtual_environments\my_env\lib\site-packages\redis\connection.py", line 89, in read
return self._fp.readline()[:-2]
AttributeError: 'NoneType' object has no attribute 'readline'
seems like an old question, but I faced the same problem recently.
My setup was using celery with redis as a broker. A ThreadPoolExecutor uses the shared celery object to batch tasks to workers. The batcher function waits for the submitted tasks to finish using celery.result.ResultSet.
After quick investigations, I found that celery somewhere uses a pub/sub mechanism to wait for the tasks to finish. And that is it, pub/sub don't play well with thread-safety per the official readme https://github.com/andymccurdy/redis-py#thread-safety
Honestly, I didn't try to prove my theory and fixed my problem by switching to a ProcessPoolExecutor instead.