google.api_core.exceptions.ServiceUnavailable: 503 Deadline Exceeded - publish-subscribe

google.api_core.exceptions.ServiceUnavailable: 503 Deadline Exceeded
using python 3.7 ,google-cloud-pubsub ==1.1.0 publishing data on topic. In my local machine it's working perfectly fine and able to publish data on that topic and also able to pull data from that topic through subscriber.
but don't understand it's not working when i deploy the code on server and it's failing with INLINE ERROR however when i explicitly call the publisher method on server it's publishing fine over server box also.code which is failing at below line while publishing:
future = publisher.publish(topic_path, data=data)
**ERROR:2020-02-20 14:24:42,714 ERROR Failed to publish 1 messages.**
Trackback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 57, in error_remapped_callable
return callable_(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 826, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 729, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "Deadline Exceeded"
debug_error_string = "{"created":"#1582208682.711481693","description":"Deadline Exceeded","file":"src/core/ext/filters/deadline/deadline_filter.cc","file_line":69,"grpc_status":14}"
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py", line 184, in retry_target
return target()
File "/usr/local/lib/python3.7/site-packages/google/api_core/timeout.py", line 214, in func_with_timeout
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 59, in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)
File "<string>", line 3, in raise_from
google.api_core.exceptions.ServiceUnavailable: 503 Deadline Exceeded
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/google/cloud/pubsub_v1/publisher/_batch/thread.py", line 219, in _commit
response = self._client.api.publish(self._topic, self._messages)
File "/usr/local/lib/python3.7/site-packages/google/cloud/pubsub_v1/gapic/publisher_client.py", line 498, in publish
request, retry=retry, timeout=timeout, metadata=metadata
File "/usr/local/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py", line 143, in call
return wrapped_func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py", line 286, in retry_wrapped_func
on_error=on_error,
File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py", line 206, in retry_target
last_exc,
File "", line 3, in raise_from
google.api_core.exceptions.RetryError: Deadline of 60.0s exceeded while calling functools.partial(.error_remapped_callable at 0x7f67d064e950>

You should try to chunk your data in reasonable sized chunks (max_messages) and don't forget to add a done callback.
# Loop over json containing records/rows
for idx, row in enumerate(rows_json):
publish_json(row, idx, rowmax=len(rows_json), topic_name)
# Publish messages asynchronous
def publish_json(msg, rowcount, rowmax, topic_project_id, topic_name):
batch_settings = pubsub_v1.types.BatchSettings(max_messages=100)
publisher = pubsub_v1.PublisherClient(batch_settings)
topic_path = publisher.topic_path(topic_project_id, topic_name)
future = publisher.publish(
topic_path, bytes(json.dumps(msg).encode('utf-8')))
future.add_done_callback(
lambda x: logging.info(
'Published msg with ID {} ({}/{} rows).'.format(
future.result(), rowcount, rowmax))
)

Related

PyTorch Net.train() fails to establish connection - what for?

I am trying to train a neural network in PyTorch following a git tutorial. When it comes to train the network I get a row of exceptions linked to a failed connection attempt (probably with some google server):
Traceback (most recent call last):
File "/home/***/.local/lib/python3.6/site-packages/urllib3/connection.py", line 157, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw
File "/home/***/.local/lib/python3.6/site-packages/urllib3/util/connection.py", line 61, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/home/***/.conda/envs/cv-nd/lib/python3.6/socket.py", line 745, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/***/.local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 672, in urlopen
chunked=chunked,
File "/home/***/.local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 387, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/home/***/.conda/envs/cv-nd/lib/python3.6/http/client.py", line 1262, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/home/***/.conda/envs/cv-nd/lib/python3.6/http/client.py", line 1308, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/home/***/.conda/envs/cv-nd/lib/python3.6/http/client.py", line 1257, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/home/***/.conda/envs/cv-nd/lib/python3.6/http/client.py", line 1036, in _send_output
self.send(msg)
File "/home/*.*/.conda/envs/cv-nd/lib/python3.6/http/client.py", line 974, in send
self.connect()
File "/home/*.*/.local/lib/python3.6/site-packages/urllib3/connection.py", line 184, in connect
conn = self._new_conn()
File "/home/*.*/.local/lib/python3.6/site-packages/urllib3/connection.py", line 169, in _new_conn
self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7ff4d4cc87f0>: Failed to establish a new connection: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/*.*/.local/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/home/*.*/.local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 720, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/home/*.*/.local/lib/python3.6/site-packages/urllib3/util/retry.py", line 436, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/attributes/keep_alive_token (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff4d4cc87f0>: Failed to establish a new connection: [Errno -2] Name or service not known',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "NB2.py", line 229, in <module>
with active_session():
File "/home/*.*/.conda/envs/cv-nd/lib/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
File "/home/*.*/Code/faceNNtrain/Facial_Keypoint_Detection/workspace_utils.py", line 31, in active_session
token = requests.request("GET", TOKEN_URL, headers=TOKEN_HEADERS).text
File "/home/*.*/.local/lib/python3.6/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/home/*.*/.local/lib/python3.6/site-packages/requests/sessions.py", line 530, in request
resp = self.send(prep, **send_kwargs)
File "/home/*.*/.local/lib/python3.6/site-packages/requests/sessions.py", line 643, in send
r = adapter.send(request, **kwargs)
File "/home/*.*/.local/lib/python3.6/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/attributes/keep_alive_token (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff4d4cc87f0>: Failed to establish a new connection: [Errno -2] Name or service not known',))
I am confused since there is no external call required, and it is nowhere defined in my code (including model definition) - it probably sits somewhere hidden in the libraries, but still, it fails, messes everything up and idk how to fix it. I checked my local firewall and all, but the only host from the errors I could check (metadata.google.internal) is also not reachable from other clients. Any ideas?

PermissionDenied: 403 error when trying to run async Google Cloud Speech async transcribe

I'm getting the following error when trying to run an async transcription request on a .flac file hosted on google cloud.
$ python3 transcribe_async.py gs://[file].flac
Traceback (most recent call last):
File "[]/anaconda3/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 54, in error_remapped_callable
return callable_(*args, **kwargs)
File "[]/anaconda3/lib/python3.6/site-packages/grpc/_channel.py", line 514, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "[]/anaconda3/lib/python3.6/site-packages/grpc/_channel.py", line 448, in _end_unary_response_blocking
raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
status = StatusCode.PERMISSION_DENIED
details = "The caller does not have permission"
debug_error_string = "{"created":"#1533912393.258761000","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1095,"grpc_message":"The caller does not have permission","grpc_status":7}"
>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "transcribe_async.py", line 105, in <module>
transcribe_gcs(args.path)
File "transcribe_async.py", line 83, in transcribe_gcs
operation = client.long_running_recognize(config, audio)
File "[]/anaconda3/lib/python3.6/site-packages/google/cloud/speech_v1/gapic/speech_client.py", line 284, in long_running_recognize
request, retry=retry, timeout=timeout, metadata=metadata)
File "[]/anaconda3/lib/python3.6/site-packages/google/api_core/gapic_v1/method.py", line 139, in __call__
return wrapped_func(*args, **kwargs)
File "[]/anaconda3/lib/python3.6/site-packages/google/api_core/retry.py", line 260, in retry_wrapped_func
on_error=on_error,
File "[]/anaconda3/lib/python3.6/site-packages/google/api_core/retry.py", line 177, in retry_target
return target()
File "[]/anaconda3/lib/python3.6/site-packages/google/api_core/timeout.py", line 206, in func_with_timeout
return func(*args, **kwargs)
File "[]/anaconda3/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 56, in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)
File "<string>", line 3, in raise_from
google.api_core.exceptions.PermissionDenied: 403 The caller does not have permission
I've added an export statement to my .zshrc file that points to the service account json, I've added myself, the service account email and the project owner and editor as owners of the cloud bucket through the browser, and I ran gcloud auth activate-service-account --key-file="[].json", but nothing helps. What have I forgotten? Any help much appreciated.
You need to make your file publicly readable. Once you set the permissions to allUsers, you will be able to use your file in your request.

Connecting to softlayer object storage using apache libcloud

Problem:
Trying to connect to Softlayer Swift Object Storage using apache libcloud and I can't get it to work. I have tried passing in different options to the provider but no matter what I pass in I get the same error. I would appreciate any and all pointers to help me resolve this issue.
Code:
from pprint import pprint
from libcloud.storage.types import Provider
from libcloud.storage.providers import get_driver
_swift = get_driver(Provider.OPENSTACK_SWIFT)
_authurl = "https://lon02.objectstorage.softlayer.net/auth/v1.0"
_user = "IBMOS<redacted>"
_key = "redacted"
driver = _swift(_user, _key,
region='lon02',
ex_force_auth_url=_authurl,
ex_force_auth_version='1.0',
ex_force_service_type='object-store',
ex_force_service_name='cloudFiles')
#pprint(driver.__dict__)
pprint(driver.list_containers())
Stack Trace:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/urllib3/contrib/pyopenssl.py", line 441, in wrap_socket
cnx.do_handshake()
File "/usr/local/lib/python3.6/site-packages/OpenSSL/SSL.py", line 1806, in do_handshake
self._raise_ssl_error(self._ssl, result)
File "/usr/local/lib/python3.6/site-packages/OpenSSL/SSL.py", line 1539, in _raise_ssl_error
raise SysCallError(-1, "Unexpected EOF")
OpenSSL.SSL.SysCallError: (-1, 'Unexpected EOF')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 601, in urlopen
chunked=chunked)
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 346, in _make_request
self._validate_conn(conn)
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 850, in _validate_conn
conn.connect()
File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 326, in connect
ssl_context=context)
File "/usr/local/lib/python3.6/site-packages/urllib3/util/ssl_.py", line 329, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "/usr/local/lib/python3.6/site-packages/urllib3/contrib/pyopenssl.py", line 448, in wrap_socket
raise ssl.SSLError('bad handshake: %r' % e)
ssl.SSLError: ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/requests/adapters.py", line 440, in send
timeout=timeout
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 639, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/local/lib/python3.6/site-packages/urllib3/util/retry.py", line 388, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='lon02.objectstorage.softlayer.net', port=443): Max retries exceeded with url: /auth/v1.0 (Caused by SSLError(SSLError("bad handshake: SysCallError(-1, 'Unexpected EOF')",),))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "swift.py", line 19, in <module>
pprint(driver.list_containers())
File "/usr/local/lib/python3.6/site-packages/libcloud/storage/base.py", line 209, in list_containers
return list(self.iterate_containers())
File "/usr/local/lib/python3.6/site-packages/libcloud/storage/drivers/cloudfiles.py", line 275, in iterate_containers
response = self.connection.request('')
File "/usr/local/lib/python3.6/site-packages/libcloud/storage/drivers/cloudfiles.py", line 165, in request
raw=raw)
File "/usr/local/lib/python3.6/site-packages/libcloud/common/openstack.py", line 223, in request
raw=raw)
File "/usr/local/lib/python3.6/site-packages/libcloud/common/base.py", line 536, in request
action = self.morph_action_hook(action)
File "/usr/local/lib/python3.6/site-packages/libcloud/common/openstack.py", line 290, in morph_action_hook
self._populate_hosts_and_request_paths()
File "/usr/local/lib/python3.6/site-packages/libcloud/common/openstack.py", line 324, in _populate_hosts_and_request_paths
osa = osa.authenticate(**kwargs) # may throw InvalidCreds
File "/usr/local/lib/python3.6/site-packages/libcloud/common/openstack_identity.py", line 761, in authenticate
resp = self.request('/v1.0', headers=headers, method='GET')
File "/usr/local/lib/python3.6/site-packages/libcloud/common/base.py", line 603, in request
headers=headers, stream=stream)
File "/usr/local/lib/python3.6/site-packages/libcloud/http.py", line 213, in request
verify=self.verification
File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 618, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.6/site-packages/requests/adapters.py", line 506, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='lon02.objectstorage.softlayer.net', port=443): Max retries exceeded with url: /auth/v1.0 (Caused by SSLError(SSLError("bad handshake: SysCallError(-1, 'Unexpected EOF')",),))
To answer my own question, the specific issue seen above was due to the system I was using. Once I switched to another system it proceeded even further however there is a bug I identified and filed in apache libcloud with a documented workaround inside the jira ticket here https://issues.apache.org/jira/browse/LIBCLOUD-972.

Celery: linked task throws connection error

I tried to run a very simple task with a linked task mentioned in the tutorial
add.apply_async((2, 2), link=add.s(16))
and got an exception in the worker process:
[2014-09-21 19:56:38,531: WARNING/Worker-1] C:\Python33\lib\site-packages\celery-3.1.15-
py3.3.egg\celery\app\trace.py:364: RuntimeWarning: Exception raised outside body: OSError(ConnectionRefusedError(10061, 'No connection could be made because the target machine actively refused it', None, 10061),):
Traceback (most recent call last):
File "C:\Python33\lib\site-packages\kombu-3.0.23-py3.3.egg\kombu\utils\__init__.py", line 420, in __call__
return self.__value__
AttributeError: 'ChannelPromise' object has no attribute '__value__'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Python33\lib\site-packages\kombu-3.0.23-py3.3.egg\kombu\connection.py", line 436, in _ensured
return fun(*args, **kwargs)
File "C:\Python33\lib\site-packages\kombu-3.0.23-py3.3.egg\kombu\messaging.py", line 173, in _publish
channel = self.channel
File "C:\Python33\lib\site-packages\kombu-3.0.23-py3.3.egg\kombu\messaging.py", line 190, in _get_channel
channel = self._channel = channel()
File "C:\Python33\lib\site-packages\kombu-3.0.23-py3.3.egg\kombu\utils\__init__.py", line 422, in __call__
value = self.__value__ = self.__contract__()
File "C:\Python33\lib\site-packages\kombu-3.0.23-py3.3.egg\kombu\messaging.py", line 205, in <lambda>
channel = ChannelPromise(lambda: connection.default_channel)
File "C:\Python33\lib\site-packages\kombu-3.0.23-py3.3.egg\kombu\connection.py", line 756, in default_channel
self.connection
File "C:\Python33\lib\site-packages\kombu-3.0.23-py3.3.egg\kombu\connection.py", line 741, in connection
self._connection = self._establish_connection()
File "C:\Python33\lib\site-packages\kombu-3.0.23-py3.3.egg\kombu\connection.py", line 696, in _establish_connection
conn = self.transport.establish_connection()
File "C:\Python33\lib\site-packages\kombu-3.0.23-py3.3.egg\kombu\transport\pyamqp.py", line 112, in establish_connection
conn = self.Connection(**opts)
File "C:\Python33\lib\site-packages\amqp-1.4.6-py3.3.egg\amqp\connection.py", line 165, in __init__
self.transport = self.Transport(host, connect_timeout, ssl)
File "C:\Python33\lib\site-packages\amqp-1.4.6-py3.3.egg\amqp\connection.py", line 186, in Transport
return create_transport(host, connect_timeout, ssl)
File "C:\Python33\lib\site-packages\amqp-1.4.6-py3.3.egg\amqp\transport.py", line 299, in create_transport
return TCPTransport(host, connect_timeout)
File "C:\Python33\lib\site-packages\amqp-1.4.6-py3.3.egg\amqp\transport.py", line 95, in __init__
raise socket.error(last_err)
OSError: [WinError 10061] No connection could be made because the target machine actively refused it
I did a brief debugging in transport.py and found the worker was trying to connect to port 5672 on localhost. It seems that the worker thinks the linked task needs to be executed via local RabbitMQ instance. This is weird because I specified a remote RabbitMQ broker in the configuration setting. Also the setting works if I simply run the async call without a linked task:
add.apply_async((2, 2))
Here is my setup:
Use RabbitMQ as broker and Redis as results back end on a remote Windows Server
Run my test client on another windows 7 machine
Can anyone shed some light? Thanks.

Celery backend cleanup failing with SQLAlchemy & MySQL

I'm facing following exception when celery is trying to cleanup back-end.
Most probably, this is happening due to MySQL disconnect issue and can be solved by using pool_recycle parameter and retrying the task later. But this is out of my hand - I guess celery needs to provide support for this?
Now my question is, what is backend cleanup task and how such a failed task may affect our system?
Log:
[2014-04-08 04:00:00,017: INFO/Beat] Scheduler: Sending due task celery.backend_cleanup (celery.backend_cleanup)
[2014-04-08 04:00:00,020: INFO/MainProcess] Received task: celery.backend_cleanup[b70acd50-e72d-43b1-a702-0bfa8e7e83a6] expires:[2014-04-08 16:00:00.018317+01:00]
[2014-04-08 04:00:00,036: ERROR/MainProcess] Task celery.backend_cleanup[b70acd50-e72d-43b1-a702-0bfa8e7e83a6] raised unexpected: OperationalError('(OperationalError) MySQL Connection not available.',)
Traceback (most recent call last):
File "/webapps/phoenix/lib/python3.3/site-packages/celery/app/trace.py", line 238, in trace_task
R = retval = fun(*args, **kwargs)
File "/webapps/phoenix/lib/python3.3/site-packages/celery/app/trace.py", line 416, in __protected_call__
return self.run(*args, **kwargs)
File "/webapps/phoenix/lib/python3.3/site-packages/celery/app/builtins.py", line 56, in backend_cleanup
app.backend.cleanup()
File "/webapps/phoenix/lib/python3.3/site-packages/celery/backends/database/__init__.py", line 180, in cleanup
Task.date_done < (now - expires)).delete()
File "/webapps/phoenix/lib/python3.3/site-packages/sqlalchemy/orm/query.py", line 2626, in delete
delete_op.exec_()
File "/webapps/phoenix/lib/python3.3/site-packages/sqlalchemy/orm/persistence.py", line 866, in exec_
self._do_exec()
File "/webapps/phoenix/lib/python3.3/site-packages/sqlalchemy/orm/persistence.py", line 991, in _do_exec
params=self.query._params)
File "/webapps/phoenix/lib/python3.3/site-packages/sqlalchemy/orm/session.py", line 978, in execute
clause, params or {})
File "/webapps/phoenix/lib/python3.3/site-packages/sqlalchemy/engine/base.py", line 664, in execute
return meth(self, multiparams, params)
File "/webapps/phoenix/lib/python3.3/site-packages/sqlalchemy/sql/elements.py", line 282, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/webapps/phoenix/lib/python3.3/site-packages/sqlalchemy/engine/base.py", line 761, in _execute_clauseelement
compiled_sql, distilled_params
File "/webapps/phoenix/lib/python3.3/site-packages/sqlalchemy/engine/base.py", line 828, in _execute_context
None, None)
File "/webapps/phoenix/lib/python3.3/site-packages/sqlalchemy/engine/base.py", line 1023, in _handle_dbapi_exception
exc_info
File "/webapps/phoenix/lib/python3.3/site-packages/sqlalchemy/util/compat.py", line 174, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=exc_value)
File "/webapps/phoenix/lib/python3.3/site-packages/sqlalchemy/util/compat.py", line 167, in reraise
raise value.with_traceback(tb)
File "/webapps/phoenix/lib/python3.3/site-packages/sqlalchemy/engine/base.py", line 824, in _execute_context
context = constructor(dialect, self, conn, *args)
File "/webapps/phoenix/lib/python3.3/site-packages/sqlalchemy/engine/default.py", line 507, in _init_compiled
self.cursor = self.create_cursor()
File "/webapps/phoenix/lib/python3.3/site-packages/sqlalchemy/engine/default.py", line 671, in create_cursor
return self._dbapi_connection.cursor()
File "/webapps/phoenix/lib/python3.3/site-packages/sqlalchemy/pool.py", line 548, in cursor
return self.connection.cursor(*args, **kwargs)
File "/webapps/phoenix/lib/python3.3/site-packages/mysql/connector/connection.py", line 1231, in cursor
raise errors.OperationalError("MySQL Connection not available.")
sqlalchemy.exc.OperationalError: (OperationalError) MySQL Connection not available. 'DELETE FROM celery_taskmeta WHERE celery_taskmeta.date_done < %(date_done_1)s' [{}]
PS I've checked this SO question but seems like it's due to a different exception: Celery log shows cleanup failed
The closest Celery has to retrying is short lived sessions.
The task is cleaning out un-read task results. If it's failing, you may see those results start to build up, but should be OK otherwise.
You're right that there's very little documentation about it!