Celery backend cleanup failing with SQLAlchemy & MySQL - celery

I'm facing following exception when celery is trying to cleanup back-end.
Most probably, this is happening due to MySQL disconnect issue and can be solved by using pool_recycle parameter and retrying the task later. But this is out of my hand - I guess celery needs to provide support for this?
Now my question is, what is backend cleanup task and how such a failed task may affect our system?
Log:
[2014-04-08 04:00:00,017: INFO/Beat] Scheduler: Sending due task celery.backend_cleanup (celery.backend_cleanup)
[2014-04-08 04:00:00,020: INFO/MainProcess] Received task: celery.backend_cleanup[b70acd50-e72d-43b1-a702-0bfa8e7e83a6] expires:[2014-04-08 16:00:00.018317+01:00]
[2014-04-08 04:00:00,036: ERROR/MainProcess] Task celery.backend_cleanup[b70acd50-e72d-43b1-a702-0bfa8e7e83a6] raised unexpected: OperationalError('(OperationalError) MySQL Connection not available.',)
Traceback (most recent call last):
File "/webapps/phoenix/lib/python3.3/site-packages/celery/app/trace.py", line 238, in trace_task
R = retval = fun(*args, **kwargs)
File "/webapps/phoenix/lib/python3.3/site-packages/celery/app/trace.py", line 416, in __protected_call__
return self.run(*args, **kwargs)
File "/webapps/phoenix/lib/python3.3/site-packages/celery/app/builtins.py", line 56, in backend_cleanup
app.backend.cleanup()
File "/webapps/phoenix/lib/python3.3/site-packages/celery/backends/database/__init__.py", line 180, in cleanup
Task.date_done < (now - expires)).delete()
File "/webapps/phoenix/lib/python3.3/site-packages/sqlalchemy/orm/query.py", line 2626, in delete
delete_op.exec_()
File "/webapps/phoenix/lib/python3.3/site-packages/sqlalchemy/orm/persistence.py", line 866, in exec_
self._do_exec()
File "/webapps/phoenix/lib/python3.3/site-packages/sqlalchemy/orm/persistence.py", line 991, in _do_exec
params=self.query._params)
File "/webapps/phoenix/lib/python3.3/site-packages/sqlalchemy/orm/session.py", line 978, in execute
clause, params or {})
File "/webapps/phoenix/lib/python3.3/site-packages/sqlalchemy/engine/base.py", line 664, in execute
return meth(self, multiparams, params)
File "/webapps/phoenix/lib/python3.3/site-packages/sqlalchemy/sql/elements.py", line 282, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/webapps/phoenix/lib/python3.3/site-packages/sqlalchemy/engine/base.py", line 761, in _execute_clauseelement
compiled_sql, distilled_params
File "/webapps/phoenix/lib/python3.3/site-packages/sqlalchemy/engine/base.py", line 828, in _execute_context
None, None)
File "/webapps/phoenix/lib/python3.3/site-packages/sqlalchemy/engine/base.py", line 1023, in _handle_dbapi_exception
exc_info
File "/webapps/phoenix/lib/python3.3/site-packages/sqlalchemy/util/compat.py", line 174, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=exc_value)
File "/webapps/phoenix/lib/python3.3/site-packages/sqlalchemy/util/compat.py", line 167, in reraise
raise value.with_traceback(tb)
File "/webapps/phoenix/lib/python3.3/site-packages/sqlalchemy/engine/base.py", line 824, in _execute_context
context = constructor(dialect, self, conn, *args)
File "/webapps/phoenix/lib/python3.3/site-packages/sqlalchemy/engine/default.py", line 507, in _init_compiled
self.cursor = self.create_cursor()
File "/webapps/phoenix/lib/python3.3/site-packages/sqlalchemy/engine/default.py", line 671, in create_cursor
return self._dbapi_connection.cursor()
File "/webapps/phoenix/lib/python3.3/site-packages/sqlalchemy/pool.py", line 548, in cursor
return self.connection.cursor(*args, **kwargs)
File "/webapps/phoenix/lib/python3.3/site-packages/mysql/connector/connection.py", line 1231, in cursor
raise errors.OperationalError("MySQL Connection not available.")
sqlalchemy.exc.OperationalError: (OperationalError) MySQL Connection not available. 'DELETE FROM celery_taskmeta WHERE celery_taskmeta.date_done < %(date_done_1)s' [{}]
PS I've checked this SO question but seems like it's due to a different exception: Celery log shows cleanup failed

The closest Celery has to retrying is short lived sessions.
The task is cleaning out un-read task results. If it's failing, you may see those results start to build up, but should be OK otherwise.
You're right that there's very little documentation about it!

Related

Why Pgadmin gives me 500 error? _pickle.PicklingError: Can't pickle <class 'wtforms.form.Meta'>: attribute lookup Meta on wtforms.form failed

Occasionally, Pgadmin gives me the 500 error in a browser. After reloading the page, the issue disappears for a while and then comes back again. Here's the log I see while getting the error:
[2022-01-21 14:35:21 +0000] [93] [ERROR] Error handling request /authenticate/login
Traceback (most recent call last):
File "/venv/lib/python3.8/site-packages/gunicorn/workers/gthread.py", line 271, in handle
keepalive = self.handle_request(req, conn)
File "/venv/lib/python3.8/site-packages/gunicorn/workers/gthread.py", line 323, in handle_request
respiter = self.wsgi(environ, resp.start_response)
File "/venv/lib/python3.8/site-packages/flask/app.py", line 2464, in __call__
return self.wsgi_app(environ, start_response)
File "/pgadmin4/pgAdmin4.py", line 77, in __call__
return self.app(environ, start_response)
File "/venv/lib/python3.8/site-packages/werkzeug/middleware/proxy_fix.py", line 169, in __call__
return self.app(environ, start_response)
File "/venv/lib/python3.8/site-packages/flask_socketio/__init__.py", line 43, in __call__
return super(_SocketIOMiddleware, self).__call__(environ,
File "/venv/lib/python3.8/site-packages/engineio/middleware.py", line 74, in __call__
return self.wsgi_app(environ, start_response)
File "/venv/lib/python3.8/site-packages/flask/app.py", line 2450, in wsgi_app
response = self.handle_exception(e)
File "/venv/lib/python3.8/site-packages/flask/app.py", line 1867, in handle_exception
reraise(exc_type, exc_value, tb)
File "/venv/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/venv/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/venv/lib/python3.8/site-packages/flask/app.py", line 1953, in full_dispatch_request
return self.finalize_request(rv)
File "/venv/lib/python3.8/site-packages/flask/app.py", line 1970, in finalize_request
response = self.process_response(response)
File "/venv/lib/python3.8/site-packages/flask/app.py", line 2269, in process_response
self.session_interface.save_session(self, ctx.session, response)
File "/pgadmin4/pgadmin/utils/session.py", line 307, in save_session
self.manager.put(session)
File "/pgadmin4/pgadmin/utils/session.py", line 166, in put
self.parent.put(session)
File "/pgadmin4/pgadmin/utils/session.py", line 270, in put
dump(
_pickle.PicklingError: Can't pickle <class 'wtforms.form.Meta'>: attribute lookup Meta on wtforms.form failed
The issue appeared after enabling Oauth2 authentication. I've tried using different version but no luck.
Pgadmin is running in Kubernetes.
Please try setting
PGADMIN_CONFIG_ENHANCED_COOKIE_PROTECTION = False in configuration file according to your operating system as mentioned here.

google.api_core.exceptions.ServiceUnavailable: 503 Deadline Exceeded

google.api_core.exceptions.ServiceUnavailable: 503 Deadline Exceeded
using python 3.7 ,google-cloud-pubsub ==1.1.0 publishing data on topic. In my local machine it's working perfectly fine and able to publish data on that topic and also able to pull data from that topic through subscriber.
but don't understand it's not working when i deploy the code on server and it's failing with INLINE ERROR however when i explicitly call the publisher method on server it's publishing fine over server box also.code which is failing at below line while publishing:
future = publisher.publish(topic_path, data=data)
**ERROR:2020-02-20 14:24:42,714 ERROR Failed to publish 1 messages.**
Trackback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 57, in error_remapped_callable
return callable_(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 826, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 729, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "Deadline Exceeded"
debug_error_string = "{"created":"#1582208682.711481693","description":"Deadline Exceeded","file":"src/core/ext/filters/deadline/deadline_filter.cc","file_line":69,"grpc_status":14}"
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py", line 184, in retry_target
return target()
File "/usr/local/lib/python3.7/site-packages/google/api_core/timeout.py", line 214, in func_with_timeout
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 59, in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)
File "<string>", line 3, in raise_from
google.api_core.exceptions.ServiceUnavailable: 503 Deadline Exceeded
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/google/cloud/pubsub_v1/publisher/_batch/thread.py", line 219, in _commit
response = self._client.api.publish(self._topic, self._messages)
File "/usr/local/lib/python3.7/site-packages/google/cloud/pubsub_v1/gapic/publisher_client.py", line 498, in publish
request, retry=retry, timeout=timeout, metadata=metadata
File "/usr/local/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py", line 143, in call
return wrapped_func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py", line 286, in retry_wrapped_func
on_error=on_error,
File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py", line 206, in retry_target
last_exc,
File "", line 3, in raise_from
google.api_core.exceptions.RetryError: Deadline of 60.0s exceeded while calling functools.partial(.error_remapped_callable at 0x7f67d064e950>
You should try to chunk your data in reasonable sized chunks (max_messages) and don't forget to add a done callback.
# Loop over json containing records/rows
for idx, row in enumerate(rows_json):
publish_json(row, idx, rowmax=len(rows_json), topic_name)
# Publish messages asynchronous
def publish_json(msg, rowcount, rowmax, topic_project_id, topic_name):
batch_settings = pubsub_v1.types.BatchSettings(max_messages=100)
publisher = pubsub_v1.PublisherClient(batch_settings)
topic_path = publisher.topic_path(topic_project_id, topic_name)
future = publisher.publish(
topic_path, bytes(json.dumps(msg).encode('utf-8')))
future.add_done_callback(
lambda x: logging.info(
'Published msg with ID {} ({}/{} rows).'.format(
future.result(), rowcount, rowmax))
)

PermissionDenied: 403 error when trying to run async Google Cloud Speech async transcribe

I'm getting the following error when trying to run an async transcription request on a .flac file hosted on google cloud.
$ python3 transcribe_async.py gs://[file].flac
Traceback (most recent call last):
File "[]/anaconda3/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 54, in error_remapped_callable
return callable_(*args, **kwargs)
File "[]/anaconda3/lib/python3.6/site-packages/grpc/_channel.py", line 514, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "[]/anaconda3/lib/python3.6/site-packages/grpc/_channel.py", line 448, in _end_unary_response_blocking
raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with:
status = StatusCode.PERMISSION_DENIED
details = "The caller does not have permission"
debug_error_string = "{"created":"#1533912393.258761000","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1095,"grpc_message":"The caller does not have permission","grpc_status":7}"
>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "transcribe_async.py", line 105, in <module>
transcribe_gcs(args.path)
File "transcribe_async.py", line 83, in transcribe_gcs
operation = client.long_running_recognize(config, audio)
File "[]/anaconda3/lib/python3.6/site-packages/google/cloud/speech_v1/gapic/speech_client.py", line 284, in long_running_recognize
request, retry=retry, timeout=timeout, metadata=metadata)
File "[]/anaconda3/lib/python3.6/site-packages/google/api_core/gapic_v1/method.py", line 139, in __call__
return wrapped_func(*args, **kwargs)
File "[]/anaconda3/lib/python3.6/site-packages/google/api_core/retry.py", line 260, in retry_wrapped_func
on_error=on_error,
File "[]/anaconda3/lib/python3.6/site-packages/google/api_core/retry.py", line 177, in retry_target
return target()
File "[]/anaconda3/lib/python3.6/site-packages/google/api_core/timeout.py", line 206, in func_with_timeout
return func(*args, **kwargs)
File "[]/anaconda3/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 56, in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)
File "<string>", line 3, in raise_from
google.api_core.exceptions.PermissionDenied: 403 The caller does not have permission
I've added an export statement to my .zshrc file that points to the service account json, I've added myself, the service account email and the project owner and editor as owners of the cloud bucket through the browser, and I ran gcloud auth activate-service-account --key-file="[].json", but nothing helps. What have I forgotten? Any help much appreciated.
You need to make your file publicly readable. Once you set the permissions to allUsers, you will be able to use your file in your request.

Celery: linked task throws connection error

I tried to run a very simple task with a linked task mentioned in the tutorial
add.apply_async((2, 2), link=add.s(16))
and got an exception in the worker process:
[2014-09-21 19:56:38,531: WARNING/Worker-1] C:\Python33\lib\site-packages\celery-3.1.15-
py3.3.egg\celery\app\trace.py:364: RuntimeWarning: Exception raised outside body: OSError(ConnectionRefusedError(10061, 'No connection could be made because the target machine actively refused it', None, 10061),):
Traceback (most recent call last):
File "C:\Python33\lib\site-packages\kombu-3.0.23-py3.3.egg\kombu\utils\__init__.py", line 420, in __call__
return self.__value__
AttributeError: 'ChannelPromise' object has no attribute '__value__'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Python33\lib\site-packages\kombu-3.0.23-py3.3.egg\kombu\connection.py", line 436, in _ensured
return fun(*args, **kwargs)
File "C:\Python33\lib\site-packages\kombu-3.0.23-py3.3.egg\kombu\messaging.py", line 173, in _publish
channel = self.channel
File "C:\Python33\lib\site-packages\kombu-3.0.23-py3.3.egg\kombu\messaging.py", line 190, in _get_channel
channel = self._channel = channel()
File "C:\Python33\lib\site-packages\kombu-3.0.23-py3.3.egg\kombu\utils\__init__.py", line 422, in __call__
value = self.__value__ = self.__contract__()
File "C:\Python33\lib\site-packages\kombu-3.0.23-py3.3.egg\kombu\messaging.py", line 205, in <lambda>
channel = ChannelPromise(lambda: connection.default_channel)
File "C:\Python33\lib\site-packages\kombu-3.0.23-py3.3.egg\kombu\connection.py", line 756, in default_channel
self.connection
File "C:\Python33\lib\site-packages\kombu-3.0.23-py3.3.egg\kombu\connection.py", line 741, in connection
self._connection = self._establish_connection()
File "C:\Python33\lib\site-packages\kombu-3.0.23-py3.3.egg\kombu\connection.py", line 696, in _establish_connection
conn = self.transport.establish_connection()
File "C:\Python33\lib\site-packages\kombu-3.0.23-py3.3.egg\kombu\transport\pyamqp.py", line 112, in establish_connection
conn = self.Connection(**opts)
File "C:\Python33\lib\site-packages\amqp-1.4.6-py3.3.egg\amqp\connection.py", line 165, in __init__
self.transport = self.Transport(host, connect_timeout, ssl)
File "C:\Python33\lib\site-packages\amqp-1.4.6-py3.3.egg\amqp\connection.py", line 186, in Transport
return create_transport(host, connect_timeout, ssl)
File "C:\Python33\lib\site-packages\amqp-1.4.6-py3.3.egg\amqp\transport.py", line 299, in create_transport
return TCPTransport(host, connect_timeout)
File "C:\Python33\lib\site-packages\amqp-1.4.6-py3.3.egg\amqp\transport.py", line 95, in __init__
raise socket.error(last_err)
OSError: [WinError 10061] No connection could be made because the target machine actively refused it
I did a brief debugging in transport.py and found the worker was trying to connect to port 5672 on localhost. It seems that the worker thinks the linked task needs to be executed via local RabbitMQ instance. This is weird because I specified a remote RabbitMQ broker in the configuration setting. Also the setting works if I simply run the async call without a linked task:
add.apply_async((2, 2))
Here is my setup:
Use RabbitMQ as broker and Redis as results back end on a remote Windows Server
Run my test client on another windows 7 machine
Can anyone shed some light? Thanks.

... twisted/internet/epollreactor.py", line 238, in _add

Ran into trouble, after the server program runs for some time, there will be a large number of errors (error), the the server can only accept client connections but does not accept the data, and finally suspends animation. The following error:
//report error:
//report error:
Unhandled Error
Traceback (most recent call last):
File "/usr/lib64/python2.7/lib/python2.7/site-packages/Twisted-12.1.0-py2.7-linux-x86_64.egg/twisted/python/log.py", line 69, in callWithContext
return context.call({ILogContext: newCtx}, func, *args, **kw)
File "/usr/lib64/python2.7/lib/python2.7/site-packages/Twisted-12.1.0-py2.7-linux-x86_64.egg/twisted/python/context.py", line 118, in callWithContext
return self.currentContext().callWithContext(ctx, func, *args, **kw)
File "/usr/lib64/python2.7/lib/python2.7/site-packages/Twisted-12.1.0-py2.7-linux-x86_64.egg/twisted/python/context.py", line 81, in callWithContext
return func(*args,**kw)
File "/usr/lib64/python2.7/lib/python2.7/site-packages/Twisted-12.1.0-py2.7-linux-x86_64.egg/twisted/internet/posixbase.py", line 614, in _doReadOrWrite
why = selectable.doRead()
// --- <exception caught here> ---
File "/usr/lib64/python2.7/lib/python2.7/site-packages/Twisted-12.1.0-py2.7-linux-x86_64.egg/twisted/internet/tcp.py", line 1016, in doRead
transport = self.transport(skt, protocol, addr, self, s, self.reactor)
File "/usr/lib64/python2.7/lib/python2.7/site-packages/Twisted-12.1.0-py2.7-linux-x86_64.egg/twisted/internet/tcp.py", line 773, in __init__
self.startReading()
File "/usr/lib64/python2.7/lib/python2.7/site-packages/Twisted-12.1.0-py2.7-linux-x86_64.egg/twisted/internet/abstract.py", line 416, in startReading
self.reactor.addReader(self)
File "/usr/lib64/python2.7/lib/python2.7/site-packages/Twisted-12.1.0-py2.7-linux-x86_64.egg/twisted/internet/epollreactor.py", line 254, in addReader
_epoll.EPOLLIN, _epoll.EPOLLOUT)
File "/usr/lib64/python2.7/lib/python2.7/site-packages/Twisted-12.1.0-py2.7-linux-x86_64.egg/twisted/internet/epollreactor.py", line 238, in _add
self._poller.modify(fd, flags)
exceptions.IOError: [Errno 2] No such file or directory
Looks like a Twisted bug to me: we just experienced this after upgrading a rock stable server that has been running on 10.0.0 for over a year to 12.3.0: it happened after only two days after the upgrade.