kubernetes.client.exceptions.ApiException: (0) Reason: Handshake status 500 Internal Server Error - kubernetes

getting the above exception while deploying dags in the pipeline.
The log is as follows:
************************************************************************************************************
* *
* *
* Deploying 'dags'... *
* *
* *
************************************************************************************************************
[2022-12-16 14:09:48,076] {io} INFO - Current directory: /artifacts/dags
[2022-12-16 14:09:48,076] {copy_deploy_tool} INFO - Deploy 'dags' by copying files...
[2022-12-16 14:09:48,083] {deploy_tool} INFO - saving values.yaml...
[2022-12-16 14:09:48,162] {copy_deploy_tool} INFO - Removing files from 'development:airflow-5db795dd7c-d586h:/root/airflow/dags'
[2022-12-16 14:09:48,264] {deploy} ERROR - Execution failed for project: dags
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/kubernetes/stream/ws_client.py", line 296, in websocket_call
client = WSClient(configuration, get_websocket_url(url), headers, capture_all)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/stream/ws_client.py", line 94, in __init__
self.sock.connect(url, header=header)
File "/usr/local/lib/python3.6/dist-packages/websocket/_core.py", line 253, in connect
self.handshake_response = handshake(self.sock, *addrs, **options)
File "/usr/local/lib/python3.6/dist-packages/websocket/_handshake.py", line 57, in handshake
status, resp = _get_resp_headers(sock)
File "/usr/local/lib/python3.6/dist-packages/websocket/_handshake.py", line 143, in _get_resp_headers
raise WebSocketBadStatusException("Handshake status %d %s", status, status_message, resp_headers)
websocket._exceptions.WebSocketBadStatusException: Handshake status 500 Internal Server Error
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/commands/deploy/deploy.py", line 65, in __try_execute
deployment=project[DEPLOYMENT],
File "/usr/local/lib/python3.6/dist-packages/tools/deploy/copy_deploy_tool.py", line 50, in run
namespace, container, pod_name, command, api_client=api_client
File "/usr/local/lib/python3.6/dist-packages/helpers/kubernetes.py", line 133, in run_pod_command
stdout=True,
File "/usr/local/lib/python3.6/dist-packages/kubernetes/stream/stream.py", line 35, in stream
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api/core_v1_api.py", line 841, in connect_get_namespaced_pod_exec
(data) = self.connect_get_namespaced_pod_exec_with_http_info(name, namespace, **kwargs) # noqa: E501
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api/core_v1_api.py", line 941, in connect_get_namespaced_pod_exec_with_http_info
collection_formats=collection_formats)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 345, in call_api
_preload_content, _request_timeout)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/client/api_client.py", line 176, in __call_api
_request_timeout=_request_timeout)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/stream/stream.py", line 30, in _intercept_request_call
return ws_client.websocket_call(config, *args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/kubernetes/stream/ws_client.py", line 302, in websocket_call
raise ApiException(status=0, reason=str(e))
kubernetes.client.rest.ApiException: (0)
Reason: Handshake status 500 Internal Server Error
[2022-12-16 14:09:49,631] {shell} INFO - doi: Deployment record uploaded successfully
[2022-12-16 14:09:49,631] {shell} INFO - OK
[2022-12-16 14:09:49,635] {io} INFO - Current directory: /artifacts
[2022-12-16 14:09:49,635] {pretty_info} INFO - ```

Usually, it happens when pod becomes into the Running state, but no running containers on it (0/1). And if you run execute command to this pod and container you will get 500 Internal server error instead of errors related to the real issue(the container is not running).
Check that all containers are running.
if all([ p.status.phase == "Running" for p in my_pods]) \
and all([c.state.running for p in my_pods for c in p.status.container_statuses]):
Also refer to this Stackpost and Github issue.

Related

airflow worker not reaching sqs

I am running the airflow worker service. The service is not able to connect to the sqs
The scheduler is able to reach and write to the queue
Environment:
Amazon Linux
Python 3.54
Airflow 1.10.1
Celery 4.1.1
Proxies are fine; I have implemented this in both python 2.7 and 3.5 same issue
I have set the celery_transport_options for the region
Airflow is configured with celeryExecutor, SQS broker & postgres database for backend
partial log
[2018-08-30 15:43:58,779: CRITICAL/MainProcess] Unrecoverable error: Exception('Request Empty body HTTP 599 Failed to connect to eu-west-1.queue.amazonaws.com port 443: Connection timed out (None)',)
Traceback (most recent call last):
File "/usr/local/lib/python3.5/site-packages/celery/worker/worker.py", line 207, in start
self.blueprint.start(self)
File "/usr/local/lib/python3.5/site-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/usr/local/lib/python3.5/site-packages/celery/bootsteps.py", line 370, in start
return self.obj.start()
File "/usr/local/lib/python3.5/site-packages/celery/worker/consumer/consumer.py", line 316, in start
blueprint.start(self)
File "/usr/local/lib/python3.5/site-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/usr/local/lib/python3.5/site-packages/celery/worker/consumer/consumer.py", line 592, in start
c.loop(*c.loop_args())
File "/usr/local/lib/python3.5/site-packages/celery/worker/loops.py", line 91, in asynloop
next(loop)
File "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/hub.py", line 354, in create_loop
cb(*cbargs)
File "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", line 114, in on_writable
return self._on_event(fd, _pycurl.CSELECT_OUT)
File "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", line 124, in _on_event
self._process_pending_requests()
File "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", line 132, in _process_pending_requests
self._process(curl, errno, reason)
File "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/http/curl.py", line 178, in _process
buffer=buffer, effective_url=effective_url, error=error,
File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 150, in __call__
svpending(*ca, **ck)
File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 143, in __call__
return self.throw()
File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 140, in __call__
retval = fun(*final_args, **final_kwargs)
File "/usr/local/lib/python3.5/site-packages/vine/funtools.py", line 100, in _transback
return callback(ret)
File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 143, in __call__
return self.throw()
File "/usr/local/lib/python3.5/site-packages/vine/promises.py", line 140, in __call__
retval = fun(*final_args, **final_kwargs)
File "/usr/local/lib/python3.5/site-packages/vine/funtools.py", line 98, in _transback
callback.throw()
File "/usr/local/lib/python3.5/site-packages/vine/funtools.py", line 96, in _transback
ret = filter_(*args + (ret,), **kwargs)
File "/usr/local/lib/python3.5/site-packages/kombu/asynchronous/aws/connection.py", line 233, in _on_list_ready
raise self._for_status(response, response.read())
Exception: Request Empty body HTTP 599 Failed to connect to eu-west-1.queue.amazonaws.com port 443: Connection timed out (None)

Google Cloud Storage batch move file failure. "Connection reset by peer"

I suspect the code below ran out connection capacity, etc. Is there any interface I can send batch requests with? Or sleep a few ms?
def archive_pending_blobs(bucket: Bucket, blobs: typing.List[Blob], pending_prefix: str,
loaded_prefix: str) -> None:
"""Archive pending blobs to loaded prefix."""
try:
for b in blobs:
bucket.copy_blob(b, bucket, b.name.replace(pending_prefix, loaded_prefix))
bucket.delete_blobs(blobs)
except Exception as e:
print('gcs achieving error for path: {} err: {}'.format(pending_prefix, e))
raise e
Traceback (most recent call last): File "/env/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen chunked=chunked) File "/env/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 384, in _make_request six.raise_from(e, None) File "", line 2, in raise_from File "/env/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 380, in _make_request httplib_response = conn.getresponse()
File "/opt/python3.7/lib/python3.7/http/client.py", line 1321, in getresponse response.begin() File "/opt/python3.7/lib/python3.7/http/client.py", line 296, in begin version, status, reason = self._read_status() File "/opt/python3.7/lib/python3.7/http/client.py", line 257, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") File "/opt/python3.7/lib/python3.7/socket.py", line 589, in readinto return self._sock.recv_into(b) File "/opt/python3.7/lib/python3.7/ssl.py", line 1049, in recv_into return self.read(nbytes, buffer) File "/opt/python3.7/lib/python3.7/ssl.py", line 908, in read return self._sslobj.read(len, buffer) ConnectionResetError: [Errno 104] Connection reset by peer During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/env/local/lib/python3.7/site-packages/requests/adapters.py", line 445, in send timeout=timeout File "/env/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 638, in urlopen _stacktrace=sys.exc_info()[2]) File "/env/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 367, in increment raise six.reraise(type(error), error, _stacktrace) File "/env/local/lib/python3.7/site-packages/urllib3/packages/six.py", line 685, in reraise raise value.with_traceback(tb) File "/env/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen chunked=chunked) File "/env/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 384, in _make_request six.raise_from(e, None) File "", line 2, in raise_from File "/env/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 380, in _make_request httplib_response = conn.getresponse() File "/opt/python3.7/lib/python3.7/http/client.py", line 1321, in getresponse response.begin() File "/opt/python3.7/lib/python3.7/http/client.py", line 296, in begin version, status, reason = self._read_status() File "/opt/python3.7/lib/python3.7/http/client.py", line 257, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") File "/opt/python3.7/lib/python3.7/socket.py", line 589, in readinto return self._sock.recv_into(b) File "/opt/python3.7/lib/python3.7/ssl.py", line 1049, in recv_into return self.read(nbytes, buffer) File "/opt/python3.7/lib/python3.7/ssl.py", line 908, in read return self._sslobj.read(len, buffer) urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/user_code/main.py", line 230, in bq_merge archive_pending_blobs(bucket, blobs[min_idx:max_idx], pending_prefix, loaded_prefix) File "/user_code/main.py", line 44, in archive_pending_blobs raise e File "/user_code/main.py", line 40, in archive_pending_blobs bucket.copy_blob(b, bucket, b.name.replace(pending_prefix, loaded_prefix)) File "/env/local/lib/python3.7/site-packages/google/cloud/storage/bucket.py", line 711, in copy_blob _target_object=new_blob, File "/env/local/lib/python3.7/site-packages/google/cloud/_http.py", line 290, in api_request headers=headers, target_object=_target_object) File "/env/local/lib/python3.7/site-packages/google/cloud/_http.py", line 183, in _make_request return self._do_request(method, url, headers, data, target_object) File "/env/local/lib/python3.7/site-packages/google/cloud/_http.py", line 212, in _do_request url=url, method=method, headers=headers, data=data) File "/env/local/lib/python3.7/site-packages/google/auth/transport/requests.py", line 201, in request method, url, data=data, headers=request_headers, **kwargs) File "/env/local/lib/python3.7/site-packages/requests/sessions.py", line 512, in request resp = self.send(prep, **send_kwargs) File "/env/local/lib/python3.7/site-packages/requests/sessions.py", line 622, in send r = adapter.send(request, **kwargs) File "/env/local/lib/python3.7/site-packages/requests/adapters.py", line 495, in send raise ConnectionError(err, request=request) requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/env/local/lib/python3.7/site-packages/google/cloud/functions_v1beta2/worker.py", li
Per this StackOverflow answer about Connection reset by peer, seems like this is a fatal error were the remote server sends a RST packet to inmediatly drop the connection.
This other SO answer tackles how to solve it. The solution given is using time.sleep but, as we discussed on the comments, didn't work in your case. That's why I'm suggesting a different approach by using truncated exponential backoff:
Truncated exponential backoff is a standard error handling strategy for network applications in which a client periodically retries a failed request with increasing delays between requests.
[...]
Accessing Cloud Storage through a client library. Note that some client libraries, such as the Cloud Storage Client Library for Node.js, have built-in exponential backoff.
There's no built-in exponential backoff for Python, but there's an example of how handle retries in Python with this method.

KeyError: 'found' on Elastic2DocManager when syncing data from MongoDB

From MongoDB to Elastic Search(5.6.5), I sync the database with Mongo-Connector using Elastic2DocManager:
mongo-connector -m localhost:27017 -t localhost:9200 -d elastic2_doc_manager
After seeing some update on docs.deleted of mongodb_meta on Elastic Search:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open mongodb_meta 3wd6OjTT6tD3f6ZGezZw 5 1 1337173 8372 192.9mb 192.9mb
mongo-connector stops working with below error:
2018-07-11 07:16:41,977 [WARNING] elasticsearch:97 - POST http://localhost:9200/_bulk [status:N/A request:10.003s]
Traceback (most recent call last):
File "c:\programdata\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 387, in _make_request
six.raise_from(e, None)
File "<string>", line 2, in raise_from
File "c:\programdata\anaconda3\lib\site-packages\urllib3\connectionpool.py", line 383, in _make_request
httplib_response = conn.getresponse()
File "c:\programdata\anaconda3\lib\http\client.py", line 1331, in getresponse
response.begin()
File "c:\programdata\anaconda3\lib\http\client.py", line 297, in begin
version, status, reason = self._read_status()
File "c:\programdata\anaconda3\lib\http\client.py", line 258, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "c:\programdata\anaconda3\lib\socket.py", line 586, in readinto
return self._sock.recv_into(b)
socket.timeout: timed out
During handling of the above exception, another exception occurred:
...
Exception in thread Thread-1:
Traceback (most recent call last):
File "c:\programdata\anaconda3\lib\threading.py", line 916, in _bootstrap_inner
self.run()
File "c:\programdata\anaconda3\lib\site-packages\mongo_connector\doc_managers\elastic2_doc_manager.py", line 150, in run
self._docman.send_buffered_operations()
File "c:\programdata\anaconda3\lib\site-packages\mongo_connector\doc_managers\elastic2_doc_manager.py", line 482, in send_buffered_operations
action_buffer = self.BulkBuffer.get_buffer()
File "c:\programdata\anaconda3\lib\site-packages\mongo_connector\doc_managers\elastic2_doc_manager.py", line 696, in get_buffer
self.update_sources()
File "c:\programdata\anaconda3\lib\site-packages\mongo_connector\util.py", line 35, in wrapped
return f(*args, **kwargs)
File "c:\programdata\anaconda3\lib\site-packages\mongo_connector\doc_managers\elastic2_doc_manager.py", line 628, in update_sources
if ES_doc['found']:
KeyError: 'found'
What is the reason for this error?
Make sure the versions are compatible.
The required python packages are installed, this is my requirements.txt look like.
astroid==1.6.5
autopep8==1.3.5
certifi==2018.8.24
elasticsearch==6.3.1
elasticsearch-dsl==6.2.1
isort==4.3.4
lazy-object-proxy==1.3.1
mccabe==0.6.1
pycodestyle==2.4.0
pylint==1.9.2
pymongo==3.7.1
rope==0.11.0
six==1.11.0
urllib3==1.23
wrapt==1.10.11

Celery log shows cleanup failed

I am using celery with django. I see an error when I lookup the celery log for the automatically scheduled cleanup. I am not sure what this means, and the implications of not doing the cleanup. Any help is appreciated.
[2013-09-28 23:00:00,204: ERROR/MainProcess] Task celery.backend_cleanup[65af1634-374a-4068-b1a5-749b70f7c78d] raised exception: NotImplementedError('No updates',)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/celery-3.0.15-py2.7.egg/celery/task/trace.py", line 228, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/celery-3.0.15-py2.7.egg/celery/task/trace.py", line 415, in __protected_call__
return self.run(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/celery-3.0.15-py2.7.egg/celery/app/builtins.py", line 58, in backend_cleanup
app.backend.cleanup()
File "/usr/local/lib/python2.7/dist-packages/djcelery/backends/database.py", line 58, in cleanup
model._default_manager.delete_expired(expires)
File "/usr/local/lib/python2.7/dist-packages/djcelery/managers.py", line 110, in delete_expired
self.get_all_expired(expires).update(hidden=True)
File "/usr/local/lib/python2.7/dist-packages/django/db/models/query.py", line 469, in update
rows = query.get_compiler(self.db).execute_sql(None)
File "/usr/local/lib/python2.7/dist-packages/djangotoolbox/db/basecompiler.py", line 376, in execute_sql
raise NotImplementedError('No updates')

celery failing on dotcloud deployment with IO Error

Celery is failing on one of my dotcloud deployments, and I'm not sure how to fix. The deployment is almost identical to an existing dotcloud deployment (verified via doing a file diff) which seems to be working ok.
The error I get in djcelery log:
dotcloud#hack-default-www-0:/var/log/supervisor$ more djcelery_error.log
/home/dotcloud/env/lib/python2.6/site-packages/django/conf/__init__.py:75: Depre
cationWarning: The ADMIN_MEDIA_PREFIX setting has been removed; use STATIC_URL i
nstead.
"use STATIC_URL instead.", DeprecationWarning)
/home/dotcloud/env/lib/python2.6/site-packages/djcelery/loaders.py:108: UserWarn
ing: Using settings.DEBUG leads to a memory leak, never use this setting in prod
uction environments!
warnings.warn("Using settings.DEBUG leads to a memory leak, never "
[2012-06-04 03:27:32,139: WARNING/MainProcess] -------------- celery#hack-defaul
t-www-0 v2.5.3
---- **** -----
--- * *** * -- [Configuration]
-- * - **** --- . broker: amqp://root#hack-OQVADQ2K.dotcloud.com:29210//
- ** ---------- . loader: djcelery.loaders.DjangoLoader
- ** ---------- . logfile: [stderr]#INFO
- ** ---------- . concurrency: 2
- ** ---------- . events: ON
- *** --- * --- . beat: OFF
-- ******* ----
--- ***** ----- [Queues]
-------------- . celery: exchange:celery (direct) binding:celery
[Tasks]
. experiments.tasks.pushMessageToIphone
. experiments.tasks.sendTestMessage
[2012-06-04 03:27:32,172: INFO/PoolWorker-1] child process calling self.run()
[2012-06-04 03:27:32,185: INFO/PoolWorker-2] child process calling self.run()
[2012-06-04 03:27:32,188: WARNING/MainProcess] celery#hack-default-www-0 has sta
rted.
[2012-06-04 03:27:35,315: ERROR/MainProcess] Consumer: Connection Error: Socket
closed. Trying again in 2 seconds...
[2012-06-04 03:27:40,374: ERROR/MainProcess] Consumer: Connection Error: Socket
closed. Trying again in 4 seconds...
[2012-06-04 03:27:47,479: ERROR/MainProcess] Consumer: Connection Error: Socket
closed. Trying again in 6 seconds...
[2012-06-04 03:27:56,509: ERROR/MainProcess] Consumer: Connection Error: Socket
Interestingly, the error log of celery cam shows something a bit different. I'm not sure if this is a red herring..
/home/dotcloud/env/lib/python2.6/site-packages/django/conf/__init__.py:75: Depre
cationWarning: The ADMIN_MEDIA_PREFIX setting has been removed; use STATIC_URL i
nstead.
"use STATIC_URL instead.", DeprecationWarning)
[2012-06-04 03:27:31,373: INFO/MainProcess] -> evcam: Taking snapshots with djce
lery.snapshot.Camera (every 1.0 secs.)
Traceback (most recent call last):
File "hack/manage.py", line 14, in
execute_manager(settings)
File "/home/dotcloud/env/lib/python2.6/site-packages/django/core/management/__
init__.py", line 459, in execute_manager
utility.execute()
File "/home/dotcloud/env/lib/python2.6/site-packages/django/core/management/__
init__.py", line 382, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/home/dotcloud/env/lib/python2.6/site-packages/djcelery/management/base.
py", line 74, in run_from_argv
return super(CeleryCommand, self).run_from_argv(argv)
File "/home/dotcloud/env/lib/python2.6/site-packages/django/core/management/ba
se.py", line 196, in run_from_argv
self.execute(*args, **options.__dict__)
File "/home/dotcloud/env/lib/python2.6/site-packages/djcelery/management/base.
py", line 67, in execute
super(CeleryCommand, self).execute(*args, **options)
File "/home/dotcloud/env/lib/python2.6/site-packages/django/core/management/ba
se.py", line 232, in execute
output = self.handle(*args, **options)
File "/home/dotcloud/env/lib/python2.6/site-packages/djcelery/management/comma
nds/celerycam.py", line 26, in handle
ev.run(*args, **options)
File "/home/dotcloud/env/lib/python2.6/site-packages/celery/bin/celeryev.py",
line 38, in run
detach=detach)
File "/home/dotcloud/env/lib/python2.6/site-packages/celery/bin/celeryev.py",
line 70, in run_evcam
return cam()
File "/home/dotcloud/env/lib/python2.6/site-packages/celery/events/snapshot.py
", line 116, in evcam
recv.capture(limit=None)
File "/home/dotcloud/env/lib/python2.6/site-packages/celery/events/__init__.py
", line 204, in capture
list(self.itercapture(limit=limit, timeout=timeout, wakeup=wakeup))
File "/home/dotcloud/env/lib/python2.6/site-packages/celery/events/__init__.py
", line 193, in itercapture
with self.consumer(wakeup=wakeup) as consumer:
File "/usr/lib/python2.6/contextlib.py", line 16, in __enter__
return self.gen.next()
File "/home/dotcloud/env/lib/python2.6/site-packages/celery/events/__init__.py
", line 185, in consumer
queues=[self.queue], no_ack=True)
File "/home/dotcloud/env/lib/python2.6/site-packages/kombu/messaging.py", line
279, in __init__
self.revive(self.channel)
File "/home/dotcloud/env/lib/python2.6/site-packages/kombu/messaging.py", line
286, in revive
channel = channel.default_channel
File "/home/dotcloud/env/lib/python2.6/site-packages/kombu/connection.py", lin
e 581, in default_channel
self.connection
File "/home/dotcloud/env/lib/python2.6/site-packages/kombu/connection.py", lin
e 574, in connection
self._connection = self._establish_connection()
File "/home/dotcloud/env/lib/python2.6/site-packages/kombu/connection.py", lin
e 533, in _establish_connection
conn = self.transport.establish_connection()
File "/home/dotcloud/env/lib/python2.6/site-packages/kombu/transport/amqplib.p
y", line 279, in establish_connection
connect_timeout=conninfo.connect_timeout)
File "/home/dotcloud/env/lib/python2.6/site-packages/kombu/transport/amqplib.p
y", line 89, in __init__
super(Connection, self).__init__(*args, **kwargs)
File "/home/dotcloud/env/lib/python2.6/site-packages/amqplib/client_0_8/connec
tion.py", line 144, in __init__
(10, 30), # tune
File "/home/dotcloud/env/lib/python2.6/site-packages/amqplib/client_0_8/abstra
ct_channel.py", line 95, in wait
self.channel_id, allowed_methods)
File "/home/dotcloud/env/lib/python2.6/site-packages/amqplib/client_0_8/connec
tion.py", line 202, in _wait_method
self.method_reader.read_method()
File "/home/dotcloud/env/lib/python2.6/site-packages/amqplib/client_0_8/method
_framing.py", line 221, in read_method
raise m
IOError: Socket closed
My supervisord file:
[program:djcelery]
directory = /home/dotcloud/current/
command = /home/dotcloud/env/bin/python hack/manage.py celeryd -E -l info -c 2
stderr_logfile = /var/log/supervisor/%(program_name)s_error.log
stdout_logfile = /var/log/supervisor/%(program_name)s.log
[program:celerycam]
directory = /home/dotcloud/current/
command = /home/dotcloud/env/bin/python hack/manage.py celerycam
stderr_logfile = /var/log/supervisor/%(program_name)s_error.log
stdout_logfile = /var/log/supervisor/%(program_name)s.log
As mentioned, I have nearly identical code deployed under a different dotcloud account that is working fine.
Status of the rabbitmq broker:
$ ./dotcloud info hack.broker
aliases:
- hackxxxx.dotcloud.com
config:
password: xxxx
rabbitmq_management: true
user: root
created_at: 1338702527.075196
datacenter: Amazon-us-east-1c
image_version: 924a079b622a (latest)
memory: 49M/512M (9%)
ports:
- name: ssh
url: ssh://dotcloud#hackxxx.dotcloud.com:29209
- name: amqp
url: amqp://root:xxxx#hackxxxx.dotcloud.com:29210
- name: http
url: http://root:xxx#hack1-xxxx.dotcloud.com/
state: running
type: rabbitmq
It looks like it is having an issue connection to your broker. Have you confirmed that you can connect to your broker, and it is up and running?
What are you using for a broker?