celery 4.3, SoftTimeLimitExceeded in group of tasks - celery

How should I handle SoftTImeLimitException with group of celery tasks?
i have task
> #shared_task(bind=True, priority=2, autoretry_for=(EasySNMPError,),
> soft_time_limit=10, retry_kwargs={'max_retries': 3, 'countdown': 2},
> acks_late=True) def discover_interface(self, interface_id: int) -> dict:
> logger.info(f'start disco interface {interface_id}')
> p = probes.DiscoverSNMP()
> try:
> with utils.Timer() as t1:
> p.discover_interface(interface_id=interface_id)
> logger.info(f'stop disco interface {interface_id}')
> except SoftTimeLimitExceeded:
> p.stats['message'] = 'soft time limit exceeded'
> return p.stats
when i run it i get soft time limit exception.
log from celery
[2021-02-24 16:30:20,298: WARNING/MainProcess] Soft time limit (10s) exceeded for pollers.tasks.discover_interface[6b166746-95c4-458c-bb63-318bdb588d00]
[2021-02-24 16:30:20,374: ERROR/MainProcess] Process 'ForkPoolWorker-2' pid:19585 exited with 'signal 11 (SIGSEGV)'
[2021-02-24 16:30:20,393: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 11 (SIGSEGV).',)
Traceback (most recent call last):
File "/home/kolekcjoner/miniconda3/envs/kolekcjoner/lib/python3.6/site-packages/billiard/pool.py", line 1267, in mark_as_worker_lost
human_status(exitcode)),
billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 11 (SIGSEGV).
execution in python console
job = group([tasks.discover_interface.s(1870361)]).apply_async()
In[45]: job.join_native()
Traceback (most recent call last):
File "/home/kolekcjoner/miniconda3/envs/kolekcjoner/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3343, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-45-9330abace970>", line 1, in <module>
job.join_native()
File "/home/kolekcjoner/miniconda3/envs/kolekcjoner/lib/python3.6/site-packages/celery/result.py", line 818, in join_native
raise value
billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 11 (SIGSEGV).
how should i change it to NOT have exception in job.join_native(), but message 'soft time limit exceeded'returned from task?

Related

Openstack Magnum kube_master: Went to status ERROR due to "Message: Exceeded maximum number of retries

I tried to deploy a kubernetes cluster with one master node and one worker node in openstack magnum. But it returned an error:
ResourceInError:
resources.kube_masters.resources[0].resources.kube-master: Went to
status ERROR due to "Message: Exceeded maximum number of retries.
Exceeded max scheduling attempts 3 for instance
c477d0d3-6176-4cec-b5aa-3a2fb7c77435. Last exception: Argument must be
bytes or unicode, got 'NoneType', Code: 500"
Additional info:
Image used: Fedora-CoreOS 32
Flavor: 4VCPUs, 4GB RAM, 25GB disk
docker storage driver: devicemapper
docker volume size: 10GB
Nova logs:
2021-03-05 02:58:29.180 22 ERROR nova.scheduler.utils
[req-aa6a7c8b-3d9a-4c8d-9d56-dfcb104ae828
53dd56f7fefc4bd38e2e4b1e8dde2b51 a67073edfee14079a6dda119969895c9 -
default default] [instance: 8fd65139-c43e-40bd-836e-0ec57ea78960]
Error from last host: kolla-ceph-compute17 (node
kolla-ceph-compute17): ['Traceback (most recent call last):\n', '
File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/compute/manager.py",
line 2437, in _build_and_run_instance\n
block_device_info=block_device_info)\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/driver.py",
line 3550, in spawn\n mdevs=mdevs)\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/driver.py",
line 6158, in _get_guest_xml\n xml = conf.to_xml()\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/config.py",
line 79, in to_xml\n root = self.format_dom()\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/config.py",
line 2726, in format_dom\n self._format_devices(root)\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/config.py",
line 2680, in _format_devices\n
devices.append(dev.format_dom())\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/config.py",
line 1037, in format_dom\n auth.set("username",
self.auth_username)\n', ' File "src/lxml/etree.pyx", line 815, in
lxml.etree._Element.set\n', ' File "src/lxml/apihelpers.pxi", line
593, in lxml.etree._setAttributeValue\n', ' File
"src/lxml/apihelpers.pxi", line 1525, in lxml.etree._utf8\n',
"TypeError: Argument must be bytes or unicode, got 'NoneType'\n",
'\nDuring handling of the above exception, another exception
occurred:\n\n', 'Traceback (most recent call last):\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/compute/manager.py",
line 2161, in _do_build_and_run_instance\n filter_properties,
request_spec)\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/compute/manager.py",
line 2537, in _build_and_run_instance\n
instance_uuid=instance.uuid, reason=six.text_type(e))\n',
"nova.exception.RescheduledException: Build of instance
8fd65139-c43e-40bd-836e-0ec57ea78960 was re-scheduled: Argument must
be bytes or unicode, got 'NoneType'\n"]
2021-03-05 02:59:08.970 20 ERROR nova.scheduler.utils
[req-aa6a7c8b-3d9a-4c8d-9d56-dfcb104ae828
53dd56f7fefc4bd38e2e4b1e8dde2b51 a67073edfee14079a6dda119969895c9 -
default default] [instance: 8fd65139-c43e-40bd-836e-0ec57ea78960]
Error from last host: kolla-ceph-compute18 (node
kolla-ceph-compute18): ['Traceback (most recent call last):\n', '
File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/compute/manager.py",
line 2437, in _build_and_run_instance\n
block_device_info=block_device_info)\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/driver.py",
line 3550, in spawn\n mdevs=mdevs)\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/driver.py",
line 6158, in _get_guest_xml\n xml = conf.to_xml()\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/config.py",
line 79, in to_xml\n root = self.format_dom()\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/config.py",
line 2726, in format_dom\n self._format_devices(root)\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/config.py",
line 2680, in _format_devices\n
devices.append(dev.format_dom())\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/libvirt/config.py",
line 1037, in format_dom\n auth.set("username",
self.auth_username)\n', ' File "src/lxml/etree.pyx", line 815, in
lxml.etree._Element.set\n', ' File "src/lxml/apihelpers.pxi", line
593, in lxml.etree._setAttributeValue\n', ' File
"src/lxml/apihelpers.pxi", line 1525, in lxml.etree._utf8\n',
"TypeError: Argument must be bytes or unicode, got 'NoneType'\n",
'\nDuring handling of the above exception, another exception
occurred:\n\n', 'Traceback (most recent call last):\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/compute/manager.py",
line 2161, in _do_build_and_run_instance\n filter_properties,
request_spec)\n', ' File
"/var/lib/kolla/venv/lib/python3.6/site-packages/nova/compute/manager.py",
line 2537, in _build_and_run_instance\n
instance_uuid=instance.uuid, reason=six.text_type(e))\n',
"nova.exception.RescheduledException: Build of instance
8fd65139-c43e-40bd-836e-0ec57ea78960 was re-scheduled: Argument must
be bytes or unicode, got 'NoneType'\n"] root#kolla-infra2:~# cat
/var/log/kolla/magnum/*log | egrep -i "(2021-03-05).*error"
Magnum logs:
2021-03-05 02:18:26.681 19 ERROR
magnum.drivers.heat.k8s_fedora_template_def
[req-bc1f4cad-1636-42b4-ab80-f7e7c3459c5d - - - - -] Failed to load
default keystone auth policy: FileNotFoundError: [Errno 2] No such
file or directory: '/etc/magnum/keystone_auth_default_policy.json'
2021-03-05 02:20:25.721 32 ERROR
magnum.drivers.heat.k8s_fedora_template_def
[req-4e8ac06c-debd-4980-9cbf-7a48684e75e1 - - - - -] Failed to load
default keystone auth policy: FileNotFoundError: [Errno 2] No such
file or directory: '/etc/magnum/keystone_auth_default_policy.json'
2021-03-05 02:45:17.069 27 ERROR
magnum.drivers.heat.k8s_fedora_template_def
[req-205dd35a-5546-4919-875a-cfd15feeeadf - - - - -] Failed to load
default keystone auth policy: FileNotFoundError: [Errno 2] No such
file or directory: '/etc/magnum/keystone_auth_default_policy.json'
2021-03-05 03:00:00.418 6 ERROR magnum.drivers.heat.driver
[req-17a7777b-0e70-4b8f-9199-026cd3f3d2ae - - - - -] Nodegroup error,
stack status: CREATE_FAILED, stack_id:
1671e517-8227-47c3-9c7a-4642ececde33, reason: Resource CREATE failed:
ResourceInError:
resources.kube_masters.resources[0].resources.kube-master: Went to
status ERROR due to "Message: Exceeded maximum number of retries.
Exceeded max scheduling attempts 3 for instance
c477d0d3-6176-4cec-b5aa-3a2fb7c77435. Last exception: Argument must be
bytes or unicode, got 'NoneType', Code: 500" 2021-03-05 03:00:00.621 6
ERROR magnum.drivers.heat.driver
[req-17a7777b-0e70-4b8f-9199-026cd3f3d2ae - - - - -] Nodegroup error,
stack status: CREATE_FAILED, stack_id:
1671e517-8227-47c3-9c7a-4642ececde33, reason: Resource CREATE failed:
ResourceInError:
resources.kube_masters.resources[0].resources.kube-master: Went to
status ERROR due to "Message: Exceeded maximum number of retries.
Exceeded max scheduling attempts 3 for instance
c477d0d3-6176-4cec-b5aa-3a2fb7c77435. Last exception: Argument must be
bytes or unicode, got 'NoneType', Code: 500"
Heat logs:
2021-03-05 02:59:49.270 21 ERROR heat.engine.resource Traceback (most
recent call last): 2021-03-05 02:59:49.270 21 ERROR
heat.engine.resource File
"/var/lib/kolla/venv/lib/python3.6/site-packages/heat/engine/resource.py",
line 920, in _action_recorder 2021-03-05 02:59:49.270 21 ERROR
heat.engine.resource yield 2021-03-05 02:59:49.270 21 ERROR
heat.engine.resource File
"/var/lib/kolla/venv/lib/python3.6/site-packages/heat/engine/resource.py",
line 1033, in _do_action 2021-03-05 02:59:49.270 21 ERROR
heat.engine.resource yield self.action_handler_task(action,
args=handler_args) 2021-03-05 02:59:49.270 21 ERROR
heat.engine.resource File
"/var/lib/kolla/venv/lib/python3.6/site-packages/heat/engine/scheduler.py",
line 346, in wrapper 2021-03-05 02:59:49.270 21 ERROR
heat.engine.resource step = next(subtask) 2021-03-05 02:59:49.270
21 ERROR heat.engine.resource File
"/var/lib/kolla/venv/lib/python3.6/site-packages/heat/engine/resource.py",
line 982, in action_handler_task 2021-03-05 02:59:49.270 21 ERROR
heat.engine.resource done = check(handler_data) 2021-03-05
02:59:49.270 21 ERROR heat.engine.resource File
"/var/lib/kolla/venv/lib/python3.6/site-packages/heat/engine/resources/stack_resource.py",
line 409, in check_create_complete 2021-03-05 02:59:49.270 21 ERROR
heat.engine.resource return
self._check_status_complete(self.CREATE) 2021-03-05 02:59:49.270 21
ERROR heat.engine.resource File
"/var/lib/kolla/venv/lib/python3.6/site-packages/heat/engine/resources/stack_resource.py",
line 463, in _check_status_complete 2021-03-05 02:59:49.270 21 ERROR
heat.engine.resource action=action) 2021-03-05 02:59:49.270 21
ERROR heat.engine.resource heat.common.exception.ResourceFailure:
ResourceInError: resources[0].resources.kube-master: Went to status
ERROR due to "Message: Exceeded maximum number of retries. Exceeded
max scheduling attempts 3 for instance
c477d0d3-6176-4cec-b5aa-3a2fb7c77435. Last exception: Argument must be
bytes or unicode, got 'NoneType', Code: 500" 2021-03-05 02:59:49.270
21 ERROR heat.engine.resource 2021-03-05 02:59:49.291 21 INFO
heat.engine.stack [req-9b34fa4a-da80-45aa-9596-6880e4a5d1ce - admin -
default default] Stack CREATE FAILED
(k8s-test-cluster-nyrxxu2pqvnj-kube_masters-5w5xewm2u4c3): Resource
CREATE failed: ResourceInError: resources[0].resources.kube-master:
Went to status ERROR due to "Message: Exceeded maximum number of
retries. Exceeded max scheduling attempts 3 for instance
c477d0d3-6176-4cec-b5aa-3a2fb7c77435. Last exception: Argument must be
bytes or unicode, got 'NoneType', Code: 500"
Thank you.

Error while executing basic code in Locust

from locust import Locust, TaskSet
def login(l):
print("I am logged In")
def logout(m):
print("I am logged Out")
class UserBehaviour(TaskSet):
task=[login,logout]
class User(Locust):
task_set = UserBehaviour
Error Message---
(venv) C:\pythnprojects\LearnLocustProject\venv\locust_test>locust -f firstlocust.py
[2020-03-11 00:38:57,259] DELLXPS/INFO/locust.main: Starting web monitor at *:8089
[2020-03-11 00:38:57,259] DELLXPS/INFO/locust.main: Starting Locust 0.11.0
[2020-03-11 00:39:05,581] DELLXPS/INFO/locust.runners: Hatching and swarming 1 clients at the rate 1 clients/s...
[2020-03-11 00:39:05,585] DELLXPS/ERROR/stderr: Traceback (most recent call last):
File "c:\pythnprojects\learnlocustproject\venv\lib\site-packages\locust\core.py", line 358, in run
self.schedule_task(self.get_next_task())
File "c:\pythnprojects\learnlocustproject\venv\lib\site-packages\locust\core.py", line 419, in get_next_task
return random.choice(self.tasks)
File "C:\DOWNLOADS\lib\random.py", line 290, in choice
raise IndexError('Cannot choose from an empty sequence') from None
IndexError: Cannot choose from an empty sequence
[2020-03-11 00:39:06,582] DELLXPS/INFO/locust.runners: All locusts hatched: User: 1
[2020-03-11 00:39:06,591] DELLXPS/ERROR/stderr: Traceback (most recent call last):
File "c:\pythnprojects\learnlocustproject\venv\lib\site-packages\locust\core.py", line 358, in run
self.schedule_task(self.get_next_task())
File "c:\pythnprojects\learnlocustproject\venv\lib\site-packages\locust\core.py", line 419, in get_next_task
return random.choice(self.tasks)
File "C:\DOWNLOADS\lib\random.py", line 290, in choice
raise IndexError('Cannot choose from an empty sequence') from None
IndexError: Cannot choose from an empty sequence
[2020-03-11 00:39:07,597] DELLXPS/ERROR/stderr: Traceback (most recent call last):
File "c:\pythnprojects\learnlocustproject\venv\lib\site-packages\locust\core.py", line 358, in run
self.schedule_task(self.get_next_task())
File "c:\pythnprojects\learnlocustproject\venv\lib\site-packages\locust\core.py", line 419, in get_next_task
return random.choice(self.tasks)
File "C:\DOWNLOADS\lib\random.py", line 290, in choice
raise IndexError('Cannot choose from an empty sequence') from None
IndexError: Cannot choose from an empty sequence
It looks like you've misspelled tasks (it currently seems to say task).

Pyspark 'tzinfo' error when using the Cassandra connector

I'm reading from Cassandra using
a = sc.cassandraTable("my_keyspace", "my_table").select("timestamp", "vaue")
and then want to convert it to a dataframe:
a.toDF()
and the schema is correctly infered:
DataFrame[timestamp: timestamp, value: double]
but then when materializing the dataframe I get the following error:
Py4JJavaError: An error occurred while calling o89372.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 285.0 failed 4 times, most recent failure: Lost task 0.3 in stage 285.0 (TID 5243, kepler8.cern.ch): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main
process()
File "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "/opt/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/types.py", line 541, in toInternal
return tuple(f.toInternal(v) for f, v in zip(self.fields, obj))
File "/opt/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/types.py", line 541, in <genexpr>
return tuple(f.toInternal(v) for f, v in zip(self.fields, obj))
File "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/sql/types.py", line 435, in toInternal
return self.dataType.toInternal(obj)
File "/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/sql/types.py", line 190, in toInternal
seconds = (calendar.timegm(dt.utctimetuple()) if dt.tzinfo
AttributeError: 'str' object has no attribute 'tzinfo'
which sounds like a string as been given to pyspark.sql.types.TimestampType.
How could I debug this further?

celery failing on dotcloud deployment with IO Error

Celery is failing on one of my dotcloud deployments, and I'm not sure how to fix. The deployment is almost identical to an existing dotcloud deployment (verified via doing a file diff) which seems to be working ok.
The error I get in djcelery log:
dotcloud#hack-default-www-0:/var/log/supervisor$ more djcelery_error.log
/home/dotcloud/env/lib/python2.6/site-packages/django/conf/__init__.py:75: Depre
cationWarning: The ADMIN_MEDIA_PREFIX setting has been removed; use STATIC_URL i
nstead.
"use STATIC_URL instead.", DeprecationWarning)
/home/dotcloud/env/lib/python2.6/site-packages/djcelery/loaders.py:108: UserWarn
ing: Using settings.DEBUG leads to a memory leak, never use this setting in prod
uction environments!
warnings.warn("Using settings.DEBUG leads to a memory leak, never "
[2012-06-04 03:27:32,139: WARNING/MainProcess] -------------- celery#hack-defaul
t-www-0 v2.5.3
---- **** -----
--- * *** * -- [Configuration]
-- * - **** --- . broker: amqp://root#hack-OQVADQ2K.dotcloud.com:29210//
- ** ---------- . loader: djcelery.loaders.DjangoLoader
- ** ---------- . logfile: [stderr]#INFO
- ** ---------- . concurrency: 2
- ** ---------- . events: ON
- *** --- * --- . beat: OFF
-- ******* ----
--- ***** ----- [Queues]
-------------- . celery: exchange:celery (direct) binding:celery
[Tasks]
. experiments.tasks.pushMessageToIphone
. experiments.tasks.sendTestMessage
[2012-06-04 03:27:32,172: INFO/PoolWorker-1] child process calling self.run()
[2012-06-04 03:27:32,185: INFO/PoolWorker-2] child process calling self.run()
[2012-06-04 03:27:32,188: WARNING/MainProcess] celery#hack-default-www-0 has sta
rted.
[2012-06-04 03:27:35,315: ERROR/MainProcess] Consumer: Connection Error: Socket
closed. Trying again in 2 seconds...
[2012-06-04 03:27:40,374: ERROR/MainProcess] Consumer: Connection Error: Socket
closed. Trying again in 4 seconds...
[2012-06-04 03:27:47,479: ERROR/MainProcess] Consumer: Connection Error: Socket
closed. Trying again in 6 seconds...
[2012-06-04 03:27:56,509: ERROR/MainProcess] Consumer: Connection Error: Socket
Interestingly, the error log of celery cam shows something a bit different. I'm not sure if this is a red herring..
/home/dotcloud/env/lib/python2.6/site-packages/django/conf/__init__.py:75: Depre
cationWarning: The ADMIN_MEDIA_PREFIX setting has been removed; use STATIC_URL i
nstead.
"use STATIC_URL instead.", DeprecationWarning)
[2012-06-04 03:27:31,373: INFO/MainProcess] -> evcam: Taking snapshots with djce
lery.snapshot.Camera (every 1.0 secs.)
Traceback (most recent call last):
File "hack/manage.py", line 14, in
execute_manager(settings)
File "/home/dotcloud/env/lib/python2.6/site-packages/django/core/management/__
init__.py", line 459, in execute_manager
utility.execute()
File "/home/dotcloud/env/lib/python2.6/site-packages/django/core/management/__
init__.py", line 382, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/home/dotcloud/env/lib/python2.6/site-packages/djcelery/management/base.
py", line 74, in run_from_argv
return super(CeleryCommand, self).run_from_argv(argv)
File "/home/dotcloud/env/lib/python2.6/site-packages/django/core/management/ba
se.py", line 196, in run_from_argv
self.execute(*args, **options.__dict__)
File "/home/dotcloud/env/lib/python2.6/site-packages/djcelery/management/base.
py", line 67, in execute
super(CeleryCommand, self).execute(*args, **options)
File "/home/dotcloud/env/lib/python2.6/site-packages/django/core/management/ba
se.py", line 232, in execute
output = self.handle(*args, **options)
File "/home/dotcloud/env/lib/python2.6/site-packages/djcelery/management/comma
nds/celerycam.py", line 26, in handle
ev.run(*args, **options)
File "/home/dotcloud/env/lib/python2.6/site-packages/celery/bin/celeryev.py",
line 38, in run
detach=detach)
File "/home/dotcloud/env/lib/python2.6/site-packages/celery/bin/celeryev.py",
line 70, in run_evcam
return cam()
File "/home/dotcloud/env/lib/python2.6/site-packages/celery/events/snapshot.py
", line 116, in evcam
recv.capture(limit=None)
File "/home/dotcloud/env/lib/python2.6/site-packages/celery/events/__init__.py
", line 204, in capture
list(self.itercapture(limit=limit, timeout=timeout, wakeup=wakeup))
File "/home/dotcloud/env/lib/python2.6/site-packages/celery/events/__init__.py
", line 193, in itercapture
with self.consumer(wakeup=wakeup) as consumer:
File "/usr/lib/python2.6/contextlib.py", line 16, in __enter__
return self.gen.next()
File "/home/dotcloud/env/lib/python2.6/site-packages/celery/events/__init__.py
", line 185, in consumer
queues=[self.queue], no_ack=True)
File "/home/dotcloud/env/lib/python2.6/site-packages/kombu/messaging.py", line
279, in __init__
self.revive(self.channel)
File "/home/dotcloud/env/lib/python2.6/site-packages/kombu/messaging.py", line
286, in revive
channel = channel.default_channel
File "/home/dotcloud/env/lib/python2.6/site-packages/kombu/connection.py", lin
e 581, in default_channel
self.connection
File "/home/dotcloud/env/lib/python2.6/site-packages/kombu/connection.py", lin
e 574, in connection
self._connection = self._establish_connection()
File "/home/dotcloud/env/lib/python2.6/site-packages/kombu/connection.py", lin
e 533, in _establish_connection
conn = self.transport.establish_connection()
File "/home/dotcloud/env/lib/python2.6/site-packages/kombu/transport/amqplib.p
y", line 279, in establish_connection
connect_timeout=conninfo.connect_timeout)
File "/home/dotcloud/env/lib/python2.6/site-packages/kombu/transport/amqplib.p
y", line 89, in __init__
super(Connection, self).__init__(*args, **kwargs)
File "/home/dotcloud/env/lib/python2.6/site-packages/amqplib/client_0_8/connec
tion.py", line 144, in __init__
(10, 30), # tune
File "/home/dotcloud/env/lib/python2.6/site-packages/amqplib/client_0_8/abstra
ct_channel.py", line 95, in wait
self.channel_id, allowed_methods)
File "/home/dotcloud/env/lib/python2.6/site-packages/amqplib/client_0_8/connec
tion.py", line 202, in _wait_method
self.method_reader.read_method()
File "/home/dotcloud/env/lib/python2.6/site-packages/amqplib/client_0_8/method
_framing.py", line 221, in read_method
raise m
IOError: Socket closed
My supervisord file:
[program:djcelery]
directory = /home/dotcloud/current/
command = /home/dotcloud/env/bin/python hack/manage.py celeryd -E -l info -c 2
stderr_logfile = /var/log/supervisor/%(program_name)s_error.log
stdout_logfile = /var/log/supervisor/%(program_name)s.log
[program:celerycam]
directory = /home/dotcloud/current/
command = /home/dotcloud/env/bin/python hack/manage.py celerycam
stderr_logfile = /var/log/supervisor/%(program_name)s_error.log
stdout_logfile = /var/log/supervisor/%(program_name)s.log
As mentioned, I have nearly identical code deployed under a different dotcloud account that is working fine.
Status of the rabbitmq broker:
$ ./dotcloud info hack.broker
aliases:
- hackxxxx.dotcloud.com
config:
password: xxxx
rabbitmq_management: true
user: root
created_at: 1338702527.075196
datacenter: Amazon-us-east-1c
image_version: 924a079b622a (latest)
memory: 49M/512M (9%)
ports:
- name: ssh
url: ssh://dotcloud#hackxxx.dotcloud.com:29209
- name: amqp
url: amqp://root:xxxx#hackxxxx.dotcloud.com:29210
- name: http
url: http://root:xxx#hack1-xxxx.dotcloud.com/
state: running
type: rabbitmq
It looks like it is having an issue connection to your broker. Have you confirmed that you can connect to your broker, and it is up and running?
What are you using for a broker?

parallel-python error: RuntimeError("Socket connection is broken")

I am using a simple program to send a function:
import pp
nodes=('mosura02','mosura03','mosura04','mosura05','mosura06',
'mosura09','mosura10','mosura11','mosura12')
nodes=('miner:60001',)
def pptester():
js=pp.Server(ppservers=nodes)
js.set_ncpus(0)
tmp=[]
for i in range(200):
tmp.append(js.submit(ppworktest,(),(),('os',)))
return tmp
def ppworktest():
import os
return os.system("uname -a")
the result is:
wkerzend#mosura:/home/wkerzend/tmp/ppython_test>ssh miner "source ~/coala_python_setup.sh;ppserver.py -d -p 60001"
2010-04-12 00:50:48,162 - pp - INFO - Creating server instance (pp-1.6.0)
2010-04-12 00:50:52,732 - pp - INFO - pp local server started with 32 workers
2010-04-12 00:50:52,732 - pp - DEBUG - Strarting network server interface=0.0.0.0 port=60001
Exception in thread client_socket:
Traceback (most recent call last):
File "/usr/lib64/python2.6/threading.py", line 525, in __bootstrap_inner
self.run()
File "/usr/lib64/python2.6/threading.py", line 477, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/wkerzend/python_coala/bin/ppserver.py", line 161, in crun
ctype = mysocket.receive()
File "/home/wkerzend/python_coala/lib/python2.6/site-packages/pptransport.py", line 178, in receive
raise RuntimeError("Socket connection is broken")
RuntimeError: Socket connection is broken