How to start a celery worker that only pushes tasks to the broker but does not consume them? - celery

I have the main producer of tasks in a webserver. I do not want the webserver to consume any tasks, so it should only send the tasks to the broker which get consumed by other nodes.
Right now I route tasks using the -Q option in the nodes by specifying the particular queues for each node. Is there a way to specify 0 queues for a worker?
Any help appreciated, thanks!

You do not need to use a worker to push tasks to the broker - you can do that from a regular python process.

Related

Pausing Kafka consumer partitions through a farm

We are pointing our kafka consumer to farm of partitions for load balancing.
When using a rest controller endpoint to pause the kafka consumer the service only pauses a few partitions and not all of them. We want all the partitions to be paused but are unable to get them all even with repeated calls. How would you suggest we accomplish this? Hazelcast?
Thanks
consumer.pause() would only pause the instance on which it is called, not the entire consumer group.
A load balancer wouldn't be able to target all of your REST endpoints that wrap the consumers since the requests are randomly routed between all instances, so yes, you'd need some sort of external variable. Zookeeper would be a better option unless you already have a Hazelcast system. Apache Curator is a high-level Zookeeper client that can be used for this. For example, a shared counter could be set for 0 state of paused, non-zero for non-paused.

CeleryExecutor: Does the airflow metric "executor.queued_tasks" report the number of tasks in the celery broker?

Using its statsd plugin, Airflow can report on metric executor.queued_tasks as well as some others.
I am using CeleryExecutor and need to know how many tasks are waiting in the Celery broker, so I know when new workers should be spawned. Indeed, I set my workers so they cannot take many tasks concurrently. Is this metric what I need?
Nope. If you want to know how many TIs are waiting in the broker, you'll have to connect to it.
Task instances that are waiting to get picked up in the celery broker are queued according to the Airflow DB, but running according to the CeleryExecutor. This is because the CeleryExecutor considers that any task instance that was successfully sent to the broker is now running (unlike the DB, which waits for a worker to pick it up before marking it as running).
Metric executor.queued_tasks reports the number of tasks queued according to the executor, not the DB.
The number of queued task instances according to the DB is not exactly what you need either, because it reports the number of task instances that are waiting in the broker plus the number of task instances queued to the executor. But when would TIs be stuck in the executor's queue, you ask? When the parallelism setting of Airflow prevents the executor from sending them to the broker.

Celery monitoring with SQS broker

We are using Airflow(1.10.3) with Celery executor(4.1.1 (latentcall)) and broker SQS. While debugging an issue we tried our hands on celery CLI and found out that SQS broker is not supported for any of the inspect commands or monitoring tool eg. Flower.
Is there any way we can monitor the tasks or events on celery workers?
We have tried the celery monitor as follows:
celery events -b sqs://
But it shows no worker discovered and no tasks selected.
The celery inspect command help page shows:
Availability: RabbitMQ (AMQP) and Redis transports.
Please let me know if I am missing something or is it even possible to monitor celery workers with SQS.
SQS transport does not provide support for monitoring/inspection (this is the main reason why I do not use it)... According to the latest documentation Redis and RabbitMQ are the only broker types that have support for monitoring/inspection and remote control.

multiple connectors in kafka to different topics are going to same node

I have created two kafka connectors in kafka-connect which use the same Connector class but have different topics they listen to.
When I launch the process on my node, both the connectors end up creating tasks on this process. However, I would like one node to only handle one connector/topic. How can I limit a topic/connector to a single node? I don't see any configuration in connect-distributed.properties where a process could specify which connector to use.
Thanks
Kafka Connect in distributed mode can run as a cluster of one or more workers. Each worker can run multiple tasks. Depending on how many connectors and workers you are running, you will have tasks running on the same worker. This is deliberate - the idea is that Kafka Connect will manage your tasks and workload for you, across the available workers.
If you want to isolate your processing you can run Kafka Connect as separate Connect clusters, either on the same machine (make sure to use different REST ports), or separate machines.
For more info, see architecture and config for steps to configure separate clusters. Note that a cluster can actually be a single worker, but then you don't have any redundancy in the event of failure.

Submitting celery jobs to an SGE queue

I'm working on a cluster that uses SGE to manage jobs across the worker nodes. Is there a way to use the SGE queue as the broker in a way that will cooperate with other people submitting jobs through non-celery means. I currently use python-gridmap to submit python jobs to the SGE queue but I'd like to use the feature-set from Celery.
Would I need to make a new Broker, or Consumer, both?