Disable InboundChannelAdapter / Poller with autoStartup false - spring-batch

In Spring Integration, I want to disable a poller by setting the autoStartup=false on the InboundChannelAdapter. But with the following setup, none of my pollers are firing on either my Tomcat instance 1 nor Tomcat instance 2. I have two Tomcat instances with the same code deployed. I want the pollers to be disabled on one of the instances since I do not want the same job polling on the two Tomcat instances concurrently.
Here is the InboundChannelAdapter:
#Bean
#InboundChannelAdapter(value = "irsDataPrepJobInputWeekdayChannel", poller = #Poller(cron="${batch.job.schedule.cron.weekdays.irsDataPrepJobRunner}", maxMessagesPerPoll="1" ), autoStartup = "${batch.job.schedule.cron.weekdays.irsDataPrepJobRunner.autoStartup}")
public MessageSource<JobLaunchRequest> pollIrsDataPrepWeekdayJob() {
return () -> new GenericMessage<>(requestIrsDataPrepWeekdayJob());
}
The property files are as follows. Property file for Tomcat instance 1:
# I wish for this job to run on Tomcat instance 1
batch.job.schedule.cron.riStateAgencyTransmissionJobRunner=0 50 14 * * *
# since autoStartup defaults to true, I do not provide:
#batch.job.schedule.cron.riStateAgencyTransmissionJobRunner.autoStartup=true
# I do NOT wish for this job to run on Tomcat instance 1
batch.job.schedule.cron.weekdays.irsDataPrepJobRunner.autoStartup=false
# need to supply as poller has a cron placeholder
batch.job.schedule.cron.weekdays.irsDataPrepJobRunner=0 0/7 * * * 1-5
Property file for Tomcat instance 2:
# I wish for this job to run on Tomcat instance 2
batch.job.schedule.cron.weekdays.irsDataPrepJobRunner=0 0/7 * * * 1-5
# since autoStartup defaults to true, I do not provide:
#batch.job.schedule.cron.weekdays.irsDataPrepJobRunner.autoStartup=true
# I do NOT wish for this job to run on Tomcat instance 2
batch.job.schedule.cron.riStateAgencyTransmissionJobRunner.autoStartup=false
# need to supply as poller has a cron placeholder
batch.job.schedule.cron.riStateAgencyTransmissionJobRunner=0 50 14 * * *
The properties files are passed as a VM option, e.g. "-Druntime.scheduler=dev1". I cannot disable the poller on one of the JVMs using "-" as the cron expression -- something similar to the ask here: Poller annotation with cron expression should support a special disable character
My goal of being able to call the job manually from either Tomcat instance 1 or Tomcat instance 2 is working. My problem with the setup mentioned above, is that none of the pollers are firing as per their cron expression.

Consider to investigate a leader election pattern: https://docs.spring.io/spring-integration/docs/current/reference/html/messaging-endpoints.html#leadership-event-handling.
This way you have those endpoints in non-started state by default and in the same role. The election is going to chose a leader and start only this one.
Of course there has to be some shared external service to control leadership.

Related

Is there a function in celery for finding waiting messages in a queue? [duplicate]

How can I retrieve a list of tasks in a queue that are yet to be processed?
EDIT: See other answers for getting a list of tasks in the queue.
You should look here:
Celery Guide - Inspecting Workers
Basically this:
my_app = Celery(...)
# Inspect all nodes.
i = my_app.control.inspect()
# Show the items that have an ETA or are scheduled for later processing
i.scheduled()
# Show tasks that are currently active.
i.active()
# Show tasks that have been claimed by workers
i.reserved()
Depending on what you want
If you are using Celery+Django simplest way to inspect tasks using commands directly from your terminal in your virtual environment or using a full path to celery:
Doc: http://docs.celeryproject.org/en/latest/userguide/workers.html?highlight=revoke#inspecting-workers
$ celery inspect reserved
$ celery inspect active
$ celery inspect registered
$ celery inspect scheduled
Also if you are using Celery+RabbitMQ you can inspect the list of queues using the following command:
More info: https://linux.die.net/man/1/rabbitmqctl
$ sudo rabbitmqctl list_queues
if you are using rabbitMQ, use this in terminal:
sudo rabbitmqctl list_queues
it will print list of queues with number of pending tasks. for example:
Listing queues ...
0b27d8c59fba4974893ec22d478a7093 0
0e0a2da9828a48bc86fe993b210d984f 0
10#torob2.celery.pidbox 0
11926b79e30a4f0a9d95df61b6f402f7 0
15c036ad25884b82839495fb29bd6395 1
celerey_mail_worker#torob2.celery.pidbox 0
celery 166
celeryev.795ec5bb-a919-46a8-80c6-5d91d2fcf2aa 0
celeryev.faa4da32-a225-4f6c-be3b-d8814856d1b6 0
the number in right column is number of tasks in the queue. in above, celery queue has 166 pending task.
If you don't use prioritized tasks, this is actually pretty simple if you're using Redis. To get the task counts:
redis-cli -h HOST -p PORT -n DATABASE_NUMBER llen QUEUE_NAME
But, prioritized tasks use a different key in redis, so the full picture is slightly more complicated. The full picture is that you need to query redis for every priority of task. In python (and from the Flower project), this looks like:
PRIORITY_SEP = '\x06\x16'
DEFAULT_PRIORITY_STEPS = [0, 3, 6, 9]
def make_queue_name_for_pri(queue, pri):
"""Make a queue name for redis
Celery uses PRIORITY_SEP to separate different priorities of tasks into
different queues in Redis. Each queue-priority combination becomes a key in
redis with names like:
- batch1\x06\x163 <-- P3 queue named batch1
There's more information about this in Github, but it doesn't look like it
will change any time soon:
- https://github.com/celery/kombu/issues/422
In that ticket the code below, from the Flower project, is referenced:
- https://github.com/mher/flower/blob/master/flower/utils/broker.py#L135
:param queue: The name of the queue to make a name for.
:param pri: The priority to make a name with.
:return: A name for the queue-priority pair.
"""
if pri not in DEFAULT_PRIORITY_STEPS:
raise ValueError('Priority not in priority steps')
return '{0}{1}{2}'.format(*((queue, PRIORITY_SEP, pri) if pri else
(queue, '', '')))
def get_queue_length(queue_name='celery'):
"""Get the number of tasks in a celery queue.
:param queue_name: The name of the queue you want to inspect.
:return: the number of items in the queue.
"""
priority_names = [make_queue_name_for_pri(queue_name, pri) for pri in
DEFAULT_PRIORITY_STEPS]
r = redis.StrictRedis(
host=settings.REDIS_HOST,
port=settings.REDIS_PORT,
db=settings.REDIS_DATABASES['CELERY'],
)
return sum([r.llen(x) for x in priority_names])
If you want to get an actual task, you can use something like:
redis-cli -h HOST -p PORT -n DATABASE_NUMBER lrange QUEUE_NAME 0 -1
From there you'll have to deserialize the returned list. In my case I was able to accomplish this with something like:
r = redis.StrictRedis(
host=settings.REDIS_HOST,
port=settings.REDIS_PORT,
db=settings.REDIS_DATABASES['CELERY'],
)
l = r.lrange('celery', 0, -1)
pickle.loads(base64.decodestring(json.loads(l[0])['body']))
Just be warned that deserialization can take a moment, and you'll need to adjust the commands above to work with various priorities.
To retrieve tasks from backend, use this
from amqplib import client_0_8 as amqp
conn = amqp.Connection(host="localhost:5672 ", userid="guest",
password="guest", virtual_host="/", insist=False)
chan = conn.channel()
name, jobs, consumers = chan.queue_declare(queue="queue_name", passive=True)
A copy-paste solution for Redis with json serialization:
def get_celery_queue_items(queue_name):
import base64
import json
# Get a configured instance of a celery app:
from yourproject.celery import app as celery_app
with celery_app.pool.acquire(block=True) as conn:
tasks = conn.default_channel.client.lrange(queue_name, 0, -1)
decoded_tasks = []
for task in tasks:
j = json.loads(task)
body = json.loads(base64.b64decode(j['body']))
decoded_tasks.append(body)
return decoded_tasks
It works with Django. Just don't forget to change yourproject.celery.
This worked for me in my application:
def get_celery_queue_active_jobs(queue_name):
connection = <CELERY_APP_INSTANCE>.connection()
try:
channel = connection.channel()
name, jobs, consumers = channel.queue_declare(queue=queue_name, passive=True)
active_jobs = []
def dump_message(message):
active_jobs.append(message.properties['application_headers']['task'])
channel.basic_consume(queue=queue_name, callback=dump_message)
for job in range(jobs):
connection.drain_events()
return active_jobs
finally:
connection.close()
active_jobs will be a list of strings that correspond to tasks in the queue.
Don't forget to swap out CELERY_APP_INSTANCE with your own.
Thanks to #ashish for pointing me in the right direction with his answer here: https://stackoverflow.com/a/19465670/9843399
The celery inspect module appears to only be aware of the tasks from the workers perspective. If you want to view the messages that are in the queue (yet to be pulled by the workers) I suggest to use pyrabbit, which can interface with the rabbitmq http api to retrieve all kinds of information from the queue.
An example can be found here:
Retrieve queue length with Celery (RabbitMQ, Django)
I think the only way to get the tasks that are waiting is to keep a list of tasks you started and let the task remove itself from the list when it's started.
With rabbitmqctl and list_queues you can get an overview of how many tasks are waiting, but not the tasks itself: http://www.rabbitmq.com/man/rabbitmqctl.1.man.html
If what you want includes the task being processed, but are not finished yet, you can keep a list of you tasks and check their states:
from tasks import add
result = add.delay(4, 4)
result.ready() # True if finished
Or you let Celery store the results with CELERY_RESULT_BACKEND and check which of your tasks are not in there.
As far as I know Celery does not give API for examining tasks that are waiting in the queue. This is broker-specific. If you use Redis as a broker for an example, then examining tasks that are waiting in the celery (default) queue is as simple as:
connect to the broker
list items in the celery list (LRANGE command for an example)
Keep in mind that these are tasks WAITING to be picked by available workers. Your cluster may have some tasks running - those will not be in this list as they have already been picked.
The process of retrieving tasks in particular queue is broker-specific.
I've come to the conclusion the best way to get the number of jobs on a queue is to use rabbitmqctl as has been suggested several times here. To allow any chosen user to run the command with sudo I followed the instructions here (I did skip editing the profile part as I don't mind typing in sudo before the command.)
I also grabbed jamesc's grep and cut snippet and wrapped it up in subprocess calls.
from subprocess import Popen, PIPE
p1 = Popen(["sudo", "rabbitmqctl", "list_queues", "-p", "[name of your virtula host"], stdout=PIPE)
p2 = Popen(["grep", "-e", "^celery\s"], stdin=p1.stdout, stdout=PIPE)
p3 = Popen(["cut", "-f2"], stdin=p2.stdout, stdout=PIPE)
p1.stdout.close()
p2.stdout.close()
print("number of jobs on queue: %i" % int(p3.communicate()[0]))
If you control the code of the tasks then you can work around the problem by letting a task trigger a trivial retry the first time it executes, then checking inspect().reserved(). The retry registers the task with the result backend, and celery can see that. The task must accept self or context as first parameter so we can access the retry count.
#task(bind=True)
def mytask(self):
if self.request.retries == 0:
raise self.retry(exc=MyTrivialError(), countdown=1)
...
This solution is broker agnostic, ie. you don't have to worry about whether you are using RabbitMQ or Redis to store the tasks.
EDIT: after testing I've found this to be only a partial solution. The size of reserved is limited to the prefetch setting for the worker.
from celery.task.control import inspect
def key_in_list(k, l):
return bool([True for i in l if k in i.values()])
def check_task(task_id):
task_value_dict = inspect().active().values()
for task_list in task_value_dict:
if self.key_in_list(task_id, task_list):
return True
return False
With subprocess.run:
import subprocess
import re
active_process_txt = subprocess.run(['celery', '-A', 'my_proj', 'inspect', 'active'],
stdout=subprocess.PIPE).stdout.decode('utf-8')
return len(re.findall(r'worker_pid', active_process_txt))
Be careful to change my_proj with your_proj
To get the number of tasks on a queue you can use the flower library, here is a simplified example:
from flower.utils.broker import Broker
from django.conf import settings
def get_queue_length(queue):
broker = Broker(settings.CELERY_BROKER_URL)
queues_result = broker.queues([queue])
return queues_result.result()[0]['messages']

Spring Cloud Data Flow - Partition Batch Job using Spring Cloud Kubernetes Deployer : Environment properties not getting passed to worker pods

I modified the dataflow sample app partitioned-batch-job to deploy it in a kubernetes cluster via the SCDF server that is running in the cluster. I used the dashboard UI to launch this app as a task. Modified the code for the partitionHandler() (DeployerPartitionHandler bean) as shown below. I am facing an issue with the worker pods being launched without the spring datasource properties from the master pod environment. I confirmed the master step environment has the right values for these properties and it is being set in the DeployerPartitionHandler bean as below:
partitionHandler
.setEnvironmentVariablesProvider(new SimpleEnvironmentVariablesProvider(this.environment));
I worked around it for now by passing these as command line arguments as indicated in the code below. Appreciate any inputs on why the environment properties are not available in the worker pods.
Also confirmed that the "spring cloud deployer local" flavor of this app runs on my local machine without this problem via "java -jar target/<app.jar>" .
#Bean
public PartitionHandler partitionHandler(TaskLauncher taskLauncher, JobExplorer jobExplorer,
TaskRepository taskRepository) throws Exception {
DockerResourceLoader dockerResourceLoader = new DockerResourceLoader();
Resource resource = dockerResourceLoader.getResource(config.getDockerResourceLocation());
logger.info("Docker Resource URI: " + config.getDockerResourceLocation());
DeployerPartitionHandler partitionHandler =
new DeployerPartitionHandler(taskLauncher, jobExplorer, resource, "workerStep", taskRepository);
List<String> commandLineArgs = new ArrayList<>(8);
commandLineArgs.add("--spring.profiles.active=worker");
commandLineArgs.add("--spring.cloud.task.initialize-enabled=false");
commandLineArgs.add("--spring.batch.initializer.enabled=false");
// Passing these properties in command line as worker tasks are not getting them from the environment
commandLineArgs.add("--spring.datasource.url=jdbc:mysql://10.141.22.143:3306/scdf?useSSL=false");
commandLineArgs.add("--spring.datasource.username=dbuser");
commandLineArgs.add("--spring.datasource.password=dbpassword");
commandLineArgs.add("--spring.datasource.driverClassName=org.mariadb.jdbc.Driver");
partitionHandler
.setCommandLineArgsProvider(new PassThroughCommandLineArgsProvider(commandLineArgs));
partitionHandler
.setEnvironmentVariablesProvider(new SimpleEnvironmentVariablesProvider(this.environment));
partitionHandler.setMaxWorkers(2);
partitionHandler.setApplicationName("PartitionedBatchJobTask");
return partitionHandler;
}
Included below is the worker pod configuration YAML that shows SCDF is already passing the datasource properties as command line, but it is still not being used (the datasource defaults to H2 embedded and not MYSQL as the URL property indicates). The properties are bunched up together (as part of the sun.java.command property) if that matters.
- --sun.boot.library.path=/opt/openjdk/lib/amd64
- --KUBERNETES_PORT_443_TCP=tcp://10.233.0.1:443
- --sun.java.command=io.spring.PartitionedBatchJobApplication --management.metrics.tags.service=task-application
--spring.datasource.username=dbuser --spring.datasource.url=jdbc:mysql://10.141.22.143:3306/scdf?useSSL=false
--spring.datasource.driverClassName=org.mariadb.jdbc.Driver --management.metrics.tags.application=partitioned-batch-job-89
--docker-resource-location=docker:vrajkuma/partitioned-batch-job:2.3.1-SNAPSHOT
--spring.cloud.task.name=partitioned-batch-job --spring.datasource.password=dbpassword
--spring.cloud.task.executionid=89
- --sun.cpu.endian=little
- --MY_SCDF_SPRING_CLOUD_DATAFLOW_SKIPPER_SERVICE_PORT_HTTP=80

How to connect searchkick (in a Rails app &/ Sidekiq job) to multiple elasticsearch clusters without stomping on global searckick config?

Upon startup my app sets my (?global?) searchkick client to point at my default elasticsearch cluster.
Searchkick.client = Elasticsearch::Client.new(
hosts: default_cluster, # this is the list of hosts in my default cluster
retry_on_failure: true,
)
However, I am upgrading my cluster (again), and while I'd like to be able to have my app read/search from that default cluster,
/search?q="some term"
# =>
Model.search("some term")
continue to work against the default_cluster
Where it starts to get a bit tricky is that:
I'd also like (via some specific ?sidekiq background jobs?) to fill an alternate (alt) cluster's index, something like:
Model.connect_to(alternate_cluster) {|client|
Searchkick.client = client
Model.reindex
}
Without causing all other background jobs to interact with the alternate cluster.
And, of course:
I'd like some way to verify that the alternate_cluster is working well (i.e. for search) before making it my default_cluster. And presumably via some admin route:
/admin/search?q="some search term"&cluster=alternate
# =>
Model.connect_to(alternate_cluster) {|client|
Searchkick.client = client
Model.search("some term")
}
And finally:
I'd like to avoid having to reconnect before every search/reindex action, i.e. I'd prefer not to have the overhead of changing (also because that probably implies that long-running tasks that continue to reconnect to searchkick will be swapping back and-forth from one cluster to the other):
Model.search("some term")
# =>
Model.connect_to(alternate_cluster) {|client|
Searchkick.client = client
Model.search("some term")
}
^ I don't want that
FWIW, the best I've been able to come-up with so far is something like:
def self.connect_to(current_cluster, &block)
previous_es_client = Searchkick.client
current_es_client = Elasticsearch::Client.new(
hosts: current_cluster,
retry_on_failure: true,
)
block.call(current_es_client)
rescue Exception => e
logger.warn(e)
ensure
Searchkick.client = previous_es_client
end
But, I suspect that will cause every other interaction within my system (via the same web-worker or other background jobs running in the same background-worker-instance) to (temporarily) point at the alternate cluster.
Thanks in advance for your assistance...

Buildbot slaves priority

Problem
I have set up a latent slave in buildbot to help avoid congestion.
I've set up my builds to run either in permanent slave or latent one. The idea is the latent slave is waken up only when needed but the result is that buildbot randomly selectes one slave or the other so sometimes I have to wait for the latent slave to wake even if the permanent one is idle.
Is there a way to prioritize buildbot slaves?
Attempted solutions
1. Custom nextSlave
Following #david-dean suggestion, I've created a nextSlave function as follows (updated to working version):
from twisted.python import log
import traceback
def slave_selector(builder, builders):
try:
host = None
support = None
for builder in builders:
if builder.slave.slavename == 'host-slave':
host = builder
elif builder.slave.slavename == 'support-slave':
support = builder
if host and support and len(support.slave.slave_status.runningBuilds) < len(host.slave.slave_status.runningBuilds):
log.msg('host-slave has many running builds, launching build in support-slave')
return support
if not support:
log.msg('no support slave found, launching build in host-slave')
elif not host:
log.msg('no host slave found, launching build in support-slave')
return support
else:
log.msg('launching build in host-slave')
return host
except Exception as e:
log.err(str(e))
log.err(traceback.format_exc())
log.msg('Selecting random slave')
return random.choice(buildslaves)
And then passed it to BuilderConfig.
The result is that I get this in twistd.log:
2014-04-28 11:01:45+0200 [-] added buildset 4329 to database
But the build never starts, in the web UI it always appear as Pending and none of the logs I've put appear in twistd.log
2. Trying to mimic default behavior
I've having a look to buildbot code, to see how it is done by default.
in file ./master/buildbot/process/buildrequestdistributor.py, class BasicBuildChooser you have:
self.nextSlave = self.bldr.config.nextSlave
if not self.nextSlave:
self.nextSlave = lambda _,slaves: random.choice(slaves) if slaves else None
So I've set exactly that lambda function in my BuilderConfig and I'm getting exactly the same build not starting result.
You can set up a nextSlave function to assign slaves to a builder in a custom manner see: http://docs.buildbot.net/current/manual/cfg-builders.html#builder-configuration

How do I programatically invoke a Felix/Karaf shell command?

I want to automatically invoke the Karaf "dev:watch" command if I detect that I'm running in a dev environment. I've considered adding dev:watch * directly to etc/shell.init.script but I don't want it to run unconditionally. So, I'm considering creating a simple service that checks a Java property (something simple like -Ddevelopment=true) and invokes org.apache.karaf.shell.dev.Watch itself. I think I can ask OSGi for a Function instance with (&(osgi.command.function=watch)(osgi.command.scope=dev)) but then I need to create a mock CommandSession just to invoke it. That just seems too complicated. Is there a better approach?
Since Apache Karaf 3.0.0 most commands are backed by OSGi services.
So for example the bundle:watch command is using the service
"org.apache.karaf.bundle.core.BundleWatcher".
So just bind this service and you can call the bundle:watch functionality very conveniently.
It has been a while since the question, but I will answer.
You need to use CommandSession class, it is not trivial. This blog post could guide you. It is related with Pax Exam, but could be applied in any situation. There are more alternatives like using a remote SSH client or even better the remote JXM management console (reference).
The init script can also be used to test for a condition and run a command if that condition is satisfied, so there is no need to create a command session yourself.
The Karaf source itself reveals one answer:
In the KarafTestSupport class which is used for integration testing Karaf itself
(see https://git-wip-us.apache.org/repos/asf?p=karaf.git;a=blob;f=itests/src/test/java/org/apache/karaf/itests/KarafTestSupport.java;h=ebdea09ae8c6d926c8e4ac1fae6672f2c00a53dc;hb=HEAD)
The relevant method starts:
/**
* Executes a shell command and returns output as a String.
* Commands have a default timeout of 10 seconds.
*
* #param command The command to execute.
* #param timeout The amount of time in millis to wait for the command to execute.
* #param silent Specifies if the command should be displayed in the screen.
* #param principals The principals (e.g. RolePrincipal objects) to run the command under
* #return
*/
protected String executeCommand(final String command, final Long timeout, final Boolean silent, final Principal ... principals) {
waitForCommandService(command);
String response;
final ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
final PrintStream printStream = new PrintStream(byteArrayOutputStream);
final SessionFactory sessionFactory = getOsgiService(SessionFactory.class);
final Session session = sessionFactory.create(System.in, printStream, System.err);
//
//
//
// Snip