How to know that all Celery tasks are completed?

How to know that all Celery tasks are completed? - celery

I am iterating here through all city and creating celery tasks(city_explore) asynchronously. I need to update status as "Explore Completed" in db after completion of all tasks. If it was normal function it could have been done easily but how to do this when tasks are running in celery ?? How would i know when all tasks are completed for each city and update the status in db ?? Please Help.
Here is the respective code
cities = ['a', 'b', 'c']
for city in cities:
city_explore.delay(name, distance)
#app.task
def city_explore(name, dis):
explorer(name, dis)

Yes, you could do it that way, but then you have to code polling of the task statuses... A more idiomatic approach would be to use Celery Workflows. A Chord primitive would probably be good enough for you.

Related

find batch-job by label client-go

How to list Batch-Job by label-selector. I want to list out the job with a certain label like:
type: upgrade or something else.
Looking out for label selector fields while querying job from client-go.

I was making mistake worrying about the .Get() method to find job by labelSelector, and thus working hard in the wrong direction.
Here is how you can list out all job with the label-selector
To get job by label-selector, We have to use .List() method.
label := "type=upgrade,name=jiva-upgrade";
jobs, err := k.K8sCS.BatchV1().Jobs(namespace).List(context.TODO(), metav1.ListOptions{LabelSelector: label})

How to Bulk import with validations and create versions using paper_trail?

I am having an array of hashes(around 20k) that I need to import but with validations and have to create the versions as well using paper_trail.
arr = [{some_1_id: '1', some_2_id: '2', some_3_id: '3', amount: '123'}, {some_1_id: '1', some_2_id: '2', some_3_id: '3', amount: '123'}]
I have an array of hashes like this. In this, we can expect around 20k elements that need to be saved to DB.
I have used activerecord-import but it's bypassing all of the callbacks including paper_trail callbacks.
I tried to run the callbacks like this
books.each do |book|
book.run_callbacks(:save) { false }
book.run_callbacks(:create) { false }
end
Book.import(books)
But it's not saving the paper_trail versions correctly.
What I did is that
arr.each do |a|
valid_books = []
book_obj = Book.new
# validate some_1_id
# validate some_2_id
# validate some_3_id
# Some other validations
if book_obj.valid?
valid_books << book_obj
end
end
# Run call backs
Book.import valid_books
It's working as expected but performance is not good at all. its taking more than 30 secs for 10k records.
Thanks in advance.

its taking more than 30 secs for 10k records.
30s is within the expected performance range for validating and creating 10k versionsed records normally (ie. new, valid?, save with PT callbacks). In some applications, 30s would be considered fast, especially when validations and other callbacks perform queries of their own.
I recommend putting the import process in a background job and simply asking the user to wait.
If the import is already in a background job, or a bg. job is not possible, you could explore using activerecord-import twice: first to create 30k Book records, and then to create 30k PaperTrail::Version records. This is an advanced technique, using private (unsupported) API of PT. To correctly instantiate the PaperTrail::Version objects, use PaperTrail::Events::Create, as PaperTrail::RecordTrail#build_version_on_create does. If you are successful with this technique, I'd gladly review a PR adding it to our README.

How to constrain delayed_job processing based on kubernetes cluster

I am looking for a way to isolate which of my review environments process which jobs.
We are using delayed_job and am running some kubernetes alias clusters based on a master cluster.
Is this at all possible? I found a way to prefix the worker's name simply, but I can't find a way to pass this on to the actual job.
Any help is appreciated.
The way I figured it should work is something like this.
I'm not sure if this is the right way to go, perhaps the same thing could be achieved using the lifecycle events? I just add a column and use the lifecycle events to add the data and query it?
Crossposted to collectiveidea/delayed_job/issues/1125

Eventually, I ended up with the following solution. Add a varchar column named cluster to the delayed_jobs table and BOOM. Works like a charm.
require 'delayed/backend/active_record'
module Delayed
module Backend
module ActiveRecord
class Configuration
attr_accessor :cluster
end
# A job object that is persisted to the database.
# Contains the work object as a YAML field.
class Job < ::ActiveRecord::Base
READY_SQL = <<~SQL.squish.freeze
((cluster = ? AND run_at <= ? AND (locked_at IS NULL OR locked_at < ?)) OR locked_by = ?) AND failed_at IS NULL
SQL
before_save :set_cluster
def self.ready_to_run(worker_name, max_run_time)
where(READY_SQL, cluster, db_time_now, db_time_now - max_run_time, worker_name)
end
# When a worker is exiting, make sure we don't have any locked jobs.
def self.clear_locks!(worker_name)
where(cluster: cluster, locked_by: worker_name)
.update_all(locked_by: nil, locked_at: nil) # rubocop:disable Rails/SkipsModelValidations
end
def self.cluster
Delayed::Backend::ActiveRecord.configuration.cluster
end
def set_cluster
self.cluster ||= self.class.cluster
end
end
end
end
end
Delayed::Backend::ActiveRecord.configuration.cluster = ENV['CLUSTER'] if ENV['CLUSTER']

HTCondor job submission tags

I want to run different batches of jobs on our HTCondor pool. Let's say 10 jobs of Type1, 20 jobs of Type2 and so on. Each of these job types should get new jobs when the current jobs are finished.
With just one type I use a simply query if all jobs are finished or if the time limit for the whole job batch passed. If one of these requirements is fulfilled the next iteration of x jobs is submitted to the cluster.
This is done by a small function (written in Lua, which is not really important for the question):
function WaitForSims(CheckupDelay)
while io.popen([[condor_q -format "%d\n" clusterid]]):read('*all'):len()~=0 do
os.execute("echo Checkup timestamp: "..os.date("%x %X"))
os.execute(string.format("timeout %d 1>nul",CheckupDelay))
end
end
Is there a possibility to separate the jobs of Type1, Type2 and Type3 and check them independently? Currently it checks for all jobs as my current user.
Adding a tag or something to the jobs would be ideal, as I could simply change the checkup call. In the documentation I couldn't find anything which is easy to add, I could remember the JobID-s, but then I'll have to store those adding more complexity.

Linked Answer
Solution could be found in another answer, I didn't find where it is described in the documentation though.
In the job.sub file add:
+YourCustomVarName = 1
+YourCustomStringName = "String"
For checking against it use:
condor_q -constraint 'YourCustomVarName == 1' -f "%s" JobStatus
or
condor_q -constraint "YourCustomStringName == \"String\"" -f "%s" JobStatus
(handling of quotations could vary)

Can a list of strings be supplied to Capistrano tasks?

I have a task whose command in 'run' is the same except for a single value. This value would out of a list of potential values. What I would like to do is create a task which would use this list of values to define the task and then use that same value in the command defined in 'run'. The point is that it would be great to define the task in such a way where I don't have to repeat nearly identical task definitions for each value.
For example: I want a task that will get the status of a single program from a list of programs that I have defined in an array. I would like to define task to be something like this:
set programs = %w["postfix", "nginx", "pgpool"]
programs.each do |program|
desc "#{program} status"
task :#{program} do
run "/etc/init.d/#{program} status"
end
end
This obviously doesn't work, but hopefully it shows what I am attempting here.
Thoughts?

Well, I answered my own question... with a little trial and error. I also did the same thing with namespace so the control of services is nice and elegant. It works quite nicely!
set :programs, %w[postfix nginx pgpool]
set :init_commands, %w[status start stop]
# init.d service control
init_commands.each do |init_command|
namespace :"#{init_command}" do
programs.each do |program|
desc "#{program} #{init_command}"
task :"#{program}" do
run "/etc/init.d/#{program} #{init_command}"
end
end
end
end

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to know that all Celery tasks are completed? - celery

Yes, you could do it that way, but then you have to code polling of the task statuses... A more idiomatic approach would be to use Celery Workflows. A Chord primitive would probably be good enough for you.

Related

find batch-job by label client-go

How to Bulk import with validations and create versions using paper_trail?

How to constrain delayed_job processing based on kubernetes cluster

HTCondor job submission tags

Can a list of strings be supplied to Capistrano tasks?

Categories

Resources