How to constrain delayed_job processing based on kubernetes cluster - kubernetes

I am looking for a way to isolate which of my review environments process which jobs.
We are using delayed_job and am running some kubernetes alias clusters based on a master cluster.
Is this at all possible? I found a way to prefix the worker's name simply, but I can't find a way to pass this on to the actual job.
Any help is appreciated.
The way I figured it should work is something like this.
I'm not sure if this is the right way to go, perhaps the same thing could be achieved using the lifecycle events? I just add a column and use the lifecycle events to add the data and query it?
Crossposted to collectiveidea/delayed_job/issues/1125

Eventually, I ended up with the following solution. Add a varchar column named cluster to the delayed_jobs table and BOOM. Works like a charm.
require 'delayed/backend/active_record'
module Delayed
module Backend
module ActiveRecord
class Configuration
attr_accessor :cluster
end
# A job object that is persisted to the database.
# Contains the work object as a YAML field.
class Job < ::ActiveRecord::Base
READY_SQL = <<~SQL.squish.freeze
((cluster = ? AND run_at <= ? AND (locked_at IS NULL OR locked_at < ?)) OR locked_by = ?) AND failed_at IS NULL
SQL
before_save :set_cluster
def self.ready_to_run(worker_name, max_run_time)
where(READY_SQL, cluster, db_time_now, db_time_now - max_run_time, worker_name)
end
# When a worker is exiting, make sure we don't have any locked jobs.
def self.clear_locks!(worker_name)
where(cluster: cluster, locked_by: worker_name)
.update_all(locked_by: nil, locked_at: nil) # rubocop:disable Rails/SkipsModelValidations
end
def self.cluster
Delayed::Backend::ActiveRecord.configuration.cluster
end
def set_cluster
self.cluster ||= self.class.cluster
end
end
end
end
end
Delayed::Backend::ActiveRecord.configuration.cluster = ENV['CLUSTER'] if ENV['CLUSTER']

Related

How to get the id of the run from within a component?

I'm doing some experimentation with Kubeflow Pipelines and I'm interested in retrieving the run id to save along with some metadata about the pipeline execution. Is there any way I can do so from a component like a ContainerOp?
You can use kfp.dsl.EXECUTION_ID_PLACEHOLDER and kfp.dsl.RUN_ID_PLACEHOLDER as arguments for your component. At runtime they will be replaced with the actual values.
I tried to do this using the Python's DSL but seems that isn't possible right now.
The only option that I found is to use the method that they used in this sample code. You basically declare a string containing {{workflow.uid}}. It will be replaced with the actual value during execution time.
You can also do this in order to get the pod name, it would be {{pod.name}}.
Since kubeflow pipeline relies on argo, you can use argo variable to get what you want.
For example,
#func_to_container_op
def dummy(run_id, run_name) -> str:
return run_id, run_name
#dsl.pipeline(
name='test_pipeline',
)
def test_pipeline():
dummy('{{workflow.labels.pipeline/runid}}', '{{workflow.annotations.pipelines.kubeflow.org/run_name}}')
You will find that the placeholders will be replaced with the correct run_id and run_name.
For more argo variables: https://github.com/argoproj/argo-workflows/blob/master/docs/variables.md
To Know what are recorded in the labels and annotation in the kubeflow pipeline run, just get the corresponding workflow from k8s.
kubectl get workflow/XXX -oyaml
create_run_from_pipeline_func which returns RunPipelineResult, and has run_id attribute
client = kfp.Client(host)
result = client.create_run_from_pipeline_func(…)
result.run_id
Your component's container should have an environment variable called HOSTNAME that is set to its unique pod name, from which you derive all necessary metadata.

How to reference a DAG's execution date inside of a `KubernetesPodOperator`?

I am writing an Airflow DAG to pull data from an API and store it in a database I own. Following best practices outlined in We're All Using Airflow Wrong, I'm writing the DAG as a sequence of KubernetesPodOperators that run pretty simple Python functions as the entry point to the Docker image.
The problem I'm trying to solve is that this DAG should only pull data for the execution_date.
If I was using a PythonOperator (doc), I could use the provide_context argument to make the execution date available to the function. But judging from the KubernetesPodOperator's documentation, it seems that the Kubernetes operator has no argument that does what provide_context does.
My best guess is that you could use the arguments command to pass in a date range, and since it's templated, you can reference it like this:
my_pod_operator = KubernetesPodOperator(
# ... other args here
arguments=['python', 'my_script.py', '{{ ds }}'],
# arguments continue
)
And then you'd get the start date like you'd get any other argument provided to a Python file run as a script, by using sys.argv.
Is this the right way of doing it?
Thanks for the help.
Yes, that is the correct way of doing it.
Each Operator would have template_fields. All the parameters listed in template_fields can render Jinja2 templates and Airflow Macros.
For KubernetesPodOperator, if you check docs, you would find:
template_fields = ['cmds', 'arguments', 'env_vars', 'config_file']
which means you can pass '{{ ds }}'to any of the four params listed above.

Get update db (in postgres) notifications in phoenix

I need to now when a row changed in my db. I'm using phoenix 1.2.4. I already have the triggers using postgres, but actually I don't know if I need them.
Do you know how could I solve my problem?
NOTE: The data base isn't necessarily changed from the controllers, rather I have a cron that update some parts.
I saw this tutorial (Publish/subscribe with PostgreSQL and Phoenix Framework) a few days ago and it seems like it contains exactly what you want.
It sets up the notification from the DB and then broadcast it. In your case, you just need the notification part and should be all good.
I hope that helps :)
Postgrex.Notifications is the module which will use postgresql listen/notify to deliver messages to an elixir process.
A simple example:
defmodule MyListener do
use GenServer
def start_link(), do: GenServer.start_link(__MODULE__, [])
def init(_arg) do
{:ok, pid} = Postgrex.Notifications.start_link(MyRepo.config())
Postgrex.Notifications.listen(pid, "my_table")
{:ok, []}
end
def handle_info({:notification, _connection_pid, _ref, _channel, payload}, state) do
# ... do something with payload ...
{:noreply, state}
end
end

Exception when exiting

I'm writing a chef recipe as shown below. I hope the recipe can stop to continue executing the resources after this, but without giving the exception.
Do you have any ideas about this except from doing exit(0)?
ruby_block "verify #{current_container_name}" do
block do
require "docker"
begin
container = Docker::Container.get(current_container_name)
rescue Docker::Error::NotFoundError => exception
container = nil
end
if container.nil?
exit(0)
end
end
end
You could use ignore_failure true in this ruby block instead of handling the exception. That way it would still output the error messages, but wouldn't treat it as a failure so would continue to execute subsequent resources.
If you want to abort a chef-run under a special circumstance - like the current Docker-container is not available - this is not possible. The solution is to rethink your problem - you want some code to be only run when a special condition is met.
You do this by either leaving the recipe (with a return true), encapsulating your configuration steps in a conditional-clause (like a if my_container.nil? then ... end) or you use node-attributes to step through conditions.
Let's say your cookbook x relies on three recipes, 1, 2 and 3. So if you'd like to define that 2 and 3 are only run if 1 was successful, you're able to to write the state of the 1st recipe into the node-attributes (f.e. node.normal['recipe1'] = 'successful').
In the other recipes you'll then define an entry-gate like:
return true if node['recipe1'] != 'succesful'
But be aware, if you're using node-attributes you'll need to use the ruby_block-resource (mostly) at the end of your first recipe because the bare-ruby-code is evaluated and run during the resource-compilation - which takes place before the converge-run.

Can a list of strings be supplied to Capistrano tasks?

I have a task whose command in 'run' is the same except for a single value. This value would out of a list of potential values. What I would like to do is create a task which would use this list of values to define the task and then use that same value in the command defined in 'run'. The point is that it would be great to define the task in such a way where I don't have to repeat nearly identical task definitions for each value.
For example: I want a task that will get the status of a single program from a list of programs that I have defined in an array. I would like to define task to be something like this:
set programs = %w["postfix", "nginx", "pgpool"]
programs.each do |program|
desc "#{program} status"
task :#{program} do
run "/etc/init.d/#{program} status"
end
end
This obviously doesn't work, but hopefully it shows what I am attempting here.
Thoughts?
Well, I answered my own question... with a little trial and error. I also did the same thing with namespace so the control of services is nice and elegant. It works quite nicely!
set :programs, %w[postfix nginx pgpool]
set :init_commands, %w[status start stop]
# init.d service control
init_commands.each do |init_command|
namespace :"#{init_command}" do
programs.each do |program|
desc "#{program} #{init_command}"
task :"#{program}" do
run "/etc/init.d/#{program} #{init_command}"
end
end
end
end