rundeck: 1 job, multiple nodes involded? - scheduler

I have a question about Rundeck features. Is it possible to include conditions within job execution? As it is quite difficult to explain, I provide an example:
You have 2 redundant firewalls in your network. You implement a job 'job1' and it's aim is to update your firewall's configuration. Master is down, therefore you do not want to update slave. Indeed if you do so, slave will have to restart and there will not have any firewall running for a short time. So, what I want to do is to test, before running the update, that none of my firewalls are out of service. If the master is down, then do not update slave.
So, is it possible to involve multiple nodes within one job?
Thanks for helping!

You create a job which pings both the firewalls. If both are up then this job will succeed. Now create another job which includes this job before update job in workflow. Make this job proceed only if first workflow succeeds. That should solve your problem.

Related

How to create a cron job in a Kubernetes deployed app without duplicates?

I am trying to find a solution to run a cron job in a Kubernetes-deployed app without unwanted duplicates. Let me describe my scenario, to give you a little bit of context.
I want to schedule jobs that execute once at a specified date. More precisely: creating such a job can happen anytime and its execution date will be known only at that time. The job that needs to be done is always the same, but it needs parametrization.
My application is running inside a Kubernetes cluster, and I cannot assume that there always will be only one instance of it running at the any moment in time. Therefore, creating the said job will lead to multiple executions of it due to the fact that all of my application instances will spawn it. However, I want to guarantee that a job runs exactly once in the whole cluster.
I tried to find solutions for this problem and came up with the following ideas.
Create a local file and check if it is already there when starting a new job. If it is there, cancel the job.
Not possible in my case, since the duplicate jobs might run on other machines!
Utilize the Kubernetes CronJob API.
I cannot use this feature because I have to create cron jobs dynamically from inside my application. I cannot change the cluster configuration from a pod running inside that cluster. Maybe there is a way, but it seems to me there have to be a better solution than giving the application access to the cluster it is running in.
Would you please be as kind as to give me any directions at which I might find a solution?
I am using a managed Kubernetes Cluster on Digital Ocean (Client Version: v1.22.4, Server Version: v1.21.5).
After thinking about a solution for a rather long time I found it.
The solution is to take the scheduling of the jobs to a central place. It is as easy as building a job web service that exposes endpoints to create jobs. An instance of a backend creating a job at this service will also provide a callback endpoint in the request which the job web service will call at the execution date and time.
The endpoint in my case links back to the calling backend server which carries the logic to be executed. It would be rather tedious to make the job service execute the logic directly since there are a lot of dependencies involved in the job. I keep a separate database in my job service just to store information about whom to call and how. Addressing the startup after crash problem becomes trivial since there is only one instance of the job web service and it can just re-create the jobs normally after retrieving them from the database in case the service crashed.
Do not forget to take care of failing jobs. If your backends are not reachable for some reason to take the callback, there must be some reconciliation mechanism in place that will prevent this failure from staying unnoticed.
A little note I want to add: In case you also want to scale the job service horizontally you run into very similar problems again. However, if you think about what is the actual work to be done in that service, you realize that it is very lightweight. I am not sure if horizontal scaling is ever a requirement, since it is only doing requests at specified times and is not executing heavy work.

Kubernetes CronJob - Do ConcurrencyPolicy and manual job execution/creation communicate with one another?

I have a Kube cronjob that has a concurrencyPolicy of Replace. As I'd have expected, documentation suggests this means if there is a job running when the next cycle in the schedule is met while the previous job is running that the previous job would be killed off / cancelled.
What I want to know is, if I manually kick off a job with kubectl create job --from, does the concurrencyPolicy still play a part? It seems as though the answer is no from the testing I've been doing (and then I'll have multiple concurrent jobs), but would like to confirm.
If I'm correct and they don't work together, is there a way to have this functionality? Basically wanting to be able to deploy a job and then test it without having to wait around for it to kick off, but also don't want to have two jobs running at the same time.
Thanks!

Chronos + Mesosphere. How to execute tasks in parallel?

Good day everyone.
I have single server for Chronos, Mesos and Zookeeper, and i want to use Chronos as something, what will run my scripts daily. Some scripts today, some tomorrow and so on..
The problem is when i'm trying to launch tasks one after another, only first one executes correctly, another one is lost somewhere. If i launch first then take a pause of 3-4 seconds and launch another - they both are launched, but sequentially.
And i need to run them in parallel.
Can someone provide a hint on this? Maybe there is some settings that i must change?
You should set a time in UTC time for both tasks to be launched with a repeating period of 24 hours. In this case, there is no reason why your tasks should not execute in parallel. Check the chronos logs and the tasks logs in sandbox on mesos for errors.
You can certainly run all of these components (Chronos, master, slave, and ZK) on the same machine, although ZK really becomes valuable once you have HA with multiple masters.
As user4103259 suggested, check the master and slave logs for that LOST/failed taskId to see what exactly happened to it. A task could go LOST/failed for numerous reasons, anywhere along the task launch/running/completing process.

How to force condor to submit job to all nodes in the cluster?

I have a condor cluster with multiple nodes active.
But when I submit a job, it only runs on a single node (i.e Master node). I'm aware that Condor automatically distributes job based on available resources.
But what if I want to force condor to make use of all the nodes? Just for the sake of evaluating process time when running on multiple nodes vs single node?
I have tried adding requirements = Machine == "hostname1" && Machine == "hostname2" in the submit file, but isn't working.
Depending on what you're trying to do, you might want to use the parallel universe as outlined here: http://research.cs.wisc.edu/htcondor/manual/current/2_9Parallel_Applications.html
With a parallel universe job you indicate the machine count via machine_count and only need to queue a single task.
I am afraid that I not fully understanding what you are asking. Let's see if I can help somehow. I can see a few scenarios:
Condor is only scheduling your jobs to run on the master node, regardless of how many machines are available.
Condor is scheduling jobs on all available machines. However what you are trying to do is get a particular job to make use of more than one machine.
In case 1. something fishy is going on with either your submit file or your pool setup. I will assume that condor_status returns more than one machine and that your pool setup is OK. The typical gotcha in this case is the following: if you do not specify a Requirement for your job, Condor will insert one for you. By default Condor will request that job runs on a machine that has the same OS and architecture of the submit node. This one did bite me a few times with heterogeneous pools ;-)
In case 2. you will have to make sure that your executable can make use of multiple machines (e.g. by way of MPI) and you need to tell Condor about it. One way to do that is to use the Parallel universe. Another way is to use a classic master/worker architecture where the workers are persistent Condor jobs.
Condor is limited in a way that it can only execute (system()) a command. If your program does not create many subtasks, you will not experience any speed improvement.
Please post a short snippet of your job description (file).

Jenkins trigger job by another which are running on offline node

Is there any way to do the following:
I have 2 jobs. One job on offline node has to trigger the second one. Are there any plugins in Jenkins that can do this. I know that TeamCity has a way of achieving this, but I think that Jenkins is more constrictive
When you configure your node, you can set Availability to Take this slave on-line when in demand and off-line when idle.
Set Usage as Leave this machine for tied jobs only
Finally, configure the job to be executed only on that node.
This way, when the job goes to queue and cannot execute (because the node is offline), Jenkins will try to bring this node online. After the job is finished, the node will go back to offline.
This of course relies on the fact that Jenkins is configured to be able to start this node.
One instance will always be turn on, on which the main job can be run. And have created the job which will look in DB and if in the DB no running instances, it will prepare one node. And the third job after running tests will clean up my environment.