Can Rundeck Be Configured to Use Different Nodes If Someone Else is Already Running A Job On Current Node? - rundeck

I am trying to see if rundeck is capable of deciding if a node is currently busy runnning another job and it will switch to another node and ruin the job there instead.
For example, I am current running a job on NODE1, then another person logs into rundeck and decide to run their job on NODE1, but NODE1 is busy running my job so rundeck will automatically run their job on NODE2.
Thanks

This could be possible assuming the following design:
1) Create/use a resource model source plugin that sets attributes about the node's business. This can be a metric like load status or something else that you use to gauge rundeck utilization.
2) Write the job with a nodefilter to match on the attribute about utilization metric
3 ) Define the job to use the RandomSubset Orchestration strategy specifying to use 1 node.

Related

Rundeck ansible inventory: static instead of dynamic

Deployed Rundeck (rundeck/rundeck:4.2.0) importing and discovering my inventory using Ansible Resource Model Source. Having 300 nodes, out of which statistically ~150 are accessible/online, the rest is offline (IOT devices). All working fine.
My challenge is when creating jobs i can assign only those nodes which are online, while i wanted to assign ALL nodes (including those offline) and keep retrying the job for the failed ones only. Only this way i could track the completeness of my deployment. Ideally i would love rundeck to be intelligent enough to automatically deploy the job as soon as my node goes back online.
Any ideas/hints how to achieve that ?
Thanks,
The easiest way is to use the health checks feature (only available on PagerDuty Process Automation On-Prem, formerly "Rundeck Enterprise"), in that way you can use a node filter only for "healthy" (up) nodes.
Using this approach (e.g: configuring a command health check against all nodes) you can dispatch your jobs only for "up" nodes (from a global set of nodes), this is possible using the .* as node filter and !healthcheck:status: HEALTHY as exclude node filter. If any "offline" node "turns on", the filter/exclude filter should work automatically.
On Ansible/Rundeck integration it works using the following environment variable: ANSIBLE_HOST_KEY_CHECKING=False or using host_key_checking=false on the ansible.cfg file (at [defaults] section).
In that way, you can see all ansible hosts in your Rundeck nodes, and your commands/jobs should be dispatched only for online nodes, if any "offline" node changes their status, the filter should work.

Spring boot scheduler running cron job for each pod

Current Setup
We have kubernetes cluster setup with 3 kubernetes pods which run spring boot application. We run a job every 12 hrs using spring boot scheduler to get some data and cache it.(there is queue setup but I will not go on those details as my query is for the setup before we get to queue)
Problem
Because we have 3 pods and scheduler is at application level , we make 3 calls for data set and each pod gets the response and pod which processes at caches it first becomes the master and other 2 pods replicate the data from that instance.
I see this as a problem because we will increase number of jobs for get more datasets , so this will multiply the number of calls made.
I am not from Devops side and have limited azure knowledge hence I need some help from community
Need
What are the options available to improve this? I want to separate out Cron schedule to run only once and not for each pod
1 - Can I keep cronjob at cluster level , i have read about it here https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/
Will this solve a problem?
2 - I googled and found other option is to run a Cronjob which will schedule a job to completion, will that help and not sure what it really means.
Thanks in Advance to taking out time to read it.
Based on my understanding of your problem, it looks like you have following two choices (at least) -
If you continue to have scheduling logic within your springboot main app, then you may want to explore something like shedlock that helps make sure your scheduled job through app code executes only once via an external lock provider like MySQL, Redis, etc. when the app code is running on multiple nodes (or kubernetes pods in your case).
If you can separate out the scheduler specific app code into its own executable process (i.e. that code can run in separate set of pods than your main application code pods), then you can levarage kubernetes cronjob to schedule kubernetes job that internally creates pods and runs your application logic. Benefit of this approach is that you can use native kubernetes cronjob parameters like concurrency and few others to ensure the job runs only once during scheduled time through single pod.
With approach (1), you get to couple your scheduler code with your main app and run them together in same pods.
With approach (2), you'd have to separate your code (that runs in scheduler) from overall application code, containerize it into its own image, and then configure kubernetes cronjob schedule with this new image referring official guide example and kubernetes cronjob best practices (authored by me but can find other examples).
Both approaches have their own merits and de-merits, so you can evaluate them to suit your needs best.

How to scheduling jobs created in different scheduler instance

I am using spring-boot-starter-quartz 2.2.1.RELEASE to schedule Quartz jobs.And I've deploy my code on two nodes.
And the quartz.properties is like this:
For node one:
org.quartz.scheduler.instanceName: machine1
org.quartz.scheduler.instanceId = AUTO
For node two:
org.quartz.scheduler.instanceName: machine2
org.quartz.scheduler.instanceId = AUTO
So in this situation, each node can run the same scanning job separately.
And now in my database in "qrtz_job_details", I can have two job records ,namely scanJobbyMachine1 and scanJobbyMachine2.
I also deployed an frontend UI on node1 that have RESTful API to schedule jobs. And I use nginx to randomly send my request to one of my nodes.
If I made a request to query all jobs, the request may be sent to node1 and only node1's job will be shown. But I want to show both node1 and node2's jobs.
If I made a request to update scanJobbyMachine1, and it may be sent to node2. And update can't be made, because node2 only have properties file whose instanceName is machine2.
Here is my plan:
Plan A:use cluster mode. But Quartz doesn't support "Allow Job Excution to be pin to a cluster node" yet. So in cluster mode, my job will only be excuted by one node. But I want both node to do the scanning jobs.
here is the issue link in github
Plan B:use Non cluster mode. Then I have to write duplicate APIs in Controller like this:
localhost:8090/machine0/updateJob
localhost:8090/machine1/updateJob
And use nginx to set when I request /machine0/updateJob, send it to 10.110.200.60(machine1's ip), when I request /machine1/updateJob, send it to 10.110.200.62(machine2's ip)
And for queryAllJobs I have to use my backend to send request to 10.110.200.60 and 10.110.200.62 first, and combine the response list in my backend, then show it in the frontend.
Plan C:write another backend with two properties files. just to schedule the jobs and don't excute these jobs (I don't know if this can work) and depoyed it on these two nodes.
I really don't want write and deploy another backend like Plan C or write duplicate APIs like Plan B.
Any good ideas?
your problem is a cluster problem :)
Perhaps you could dynamically configure your job registration: as node1 start, it will be alone on the cluster and run jobs only on this one. As node2 start, the two nodes can be called.

Complete parallel Kubernetes job when one worker pod succeeds

I have a simple containerised python script which I am trying to parallelise with Kubernetes. This script guesses hashes until it finds a hashed value below a certain threshold.
I am only interested in the first such value, so I wish to create a Kubernetes job that spawns n worker pods and completes as soon as one worker pod finds a suitable value.
By default, Kubernetes jobs wait until all worker pods complete before marking the job as complete. I have so far been unable to find a way around this (no mention of this job pattern in the documentation), and have been relying on checking the logs of bare pods via a bash script to determine whether one has completed.
Is there a native means to achieve this? And, if not, what would be the best approach?
Hi look this link https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#parallel-jobs.
I've never tried it but it seems possible to launch several pods and configure the end of the job when x pods have finished. In your case x is 1.
We can define two specifications for parallel Jobs:
1. Parallel Jobs with a fixed completion count:
specify a non-zero positive value for .spec.completions.
the Job represents the overall task, and is complete when there is
one successful Pod for each value in the range 1 to
.spec.completions
not implemented yet: Each Pod is passed a different index in the
range 1 to .spec.completions.
2. Parallel Jobs with a work queue:
do not specify .spec.completions, default to .spec.parallelism
the Pods must coordinate amongst themselves or an external service to
determine what each should work on.
For example, a Pod might fetch a batch of up to N items from the work queue.
each Pod is independently capable of determining whether or not all its peers are done, and thus that the entire Job is done.
when any Pod from the Job terminates with success, no new Pods are
created
once at least one Pod has terminated with success and all Pods are
terminated, then the Job is completed with success
once any Pod has exited with success, no other Pod should still be
doing any work for this task or writing any output. They should all
be in the process of exiting
For a fixed completion count Job, you should set .spec.completions to the number of completions needed. You can set .spec.parallelism, or leave it unset and it will default to 1.
For a work queue Job, you must leave .spec.completions unset, and set .spec.parallelism to a non-negative integer.
For more information about how to make use of the different types of job, see the job patterns section.
You can also take a look on single job which starts controller pod:
This pattern is for a single Job to create a Pod which then creates other Pods, acting as a sort of custom controller for those Pods. This allows the most flexibility, but may be somewhat complicated to get started with and offers less integration with Kubernetes.
One example of this pattern would be a Job which starts a Pod which runs a script that in turn starts a Spark master controller (see spark example), runs a spark driver, and then cleans up.
An advantage of this approach is that the overall process gets the completion guarantee of a Job object, but complete control over what Pods are created and how work is assigned to them.
At the same time take under consideration that completition status of Job set by dafault - when specified number of successful completions is reached it ensure that all tasks are processed properly. Applying this status before all tasks are finished is not secure solution.
You should also know that finished Jobs are usually no longer needed in the system. Keeping them around in the system will put pressure on the API server. If the Jobs are managed directly by a higher level controller, such as CronJobs, the Jobs can be cleaned up by CronJobs based on the specified capacity-based cleanup policy.
Here is official documentations: jobs-parallel-processing , parallel-jobs.
Useful blog: article-parallel job.
EDIT:
Another option is that you can create special script which will continuously check values you look for. Using job then will not be necessary, you can simply use deployment.

Jenkins trigger job by another which are running on offline node

Is there any way to do the following:
I have 2 jobs. One job on offline node has to trigger the second one. Are there any plugins in Jenkins that can do this. I know that TeamCity has a way of achieving this, but I think that Jenkins is more constrictive
When you configure your node, you can set Availability to Take this slave on-line when in demand and off-line when idle.
Set Usage as Leave this machine for tied jobs only
Finally, configure the job to be executed only on that node.
This way, when the job goes to queue and cannot execute (because the node is offline), Jenkins will try to bring this node online. After the job is finished, the node will go back to offline.
This of course relies on the fact that Jenkins is configured to be able to start this node.
One instance will always be turn on, on which the main job can be run. And have created the job which will look in DB and if in the DB no running instances, it will prepare one node. And the third job after running tests will clean up my environment.