How to make sure a task is executed before starting different services - upstart

We have the following scenario in upstart:
We have some task called T, and some services, A and B with the following requirements:
T must run completely isolated from the services A and B
Both A and B can only run if task T has completed
A and B can be started independently
In simple words, T is a requirement for both A and B, but running T doesn't necessarily mean that either A or B should be started.
How can we enforce these requirements in upstart? Adding other "helper" jobs is fine, of course.
We tried the following, that doesn't work:
# T.conf
task
start on (starting A or starting B)
The problem is that if T is already running when starting B, e.g. because A is already about to start, then B will just start without waiting for T to finish. This violates the first two requirements above.
Another option is to explicitly start T from the pre-start sections of the services. However, that causes a service to fail to start, instead of waiting, if T is already being executed.

There is a workaround using this extra helper task (better suggestions are still welcome):
start on (starting A or starting B)
task
instance $JOB
script
until start T; do sleep 1; done
end script
This helper job is started just about when either A or B is about to start, blocking those services. There will be one instance of this task for each service. It will block until T is successfully completed.

Related

Datastage: How to keep continuous mode job running after a unexpected termination

I have a job that uses the Kafka Connector Stage in order to read a Kafka queue and then load into the database. That job runs in Continuous Mode, which it has no time to conclude, since it keeps monitoring the Kafka queue in real time.
For unexpected reasons (say, server issues, job issues etc) that job may terminate with failure. In general, that happens after 300 running hours of that job. So, in order to keep the job alive I have to manually look to the job status and then to do a Reset and Run, in order to keep the job running.
The problem is that between the job termination and my manual Reset and Run can pass several hours, which is critical. So I'm looking for a way to eliminate the manual interaction and to reduce that gap by automating the job invocation.
I tried to use Control-M to daily run the job, but with no success: The first day the Control-M called the job, it ran it fine. But in the next day, when the Control-M did an attempt to instantiate the job again it failed (since it was already running). Besides, the Datastage will never tell back Control-M that a job was successfully concluded, since the job's nature won't allow that.
Said that, I would like to hear ideas from you that can light me up.
The first thing that came in mind is to create a intermediate Sequence and then schedule it in Control-M. Then, this new Sequence would call the continuous job asynchronously by using command line stage.
For the case where just this one job terminates unexpectedly and you want it to be restarted as soon as possible, have you considered calling this job from a sequence? The sequence could be setup to loop running this job.
Thus sequence starts job and waits for it to finish. When job finishes, the sequence will then loop and start the job again. You could have added conditions on job exit (for example, if the job aborted, then based on that job end status, you could reset the job before re-running it.
This would not handle the condition where the DataStage engine itself was shut down (such as for maintenance or possibly an error) in which case all jobs end including your new sequence. The same also applies for a server reboot or other situations where someone may have inadvertently stopped your sequence. For those cases (such as DataStage engine stop) your team would need to have process in place for jobs/sequences that need to be started up following a DataStage or System outage.
For the outage scenario, you could create a monitor script (regardless of whether running the job solo or from sequence) that sleeps/loops on 5-10 minute intervals and then checks the status of your job using dsjob command, and if not running can start that job/sequence (also via dsjob command). You can decide whether that script startup would occur at DataSTage startup, machine startup, or run it from Control M or other scheduler.

How restart a JBPM workflow which is completed

We are using JBPM 6 and we have a requirement to restart the same process once it is completed. Is there any way that we can restart the same process id without creating new one
Restarting an already completed process instance makes not much sense as this would break with audit. At least I did not know a way to do this.
Consider to
a) start the Process again / create a new Process Instance with same input or
b) include a loop in your process that leads to the start of the Process model

configuring multiple versions of job in spring batch

SpringBatch seems to be lacking the metadata for the job definition in the database.
In order to create a job instance in the database, the only thing it considers is jobName and jobParamter, "JobInstance createJobInstance(String jobName, JobParameters jobParameters);"
But,the object model of Job is rich enough to consider steps and listeners. So, if i create a new version of the existing job, by adding few additional steps, spring batch does not distinguish it from the previous version. Hence, if i ran the previous version today and run the updated version, spring batch does not run the updated version, as it feels that previous run was successful. At present, it seems like, the version number of the job, should be part of the name. Is this correct understanding ?
You are correct that the framework identifies each job instance by a unique combination of job name and (identifying) job parameters.
In general, if a job fails, you should be able to re-run with the same parameters to restart the failed instance. However, you cannot restart a completed instance. From the documentation:
JobInstance can be restarted multiple times in case of execution failure and it's lifecycle ends with first successful execution. Trying to execute an existing JobIntance that has already completed successfully will result in error. Error will be raised also for an attempt to restart a failed JobInstance if the Job is not restartable.
So you're right that the same job name and identifying parameters cannot be run multiple times. The design framework prevents this, regardless of what the business steps job performs. Again, ignoring what your job actually does, here's how it would work:
1) jobName=myJob, parm1=foo , parm2=bar -> runs and fails (assume some exception)
2) jobName=myJob, parm1=foo , parm2=bar -> restarts failed instance and completes
3) jobName=myJob, parm1=foo , parm2=bar -> fails on startup (as expected)
4) jobName=myJob, parm1=foobar, parm2=bar -> new params, runs and completes
The "best practices" we use are the following:
Each job instance (usually defined by run-date or filename we are processing) must define a unique set of parameters (otherwise it will fail per the framework design)
Jobs that run multiple times a day but just scan a work table or something use an incrementer to pass a integer parameter, which we increase by 1 upon each successful completion
Any failed job instances must be either restarted or abandoned before pushing code changes that affect the the job will function

PowerShell session/environment isolation - Jobs sharing same context?

I'm testing a workflow runbook that utilizes Add-Type to add some custom C# code.
All of a sudden I started getting 'type already exists' errors on subsequent test jobs, as if a new PSSession is not being created.
In other words, it looks like new jobs are sharing the same execution context. I only get this locally if I try to run the same command twice per PS instance.
The type in question is a static class with some Extension methods. Since it also happens to be the first type declared in the source block, I don't doubt other non-static types would throw errors as well.
I've executed this handfuls of times already, so I fully expect that 'eventually' this will stop happening, but I can't seem to force it, and I have no idea what I could've done to trip it into this situation, either.
Seeing evidence of shared execution contexts across jobs like this - even (especially?) if only temporal - makes me wonder if some or all of the general execution inconsistencies we've seen in the past when making & deploying changes & performing subsequent tests soon-after are related to this.
I'm tempted to think that this is simply a part of the difference between a Test Job and a 'real' one, but that raises questions about the validity of the Test jobs themselves WRT mimicking Published Jobs.
Are all Azure Automation Jobs supposed to execute in Isolation? Can this be controlled/exploited by a developer?
Each automation account has its own isolated sandboxes where its jobs run. Those sandboxes are distributed among a number of worker machines. For test jobs, to try to improve job start time since [make code change, retest] over and over is very common, Automation reuses the same sandbox as used for previous test jobs of this runbook, if the sandbox has not been cleaned up yet, so that sandboxes do not have to be spun up for each unique test job (sandbox creation is one reason for a longer job start time than desired). Due to this behavior, if you execute test jobs of the same runbook within a short amount of time, you will get the behavior you're seeing above.
However, even for production jobs, jobs of the same automation account (across runbooks) can share the same sandboxes. We randomly distribute jobs across our worker machines, so its possible job A is queued for execution and is placed on worker W, then 5 minutes later, job B is queued for execution and is placed on worker W as well. If job A and job B are of the same automation account and have the same "demands" in terms of modules / module versions, they will be placed in the same sandbox, if job A's sandbox is still around. "Module / module version demands" does not mean the modules used by the runbook, but the modules / latest module versions that existed in the automation account at the time when the job was started / runbook was scheduled (for jobs started via schedule) / runbook was assigned to a webhook (for jobs started via webhook)
In terms of resolving your specific problem, you could surround Add-Type with a try, catch statement, or maybe use Add-Type -IgnoreWarnings

Execute external script on server with multiple play 2 framework instances

I have a linux server with three play 2 framework instances on it and I would like to execute regularly an external Scala script that has access to all application environment (models) and that is executed only once at a time.
I would like to call this script from crontab but I cannot find any documentation on how to do it. I know that we can schedule asynchronous tasks from Global object, but I want the script executed only once for the three play instances.
Actually I would like to do the same kind of things as Ruby on Rails rake tasks for those who knows them.
Create a regular action for this task which will be accessible via http, then you can use ie. curl in unix' crontab to call that action and it will hit first available instance.
Other possibility is... using Global object to schedule the task with Akka support. In this case to make sure that only one instance will schedule task, you need to determine somehow which one should it be. If you are starting all 3 instances with specified port (always the same per instance) you can read http.port to allow or skip the execution.
Finally you can use database to inform other instances, that task is just executed: all 3 instances tries to execute the Akka scheduler, but before execution the task they can check if this task has still TODO flag. If not, instance sets TODO flag to false and continues execution, otherwise just skips execution this time.
Also you can use filesystem for similar approach: at the beginning of the execution, create flag-file to inform other instances, that, this time they can skip the task.