Azure Batch: Does StartTask run when the node is already joined to a pool - azure-batch

I have setup a azure batch which is a User Subscription type. Pool is already setup with 3 nodes which are in idle state. from my c# code I get the pool reference and run StartTask and do CommitAsync.
Does this caused the StartTask to run or will StartTask only be executed when the node tires to join the pool.
pool = batchClient.PoolOperations.GetPool(poolId);
pool.StartTask = new StartTask
{
CommandLine = "cmd /c (robocopy %AZ_BATCH_TASK_WORKING_DIR% %AZ_BATCH_NODE_SHARED_DIR%) ^& IF %ERRORLEVEL% LEQ 1 exit 0",
ResourceFiles = resourceFiles,
WaitForSuccess = true
};
When I run this code it does not seem to copy the required files to Node shared directory.

As documented, a Start Task runs when a node joins a pool, reboots or is reimaged.

Related

How to get nodename of running celery worker?

I want to shut down celery workers specifically. I was using app.control.broadcast('shutdown'); however, this shutdown all the workers; therefore, I would like to pass the destination parameter.
When I run ps -ef | grep celery, I can see the --hostname on the process.
I know that the format is {CELERYD_NODES}{NODENAME_SEP}{hostname} from the utility function nodename
destination = ''.join(['celery', # CELERYD_NODES defined at /etc/default/newfies-celeryd
'#', # from celery.utils.__init__ import NODENAME_SEP
socket.gethostname()])
Is there a helper function which returns the nodename? I don't want to create it myself since I don't want to hardcode the value.
I am not sure if that's what you're looking for, but with control.inspect you can get info about the workers, for example:
app = Celery('app_name', broker=...)
app.control.inspect().stats() # statistics per worker
app.control.inspect().registered() # registered tasks per each worker
app.control.inspect().active() # active workers/tasks
so basically you can get the list of workers from each one of them:
app.control.inspect().stats().keys()
app.control.inspect().registered().keys()
app.control.inspect().active().keys()
for example:
>>> app.control.inspect().registered().keys()
dict_keys(['worker1#my-host-name', 'worker2#my-host-name', ..])

rundeck make job with multiple steps on different nodes

How can a Job with multiple steps run some steps on Node 1 and other on Node 2?
For example:
On Node 1, I have to copy a file to a folder cp file.txt /var/www/htm/
On Node 2, I have to download this file wget https://www.mywebsite.com/file.txt
I have tried creating three jobs,
JOB 1, workflow I have Execute Command on remote cp file.txt /var/www/htm/ and NODES filter to my NODE 1
JOB 2, workflow I have Execute Command on remote wget https://www.mywebsite.com/file.txt and NODES filter to NODE 2
JOB 3, workflow step 1: selected Job Reference, and paste the UUID from JOB 1, step 2 Job reference and paste UUID JOB 2 and node filter I writed .* to get all nodes.
For now I tried to only run a command ls(on JOB 1 and JOB 2), but when I run JOB 3 the output is 3 time the command each job, for example:
// Run Job 3
// Output from Job 1
test-folder
test.text
test-folder
test.text
test-folder
test.text
And same for JOB 2
How can I implement my job?
Using the job reference step is the right way to solve that, but instead of defining .* to get all nodes, you can use the node1 name in the first job reference step call and the node2 name for the second job reference call, on "Override node filters?" section. Alternatively you can define the nodes filter in each job and just call it from the Job 3 using job reference step.

Port 51347 seems to be used by another program

On running the sample code given in the dispy documentation
def compute(n):
import time, socket
time.sleep(n)
host = socket.gethostname()
return (host, n)
if name == 'main':
import dispy, random
cluster = dispy.JobCluster(compute)
jobs = []
for i in range(10):
# schedule execution of 'compute' on a node (running 'dispynode')
# with a parameter (random number in this case)
job = cluster.submit(random.randint(5,20))
job.id = i # optionally associate an ID to job (if needed later)
jobs.append(job)
# cluster.wait() # wait for all scheduled jobs to finish
for job in jobs:
host, n = job() # waits for job to finish and returns results
print('%s executed job %s at %s with %s' % (host, job.id, job.start_time, n))
# other fields of 'job' that may be useful:
# print(job.stdout, job.stderr, job.exception, job.ip_addr, job.start_time, job.end_time)
cluster.print_status()
I get the following output
2017-03-29 22:39:52 asyncoro - version 4.5.2 with epoll I/O notifier
2017-03-29 22:39:52 dispy - dispy client version: 4.7.3
2017-03-29 22:39:52 dispy - Port 51347 seems to be used by another program
And then nothing happens.
How to free the 51347 port?
If you are under Linux, run sudo netstat -tuanp | grep 51347 and take note of the pid using that port.
Then execute ps ax | grep <pid> to check which service/program is running with that pid.
Then execute kill <pid> to terminate the process using that port.
Please check which process is using the port before killing it just in case it is something that you should not kill.

Unable to automate the migration process using Task Scheduler and SharePoint cmdlet “MigrateUserAccount”

Unable to automate the migration process using Task Scheduler and SharePoint cmdlet “MigrateUserAccount” getting error “You cannot call a method on a null-valued expression”
$spFarm = [Microsoft.SharePoint.Administration.SPFarm]::Local
$spFarm.MigrateUserAccount("$from\$name", "$to\$name", $false)
When I run the PowerShell script using the “SharePoint 2010 Management Shell” it is running and the output is successful, but when I configured the PowerShell script in Task scheduler the script is running but it throws error like “You cannot call a method on a null-valued expression”
Below screenshot displays that task scheduler is running in high privileges.
enter image description here
enter image description here
And this task has been created using the service account who administration access to this servers and added to “db_owners” in sqldatabase aswell.
Server Architecture
Web Front End 1
Web Front End 2
Application Server 1
Application Server 2
Database Cluster Node1
Database Cluster Node2
If this is all on one line...
$spFarm = [Microsoft.SharePoint.Administration.SPFarm]::Local $spFarm.MigrateUserAccount("$from\$name", "$to\$name", $false)
...then $spFarm will not have been defined when the MigrateUserAccount function is invoked.
You'll either need to put a semicolon between the two statements, or put them on separate lines like so:
$spFarm = [Microsoft.SharePoint.Administration.SPFarm]::Local
$spFarm.MigrateUserAccount("$from\$name", "$to\$name", $false)

Stopping SpringBatch jobs started from the command line

Spring Batch jobs can be started from the commandline by telling the JVM to run CommandLineJobRunner. According to the JavaDoc, running the same command with the added parameter of -stop will stop the Job:
The arguments to this class can be provided on the command line
(separated by spaces), or through stdin (separated by new line). They
are as follows:
jobPath jobIdentifier (jobParameters)* The command line options are as
follows
jobPath: the xml application context containing a Job
-restart: (optional) to restart the last failed execution
-stop: (optional) to stop a running execution
-abandon: (optional) to abandon a stopped execution
-next: (optional) to start the next in a sequence according to the JobParametersIncrementer in the Job jobIdentifier: the name of the job or the id of a job execution (for -stop, -abandon or -restart).
jobParameters: 0 to many parameters that will be used to launch a job specified in the form of key=value pairs.
However, on the JavaDoc for the main() method the -stop parameter is not specified. Looking through the code on docjar.com I can't see any use of the -stop parameter where I would expect it to be.
I suspect that it is possible to stop a batch that has been started from the command line but only if the batches being run are backed by a non-transient jobRepository? If running a batch on the command line that only stores its data in HSQL (ie in memory) there is no way to stop the job other than CTRL-C etc?
stop command is implemented, see source for CommandLineJobRunner, line 300+
if (opts.contains("-stop")) {
List<JobExecution> jobExecutions = getRunningJobExecutions(jobIdentifier);
if (jobExecutions == null) {
throw new JobExecutionNotRunningException("No running execution found for job=" + jobIdentifier);
}
for (JobExecution jobExecution : jobExecutions) {
jobExecution.setStatus(BatchStatus.STOPPING);
jobRepository.update(jobExecution);
}
return exitCodeMapper.intValue(ExitStatus.COMPLETED.getExitCode());
}
The stop switch will work, but it will only stop the job after the currently executing step completes. It won't kill the job immediately.