Jenkins kills JBoss server when job finishes - jboss

I use Ant to start/shutdown JBoss 5 server through Jenkins. Ant java spawn and fork are set to "true", so command is executed in the background.
Jenkins successfully starts up the server, waits two minutes (a "sleep" command in Jenkins), then after the sleep it for some strange reason shuts down the server. The sleep command is the last step in the build job. The shutdown says:
2013-01-29 17:03:39,332 INFO [org.jboss.bootstrap.microcontainer.ServerImpl] Runtime shutdown hook called, forceHalt: true
I googled it and tried the suggested -Xrs command, but it didn't help. What is happening here?

Jenkins have something called the process tree killer that will kill all processes created by the job (even those started with spawn and fork set to true).
There are some workarounds to this behavior.
disabling process tree killer
-Dhudson.util.ProcessTreeKiller.disable=true
or
set the env. var BUILD_ID=dontKillMe in the JBOSS process.
export BUILD_ID=dontKillMe
You can browse the ProcessTreeKill wiki article or jenkins JIRA to find various workarounds for this issue.

This source (comments) suggest other environment variables, apparently for older versions of Jenkins. For me it didn't work before I started using JENKINS(_SERVER)_COOKIE.

Related

Running a Node.js server in Azure Release Pipeline

I have a Node.js web server which, as part of a CD process, I want to deploy to a staging server using Azure Release Pipeline. The problem is, that if I just run a Powershell script:
# Run-Server.ps1
node my-server.js
The Pipeline will hold since the node process blocks the Powershell session.
What I want is to be able to launch the service, and then in the next deployment just kill the node process and run it again with the new code.
So I figured I'll use Start-Process. If I run it locally:
> Start-Process node -ArgumentList ./server.js
I can now exit the Powershell session and the server will continue running. So I thought I can implement it the same way in my Release Pipeline.
But it turns out that once the Release Pipeline ends running, the server is no longer available - the node process is gone.
Can you help me figure out why is that? Is there another way of achieving this? I suppose it's a pretty common use case so there must be best-practices out there regarding to how this should be done.
Another way to achieve this is to use a full-blown web server to host andmanage node process. I.e. on Windows you could use IIS with iisnode module. This is more reliable and gives you a few other benefits:
process management (automatic start, restart on failure, etc.)
security - you can configure the user that node process will run as
scalability on multi-core CPUs
Then the process of app deployment would be just copying files to the right directory - the web server should pick up the change automatically.
By default, A pipeline job cleans up all of the child processes it spins up when it exits. This is killing your node server.
Set Process.Clean variable to false to override the default behavior.

Kafka Connect - how to get a failed task to restart with a new configuration

Whenever we restart a failed task, it will ALWAYS pick up the config it had at the time of the failure, and run with that.. and THEN it picks up the new config.. and runs that as well.
We have connect jobs that we pause, update config, and then resume. This works fine, unless the task has failed.
If we restart a failed task, even if the connector has an updated config, the task will launch with the old config.. run to completion/failure.. then a new task will be launched with the new config.
This can cause various data/etc issues.. if you really don't want that old task to run with that config.
Any ideas how to restart a connector with a failed task.. with a new config.. and NOT have the old config get invoked?
(running Kafka v2.5, btw)
I don't know if it would make sense for the task to pick up the latest config.
For instance, let's assume that your connector fires up 10 distinct tasks and 1 of them fails. It won't make sense to have the remaining 9 tasks of the connector running with the older config while the failed task runs the newest config once it is restarted.
I would say that in cases you want to use a new/different configuration file when a task fails, it might make more sense to restart the connector and not the individual task(s):
POST /connectors/connector-name/restart HTTP/1.1
I was having this problem and managed to "fix" this by a bit of randomness.
I increased the number of Tasks in the connector and then reduced it again and it seemed to have picked up a new configuration.
Was really random.
I do know the restart did not work for me

How do I upgrade concourse from 3.4.0 to 3.5.0 without causing jobs to abort with state error?

When I did the upgrade of concourse from 3.4.0 to 3.5.0, suddenly all running jobs changed their state from running to errored. I can see the string 'no workers' appearing at the start of their log now. Starting the jobs manually or triggered by the next changes didn't have any problem.
The upgrade of concourse itself was successful.
I was watching what bosh did at the time and I saw this change of job states took place all at once while either the web or the db VM was upgraded (I don't know which one). I am pretty sure that the worker VMs were not touched yet by bosh.
Is there a way to avoid this behavior?
We have one db, one web VM and six workers.
With only one web VM it's possible that it was out of service for long enough that all workers expired. Workers continuously heartbeat and if they miss two heartbeats (which takes 1 minute by default) they'll stall. They should come back after the deploy is finished but if scheduling happened before they heartbeats, that would cause those errors.

How to properly check if a slow starting java tomcat application is running so you can restart it?

I want to implement a automatic service restarting for several tomcat applications, applications that do take a lot of time to start, even over 10 minutes.
Mainly the test would check if the application is responding on HTTP with a valid response.
Still, this is not the problem, the problem is how to prevent this uptime check to fail while the service is under maintenance, scheduled or not.
I don't want for this service to be started if it was stopped manually, with `service appname stop".
I considered creating .maintenance files on stop or restart actions of the daemon and checking for them before triggering an automated restart.
So far the only problem that I wasn't able to properly solve was, how to detect that the app finished starting up and remove the .maintenance file, so the automatic restart would work properly.
Note, an init.d script is not supposed to wait, so the daemon should start a background command that solves this problem.

Jobs in a queue is dropped unexpectedly in Gearman

I'm dealing with a very strange problem now.
Since I queue the jobs over 1,000 at once, Gearman doesn't work properly so far...
The problem is that, when I reserve the jobs in background mode, I could see the jobs were correctly queued from the monitoring page (gearman monitor),
but It is drained right after without delivering it to the worker. (within a few seconds)
After all, the jobs never be executed by the worker, just disappeared from the queue (job server).
So I tried rebooting the server entirely, and reinstall gearman as well as php library. (I'm using 1 CentOS, 1 Ubuntu with PHP gearman library, and version is 0.34 and 1.0.2)
But no luck yet... Job server just misbehaving as I explained in aobve.
What should I do for now?
Can I check the workers state, or see and monitor the whole process from queueing the jobs to the delivering to the worker?
When I tried gearmand with a option like: 'gearmand -vvvv' It never print anything on the screen while I register worker to the server, and run a job with client code (PHP)
Any comment will be appreciated.
For your information, I'm not considering persistent queue using MySQL or SQLite for now, because it sometimes occurs performance issue with slow execution.