Could not start backuppc service through puppet - service

I have backuppc which is being handled by puppet and also using foreman. Below is my init.pp file :
class backuppc::service {
if $::operatingsystemcodename == 'squeeze' {
service { 'backuppc' : ensure => running, hasstatus => false, pattern => '/usr/share/backuppc/bin/BackupPC' }
} else {
service { 'backuppc' : ensure => running, hasstatus => true }
}
service { 'apache2' : ensure => running }
}
when I run puppet on node, it throws this reports on foreman :
class backuppc::service {
if $::operatingsystemcodename == 'squeeze' {
service { 'backuppc' : ensure => running, hasstatus => false, pattern => '/usr/share/backuppc/bin/BackupPC' }
} else {
service { 'backuppc' : ensure => running, hasstatus => true }
}
service { 'apache2' : ensure => running }
}
change from stopped to running failed: Could not start Service[backuppc]: Execution of '/etc/init.d/backuppc start' returned 1: Starting backuppc...2016-05-31 17:13:25 Another BackupPC is running (pid 6731); quitting...
node is running with debain squeeze 6.0.10.
any help on this ?

This ...
change from stopped to running failed: Could not start Service[backuppc]: Execution of '/etc/init.d/backuppc start' returned 1: Starting backuppc...2016-05-31 17:13:25 Another BackupPC is running (pid 6731); quitting...
... means that puppet attempted to start BackupPC, with /etc/init.d/backuppc start, which found that the process was already running. This indicates that puppet is incorrectly determining the status of the BackupPC service.
I can't find a reference to a facter fact named operatingsystemcodename in the source. Does foreman provide this variable, or are you defining it elsewhere? Perhaps you meant lsbdistcodename instead?
If so, and $::operatingsystemcodename is undefined, your conditional will always fall through to the else branch, and the resource will be defined with hasstatus => true. Puppet will attempt to use /etc/init.d/backuppc status to check if the service is running. Therefore, if the init script's status action is broken in some way (by always returning a non-0 exit code, for example) puppet will attempt to start the service on every agent run.
So first things first, I'd verify that $::operatingsystemcodename returns 'squeeze' on the node in question.
If not, I'd check the exit code of /etc/init.d/backuppc status under its various states, returning zero when started and non-zero when stopped.
If on the other hand $::operatingsystemcodename is undefined, or some unexpected value, then I'd choose another expression to use in the if statement. In this case, you'll also want to verify that the pattern attribute is correct by inspecting the process table while the BackupPC service is running.
EDIT: Alternatively, you can provide a value for the status attribute, containing a custom command used by puppet to check the status of the BackupPC service. I would expect something like status => 'pgrep -f BackupPC to work well enough. Although, puppet is already doing almost exactly this in ruby code, so I wouldn't expect it to solve you problem.
While a bit dated this blog post covers some general tips for troubleshooting puppet.

Related

How to connect searchkick (in a Rails app &/ Sidekiq job) to multiple elasticsearch clusters without stomping on global searckick config?

Upon startup my app sets my (?global?) searchkick client to point at my default elasticsearch cluster.
Searchkick.client = Elasticsearch::Client.new(
hosts: default_cluster, # this is the list of hosts in my default cluster
retry_on_failure: true,
)
However, I am upgrading my cluster (again), and while I'd like to be able to have my app read/search from that default cluster,
/search?q="some term"
# =>
Model.search("some term")
continue to work against the default_cluster
Where it starts to get a bit tricky is that:
I'd also like (via some specific ?sidekiq background jobs?) to fill an alternate (alt) cluster's index, something like:
Model.connect_to(alternate_cluster) {|client|
Searchkick.client = client
Model.reindex
}
Without causing all other background jobs to interact with the alternate cluster.
And, of course:
I'd like some way to verify that the alternate_cluster is working well (i.e. for search) before making it my default_cluster. And presumably via some admin route:
/admin/search?q="some search term"&cluster=alternate
# =>
Model.connect_to(alternate_cluster) {|client|
Searchkick.client = client
Model.search("some term")
}
And finally:
I'd like to avoid having to reconnect before every search/reindex action, i.e. I'd prefer not to have the overhead of changing (also because that probably implies that long-running tasks that continue to reconnect to searchkick will be swapping back and-forth from one cluster to the other):
Model.search("some term")
# =>
Model.connect_to(alternate_cluster) {|client|
Searchkick.client = client
Model.search("some term")
}
^ I don't want that
FWIW, the best I've been able to come-up with so far is something like:
def self.connect_to(current_cluster, &block)
previous_es_client = Searchkick.client
current_es_client = Elasticsearch::Client.new(
hosts: current_cluster,
retry_on_failure: true,
)
block.call(current_es_client)
rescue Exception => e
logger.warn(e)
ensure
Searchkick.client = previous_es_client
end
But, I suspect that will cause every other interaction within my system (via the same web-worker or other background jobs running in the same background-worker-instance) to (temporarily) point at the alternate cluster.
Thanks in advance for your assistance...

How to stop reporting FAILED of systemctl unit

I have an systemctl service unit which has some runtime dependencies which get resolved during boot. Many times it reports "FAILED" state during boot. This service unit has "Restart=always", so ultimately after boot this unit starts successfully. But, during boot around 3-4 times it reports FAILED which I want to avoid.
Is there a way to ignore the "FAILED" state of service unit being reported?
(As I know it will succeed once the dependency is resolved or will keep retrying)
I found that the return value (including error) reported by failure of service unit can be ignored prepending an hyphen while configuring the ExecStart.
From manual:
https://www.freedesktop.org/software/systemd/man/systemd.service.html#BusName=
VIZ:
"-" If the executable path is prefixed with "-", an exit code of the command normally considered a failure (i.e. non-zero exit status or abnormal exit due to signal) is recorded, but has no further effect and is considered equivalent to success.
ExecStart=-/sbin/getty

Service Fabric Replica Stuck

I am upgrading an application on Service Fabric and one of the replicas is showing the following warning:
Unhealthy event: SourceId='System.RAP', Property='IStatefulServiceReplica.ChangeRole(S)Duration', HealthState='Warning', ConsiderWarningAsError=false.
The api IStatefulServiceReplica.ChangeRole(S) on node _gtmsf1_0 is stuck. Start Time (UTC): 2018-03-21 15:49:54.326.
After some debugging, I suspect I'm not properly honoring a cancellation token. In the meantime, how do I safely force a restart of this stuck replica to get the service working again?
Partial results of Get-ServiceFabricDeployedReplica:
...
ReplicaRole : ActiveSecondary
ReplicaStatus : Ready
ServiceTypeName : MarketServiceType
...
ServicePackageActivationId :
CodePackageName : Code
...
HostProcessId : 6180
ReconfigurationInformation : {
PreviousConfigurationRole : Primary
ReconfigurationPhase : Phase0
ReconfigurationType : SwapPrimary
ReconfigurationStartTimeUtc : 3/21/2018 3:49:54 PM
}
You might be able to pipe that directly to Restart-ServiceFabricReplica. If that remains stuck, then you should be able to use Get-ServiceFabricDeployedCodePackage and Restart-ServiceFabricDeployedCodePackage to restart the surrounding process. Since Restart-ServiceFabricDeployedCodePackage has options for selecting random packages to simulate failure, just be sure to target the specific code package you're interested in restarting.

How spring batch admin is stopping a running job?

How spring batch admin is stopping a running job from the UI .
On the spring batch admin's online documentation i have read the following lines .
"A job that is executing can be stopped by the user (whether or not it
is launchable). The stop signal is sent via the database and once
detected by Spring Batch in whatever process is running the job, the
job is stopped (status moves from STOPPING to STOPPED) and no further
processing takes place."
Does that mean Spring batch admin UI is directly changing the status of job inside the spring batch table ?
UPDATE: I tried executing the below query on the running job .
update batch_job_execution set status="STOPPED" where job_ins
tance_id=19;
The above query is getting updated in the DB but spring batch is not bale to stop the running job.
If anybody has tried this please do share the logic here .
You're confused between Batch Status vs. Exit Status.
What are you doing with that SQL is changed the STATUS to STOPPED
When a job is running you can stop the job from the code. In each step iteration, check their status and if STOPPING its set, then send the step to stop ongoing.
Anyway, what you doing is not elegant. The correct way is explained in Common Batch Patterns -> 11.2 Stopping a Job Manually for Business Reasons
public class FooProcessor implements ItemProcessor<FooIn,FooOut>{
public FooOut process(FooIn foo) throws Exception {
if (sendToStop(item)) {
throw new MyStopException("I need to Stop: " + item);
}
//do my stuff
return new FooOut(foo);
}
}
Another simple way to stop chunk step is return null in the reader. This tells us that no more elements to iterate the reader
public T read() throws Exception {
T item = delegate.read();
if (ifNeedStop(item)) {
return null; // end the step here
}
return item;
}
I investigated the spring batch code.
It seems they update both the version and status of the BATCH_JOB_EXECUTION.
This works for me:
update batch_job_execution set status="STOPPED", version=version+1 where job_instance_id=19;
If you look into the jars of spring batch admin, you can see that in AbstractStep.java(spring-batch admin class) it checks for the status of the Step and Job from Database .
Based on this status it validates step before running it .
This works well for all cases except in chunk, since next step is called after large processing . If you want to implement in it, you can implement your own listener to check status (but it will increase DB hits) .

How define twice the same service in Puppet?

In order to deploy Varnish with a Puppet class, I need stop Varnish for move and deploy files, then at the end, ensure that Varnish is started.
My problem is simple, how I can define twice a service in a Puppet class, in order to stop or start the service at differents steps ?
class varnish::install (
(...)
service { "varnish":
ensure => "stopped",
require => Package['varnish'],
before => Exec['mv-lib-varnish'],
}
(...)
service { "varnish":
ensure => "running",
require => File["$varnishncsa_file"],
}
}
I've an Duplicate definition: Service[varnish] (...) error, and it's logical...
What's the best practice to manage services in a Puppet class ? Divide in multiple classes, or there is an option for "rename" a service for declare it several times ?
try the following to get rid of duplicate error, but what you are trying to do is wrong.
Puppet brings system to certain consistent state - so telling stop service X, do some work , start service X - it out of scope of proper puppet use, puppet is more like restart service if some files on which the service depends were modified.
class varnish::install (
(...)
service { "varnish-stop":
name => "varnish"
ensure => "stopped",
require => Package['varnish'],
before => Exec['mv-lib-varnish'],
}
(...)
service { "varnish-start":
name => "varnish"
ensure => "running",
require => File["$varnishncsa_file"],
}
}
Use exec with service restart as a hook (notify) for "deploy files" action (package/another exec). Define service itself only once as running, because that is what you normally want assuring. Puppet is for describing target state.