Saltstack - Schedule to ensure service is running not working - service

I'm trying to set up a Saltstack schedule that will check to ensure that a service is running on the minion. However, it doesn't seem like service.running is working as a function on the schedule.
Here's my run.sls file:
test-service-sched:
schedule.present:
- name: test-service-sched
- function: service.running
- seconds: 60
- job_kwargs:
name: test-service
- persist: True
- enabled: True
- run_on_start: True
And I execute the following: salt 'service*' state.apply run
This ends up with the following error on the minion:
2017-03-28 02:47:11,493 [salt.utils.schedule ][ERROR ][6172] Unhandled exception running service.running
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/salt/utils/schedule.py", line 826, in handle_func
message=self.functions.missing_fun_string(func))
File "/usr/lib/python2.6/site-packages/salt/utils/error.py", line 36, in raise_error
raise ex(message)
Exception: 'service.running' is not available.
I haven't seen anything in the documentation that says I can't run service.running from a schedule. Is it a known limitation of Salt? Or am I just doing it wrong?
I can use cmd.run, but it ends up spamming the logs with errors if the service is already running.

So, I was pointed in the right direction on the Salt Google Group. There's a difference between execution modules and state modules. Since service.running is an execution module, and schedule only supports state modules, I had to reference it indirectly. I used 2 files:
schedule.sls:
service_schedule:
schedule.present:
- function: state.apply
- minutes: 1
- job_args:
- running
running.sls:
service_running:
service.running:
- name: test_service
Now, running salt 'service*' state.apply schedule did exactly what I wanted it to.

Related

How do I modify existing jobs to switch owner?

I installed Rundeck v3.3.5 (on CentOS 7 via RPM) to replace an old Rundeck instance that was decommissioned. I did the export/import of projects (which worked brilliantly) while connected to the new server as the default admin user. The imported jobs run properly on the correct schedule. I subsequently configured the new server to use LDAP authentication and configured ACLs for users/roles. That also works properly.
However, I see an error like this in the service.log:
ERROR services.NotificationService - Error sending notification email to foo#bar.com for Execution 9358 Error executing tag <g:render>: could not initialize proxy [rundeck.Workflow#9468] - no Session
My thought is to switch job owners from admin to a user that exists in LDAP. I mean, I would like to switch job owners regardless, but I'm also hoping it addresses the error.
Is there a way in the web interface or using rd that I can bulk-modify jobs to switch the owner?
It turns out that the error in the log was caused by notification settings in an included job. I didn't realize that notifications were configured on the parameterized shared job definition, but there were; removing the notification settings caused the error to stop being added to /var/log/rundeck/service.log.
To illustrate the problem, here are chunks of YAML I've edited to show just the important parts. Here's the common job:
- description: Do the actual work with arguments passed
group: jobs/common
id: a618ceb6-f966-49cf-96c5-03a0c2efb9d8
name: do_the_work
notification:
onstart:
email:
attachType: file
recipients: ops#company.com
subject: Actual work being started
notifyAvgDurationThreshold: null
options:
- enforced: true
name: do_the_job
required: true
values:
- yes
- no
valuesListDelimiter: ','
- enforced: true
name: fail_a_lot
required: true
values:
- yes
- no
valuesListDelimiter: ','
scheduleEnabled: false
sequence:
commands:
- description: The actual work
script: |-
#!/bin/bash
echo ${RD_OPTION_DO_THE_JOB} ${RD_OPTION_FAIL_A_LOT}
keepgoing: false
strategy: node-first
timeout: '60'
uuid: a618ceb6-f966-49cf-96c5-03a0c2efb9d8
And here's the job that calls it (the one that is scheduled and causes an error to show up in the log when it runs):
- description: Do the job
group: jobs/individual
name: do_the_job
...
notification:
onfailure:
email:
recipients: ops#company.com
subject: '[Rundeck] Failure of ${job.name}'
notifyAvgDurationThreshold: null
...
sequence:
commands:
- description: Call the job that does the work
jobref:
args: -do_the_job yes -fail_a_lot no
group: jobs/common
name: do_the_work
If I remove the notification settings from the common job, the error in the log goes away. I'm not sure if sending notifications from an included job is not supported. It would be useful to me if it was, so I could place notification settings in a single location. However, I can understand why it presents a problem for the scheduler/executor.

SAM Deployment failed Error- Waiter StackCreateComplete failed: Waiter encountered a terminal failure state

When I try to deploy package on SAM, the very first status comes in cloud formation console is ROLLBACK_IN_PROGRESS after that it gets changed to ROLLBACK_COMPLETE
I have tried deleting the stack and trying again, but every time same issue occurs.
Error in terminal looks like this-
Sourcing local options from ./SAMToolkit.devenv
SAM_PARAM_PKG environment variable not set
SAMToolkit will operate in legacy mode.
Please set SAM_PARAM_PKG in your .devenv file to run modern packaging.
Run 'sam help package' for more information
Runtime: java
Attempting to assume role from AWS Identity Broker using account 634668058279
Assumed role from AWS Identity Broker successfully.
Deploying stack sam-dev* from template: /home/***/1.0/runtime/sam/template.yml
sam-additional-artifacts-url.txt was not found, which is fine if there is no additional artifacts uploaded
Replacing BATS::SAM placeholders in template...
Uploading template build/private/tmp/sam-toolkit.yml to s3://***/sam-toolkit.yml
make_bucket failed: s3://sam-dev* An error occurred (BucketAlreadyOwnedByYou) when calling the CreateBucket operation: Your previous request to create the named bucket succeeded and you already own it.
upload: build/private/tmp/sam-toolkit.yml to s3://sam-dev*/sam-toolkit.yml
An error occurred (ValidationError) when calling the DescribeStacks operation: Stack with id sam-dev* does not exist
sam-dev* will be created.
Creating ChangeSet ChangeSet-2020-01-20T12-25-56Z
Deploying stack sam-dev*. Follow in console: https://aws-identity-broker.amazon.com/federation/634668058279/CloudFormation
ChangeSet ChangeSet-2020-01-20T12-25-56Z in sam-dev* succeeded
"StackStatus": "REVIEW_IN_PROGRESS",
sam-dev* reached REVIEW_IN_PROGRESS
Deploying stack sam-dev*. Follow in console: https://console.aws.amazon.com/cloudformation/home?region=us-west-2
Waiting for stack-create-complete
Waiter StackCreateComplete failed: Waiter encountered a terminal failure state
Command failed.
Please see the logs above.
I set SQS as event source for Lambda, but didn't provided the permissions like this
- Effect: Allow
Action:
- sqs:ReceiveMessage
- sqs:DeleteMessage
- sqs:GetQueueAttributes
Resource: "*"
in lambda policies.
I found this error in "Events" tab of "CloudFormation" service.

Codeship Pro on_fail accross step

Is the on_fail directive of a step run when a previous step has failed ?
I'm using these steps :
- name: fail intentionally
service: busybox
command: false
- name: check if onfail is called
service: busybox
command: true
on_fail:
- command: echo reporting failure
Calling jet steps produces the following output :
(step: fail intentionally)
(image: busybox) (service: busybox) Image exists, using cached image
(step: fail intentionally) error ✗
(step: fail intentionally) container exited with a 1 code
My on_fail is not run.
Is that an issue with the jet utility or would things behave the same in Codeship ?
You have defined an on_fail contingency for the second test step (a step that will not fail). If the on_fail was set for the first step (which fails and stops the build), you would have noted the echoed statement.
This behavior would be consistent with a build running in CodeShip Pro.

How to collect more than 22 event ids with winlogbeat?

I've got a task to collect over 500 events from DC with winlogbeat. But windows got a limit 22 events to query. I'm using version 6.1.2. I've tried with processors like this:
winlogbeat.event_logs:
- name: Security
processors:
- drop_event.when.not.or:
- equals.event_id: 4618
...
but with these settings client doesn't work, nothing in logs. If I run it from exe file it just starts and stops with no error.
If I try to do like it was written in the official manual:
winlogbeat.event_logs:
- name: Security
event_id: ...
processors:
- drop_event.when.not.or:
- equals.event_id: 4618
...
client just crashes with "invalid event log key processors found". Also I've tried to create new custom view and take event from there, but apparently it also has query limit to 22 events.

msdeploy - stop deploy in postsync if presync fails

I am using msdeploy -presync to backup the current deployment of a website in IIS before the -postsync deploys it, however I recently had a situation where the -presync failed (raised a warning due to a missing dll) and the -postsync continued and overwrote the code.
Both the presync and postsync run batch files.
Obviously this is bad as the backup failed so there is no backout route if the deployment has bugs or fails.
Is there anyway to stop the postsync if the presync raises warnings with msdeploy?
Perhaps the issue here is that the presync failure was raised as a warning not an error.
Supply successReturnCodes parameter set to 0 (success return code convention) to presync option such as:
-preSync:runCommand="your script",successReturnCodes=0
More info at: http://technet.microsoft.com/en-us/library/ee619740(v=ws.10).aspx