Roll back AWS stack creation when a bash script fails

Roll back AWS stack creation when a bash script fails - aws-cloudformation

I have a cloud formation template which creates a stack. The stack creates different instances, instance 1 and instance 2. During the creation of instance 1 a bash script is run. I need the stack creation to roll back when the exit code of this bash script is 1. Is there any way I can do this?

Put set -euo pipefail at the top of your script under the #!/bin/bash line. That'll cause the script to return a failed status if your command fails, which should result in a rollback.

You need to use CFN Signal to send signal to Cloudformation upon failure at userdata
Take a look at this example
https://cumulus-ds.readthedocs.io/en/latest/cf_examples.html

Related

How to activate debug mode for Terraform in Azure Pipelines?

I would like to print more logs in Terraform using Azure pipeline, no need to export the logs in a file as the Terraform's documentation mentions.

You should be able to do this by setting the TF_LOG environment variable to one of the values below (most likely DEBUG in your case), in a separate task before you start running actual Terraform commands.
TRACE: the most elaborate verbosity, as it shows every step taken by Terraform and produces enormous outputs with internal logs.
DEBUG: describes what happens internally in a more concise way compared to TRACE.
ERROR: shows errors that prevent Terraform from continuing.
WARN: logs warnings, which may indicate misconfiguration or mistakes, but are not critical to execution.
INFO: shows general, high-level messages about the execution process
Bash:
export TF_LOG=<log_level>
PowerShell
$env:TF_LOG = '<log_level>'

Argo workflow: execute a step when stopped forcefully

I have a 5 steps Argo-workflow:
step1: create an VM on cloud
step2: do some work
step3: do some more work
step4: do some further work
step5: delete the VM
All the above steps are time consuming. And for whatever reasons, a running workflow might be stopped or terminated by issuing the stop/terminate command.
What I want to do is, if the stop/terminate command is issued at any stage before step4 is started, I want to directly jump to step4, so that I can clean up the VM created at step1.
Is there any way to achieve this?
I was imagining it can happen this way:
Suppose I am at step2 when the stop/terminate signal is issued.
The pods running at step2 gets a signal that the workflow is going to be stopped.
The pods stop doing their current work and outputs a special string telling the next steps to skip
So step3 sees the outputs from step2, skips its work and passes it on to step4 and so on.
step5 runs irrespective of the input and deletes the VM.
Please let me know if something like this is achievable.

It sounds like step 5 needs to be run regardlessly, which is what exit handler is for. Here is an example. Exit handler would be executed when you 'stop' at any step, but would be skipped if you terminated the entire workflow.

Run PowerShell on Windows Container start and keep it running

I've been experimenting and searching for a long time without finding an answer that works.
I have a Windows Container and I need to embed a startup script for each time a new container is created.
All the answers I found suggest one of the following:
Add the command to the dockerfile - this is not good because it will only run when the image is built. I need it to run every single time a new container is created from the image,
use docker exec after starting a container - this is also not what I need. These images are intended to be "shippable". I need the script to run without any special action apart from creating a new container.
Using ENTRYPOINT - I had 2 cases here. It either fails and immediately exits. Or it succeeds but the container stops. I need it to keep running.
Basically, the goal of this is to do some initial configuration on the container when it starts and keep it running.
The actions are around generating a GUID and registering the hostname. These have to be unique which is why I need to run them immediately when the container starts.

Looks like CMD in the dockerfile is all I needed. I used:
CMD powershell -file
I simply checked in the script if it's the first time it is running

How to Properly Update the Status of a Job

As far as I know, when most people want to know if a Kubernetes (or Spark even) Job is done, they initiate some sort of loop somewhere to periodically check if the Job is finished with the respective API.
Right now, I'm doing that with Kubernetes in the background with the disown (&) operator (bash inside Python below):
import subprocess
cmd = f'''
kubectl wait \\
--for=condition=complete \\
--timeout=-1s \\
job/job_name \\
> logs/kube_wait_log.txt \\
&
'''
kube_listen = subprocess.run(
cmd,
shell = True,
stdout = subprocess.PIPE
)
So... I actually have two (correlated) questions:
Is there a better way of doing this in the background with shell other than with the & operator?
The option that I think would be best is actually to use cURL from inside the Job to update my Local Server API that interacts with Kubernetes.
However, I don't know how I can perform a cURL from a Job. Is it possible?
I imagine you would have to expose ports somewhere but where? And is it really supported? Could you create a Kubernetes Service to manage the ports and connections?

If you don't want to block on a process running to completion, you can create a subprocess.Popen instance instead. Once you have this, you can poll() it to see if it's completed. (You should try really really really hard to avoid using shell=True if at all possible.) So one variation of this could look like (untested):
with open('logs/kube_wait_log.txt', 'w') as f:
with subprocess.Popen(['kubectl', 'wait',
'--for=condition=complete',
'--timeout=-1s',
'job/job_name'],
stdin=subprocess.DEVNULL,
stdout=f,
stderr=subprocess.STDOUT) as p:
while True:
if p.poll():
job_is_complete()
break
time.sleep(1)
Better than shelling out to kubectl, though, is using the official Kubernetes Python client library. Rather than using this "wait" operation, you would watch the job object in question and see if its status changes to "completed". This could look roughly like (untested):
from kubernetes import client, watch
jobsv1 = client.BatchV1Api()
w = watch.watch()
for event in w.stream(jobsv1.read_namespaced_job, 'job_name', 'default'):
job = event['object']
if job.status.completion_time is not None:
job_is_complete()
break
The Job's Pod doesn't need to update its own status with the Kubernetes server. It just needs to exit with a successful status code (0) when it's done, and that will get reflected in the Job's status field.

Set the ECS Cloudformation Update Stack timeout?

When updating a Cloudformation EC2 Container Service (ECS) Stack with a new Container Image, is there any way to control the timeout so if the service does not stabilize it rolls back automatically?
The UpdatePolicy attribute which is part of the Auto Scaling Group does not help since instances are not being created.
I also tried a WaitCondition but have not been able to get that to work.
The stack essentially just stays in the UPDATE_IN_PROGRESS state until it hits the default timeout (~3 hours), or you trigger a Cancel the update.
Ideally we would be able to have the stack timeout after a short period of time.
This is what my Cloudformation template looks like:
https://s3.amazonaws.com/aws-rga-cw-public/ops/cfn/ecs-cluster-asg-elb-cfn.yaml
Thanks.

I've created a workaround for this problem until AWS creates a ECS UpdatePolicy and CreationPolicy that allows for resourcing signaling:
Use AWS::CloudFormation::WaitCondition with a Macro that will create new WaitCondition resources when the service is expected to update. Signal the wait condition with a non-essential container attached to the task.
Example: https://github.com/deuscapturus/cloudformation-macro-WaitConditionUpdate/blob/master/example-ecs-service.yaml
The Macro for the above example can be found here: https://github.com/deuscapturus/cloudformation-macro-WaitConditionUpdate

My workaround for this problem is that before triggering an update stack, run a script in the background
./deployment-breaker.sh &
And for the script
#!/bin/bash
sleep 600
$deploymentStatus = (aws cloudformation describe-stack --stack-name STACK_NAME | jq XXX)
if [[ $deploymentStatus == YOUR_TERMINATE_CONDITION ]]then
aws cloudformation cancel-update-stack --stack-name STACK_NAME
fi

If your WaitCondition is in the original create you need to rename it (and the Handle). Once a waitcondition has been signaled as complete, it will always be complete. If you rename it and do an update, the original WaitCondition and Handle will be dropped and the new ones created created and signaled.
If you don't want to have to modify your template you might be able to use Lamba and Custom resources to create a unique WaitCondition via the aws cli for each update.

It's not possible at the moment with the provided CloudFormation types. I have the same problem and I might create a custom CloudFormation resource (usineg AWS Lambda) to replace my AWS::ECS::Service.
The other alternative is to use nested stacks to wrap the AWS::ECS::Service resources — it won't solve the problem, but it at least will isolate the individual service and the rest of the stack will be in a good state. My stacks have multiple services and this would help, but the custom resource is the best option so far (I know other people that did the same thing).

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Roll back AWS stack creation when a bash script fails - aws-cloudformation

I have a cloud formation template which creates a stack. The stack creates different instances, instance 1 and instance 2. During the creation of instance 1 a bash script is run. I need the stack creation to roll back when the exit code of this bash script is 1. Is there any way I can do this?

Put set -euo pipefail at the top of your script under the #!/bin/bash line. That'll cause the script to return a failed status if your command fails, which should result in a rollback.

You need to use CFN Signal to send signal to Cloudformation upon failure at userdata Take a look at this example https://cumulus-ds.readthedocs.io/en/latest/cf_examples.html

Related

How to activate debug mode for Terraform in Azure Pipelines?

Argo workflow: execute a step when stopped forcefully

Run PowerShell on Windows Container start and keep it running

How to Properly Update the Status of a Job

Set the ECS Cloudformation Update Stack timeout?

Categories

Resources