Is there a way to specify that a substack is not to be rolled back on failure when calling other CFTs from a CFT?
Ie, master CFT invoked (when invoked, you can use --disable-rollback or provide the option to CFN) -> substack 1 succesfully created -> substack 2 fails.
Now, substack 2 rolls back and I lose the record of what happened and the master CFT just sits there failed.
Is there a place to specify whether or not to allow rollback inside of a CFT, either in the invoking template (master) or the child (substack)?
Yes, you can disable the Rollback on failure of Cloud Formation stacks.
In the Options menu while creating the stack, you may find the Advanced portion.
In the expanded Advanced menu, you may find the Rollback on failure option.
Now the CFT won't rollback on failures. Even when a child stack fails It won't initiate rollback.
Related
Currently we are using Github Actions for CI for infrastructure.
Infrastructure is using terraform and a code change on a module triggers plan and deploy for changed module only (hence only updates related modules, e.g 1 pod container)
Since auto-update can be triggered by another github repository push they can come relatively on same time frame, e.g Pod A Image is updated and Pod B Image is updated.
Without any concurrency in place, since terraform holds lock, one of the actions will fail due to lock timeout.
After implementing concurreny it is ok for just 2 on same time pushes to deploy as second one can wait for first one to finish.
Yet if there are more coming, Githubs concurreny only takes into account last push for queue and cancels waiting ones (in progress one can still continue). This is logical from single app domain perspective but since our Infra code is using difference checks, by passing deployments on canceled job actually bypasses and deployment!.
Is there a mechanism where we can queue workflows (or even maybe give a queue wait timeout) on Github Actions ?
Eventually we wrote our own script in workflow to wait for previous runs
Get information on current run
Collect previous non completed runs
and wait until completed (in a loop)
If exited waiting loop continue
on workflow
Tutorial on checking status of workflow jobs
https://www.softwaretester.blog/detecting-github-workflow-job-run-status-changes
I'm creating a Deployment Group in CodeDeploy with a CloudFormation template.
The Deployment Group is successfully created and the application is deployed perfectly fine.
The CF resource that I defined (Type: AWS::CodeDeploy::DeploymentGroup) has the "Deployment" property set. The thing is that I would like to configure automatic rollbacks for this deployment, but as per CF documentation for "AutoRollbackConfiguration" property: "Information about the automatic rollback configuration that is associated with the deployment group. If you specify this property, don't specify the Deployment property."
So my understanding is that if I specify "Deployment", I cannot set "AutoRollbackConfiguration"... Then how are you supposed to configure any rollback for the deployment? I don't see any other resource property that relates to rollbacks.
Should I create a second DeploymentGroup resource and bind it to the same instances that the original Deployment Group has? I'm not sure this is possible or makes sense but I ran out of options.
Thanks,
Nicolas
First i like to describe why you cannot specify both, deployment and rollback configuration:
Whenever you specify a deployment directly for the group, you already state which revision you like to deploy. This conflicts with the idea of CloudFormation of having resources managed by it without having a drift in the actual configuration of those resources.
I would recommend the following:
Use CloudFormation to deploy the 'underlying' infrastructure (the deployment group, application, roles, instances, etc.)
Create a CodePipline within this infrastructure template, which then includes a CodeDeploy deployment action (https://docs.aws.amazon.com/codepipeline/latest/userguide/action-reference-CodeDeploy.html, https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-codepipeline-pipeline-stages-actions-actiontypeid.html)
The pipeline can triggered whenever you have a new version inside you revision location
This approach clearly separates the underlying stuff, which is not changing dynamically and the actual application deployment, done using a proper pipeline.
Additionally in this way you can specify how you like to deploy (green/blue, canary) and how/when rollbacks should be handled. The status of your deployment also to be seen inside CodePipeline.
I didn't mention it but what you are suggesting about CodePipeline is exactly what I did.
In fact, I have one CloudFormation template that creates all the infrastructure and includes the DeploymentGroup. With this, the application is deployed for the first time to my EC2 instances.
Then I have another CF template for CI/CD purposes with a CodeDeploy stage/action that references the previous DeploymentGroup. Whenever I push some code to my repository, the Pipeline is triggered, code is built and new version successfully deployed to the instances.
However, I don't see how/where in any of the CF templates to handle/configure the rollback for the DeploymentGroup as you were saying. I think I get the idea of your explanation about the conflict CF might have in case of having a drift, but my impression is that in case of errors during the CF stack creation, CF rollback should just remove the DeploymentGroup you're trying to create. In other words, for me there's no CodeDeploy deployment rollback involved in that scenario, just removing the resource (DeploymentGroup) CF was trying to create.
One thing that really impresses me is that you can enable/disable automatic rollbacks for the DeploymentGroup through the AWS Console. Just edit and go to Advanced Configuration for the DeploymentGroup and you have a checkbox. I tried it and triggered the Pipeline again and worked perfectly. I made a faulty change to make the deployment fail in purpose, and then CodeDeploy automatically reverted back to the previous version of my application... completely expected behavior. Doesn't make much sense that this simple boolean/flag option is not available through CF.
Hope this makes sense and helps clarifying my current situation. Any extra help would be highly appreciated.
Thanks again
My cloudformation template had couple of AWS::SNS::Subscription. I removed those and deployed the template. 1 of those 2 AWS::SNS::Subscription failed to delete and ended up in DELETE_FAILED. I expected the AWS::CloudFormation::Stack to ROLLBACK on failure to delete the AWS::SNS::Subscription. But to my surprise it ended up in UPDATE_COMPLETE state.
Generally if CloudFormation can't delete a resource as part of the cleanup step, it does not rollback, but succeeds.
No Worry!! Now AWS added new features and ability to retry the stack operations from the point of failure.
This is amazing!! While I was using AWS Cloudformation, I faced the same problem when any resources fail to launch for any reason and we have to wait for the rolls back and again we have to launch the stack from the scratch. But now we can retry stack operations from the point of failure.
Thanks to AWS for adding this new features in it.
Still do you have any questions regarding the same please let me know in the comments section.
Get the full details here : https://aws.amazon.com/blogs/aws/new-for-aws-cloudformation-quickly-retry-stack-operations-from-the-point-of-failure/
What is the meaning of "mode" of set-AzureDeployment?
-Mode
Specifies the mode of upgrade. Supported values are: "Auto", "Manual", and "Simultaneous".
What does "Auto","Manual", and "Simultaneous" mean?
I am particularly interested in "Simultaneous". Does it mean my package will be deployed to multiple instances simultaneously?
Thanks
Mode specifies the type of update to initiate. Role instances are allocated to update domains when the service is deployed. Updates can be initiated manually in each update domain or initiated automatically in all update domains.
If not specified, the default value is Auto. If set to Manual, WalkUpgradeDomain must be called to apply the update. If set to Auto, the update is automatically applied to each update domain in sequence.
To perform an automatic update of a deployment, call Upgrade Deployment or Change Deployment Configuration with the Mode element set to automatic. The update proceeds from that point without a need for further input. You can call Get Operation Status to determine when the update is complete.
To perform a manual update, first call Upgrade Deployment with the Mode element set to manual. Next, call Walk Upgrade Domain to update each domain within the deployment. You should make sure that the operation is complete by calling Get Operation Status before updating the next domain. More information please refer to this link.
One of the new deployment options we now support is the ability to do a “Simultaneous Update” of a Cloud Service (we sometimes also refer to this as the “Blast Option”). When you use this option we bypass the normal upgrade domain walk that is done by default with Cloud Services (where we upgrade parts of the Cloud Service sequentially to avoid ever bringing the entire service down) and we instead upgrade all roles and instances simultaneously. With today’s release this simultaneous update logic now happens within Windows Azure (on the cloud side). This has the benefit of enabling the Cloud Service update to happen much faster. More information please refer to this link.
I am particularly interested in "Simultaneous". Does it mean my
package will be deployed to multiple instances simultaneously?
The answer is yes.
When updating a Cloudformation EC2 Container Service (ECS) Stack with a new Container Image, is there any way to control the timeout so if the service does not stabilize it rolls back automatically?
The UpdatePolicy attribute which is part of the Auto Scaling Group does not help since instances are not being created.
I also tried a WaitCondition but have not been able to get that to work.
The stack essentially just stays in the UPDATE_IN_PROGRESS state until it hits the default timeout (~3 hours), or you trigger a Cancel the update.
Ideally we would be able to have the stack timeout after a short period of time.
This is what my Cloudformation template looks like:
https://s3.amazonaws.com/aws-rga-cw-public/ops/cfn/ecs-cluster-asg-elb-cfn.yaml
Thanks.
I've created a workaround for this problem until AWS creates a ECS UpdatePolicy and CreationPolicy that allows for resourcing signaling:
Use AWS::CloudFormation::WaitCondition with a Macro that will create new WaitCondition resources when the service is expected to update. Signal the wait condition with a non-essential container attached to the task.
Example: https://github.com/deuscapturus/cloudformation-macro-WaitConditionUpdate/blob/master/example-ecs-service.yaml
The Macro for the above example can be found here: https://github.com/deuscapturus/cloudformation-macro-WaitConditionUpdate
My workaround for this problem is that before triggering an update stack, run a script in the background
./deployment-breaker.sh &
And for the script
#!/bin/bash
sleep 600
$deploymentStatus = (aws cloudformation describe-stack --stack-name STACK_NAME | jq XXX)
if [[ $deploymentStatus == YOUR_TERMINATE_CONDITION ]]then
aws cloudformation cancel-update-stack --stack-name STACK_NAME
fi
If your WaitCondition is in the original create you need to rename it (and the Handle). Once a waitcondition has been signaled as complete, it will always be complete. If you rename it and do an update, the original WaitCondition and Handle will be dropped and the new ones created created and signaled.
If you don't want to have to modify your template you might be able to use Lamba and Custom resources to create a unique WaitCondition via the aws cli for each update.
It's not possible at the moment with the provided CloudFormation types. I have the same problem and I might create a custom CloudFormation resource (usineg AWS Lambda) to replace my AWS::ECS::Service.
The other alternative is to use nested stacks to wrap the AWS::ECS::Service resources — it won't solve the problem, but it at least will isolate the individual service and the rest of the stack will be in a good state. My stacks have multiple services and this would help, but the custom resource is the best option so far (I know other people that did the same thing).