CDK stack stuck in UPDATE_ROLLBACK_FAILED - How to continue? - aws-cloudformation

I have a CDK project. I tried to deploy a stack but it had an error related to permissions. Now the stack state is UPDATE_ROLLBACK_FAILED. I fixed the error, and I would like to continue my deployment. When I enter cdk deploy it fails:
is in UPDATE_ROLLBACK_FAILED state and can not be updated
I read here that you can issue a command to ContinueUpdateRollback. Can I do this in CDK? What's the best practice for such a state? What do I do if this happens in production, I don't want to delete the stack...

You cannot fix this in CDK -- although you can use the AWS cloudformation CLI to run continue-update-rollback, usually this state shouldn't be resolved programmatically because it requires a decision on your part. You have to go into the console > Cloudformation > Your Stack > Stack Actions > Continue Rollback to see what cannot be updated/rolled back.
You will be provided with a prompt as to what the exact issue is (usually a resource that cannot be updated/deleted etc). You can choose to skip updating this resource and the rollback will continue and succeed.
To see what went wrong with the deployment, or what tried to change that you weren't expecting, run
cdk deploy --no-execute --change-set-name debug-changeset
This command will not actually deploy. It will just generate a change set (called debug-changeset) that you can view from Cloudformation console of the stack in question. This file will show you what cdk deploy wanted to change and can help you debug why your update failed (I debug it this way because I find the deployment events log errors are usually not detailed enough to help you figure out the exact problem)

Related

`gcloud run deploy` raises "Revision <revision_name> is not ready and cannot serve traffic."

Command
gcloud run deploy api --region=$REGION --image=$IMAGE
Logs
Deploying container to Cloud Run service [api] in project [[MASKED]] region [[MASKED]]
Deploying...
Creating Revision...........interrupted
Deployment failed
ERROR: (gcloud.run.deploy) Revision [[MASKED]] is not ready and cannot serve traffic.
I've tried to search Google Cloud documentation, but it does not mention such problem.
How to solve the "Revision is not ready and cannot serve traffic."?
Try to wait a few minutes and then just re-launch the procedure. The good old "let's retry without changing anything" worked for me! :)
EDIT: I talked with a Cloud Architect who works with me and he told me that this is the actual solution, because if you retry too quickly to restart the deploy, GCP may still have some pending operations from the previous one!
I faced the same error in Cloud Run after getting the container working correctly locally. In my case the revisions weren't showing as failing, they had a grey checkmark
and when hovering I got the message
The revision is healthy but not currently serving traffic.
I just needed to click Manage Traffic and set 100% of the traffic to a new revision
I faced this problem as well. In my case I checked "Cloud Run" section from hamburger menu of google cloud console. The "Logs" section should give you more idea about what went wrong. I was missing a python library, and adding correct python dependency in my requirements.txt solved the issue for me. Somehow my local testing went well without this issue. I hope this helps. :)
I faced with this problem, my problem is that my docker image is missing required dependency package at build stage, my Dockerfile missed some steps to copy required files for preparing to install package.
To find you problem if cloud build logs was not make sense for you, I think you should:
From gcloud console, go to service "Container Registry" > Images
Select your repository name
From the image version (maybe latest) that you want to check > more actions > show pull command > then copy that command ex: docker pull gcr.io/..
From gcloud console header > select activate cloud shell
At cloud shell terminal, pull docker images of your latest build by running "pull command" that you copied before.
Start your container from this image to see what exactly happens with your run revision

How to test the validity of alertmanager.yaml

Is there a way to find why my alert manager configurations are not being applied?
from the doc, the reason for that is that the file is not valid.
Alertmanager can reload its configuration at runtime. If the new configuration is not well-formed, the changes will not be applied and an error is logged. A configuration reload is triggered by sending a SIGHUP to the processor sending an HTTP POST request to the /-/reload endpoint.
Blockquote
I am trying to find a way to test the validity of my alertmanager.yaml
I came through the amtool git repo then it takes me through installing the alert manager itself where amtool is included inside.
Ok, I did that and I got the amtool.exe downloaded inside the alert manager package.
I added my file in the config folder that is supposed to be scanned by the tool. But I didn't get an answer from it, it keeps shutting its console screen without showing any log.
Second,
I have installed the Prometheus stack Helm chart on my k8s cluster from the Prometheus stack community git repo, how do I find the amtool inside?
No code sample it is not a code issue.
Thanks everyone.

Does anyone have tried the HLF 2.0 feature "External Builders and Launchers" and wants to get in touch?

I'm getting my way through the HLF 2.0 docs and would love to discuss and try out the new features "External Builders and Launchers" and "Chaincode as an external service".
My goal is to run HLF2.0 on an K8s cluster (OpenShift). Does anyone wants to get in touch or has anyone already figured his way through?
Cheers from Germany
Also trying to use the ExternalBuilder. Setup core.yaml, rebuilt the containers to use it. I get an error that on "peer lifecycle chaincode install .tgz...", that the path to the scripts in core.yaml can not be found.
I've added volume bind commands in the peer-base.yaml, and in docker-compose-cli.yaml, and am using the first-network setup. Dropped out the part of the byfn.sh that would connect to the cli container, so that I do that part manually, do the create, join, update anchors successfully, and then try to do the install and fail. However, on the install, I'm failing on the /bin/detect, because it can't find that file to fork/exec it. To get that far, peer was able to read my external configuration, and read the core.yaml file. At the moment, trying the "mode: dev" in the core.yaml which seems to indicate that the scripts and the chaincode will be run "locally", which I think means it should run in the cli container. Otherwise, tried to walk the code to see how the docker containers are being created dynamically, and from what image, but haven't been able to nail that down yet.

Error occurred while starting the build in Openshift 3

I have been trying to deploy a war file as an OpenShift project. The server used is jboss-webserver30-tomcat8. I have followed the below steps -
Put ROOT.war file under 'deployments' directory in local system.
Upload the changes in github.
Create a new JAVA project in OpenShift 3 and provide the github repository details.
No automatic build or deployment starts. On manually clicking on Start Build button, the below error is displayed:
An error occurred while starting the build. Reason: Error resolving
ImageStreamTag jboss-webserver30-tomcat8-openshift:1.2 in namespace
openshift: unable to find latest tagged image
Please suggest how can I resolve the error.
This is an issue with how the jboss-webserver30-tomcat8-openshift imagestream is defined in the cluster. We are working to correct this, it is not currently importing the correct set of tags and as a result the 1.2 tag was stopped being a valid tag, when it should be.
However the short term solution is change your buildconfig to reference one of the tags that has a valid image reference associated (e.g. 1.3) instead of the 1.2 tag it is currently referencing. Your build should then be able to run.
A (temporarily) unavailable builder image may be related to this platform upgrade that correlates with the time of posting your question.
Generally, the best place to check for any incident reports or scheduled maintenance is the Status Page (Starter | Pro clusters; it's linked in the web console too, in the upper right corner of the interface).
If this does not seem to be related (e.g. you're not on the starter-us-west-2 cluster where the platform upgrade is taking place) or persists after the maintenance is over, I would encourage you to check the open issues, and log a new bug report, if it's not in the list.
Thank you.

Azure Run/Publish Fail Reason or Exception Dump

I'm wrestling with the Azure Deployment process. I have an application (many applications) that have run very-well-thank-you-very-much on my local machine, but when I publish and run them they often are "Initializing...." and then "Stopping..." because they've hit some error.
My question is: How can I find out what the error was that stopped it from running?
I want to be able to capture or view errors that stopped the actual deployment.
Thanks in advance
This is problem is normally caused by referenced assemblies - you will need to check that you have set the "Copy Local" attribute to true for any 3rd party assemblies within your project.
See the following blog post for a more in-depth analysis of this issue.