How to stop a cloudcontrol deployment? - deployment

My cloudcontrol app is stuck in deployment.
Both rollback and deploy give this message:
"Deployment foobar/default is currently deploying. Please retry in a few seconds."

you get this message, when the cloudControl platform is currently deploying you application. In normal cases the takes at most some minutes, you can check the progress within the deploy log. https://www.cloudcontrol.com/dev-center/Platform%20Documentation#logging

Related

AWS ECS won't start tasks: http request timed out enforced after 4999ms

I have an ECS cluster (fargate), task, and service I have had setup in Terraform for at least a year. I haven't touched it for a long while. My normal deployment for updating the code is to push a new container to the registry and then stop all tasks on the cluster with a script. Today, my service did not run a new task in response to that task being stopped. It's desired count is fixed at so it should.
I have go in an tried to manually run this and I'm seeing this error.
Unable to run task
Http request timed out enforced after 4999ms
When I try to do this, a new stopped task is added to my stopped tasks lists. When I look into that task the stopped reason is "Deployment restart" and two of them are now showing "Task provisioning failed." which I think might be tasks the service tried to start. But these tasks do not show a started timestamp. The ones I start in the console have a started timestamp.
My site is now down and I can't get it back up. Does anyone know of a way to debug this? Is AWS ECS experiencing problems right now? I checked the health monitors and I see no issues.
This was an AWS outage affecting Fargate in us-east-1. It's fixed now.

Jobs remain executing after Rundeck crashes/restarts

Details:
App Version: Rundeck Community - v4.2.1 (though the issue below has
been around for some time)
Database: Postgresql 9.6
I'm wondering if anyone else has experienced this?
In the event of a rundeck crash or server crash jobs which were executing remain in a running state. Navigating to the execution shows the following message:
"Workflow State and Log Output is not available."
This causes issues since a running execution will block subsequent executions and are not alerted on etc.
Is there a config setting that can be used to force any jobs in a RUNNING state to fail in the event of a rundeck service crash/restart?
regards
Right now the way is to use Missed Job Fires feature with Job Resume (both features only available on PagerDuty Process Automation On Prem, formerly "Rundeck Enterprise") to resume missed jobs/steps after a Rundeck server failure.

I'm having issues with DevOps production deployment - Unable to edit or replace deployment

Up until yesterday morning I was able to deploy data factory v2 changes in my release pipeline. Then last night during deployment I received an error that the connection was forced closed. Now when I try to deploy to the production environment, I get this error: "Unable to edit or replace deployment 'ArmTemplate_18': previous deployment from '12/10/2019 10:19:27 PM' is still active (expiration time is '12/17/2019 10:19:23 PM')". Am I supposed to wait a week for this error to clear itself?
This message indicates that there’s another deployment going on, with the same name, in the same ARM Resource Group. In order to perform your new deployment, you’ll need to either:
Wait for the existing deployment to complete
Stop the in-progress / active deployment
You can stop an active deployment by using the Stop-AzureRmResourceGroupDeployment PowerShell command or the azure group deployment stop command in the xPlat CLI tool. Please refer to this case.
Or you can open target Resource Group on the azure portal, go to Deployment tab, find not completed deployments, cancel it, start new deploy. You can refer to this issue for details.
In addition, there is a recently event of availability degradation of Azure DevOps .This could also have an impact. Now the engineers have mitigated this event.

ECS + ALB - My applications only respond a few times

I've developed two spring boot applications for microservices and I've used ECS to deploy these applications into containers.
To do this, I followed the official pet clinic example (https://github.com/aws-samples/amazon-ecs-java-microservices/tree/master/3_ECS_Java_Spring_PetClinic_CICD).
All seems to works correctly, but when I make a request to the ALB very often I receive the 502 or 503 HTTP error and a few times I can see the correct response of the applications.
Can someone help me?
Thanks in advance.
You receive a 502 when you have no healthy task running and 503 when task is starting/restarting.
All of this mean that your task got stopped and then your cluster restart it, so you should find what make your task failed.
It can be something directly in your code that make it crash. or it can be the cluster healthcheck defined in your target group that failed.
Firstly you should look your task in the AWS ECS Console and see what error your task receive when it's stopped.
But as you are able to make request for some time and then it failed. I pretty sure your problem come from your healthcheck. So go in your target group used by the task (in AWS EC2 Console) and make sure the healthcheck path configured exist and returned a 200 status code.

Service Fabric "Waiting for upgrade..." using VSTS

I've configured upgrades on my VSTS release of a Service Fabric app containing 5 services to a single node test environment on Azure. Unfortunately when it gets to the release part it just hangs saying "Waiting for upgrade..." over and over again. I left it for 15 hours and it still says the same thing. The initial deployment went ahead without issue.
I've looked at various posts about turning off health check times, but this has not been successful. I've also tried setting the mode to UnmonitoredAuto, but no success.
I've RDPd onto the environment and checked the processor/memory usage in task manager, and everything is pretty much 0%, and very low memory usage.
Is there anything else I can do to stop the upgrade hanging?
OK, I've managed to fix this. This was happening because there is a PreUpgradeSafetyCheck that happens before rolling out an upgrade. This is not relevant for a single node cluster as downtime is inevitable for single node clusters.
The status of an upgrade can be found using: Get-ServiceFabricApplicationUpgrade. Which shows the status above.
To fix this there is a flag: UpgradeReplicaSetCheckTimeoutSec in the release task. Setting the value to 0 sorts things out.