aws codedeploy error The old deployment was still in status 'Active' after the deployment timeout. The configured timeout is 10 minutes - amazon-ecs

while running code deploy getting error as The old deployment was still in status 'Active' after the deployment timeout. The configured timeout for this action is 10 minutes.
it showing deployment in running stage after configured timeout.
enter image description here
enter image description here

You should check the Task log.
In the Task Tab, click 'stopped' and see logs.
Most common reasons are about Authority. Check the ECS IAM Role has enough permissions to access resources.
(https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_execution_IAM_role.html)

Related

Iguazio job is stuck on 'Pending' status

I have a job I am running in Iguazio. It starts and then the status is "Pending" and the icon is blue. It stays like this indefinitely and there is nothing in the logs that describes what is going on. How do I fix this?
A job stuck in this status is usually a Kubernetes issue. The reason there is no logs in the Iguazio dashboard for the job is because the pod never started, which is where the logs come from. You can navigate to the web shell / Jupyter service in Iguazio and use kubectl commands to find out what is going on in Kubernetes. Usually, I see this when there is an issue with the docker image for the pod, it either can’t be found or has bugs.
In a terminal: doing kubectl get pods and find your pod. It usually has ImagePullBackOff, or CrashLoopBackOff or some similar error. Check the docker image which is usually the culprit. You can kill the pod in Kubernetes, which in turn will error the job out. You can also “abort” the job from the menu in the dashboard under that specific job.

JBoss tracing JDBC on OpenShift

I have an OpenShift Deployment with 14 replicas in production environment.
I need to activate a trace on a single pod/replica and I have found the following jboss-cli.sh commands to do it
/subsystem=datasources/data-source=MySQLPool/:write-attribute(name=spy,value=true)
/subsystem=logging/logger=jboss.jdbc.spy/:add(level=TRACE)
/subsystem=jca/cached-connection-manager=cached-connection-manager/:write-attribute(name=error,value=true)
but when i enter those commands reload is required.
If I do a
:reload
the pod I am on restarts and the given configuration is lost.
Is there an altervative way to activate pool tracing ?
thanx a lot in advance!
I have found the problem: it wai in OpenShift Heath Check configuration
Liveness probe period was too small, so it caused killing while jboss reloading was in progress.
Rising the value gives Jboss time for reloading without killing the Pod.

Missing edit permissions on a kubernetes cluster on GCP

This is a Google Cloud specific problem.
I returned from vacation and noticed I can no longer manage workloads or cluster due to this error: "Missing edit permissions on account"
I am a sole person with access to this account (owner role) and yet I see this issue.
The troubleshooting guide suggests checking system service account role, looks like it's set up correctly (why would it not if I haven't edited it):
If it's not set up correctly it suggests turning off/on the Kubernetes API on GCP, but when you press on "disable" there's a scary-looking prompt that your Kubernetes resources are going to be deleted, so obviously I can't do that.
Upon trying to connect to it I get
gcloud container clusters get-credentials cluster-1 --zone us-west1-b --project PROJECT_ID
Fetching cluster endpoint and auth data.
WARNING: cluster cluster-1 is not running. The kubernetes API may not be available.
In the logs I found a record (the last one) that is 4 days old:
"Readiness probe failed: Get http://10.20.0.5:44135/readiness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
Anyone here has any ideas?
Thanks in advance.
The issue is solved,
I had to upgrade node versions in the pool.
What a misleading error message.
Hopefully, this helps someone.

How to debug an ECS Fargate service that occasionally restarts task due to unhealthy elastic load balancer health checks

I'm hosting a shiny app on ECS Fargate. It works fairly well but then occasionally when using the app it crashes. I traced it to the following in the events tab:
service YYYY has started 1 tasks: task XXX
service YYYY has stopped 1 running tasks: task XXX
service YYYY deregistered 1 targets in target-group (Name of Elastic Load Balancer)
service YYYY (port 3838) is unhealthy in target-group (Name of Elastic Load Balancer) due to (reason Request timed out).
Does anyone know what might be causing this?
Or alternatively how can I investigate this further?
Could this be linked to spikes in CPU utilization within the application?
I've seen that at certain times the CPU utilization is spiked to 100%.
So If the user uses the application in a way that causes this high utilization, could this cause the container to be deemed unhealthy?
Also, auto-scaling is enabled for the application for when the CPU > 50% - however this is not being activated in the moments when the CPU utilization spikes to 100%. Any ideas?
You can get details about stopped tasks on the ECS Console
Cluster -> Tasks -> Stopped and then enter in the specific task
ECS Console
Additionally in that tab you can get the logs of the container if you have configured the appropiate log driver in the task definition
Does the application write any logs? Make sure those logs are getting sent to the container's console so they show up in CloudWatch logs for ECS.
Add the following to your Dockerfile to get logs to output to the console:
RUN ln -sf /proc/self/fd/1 /var/log/mylocation/mylogfil.log && \
ln -sf /proc/self/fd/1 /var/log/mylocation/myerrorfile.log

Run a container on a pod failure Kubernetes

I have a cronjob that runs and does things regularly. I want to send a slack message with the technosophos/slack-notify container when that cronjob fails.
Is it possible to have a container run when a pod fails?
There is nothing built in for this that i am aware of. You could use a web hook to get notified when a pod changes and look for state stuff in there. But you would have to build the plumbing yourself or look for an existing third party tool.
Pods and Jobs are different things. If you want to wait for a job that has failed and send an email after it has, you can do something like this in bash:
while true
do
kubectl wait --for=condition=failed job/myjob
kubectl run --image=technosophos/slack-notify --env="EMAIL=failure#yourdomain.com"
done
To the question: Is it possible to have a container run when a pod fails?
Yes , although there is nothing out of the box right now , but you can define a health check.
Then you can write a cron job , or a Jenkins job , or a custom kubernetes cluster service/controller that checks/probes that health check regularly and if the health check fails then you can run a container based on that.