Container App fails with error "Error provisioning the container app. Container failed to start-up" - azure-container-apps

Was creating an Azure Container App, I got an error when creating of "Error provisioning the container app. Container '' failed to start-up. Please check Log Analytics workspace for container logs." When I went to Log Analytics and Logs I didn't see any tables. Why did the container fail?

Sometimes Log Analytics has a few minutes delay to ingest the logs. In my case after I waited for 3 or so minutes and went back to the logs screen, I had tables populated and could see the console logs that emitted from my container that helped me understand why it failed to startup.

Related

AWS ECS won't start tasks: http request timed out enforced after 4999ms

I have an ECS cluster (fargate), task, and service I have had setup in Terraform for at least a year. I haven't touched it for a long while. My normal deployment for updating the code is to push a new container to the registry and then stop all tasks on the cluster with a script. Today, my service did not run a new task in response to that task being stopped. It's desired count is fixed at so it should.
I have go in an tried to manually run this and I'm seeing this error.
Unable to run task
Http request timed out enforced after 4999ms
When I try to do this, a new stopped task is added to my stopped tasks lists. When I look into that task the stopped reason is "Deployment restart" and two of them are now showing "Task provisioning failed." which I think might be tasks the service tried to start. But these tasks do not show a started timestamp. The ones I start in the console have a started timestamp.
My site is now down and I can't get it back up. Does anyone know of a way to debug this? Is AWS ECS experiencing problems right now? I checked the health monitors and I see no issues.
This was an AWS outage affecting Fargate in us-east-1. It's fixed now.

Celery: Some tasks are not printing logs on ECS deployment

I have the following celery setup in production
Using RabbitMQ as the broker
I am running multiple instances of the celery using ECS Fargate
Logs are sent to CloudWatch using default awslogs driver
Result backend - MongoDB
The issue I am facing is this. A lot of my tasks are not showing logs on cloudwatch.
I just see this log
Task task_name[3d56f396-4530-4471-b37c-9ad2985621dd] received
But I do not see the actual logs for the execution of this task. Nor do I see the logs for completion - for example something like this is nowhere in the logs to be found
Task task_name[3d56f396-4530-4471-b37c-9ad2985621dd] succeeded
This does not happen all the time. It happens intermittently but consistently. I see that a lot of tasks are printing the logs.
I can see that result backend has the task results and I know that the task has executed but the logs for the task are completely missing. It is not specific to some task_name.
On my local setup, I have not been able to isolate the issue
I am not sure if this is a celery logging issue or awslogs issue. What can I do to troubleshoot this?
** UPDATE **
Found the root cause - it was that I had some code in the codebase that was removing handlers from the root logger. Leaving this question on stack overflow in case someone else faces this issue
Log all requests from the python-requests module

How do I view the log of a previous failure for an OpenShift Job?

I have a Job that runs daily with its container having a restartPolicy of onFailure. It failed once, then ran again about 40 mins later. We need to know the reason for the failure, but oc logs only brings up the log of the successful run.
Is there a way to bring up the log for the previous failed run?

Why would running a container on GCE get stuck Metadata request unsuccessful forbidden (403)

I'm trying to run a container in a custom VM on Google Compute Engine. This is to perform a heavy ETL process so I need a large machine but only for a couple of hours a month. I have two versions of my container with small startup changes. Both versions were built and pushed to the same google container registry by the same computer using the same Google login. The older one works fine but the newer one fails by getting stuck in an endless list of the following error:
E0927 09:10:13 7f5be3fff700 api_server.cc:184 Metadata request unsuccessful: Server responded with 'Forbidden' (403): Transport endpoint is not connected
Can anyone tell me exactly what's going on here? Can anyone please explain why one of my images doesn't have this problem (well it gives a few of these messages but gets past them) and the other does have this problem (thousands of this message and taking over 24 hours before I killed it).
If I ssh in to a GCE instance then both versions of the container pull and run just fine. I'm suspecting the INTEGRITY_RULE checking from the logs but I know nothing about how that works.
MORE INFO: this is down to "restart policy: never". Even a simple Centos:7 container that says "hello world" deployed from the console triggers this if the restart policy is never. At least in the short term I can fix this in the entrypoint script as the instance will be destroyed when the monitor realises that the process has finished
I suggest you try creating a 3rd container that's focused on the metadata service functionality to isolate the issue. It may be that there's a timing difference between the 2 containers that's not being overcome.
Make sure you can ‘curl’ the metadata service from the VM and that the request to the metadata service is using the VM's service account.

Failed to pull image "gcr.io/blah/blah":

We started getting an error when trying to update the image tag of a deployment and its pod
Failed to pull image "gcr.io/blah/blah": rpc error: code = Unknown desc = Error: Status 429 trying to pull repository gcr.io/blah/blah: "Quota Exceeded." Error syncing pod
Randomly it started yesterday in Google Container Builder twice (the same error anyway) and stopped. Then it started during our deployment to two different pods any ideas on how to debug? Its currently stopping all deployments
Thanks
Mark
according to the error message it's seems like one of your quota has exceeded...
select your project inside Google Cloud Platform and on the menu go to
IAM & admin -> Quotas
on the right you will see Used column and pick the service that has exceeded,
then press EDIT QUOTAS on the top and increase your demand.