We started getting an error when trying to update the image tag of a deployment and its pod
Failed to pull image "gcr.io/blah/blah": rpc error: code = Unknown desc = Error: Status 429 trying to pull repository gcr.io/blah/blah: "Quota Exceeded." Error syncing pod
Randomly it started yesterday in Google Container Builder twice (the same error anyway) and stopped. Then it started during our deployment to two different pods any ideas on how to debug? Its currently stopping all deployments
Thanks
Mark
according to the error message it's seems like one of your quota has exceeded...
select your project inside Google Cloud Platform and on the menu go to
IAM & admin -> Quotas
on the right you will see Used column and pick the service that has exceeded,
then press EDIT QUOTAS on the top and increase your demand.
Related
I am receiving an error message when trying to access details from VS Code of my Azure Kubernetes Cluster. This problem prevents me from attaching a debugger to the pod.
I receive the following error message:
Error loading document: Error: cannot open k8smsx://loadkubernetescore/pod-kube-system%20%20%20coredns-748cdb7bf4-q9f9x.yaml?ns%3Dall%26value%3Dpod%2Fkube-system%20%20%20coredns-748cdb7bf4-q9f9x%26_%3D1611398456559. Detail: Unable to read file 'k8smsx://loadkubernetescore/pod-kube-system coredns-748cdb7bf4-q9f9x.yaml?ns=all&value=pod/kube-system coredns-748cdb7bf4-q9f9x&_=1611398456559' (Get command failed: error: there is no need to specify a resource type as a separate argument when passing arguments in resource/name form (e.g. 'kubectl get resource/<resource_name>' instead of 'kubectl get resource resource/<resource_name>'
)
My Setup
I have VS Code installed, with "Kubernetes", "Bridge to Kubernetes" and "Azure Kubernetes Service" installed
I have connected my Cluster through az login and can already access different information (e.g. my nodes, etc.)
When trying to access the workloads / pods on my cluster, I receive the above error message - and in the Kubernetes View in VS Code I get an error for the details of the pod.
Error in Kubernetes-View in VS Code
What I tried
I tried to reinstall the AKS Cluster and completely logging in freshly to it
I tried to reinstall all extensions mentioned above in VS Code
Browsing the internet, I do not find any comparable error message
The strange thing is that it used to work two weeks ago - and I did not change or update anything (as far as I remember)
Any ideas / hints that I can try further?
Thank you
As #mdaniel wrote: the Node view is just for human consumption, and that the tree item you actually want to click on is underneath Namespaces / kube-system / coredns-748cdb7bf4-q9f9x. Give that a try, and consider reporting your bad experience to their issue tracker since it looks like release 1.2.2 just came out 2 days ago and might not have been tested well.
final solution is to attach debugger in the other way - through Workloads / Deployments.
When trying to sync the user directories of Jira to other atlassian products (confluence and bitbucket server running on aks) a 403 error is returned.
Upon looking into this error the following steps have been attempted:
https://confluence.atlassian.com/stashkb/unable-to-connect-to-jira-for-authentication-forbidden-403-323391874.html
The IP adresses have been added to the whitelist of Jira. The next step in solutions online is to restart the Jira
service.
This however causes issues as upon running the stop/start-jira.sh files inside the pod the service returns
with none of the previous settings and all configurations including backups are gone. Taking us back to square one.
cluster size:
current set-up
3 x Standard D8 v3 (8 vcpus, 32 GiB memory) cluster on aks
Used the following images installed through UI:
atlassian/jira-software
cptactionhank/docker-atlassian-jira
Exec into pod and go to /opt/atlassian/jira/bin
run ./(start/stop)-jira.sh
What should happen is that when going back to the url the Jira instance is reset and all configuration files in the pod for the service are lost.
The logs of the pod give error no 137 as a common error when restarting.
update:
https://github.com/int128/devops-kompose/tree/master/atlassian-jira-software
The following helm chart has also been used and achieved the same result.
The cluster was running fine for 255 days. I brought down the cluster and after that, I was unable to run the cluster up. It gives the following error while running the cluster up.
Creating minions.
Attempt 1 to create kubernetes-minion-template
ERROR: (gcloud.compute.instance-templates.create) Could not fetch image resource:
- The resource 'projects/google-containers/global/images/container-vm-v20170627' was not found
Attempt 1 failed to create instance template kubernetes-minion-template. Retrying.
This Attempt goes on and it always fails. Am I missing something?
The kubernetes version is v1.7.2.
It looks like the image you are trying to use to create the machines has been deprecated and/or is no longer available.
You should try specifying an alternative image to create these machines from Google's current public images.
I'm trying to run a container in a custom VM on Google Compute Engine. This is to perform a heavy ETL process so I need a large machine but only for a couple of hours a month. I have two versions of my container with small startup changes. Both versions were built and pushed to the same google container registry by the same computer using the same Google login. The older one works fine but the newer one fails by getting stuck in an endless list of the following error:
E0927 09:10:13 7f5be3fff700 api_server.cc:184 Metadata request unsuccessful: Server responded with 'Forbidden' (403): Transport endpoint is not connected
Can anyone tell me exactly what's going on here? Can anyone please explain why one of my images doesn't have this problem (well it gives a few of these messages but gets past them) and the other does have this problem (thousands of this message and taking over 24 hours before I killed it).
If I ssh in to a GCE instance then both versions of the container pull and run just fine. I'm suspecting the INTEGRITY_RULE checking from the logs but I know nothing about how that works.
MORE INFO: this is down to "restart policy: never". Even a simple Centos:7 container that says "hello world" deployed from the console triggers this if the restart policy is never. At least in the short term I can fix this in the entrypoint script as the instance will be destroyed when the monitor realises that the process has finished
I suggest you try creating a 3rd container that's focused on the metadata service functionality to isolate the issue. It may be that there's a timing difference between the 2 containers that's not being overcome.
Make sure you can ‘curl’ the metadata service from the VM and that the request to the metadata service is using the VM's service account.
I'm encountering the following error for my ingress controller.
Warning GCE googleapi: Error 403: Quota 'BACKEND_SERVICES' exceeded. Limit: 9.0, quotaExceeded
My limit is set as 9, and this has previously worked so I'm not sure why this error is being encountered now.
I did delete the cluster and then created a new one, what do these backend services refer to? How could I remove any old ones that have not been deleted?
You could also ask for a small up on the backend # quota page.
If it's small enough it will get auto accepted.
I had to delete the previously created Load balancers, and the related "backends" in the Google Cloud console.
The quota was shortly updated after that.
Just a heads up — I ran into this quickly trialing a Multi Region GCE Ingress deployed using Kubemci. Since you are essentially duplicating your backend across many regions the maximum number of regions you could use on a GCP Trial Account would be 5.
GCP will force you to upgrade to a full account (and enter billing if you haven't yet). Not a big deal but in my instance I had do this in order to test a service being served from more than 5 regions at once — where the error was not immediately evident in the logs.
When trouble shooting the rest of the Multi-Region Ingress process this one was tricky to track down — so hopefully this saves a bit of time for someone trying to deploy many clusters on a trial account (like I was!).