ERROR: (gcloud.compute.instance-templates.create) Could not fetch image resource: - kubernetes

The cluster was running fine for 255 days. I brought down the cluster and after that, I was unable to run the cluster up. It gives the following error while running the cluster up.
Creating minions.
Attempt 1 to create kubernetes-minion-template
ERROR: (gcloud.compute.instance-templates.create) Could not fetch image resource:
- The resource 'projects/google-containers/global/images/container-vm-v20170627' was not found
Attempt 1 failed to create instance template kubernetes-minion-template. Retrying.
This Attempt goes on and it always fails. Am I missing something?
The kubernetes version is v1.7.2.

It looks like the image you are trying to use to create the machines has been deprecated and/or is no longer available.
You should try specifying an alternative image to create these machines from Google's current public images.

Related

Connection from VS Code to Kubernetes failing

I am receiving an error message when trying to access details from VS Code of my Azure Kubernetes Cluster. This problem prevents me from attaching a debugger to the pod.
I receive the following error message:
Error loading document: Error: cannot open k8smsx://loadkubernetescore/pod-kube-system%20%20%20coredns-748cdb7bf4-q9f9x.yaml?ns%3Dall%26value%3Dpod%2Fkube-system%20%20%20coredns-748cdb7bf4-q9f9x%26_%3D1611398456559. Detail: Unable to read file 'k8smsx://loadkubernetescore/pod-kube-system coredns-748cdb7bf4-q9f9x.yaml?ns=all&value=pod/kube-system coredns-748cdb7bf4-q9f9x&_=1611398456559' (Get command failed: error: there is no need to specify a resource type as a separate argument when passing arguments in resource/name form (e.g. 'kubectl get resource/<resource_name>' instead of 'kubectl get resource resource/<resource_name>'
)
My Setup
I have VS Code installed, with "Kubernetes", "Bridge to Kubernetes" and "Azure Kubernetes Service" installed
I have connected my Cluster through az login and can already access different information (e.g. my nodes, etc.)
When trying to access the workloads / pods on my cluster, I receive the above error message - and in the Kubernetes View in VS Code I get an error for the details of the pod.
Error in Kubernetes-View in VS Code
What I tried
I tried to reinstall the AKS Cluster and completely logging in freshly to it
I tried to reinstall all extensions mentioned above in VS Code
Browsing the internet, I do not find any comparable error message
The strange thing is that it used to work two weeks ago - and I did not change or update anything (as far as I remember)
Any ideas / hints that I can try further?
Thank you
As #mdaniel wrote: the Node view is just for human consumption, and that the tree item you actually want to click on is underneath Namespaces / kube-system / coredns-748cdb7bf4-q9f9x. Give that a try, and consider reporting your bad experience to their issue tracker since it looks like release 1.2.2 just came out 2 days ago and might not have been tested well.
final solution is to attach debugger in the other way - through Workloads / Deployments.

How to sync user directory on bitbucket server to jira with both running on aks?

When trying to sync the user directories of Jira to other atlassian products (confluence and bitbucket server running on aks) a 403 error is returned.
Upon looking into this error the following steps have been attempted:
https://confluence.atlassian.com/stashkb/unable-to-connect-to-jira-for-authentication-forbidden-403-323391874.html
The IP adresses have been added to the whitelist of Jira. The next step in solutions online is to restart the Jira
service.
This however causes issues as upon running the stop/start-jira.sh files inside the pod the service returns
with none of the previous settings and all configurations including backups are gone. Taking us back to square one.
cluster size:
current set-up
3 x Standard D8 v3 (8 vcpus, 32 GiB memory) cluster on aks
Used the following images installed through UI:
atlassian/jira-software
cptactionhank/docker-atlassian-jira
Exec into pod and go to /opt/atlassian/jira/bin
run ./(start/stop)-jira.sh
What should happen is that when going back to the url the Jira instance is reset and all configuration files in the pod for the service are lost.
The logs of the pod give error no 137 as a common error when restarting.
update:
https://github.com/int128/devops-kompose/tree/master/atlassian-jira-software
The following helm chart has also been used and achieved the same result.

Why would running a container on GCE get stuck Metadata request unsuccessful forbidden (403)

I'm trying to run a container in a custom VM on Google Compute Engine. This is to perform a heavy ETL process so I need a large machine but only for a couple of hours a month. I have two versions of my container with small startup changes. Both versions were built and pushed to the same google container registry by the same computer using the same Google login. The older one works fine but the newer one fails by getting stuck in an endless list of the following error:
E0927 09:10:13 7f5be3fff700 api_server.cc:184 Metadata request unsuccessful: Server responded with 'Forbidden' (403): Transport endpoint is not connected
Can anyone tell me exactly what's going on here? Can anyone please explain why one of my images doesn't have this problem (well it gives a few of these messages but gets past them) and the other does have this problem (thousands of this message and taking over 24 hours before I killed it).
If I ssh in to a GCE instance then both versions of the container pull and run just fine. I'm suspecting the INTEGRITY_RULE checking from the logs but I know nothing about how that works.
MORE INFO: this is down to "restart policy: never". Even a simple Centos:7 container that says "hello world" deployed from the console triggers this if the restart policy is never. At least in the short term I can fix this in the entrypoint script as the instance will be destroyed when the monitor realises that the process has finished
I suggest you try creating a 3rd container that's focused on the metadata service functionality to isolate the issue. It may be that there's a timing difference between the 2 containers that's not being overcome.
Make sure you can ‘curl’ the metadata service from the VM and that the request to the metadata service is using the VM's service account.

Azure Service Fabric Cluster Update

I have a cluster in Azure and it failed to update automatically so I'm trying a manual update. I tried via the portal, it failed so I kicked off an update using PS, it failed also. The update starts then just sits at "UpdatingUserConfiguration" then after an hour or so fails with a time out. I have removed all application types and check my certs for "NETWORK SERVCIE". The cluster is 5 VM single node type, Windows.
Error
Set-AzureRmServiceFabricUpgradeType : Code: ClusterUpgradeFailed,
Message: Cluster upgrade failed. Reason Code: 'UpgradeDomainTimeout',
Upgrade Progress:
'{"upgradeDescription":{"targetCodeVersion":"6.0.219.9494","
targetConfigVersion":"1","upgradePolicyDescription":{"upgradeMode":"UnmonitoredAuto","forceRestart":false,"u
pgradeReplicaSetCheckTimeout":"37201.09:59:01","kind":"Rolling"}},"targetCodeVersion":"6.0.219.9494","target
ConfigVersion":"1","upgradeState":"RollingBackCompleted","upgradeDomains":[{"name":"1","state":"Completed"},
{"name":"2","state":"Completed"},{"name":"3","state":"Completed"},{"name":"4","state":"Completed"}],"rolling
UpgradeMode":"UnmonitoredAuto","upgradeDuration":"02:02:07","currentUpgradeDomainDuration":"00:00:00","unhea
lthyEvaluations":[],"currentUpgradeDomainProgress":{"upgradeDomainName":"","nodeProgressList":[]},"startTime
stampUtc":"2018-05-17T03:13:16.4152077Z","failureTimestampUtc":"2018-05-17T05:13:23.574452Z","failureReason"
:"UpgradeDomainTimeout","upgradeDomainProgressAtFailure":{"upgradeDomainName":"1","nodeProgressList":[{"node
Name":"_mstarsf10_1","upgradePhase":"PreUpgradeSafetyCheck","pendingSafetyChecks":[{"kind":"EnsureSeedNodeQu
orum"}]}]}}'.
Any ideas on what I can do about a "EnsureSeedNodeQuorum" error ?
The root cause was only 3 seed nodes in the cluster as a result of the cluster being build with a VM scale set that had "overprovision" set to true. Lesson learned, remember to set "overprovision" to false.
I ended up deleting the cluster and scale set and recreated using my stored ARM template.

Failed to pull image "gcr.io/blah/blah":

We started getting an error when trying to update the image tag of a deployment and its pod
Failed to pull image "gcr.io/blah/blah": rpc error: code = Unknown desc = Error: Status 429 trying to pull repository gcr.io/blah/blah: "Quota Exceeded." Error syncing pod
Randomly it started yesterday in Google Container Builder twice (the same error anyway) and stopped. Then it started during our deployment to two different pods any ideas on how to debug? Its currently stopping all deployments
Thanks
Mark
according to the error message it's seems like one of your quota has exceeded...
select your project inside Google Cloud Platform and on the menu go to
IAM & admin -> Quotas
on the right you will see Used column and pick the service that has exceeded,
then press EDIT QUOTAS on the top and increase your demand.