Kubernetes cluster-autoscaler not working

Kubernetes cluster-autoscaler not working - kubernetes

I have deployed cluster-autoscaler for my aws kube cluster, now it is failing with below error.
W0411 03:07:37.393124 1 clusterstate.go:514] Failed to get nodegroup for dev-k8s-node-asg-230-i-089e4d2f163533989: Wrong id: expected format aws:////, got
W0411 03:07:37.393145 1 clusterstate.go:514] Failed to get nodegroup for stg-k8s-w2-npe-master-3: Wrong id: expected format aws:////, got
W0411 03:07:37.393152 1 clusterstate.go:514] Failed to get nodegroup for dev-k8s-node-prm-1: Wrong id: expected format aws:////, got
W0411 03:07:37.393158 1 clusterstate.go:514] Failed to get nodegroup for dev-k8s-node-asg-230-i-0eb3341fce85be39c: Wrong id: expected format aws:////, got
W0411 03:07:37.393164 1 clusterstate.go:514] Failed to get nodegroup for dev-k8s-node-asg-230-i-091d1a037311d5daf: Wrong id: expected format aws:////, got
W0411 03:07:37.393169 1 clusterstate.go:514] Failed to get nodegroup for dev-k8s-node-asg-230-i-041dd54f2baaa4553: Wrong id: expected format aws:////, got
W0411 03:07:37.393188 1 clusterstate.go:560] Readiness for node group dev-k8s-node-asg-230 not found
W0411 03:07:37.393203 1 clusterstate.go:560] Readiness for node group stg-k8s-agent-w2-asg not found
autoscaler-configuration
Command:
./cluster-autoscaler \
--v=6 \
--stderrthreshold=info \
--cloud-provider=aws \
--skip-nodes-with-local-storage=false \
--expander=least-waste \
--node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,kubernetes.io/cluster/dev
I have added below tags to my autoscaling group, can someone help me to understand this error.

You may try to create cloud-config.conf and insert KubernetesClusterTag, KubernetesClusterID and Zone properties there. Please verify if
--cloud-provider=aws
is present at every nodes and in the kubelet command line, and if
--cloud-config=
points to correct path of cloud_config.conf.

Related

How to install Eclipse-che in azure Kubernetes cluster

I'm trying to install the Eclipse-Che by following this blog : https://che.eclipseprojects.io/2022/07/25/#karatkep-installing-eclipse-che-on-aks.html,
yet following all the steps i'm not able to successfully install the Eclipse che.
1)
After running this command:
kubectl logs -l app.kubernetes.io/component=che-operator -n eclipse-che -f
these are the errors i'm facing:
logs: Waited for 1.034843163s due to client-side throttling, not priority and fairness, request: GET:https://10.1.0.1:443/apis/discovery.k8s.io/v1?timeout=32s
time="2022-09-12T14:08:29Z" level=info msg="Successfully reconciled."
2) the Che-gateway pod is failing:
che-gateway-7d54ccdd59-bblw6 3/4 CrashLoopBackOff 18 (2m51s ago) 70m
Description: Oauth-proxy container is getting failed (Crash loop back error)
Logs of the oauth- Proxy container:
#invalid configuration:
missing setting: login-url
missing setting: redeem-url

How to fix Unsupported Config Type "" error in Hyperledger Fabric on Kubernetes?

I am trying to follow this tutorial on deploying Hyperledger Fabric on Kubernetes. But instead of IBM Cloud, I'm doing it with Google Cloud. I encountered this same issue (see my logs below) and tried:
changing docker image to docker:18.09-dind in docker.yaml.
setting FABRIC_CFG_PATH=$PWD/configFiles instead of FABRIC_CFG_PATH=$PWD in create_channel.yaml according to another StackOverflow answer.
However, these workaround did not work for me and I still encounter the error.
How do I fix this to be able to successfully deploy the network?
> ./setup_blockchainNetwork.sh
peersDeployment.yaml file was configured to use Docker in a container.
Creating Docker deployment
persistentvolume/docker-pv created
persistentvolumeclaim/docker-pvc created
service/docker created
deployment.apps/docker-dind created
Creating volume
The Persistant Volume does not seem to exist or is not bound
Creating Persistant Volume
Running: kubectl create -f /home/me/blockchain-network-on-kubernetes/configFiles/createVolume.yaml
persistentvolume/shared-pv created
persistentvolumeclaim/shared-pvc created
Success creating Persistant Volume
Creating Copy artifacts job.
Running: kubectl create -f /home/me/blockchain-network-on-kubernetes/configFiles/copyArtifactsJob.yaml
job.batch/copyartifacts created
Wating for container of copy artifact pod to run. Current status of copyartifacts-dcg4m is Pending
copyartifacts-dcg4m is now Running
Starting to copy artifacts in persistent volume.
Waiting for 10 more seconds for copying artifacts to avoid any network delay
Waiting for copyartifacts job to complete
Copy artifacts job completed
Generating the required artifacts for Blockchain network
Running: kubectl create -f /home/me/blockchain-network-on-kubernetes/configFiles/generateArtifactsJob.yaml
job.batch/utils created
Waiting for generateArtifacts job to complete
Waiting for generateArtifacts job to complete
Creating Services for blockchain network
Running: kubectl create -f /home/me/blockchain-network-on-kubernetes/configFiles/blockchain-services.yaml
service/blockchain-ca created
service/blockchain-orderer created
service/blockchain-org1peer1 created
service/blockchain-org2peer1 created
service/blockchain-org3peer1 created
service/blockchain-org4peer1 created
Creating new Deployment to create four peers in network
Running: kubectl create -f /home/me/blockchain-network-on-kubernetes/configFiles/peersDeployment.yaml
deployment.apps/blockchain-orderer created
deployment.apps/blockchain-ca created
deployment.apps/blockchain-org1peer1 created
deployment.apps/blockchain-org2peer1 created
deployment.apps/blockchain-org3peer1 created
deployment.apps/blockchain-org4peer1 created
Checking if all deployments are ready
Waiting for 15 seconds for peers and orderer to settle
Creating channel transaction artifact and a channel
Running: kubectl create -f /home/me/blockchain-network-on-kubernetes/configFiles/create_channel.yaml
job.batch/createchannel created
Waiting for createchannel job to be completed
Waiting for createchannel job to be completed
Create Channel Failed
> kubectl get pods
NAME READY STATUS RESTARTS AGE
blockchain-ca-58b4bbbcc7-dqmnw 1/1 Running 0 30s
blockchain-orderer-ddc9466d-2sqt8 1/1 Running 0 30s
blockchain-org1peer1-ffbf698bb-fd6nf 1/1 Running 0 29s
blockchain-org2peer1-98f7fb5f9-mb5m7 1/1 Running 0 29s
blockchain-org3peer1-75d6b8bf5c-bxd24 1/1 Running 0 29s
blockchain-org4peer1-675669ffff-b4dxj 1/1 Running 0 29s
copyartifacts-dcg4m 0/1 Completed 0 60s
createchannel-9wt54 1/2 Error 0 12s
docker-dind-54767c54c5-crk7b 0/1 CrashLoopBackOff 3 73s
utils-wbpcz 0/2 Completed 0 37s
> kubectl logs createchannel-9wt54 -c createchanneltx
/shared
systemd-private-3cbb0a492497473087eda0bb66fbd738-systemd-networkd.service-QHqKfL
systemd-private-3cbb0a492497473087eda0bb66fbd738-systemd-resolved.service-NuNfWF
systemd-private-3cbb0a492497473087eda0bb66fbd738-systemd-timesyncd.service-SzE37R
2021-02-03 08:49:16.970 UTC [common.tools.configtxgen] main -> INFO 001 Loading configuration
2021-02-03 08:49:16.970 UTC [common.tools.configtxgen.localconfig] Load -> PANI 002 Error reading configuration: Unsupported Config Type ""
2021-02-03 08:49:16.970 UTC [common.tools.configtxgen] func1 -> PANI 003 Error reading configuration: Unsupported Config Type ""
panic: Error reading configuration: Unsupported Config Type "" [recovered]
panic: Error reading configuration: Unsupported Config Type ""
...

FABRIC_CFG_PATH setting is wrong.
Currently, your error is a phrase that occurs when there is a problem with the syntax in the configtx.yaml file or when the file path is wrong and cannot be found.
For configtxgen, refer to the configtx.yaml file under FABRIC_CFG_PATH.
In the tutorial you provided, configtx.yaml is not found under configFiles directory and it exists under artifacts directory.
I'll suggest two of the easiest solutions out of many.
move artifacts/configtx.yaml to configFiles/configtx.yaml
mv ./artifacts/configtx.yaml configFiles/configtx.yaml
Or, set FABRIC_CFG_PATH to configFiles
export FABRIC_CFG_PATH=${PWD}/artifacts

Mounting a Kubernetes Volume with Quarkus

I am trying to mount a volume to a Pod so that one deployment can write to it, and another deployment can read from it. I am using MiniKube with Docker on Ubuntu. I am running ./mvnw clean package -Dquarkus.kubernetes.deploy=true.
From the Quarkus documentation, it seems pretty straightforward, but I'm running into trouble.
When I add this line quarkus.kubernetes.mounts.my-volume.path=/volumePath to my application.properties, I get the following error:
[ERROR] Failed to execute goal io.quarkus:quarkus-maven-plugin:1.6.0.Final:build (default) on project getting-started: Failed to build quarkus application: io.quarkus.builder.BuildException: Build failure: Build failed due to errors
[ERROR] [error]: Build step io.quarkus.kubernetes.deployment.KubernetesDeployer#deploy threw an exception: io.dekorate.deps.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://IP:8443/apis/apps/v1/namespaces/default/deployments. Message: Deployment.apps "getting-started" is invalid: spec.template.spec.containers[0].volumeMounts[0].name: Not found: "my-volume". Received status: Status(apiVersion=v1, code=422, details=StatusDetails(causes=[StatusCause(field=spec.template.spec.containers[0].volumeMounts[0].name, message=Not found: "my-volume", reason=FieldValueNotFound, additionalProperties={})], group=apps, kind=Deployment, name=getting-started, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Deployment.apps "getting-started" is invalid: spec.template.spec.containers[0].volumeMounts[0].name: Not found: "my-volume", metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, status=Failure, additionalProperties={}).
When I add quarkus.kubernetes.config-map-volumes.my-volume.config-map-name=my-volume (along with the previous statement), the error goes away, but the pod does not start. Running "kubectl describe pods" returns:
Normal Scheduled <unknown> default-scheduler Successfully assigned default/getting-started-859d89fc8-tbg6w to minikube
Warning FailedMount 14s (x6 over 30s) kubelet, minikube MountVolume.SetUp failed for volume "my-volume" : configmap "my-volume" not found
Does it look like the volume is not being set in the YAML file?
So my question is, how can I set the name of the volume in application.properties, so I can have a volume mounted in the Pod?

I recommend you look at your kubernetes.yml and kubernetes.json files under target/kubernetes
For the first error. It looks like my-volume needs to exist in your cluster either as a Persistent Volume.
For the second error quarkus.kubernetes.config-map-volumes.my-volume.config-map-name=my-volume is meant to be used as a ConfigMap so the actual ConfigMap needs to be defined/exist in your cluster.

gcloud container clusters upgrade --node-pool: it timed out; how do i use gcloud to tell when it's done?

I'm trying to upgrade my gke cluster from this command:
gcloud container clusters upgrade CLUSTER_NAME --cluster-version=1.15.11-gke.3 \
--node-pool=default-pool --zone=ZONE
I get the following output:
Upgrading test-upgrade-172615287... Done with 0 out of 5 nodes (0.0%): 2 being processed...done.
Timed out waiting for operation <Operation
clusterConditions: []
detail: u'Done with 0 out of 5 nodes (0.0%): 2 being processed'
name: u'operation-NUM-TAG'
nodepoolConditions: []
operationType: OperationTypeValueValuesEnum(UPGRADE_NODES, 4)
progress: <OperationProgress
metrics: [<Metric
intValue: 5
name: u'NODES_TOTAL'>, <Metric
intValue: 0
name: u'NODES_FAILED'>, <Metric
intValue: 0
name: u'NODES_COMPLETE'>, <Metric
intValue: 0
name: u'NODES_DONE'>]
stages: []>
…
status: StatusValueValuesEnum(RUNNING, 2)
…>
ERROR: (gcloud.container.clusters.upgrade) Operation [DATA_SAME_AS_IN_TIMEOUT] is still running
I just discovered gcloud config set builds/timeout 3600 so I hope this doesn't happen again, like in my CI. But if it does, is there a gcloud command that lets me know that the upgrade is still in progress? These two didn't provide that:
gcloud container clusters describe CLUSTER_NAME --zone=ZONE
gcloud container node-pools describe default-pool --cluster=CLUSTER_NAME --zone=ZONE
Note: Doing this upgrade in the console took 2 hours so I'm not surprised the command-line attempt timed out. This is for a CI, so I'm fine looping and sleeping for 4 hours or so before giving up. But what's the command that will let me know when the cluster is being upgraded, and when it either finishes or fails? The UI is showing the cluster is still undergoing the upgrade, so I assume there is some command.
TIA as usual

Bumped into the same issue.
All gcloud commands, including gcloud container operations wait OPERATION_ID(https://cloud.google.com/sdk/gcloud/reference/container/operations/wait), have the same 1-hour timeout.
At this point, there is no other way to wait for the upgrade to complete than to query gcloud container operations list and check the STATUS in a loop.

Cluster autoscaler v1.0.4 kubernetes error

im getting below error
W0316 22:04:26.025272 1 clusterstate.go:514] Failed to get nodegroup for <nodename>: Wrong id: expected format aws:///<zone>/<name>, got
W0316 22:04:26.025296 1 clusterstate.go:514] Failed to get nodegroup for <nodename>: Wrong id: expected format aws:///<zone>/<name>, got
W0316 22:04:26.025303 1 clusterstate.go:514] Failed to get nodegroup for <nodename>: Wrong id: expected format aws:///<zone>/<name>, got
W0316 22:04:26.025309 1 clusterstate.go:514] Failed to get nodegroup for <nodename>: Wrong id: expected format aws:///<zone>/<name>, got
W0316 22:04:26.025316 1 clusterstate.go:514] Failed to get nodegroup for <nodename>: Wrong id: expected format aws:///<zone>/<name>, got
W0316 22:04:26.025324 1 clusterstate.go:514] Failed to get nodegroup for <nodename>: Wrong id: expected format aws:///<zone>/<name>, got
W0316 22:04:26.025340 1 clusterstate.go:560] Readiness for node group *** not found
E0316 22:04:02.705833 1 static_autoscaler.go:257] Failed to scale up: failed to build node infos for node groups: Wrong id: expected format aws:///<zone>/<name>, got
using cluster-autoscasler
https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler

That happened because some of your nodes do not have a tag which identifies your node group.
As #Matthew L Daniel mentioned in his comment, it needs a tag on AWS instance for working properly.
Here is from official documentation about how identification works and why:
It is assumed that the underlying cluster is run on top of some kind of node groups. Inside a node group, all machines have identical capacity and have the same set of assigned labels. Thus, increasing a size of a node group will create a new machine that will be similar to those already in the cluster - they will just not have any user-created pods running (but will have all pods run from the node manifest and daemon sets.)
As you can find in installation documentation:
To run a cluster-autoscaler which auto-discovers ASGs with nodes use the --node-group-auto-discovery flag and tag the ASGs with key k8s.io/cluster-autoscaler/enabled and key kubernetes.io/cluster/< YOUR CLUSTER NAME >.
So, just add that tags to your nodes.
Also, you can use as many AWS tags and Kubernetes labels for a node as you want, it will not affect autoscaler.
UPD:
The reason why Autoscaler was not working and crashed on getting ProviderID was in a missed --cloud-provider option value in Kubelet. Addin aws value should fix that kind of issues.