I have an AKS environment based on the AKS-Construction templates
At some point fluxconfig-agent started reporting unhealthy. I checked the logs and it looks like there is a 401 when it tries to fetch config from https://eastus.dp.kubernetesconfiguration.azure.com
{"Message":"2022/10/03 17:09:01 URL:\u003e https://eastus.dp.kubernetesconfiguration.azure.com/subscriptions/xxx/resourceGroups/my-aks/provider/Microsoft.ContainerService-managedclusters/clusters/my-aks/configurations/getPendingConfigs?api-version=2021-11-01","LogType":"ConfigAgentTrace","LogLevel":"Information","Environment":"prod","Role":"ClusterConfigAgent","Location":"eastus","ArmId":"/subscriptions/xxx/resourceGroups/my-aks/providers/Microsoft.ContainerService/managedclusters/my-aks","CorrelationId":"","AgentName":"FluxConfigAgent","AgentVersion":"1.6.0","AgentTimestamp":"2022/10/03 17:09:01"}
{"Message":"2022/10/03 17:09:01 GET configurations returned response code {401}","LogType":"ConfigAgentTrace","LogLevel":"Information","Environment":"prod","Role":"ClusterConfigAgent","Location":"eastus","ArmId":"/subscriptions/xxx/resourceGroups/my-aks/providers/Microsoft.ContainerService/managedclusters/my-aks","CorrelationId":"","AgentName":"FluxConfigAgent","AgentVersion":"1.6.0","AgentTimestamp":"2022/10/03 17:09:01"}
{"Message":"2022/10/03 17:09:01 Failed to GET configurations with ResponseCode : {401}","LogType":"ConfigAgentTrace","LogLevel":"Information","Environment":"prod","Role":"ClusterConfigAgent","Location":"eastus","ArmId":"/subscriptions/xxx/resourceGroups/my-aks/providers/Microsoft.ContainerService/managedclusters/my-aks","CorrelationId":"","AgentName":"FluxConfigAgent","AgentVersion":"1.6.0","AgentTimestamp":"2022/10/03 17:09:01"}
{"Message":"Error in the getting the Configurations: error {%!s(\u003cnil\u003e)}","LogType":"ConfigAgentTrace","LogLevel":"Error","Environment":"prod","Role":"ClusterConfigAgent","Location":"eastus","ArmId":"/subscriptions/xxx/resourceGroups/my-aks/providers/Microsoft.ContainerService/managedclusters/my-aks","CorrelationId":"","AgentName":"FluxConfigAgent","AgentVersion":"1.6.0","AgentTimestamp":"2022/10/03 17:09:01"}
{"Message":"2022/10/03 17:09:01 \"Errorcode: 401, Message Unauthorized client credentials., Target /subscriptions/xxx/resourceGroups/my-aks/provider/Microsoft.ContainerService-managedclusters/clusters/my-aks/configurations/getPendingConfigs\"","LogType":"ConfigAgentTrace","LogLevel":"Information","Environment":"prod","Role":"ClusterConfigAgent","Location":"eastus","ArmId":"/subscriptions/xxx/resourceGroups/my-aks/providers/Microsoft.ContainerService/managedclusters/my-aks","CorrelationId":"","AgentName":"FluxConfigAgent","AgentVersion":"1.6.0","AgentTimestamp":"2022/10/03 17:09:01"}
Is anyone here familiar with how fluxconfig-agent authenticates and what might cause a 401 here?
Seems to have went away for now after upgrading my AKS cluster and nodes to latest Kubernetes version.
I am new to aws and kubectl, I need to deploy one of the app to aws. After deploying to eks cluster, I edited the ingress in the kubectl but unfortunately it returned 404 not found. (i am pretty sure the new service container works fine)
after checking from kubectl describe ingress, here are some events reports:
Warning FailedBuildModel 40m ingress Failed build model due to WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements
status code: 400, request id: xxxxxxxx-4a93-4e27-9d6b-xxxxxxxx
Warning FailedBuildModel 22m ingress Failed build model due to WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements
status code: 400, request id: xxxxxxxx-5368-41e1-8a4d-xxxxxxxx
Warning FailedBuildModel 5m8s ingress Failed build model due to WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements
status code: 400, request id: xxxxxxxx-20ea-4bd0-b1cb-xxxxxxxx
Anyone has ideas about this issue?
I tried to update an existing jenkins stack from https://github.com/widdix/aws-cf-templates and just modified the jenkins rpm and the AWS AMI Versions.
But on updating the stack I get this error message:
2019-12-17 19:14:16 UTC+0100 vpc-ci-jenkins UPDATE_ROLLBACK_IN_PROGRESS The following resource(s) failed to update: [MasterASG].
2019-12-17 19:14:15 UTC+0100 MasterASG UPDATE_FAILED Received 1 FAILURE signal(s) out of 1. Unable to satisfy 100% MinSuccessfulInstancesPercent requirement
Anyone an idea what might gone wrong with the update?
There were no changes in the MasterASG section of the template.
im getting below error
W0316 22:04:26.025272 1 clusterstate.go:514] Failed to get nodegroup for <nodename>: Wrong id: expected format aws:///<zone>/<name>, got
W0316 22:04:26.025296 1 clusterstate.go:514] Failed to get nodegroup for <nodename>: Wrong id: expected format aws:///<zone>/<name>, got
W0316 22:04:26.025303 1 clusterstate.go:514] Failed to get nodegroup for <nodename>: Wrong id: expected format aws:///<zone>/<name>, got
W0316 22:04:26.025309 1 clusterstate.go:514] Failed to get nodegroup for <nodename>: Wrong id: expected format aws:///<zone>/<name>, got
W0316 22:04:26.025316 1 clusterstate.go:514] Failed to get nodegroup for <nodename>: Wrong id: expected format aws:///<zone>/<name>, got
W0316 22:04:26.025324 1 clusterstate.go:514] Failed to get nodegroup for <nodename>: Wrong id: expected format aws:///<zone>/<name>, got
W0316 22:04:26.025340 1 clusterstate.go:560] Readiness for node group *** not found
E0316 22:04:02.705833 1 static_autoscaler.go:257] Failed to scale up: failed to build node infos for node groups: Wrong id: expected format aws:///<zone>/<name>, got
using cluster-autoscasler
https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler
That happened because some of your nodes do not have a tag which identifies your node group.
As #Matthew L Daniel mentioned in his comment, it needs a tag on AWS instance for working properly.
Here is from official documentation about how identification works and why:
It is assumed that the underlying cluster is run on top of some kind of node groups. Inside a node group, all machines have identical capacity and have the same set of assigned labels. Thus, increasing a size of a node group will create a new machine that will be similar to those already in the cluster - they will just not have any user-created pods running (but will have all pods run from the node manifest and daemon sets.)
As you can find in installation documentation:
To run a cluster-autoscaler which auto-discovers ASGs with nodes use the --node-group-auto-discovery flag and tag the ASGs with key k8s.io/cluster-autoscaler/enabled and key kubernetes.io/cluster/< YOUR CLUSTER NAME >.
So, just add that tags to your nodes.
Also, you can use as many AWS tags and Kubernetes labels for a node as you want, it will not affect autoscaler.
UPD:
The reason why Autoscaler was not working and crashed on getting ProviderID was in a missed --cloud-provider option value in Kubelet. Addin aws value should fix that kind of issues.
On Kubernetes 1.7, I am trying to create an ExternalAdmissionHookConfiguration. I have tried to set the failurePolicy: Fail, however, I get the following error:
The ExternalAdmissionHookConfiguration "policy-agent" is invalid: externalAdmissionHooks[0].failurePolicy: Unsupported value: "Fail": supported values: Ignore
The documentation suggests that Fail is a valid option.
https://kubernetes.io/docs/admin/extensible-admission-controllers/
It is valid as of 1.9
I'd recommend building on 1.9 admission webhooks. The pre-1.9 versions were discontinued at alpha level and redone as validating and mutating versions in 1.9