Ec2 Metadata updgrade from imdSV1 to imdSV2 causes 403 and 401 error- kube2iam - kubernetes

I recently updated my ec2 instances to use imdSV2 but had to rollback because of the following issue:
It looks like after i did the upgrade my init containers started failing and i saw the following in the logs:
time="2022-01-11T14:25:01Z" level=info msg="PUT /latest/api/token (403) took 0.753220 ms" req.method=PUT req.path=/latest/api/token req.remote=XXXXX res.duration=0.75322 res.status=403 time="2022-01-11T14:25:37Z" level=error msg="Error getting instance id, got status: 401 Unauthorized"
We are using Kube2iam for the same. Any advice what changes need to be done on the Kube2iam side to support imdSV2? Below is some info from my kube2iam daemonset:
EKS =1.21
image = "jtblin/kube2iam:0.10.9"

Related

Azure AKS fluxconfig-agent 401 causing unhealthy

I have an AKS environment based on the AKS-Construction templates
At some point fluxconfig-agent started reporting unhealthy. I checked the logs and it looks like there is a 401 when it tries to fetch config from https://eastus.dp.kubernetesconfiguration.azure.com
{"Message":"2022/10/03 17:09:01 URL:\u003e https://eastus.dp.kubernetesconfiguration.azure.com/subscriptions/xxx/resourceGroups/my-aks/provider/Microsoft.ContainerService-managedclusters/clusters/my-aks/configurations/getPendingConfigs?api-version=2021-11-01","LogType":"ConfigAgentTrace","LogLevel":"Information","Environment":"prod","Role":"ClusterConfigAgent","Location":"eastus","ArmId":"/subscriptions/xxx/resourceGroups/my-aks/providers/Microsoft.ContainerService/managedclusters/my-aks","CorrelationId":"","AgentName":"FluxConfigAgent","AgentVersion":"1.6.0","AgentTimestamp":"2022/10/03 17:09:01"}
{"Message":"2022/10/03 17:09:01 GET configurations returned response code {401}","LogType":"ConfigAgentTrace","LogLevel":"Information","Environment":"prod","Role":"ClusterConfigAgent","Location":"eastus","ArmId":"/subscriptions/xxx/resourceGroups/my-aks/providers/Microsoft.ContainerService/managedclusters/my-aks","CorrelationId":"","AgentName":"FluxConfigAgent","AgentVersion":"1.6.0","AgentTimestamp":"2022/10/03 17:09:01"}
{"Message":"2022/10/03 17:09:01 Failed to GET configurations with ResponseCode : {401}","LogType":"ConfigAgentTrace","LogLevel":"Information","Environment":"prod","Role":"ClusterConfigAgent","Location":"eastus","ArmId":"/subscriptions/xxx/resourceGroups/my-aks/providers/Microsoft.ContainerService/managedclusters/my-aks","CorrelationId":"","AgentName":"FluxConfigAgent","AgentVersion":"1.6.0","AgentTimestamp":"2022/10/03 17:09:01"}
{"Message":"Error in the getting the Configurations: error {%!s(\u003cnil\u003e)}","LogType":"ConfigAgentTrace","LogLevel":"Error","Environment":"prod","Role":"ClusterConfigAgent","Location":"eastus","ArmId":"/subscriptions/xxx/resourceGroups/my-aks/providers/Microsoft.ContainerService/managedclusters/my-aks","CorrelationId":"","AgentName":"FluxConfigAgent","AgentVersion":"1.6.0","AgentTimestamp":"2022/10/03 17:09:01"}
{"Message":"2022/10/03 17:09:01 \"Errorcode: 401, Message Unauthorized client credentials., Target /subscriptions/xxx/resourceGroups/my-aks/provider/Microsoft.ContainerService-managedclusters/clusters/my-aks/configurations/getPendingConfigs\"","LogType":"ConfigAgentTrace","LogLevel":"Information","Environment":"prod","Role":"ClusterConfigAgent","Location":"eastus","ArmId":"/subscriptions/xxx/resourceGroups/my-aks/providers/Microsoft.ContainerService/managedclusters/my-aks","CorrelationId":"","AgentName":"FluxConfigAgent","AgentVersion":"1.6.0","AgentTimestamp":"2022/10/03 17:09:01"}
Is anyone here familiar with how fluxconfig-agent authenticates and what might cause a 401 here?
Seems to have went away for now after upgrading my AKS cluster and nodes to latest Kubernetes version.

Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io"

i have ingress controller up and running in default namespace. my other namespaces have their own ingress yaml files. whenever i try to deploy that. i get the following error:
Error from server (InternalError): error when creating "orchestration-ingress.yml": Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post https://ingress-nginx-controller-admission.default.svc:443/extensions/v1beta1/ingresses?timeout=30s: x509: certificate is valid for ingress-nginx-controller-admission, ingress-nginx-controller-admission.ingress-nginx.svc, not ingress-nginx-controller-admission.default.svc```
This solved my error. i removed previous version of ingress and deployed this one.
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.34.1/deploy/static/provider/cloud/deploy.yaml

fluxcd not applying anything with err="running kubectl: error: unable to recognize \"STDIN\": ..."

I recently installed FluxCD 1.19.0 on an Azure AKS k8s cluster using fluxctl install. We use a private git (self hosted bitbucket) which Flux is able to reach and check out.
Now Flux is not applying anything with the error message:
ts=2020-06-10T09:07:42.7589883Z caller=loop.go:133 component=sync-loop event=refreshed url=ssh://git#bitbucket.some-private-server.com:7999/infra/k8s-gitops.git branch=master HEAD=7bb83d1753a814c510b1583da6867408a5f7e21b
ts=2020-06-10T09:09:00.631764Z caller=sync.go:73 component=daemon info="trying to sync git changes to the cluster" old=7bb83d1753a814c510b1583da6867408a5f7e21b new=7bb83d1753a814c510b1583da6867408a5f7e21b
ts=2020-06-10T09:09:01.6130559Z caller=sync.go:539 method=Sync cmd=apply args= count=3
ts=2020-06-10T09:09:20.2097034Z caller=sync.go:605 method=Sync cmd="kubectl apply -f -" took=18.5965923s err="running kubectl: error: unable to recognize \"STDIN\": an error on the server (\"\") has prevented the request from succeeding" output=
ts=2020-06-10T09:09:38.7432182Z caller=sync.go:605 method=Sync cmd="kubectl apply -f -" took=18.5334244s err="running kubectl: error: unable to recognize \"STDIN\": an error on the server (\"\") has prevented the request from succeeding" output=
ts=2020-06-10T09:09:57.277918Z caller=sync.go:605 method=Sync cmd="kubectl apply -f -" took=18.5346491s err="running kubectl: error: unable to recognize \"STDIN\": an error on the server (\"\") has prevented the request from succeeding" output=
ts=2020-06-10T09:09:57.2779965Z caller=sync.go:167 component=daemon err="<cluster>:namespace/dev: running kubectl: error: unable to recognize \"STDIN\": an error on the server (\"\") has prevented the request from succeeding; <cluster>:namespace/prod: running kubectl: error: unable to recognize \"STDIN\": an error on the server (\"\") has prevented the request from succeeding; dev:service/hello-world: running kubectl: error: unable to recognize \"STDIN\": an error on the server (\"\") has prevented the request from succeeding"
ts=2020-06-10T09:09:57.2879489Z caller=images.go:17 component=sync-loop msg="polling for new images for automated workloads"
ts=2020-06-10T09:09:57.3002208Z caller=images.go:27 component=sync-loop msg="no automated workloads"
From what I understand, Flux passes the resource definitions to kubectl, which then applies them?
The way I interpret the error would mean that kubectl isn't passed anything to. However I opened a shell in the container and made sure Flux was in fact checking something out - which it did.
I tried raising the verbosity to 9, but it didn't return anything that I deemed relevant (detailed outputs of the http requests and responses against the Kubernetes API).
So what is happening here?
The problem was with the version of kubectl used in the 1.19 flux release, so I fixed it by using a prerelease: https://hub.docker.com/r/fluxcd/flux-prerelease/tags

Openshift 3.11 cloud integration fails with lookup RequestError: send request failed\\ncaused by: Post https://ec2.eu-west-.amazonaws.com

Following the docs: https://docs.openshift.com/container-platform/3.11/install_config/configuring_aws.html#aws-cluster-labeling
Configuring the cloud integration after the cluster build.
When the cluster services are restarted on the masters it fails looking up AWS instances:
22 16:32:10.112895 75995 server.go:261] failed to run Kubelet: could not init cloud provider "aws": error finding instance i-0c5cbd50923f9c6d2: "error listing AWS instances: \"Request.service: main process exited, code=exited, status=255/n/a Error: send request failed\\ncaused by: Post https://ec2.eu-west-.amazonaws.com/: dial tcp: lookup ec2.eu-west-.amazonaws.com: no such host\""
On closer inspection seems to be due to incorrect hostname:
https://ec2.eu-west-.amazonaws.com/ VS https://ec2.eu-west-2.amazonaws.com/
So I double checked the config, which seems to be correct:
# cat /etc/origin/cloudprovider/aws.conf
[Global]
Zone = eu-west-2
Had a google and it seems to be a similar issue to this:
https://github.com/kubernetes-sigs/kubespray/issues/4345
Is there a way to work around this? Moving off 3.11 isn't an option right now.
Thanks.
Looks as though it needs to be zone, rather than the region.
# cat /etc/origin/cloudprovider/aws.conf
[Global]
Zone = eu-west-2a

Failed to send instantiate transaction and get notifications within the timeout period. undefined[fabric1.0 k8s]

I am trying to deploy Hyperledger fabric 1.0.5 on k8s, and use the balance transfer to test it. Everything is right before instantiate-chaincode, and I get this:
[2019-01-02 23:23:14.392] [ERROR] instantiate-chaincode - Failed to send instantiate transaction and get notifications within the timeout period. undefined
[2019-01-02 23:23:14.393] [ERROR] instantiate-chaincode - Failed to order the transaction. Error code: undefined
and I use kubectl logs to get the peer0's log which is like this:
[ConnProducer] NewConnection -> ERRO 61a Failed connecting to orderer2.orderer1:7050 , error: context deadline exceeded
[ConnProducer] NewConnection -> ERRO 61b Failed connecting to orderer1.orderer1:7050 , error: context deadline exceeded
[ConnProducer] NewConnection -> ERRO 61c Failed connecting to orderer0.orderer1:7050 , error: context deadline exceeded
[deliveryClient] connect -> DEBU 61d Connected to
[deliveryClient] connect -> ERRO 61e Failed obtaining connection: Could not connect to any of the endpoints: [orderer2.orderer1:7050 orderer1.orderer1:7050 orderer0.orderer1:7050]
I checked the connectivity of orderer0:7050 and found no problem.
What should I do next?
Thank for help!
You didn't describe what runbook you followed to deploy Hyperledger Fabric but looks like your pods cannot find each other through DNS. If you are following Kubernetes standards your pods should be in the orderer1 namespace and hopefully, you have Kubernetes services for orderer0, orderer1, and orderer2.
You can read more about communication between the Fabric components here in the "Communication between Fabric components" section. Also, read on the "Work around the chaincode sandbox" where it shows you a workaround for --dns-search.
It looks like firewall problem.
In my case to run hlf on k8s, I disabled firewall service.