FluxCD on Azure AKS: Reconciler errors

FluxCD on Azure AKS: Reconciler errors - kubernetes

I had to rerun flux bootstrap... on my cluster after a colleague accidentally ran flux bootstrap... on their new cluster using the existing branch and cluster from the same flux repo.
Running kubectl get gitrepositories -A has no errors -
flux-system flux-system ssh://git#git.group.net:7999/psmgsbb/flux.git stored artifact for revision 'master/252f6416c034bb67f06cc3e413e66704bc6b1069'
however I am seeing these errors now when I run flux logs --level=error
error ImagePolicy/post-processing-master-branch-policy.flux-system : Reconciler error cannot determine latest tag for policy version list argument cannot be empty
error HelmRelease/post-processing.post-processing-dev : Reconciler error previous release attempt remediation failed
error ImageRepository/post-processing-repository.flux-system : Reconciler error auth for "myacr.azurecr.io" not found in secret flux-system/psbombb-image-acr-auth-cc8mg5tk84
Regarding the secret above I ran:
kubcetl get secret -n flux-system psbombb-image-acr-auth-cc8mg5tk84 -oyaml
which gave me
apiVersion: v1
data:
.dockerconfigjson:
ewoJImRhdGEiOiAie1xuICBcI...<redacted>
kind: Secret
which decodes to
"data": "{
"auths": {
"myacr.azurecr.io": {
"auth":
"YTNlMTNlOGItYWQwNi00M2IzLTkyMjgtMjA0ZmQ2ODllMD<redacted>"
}
}
}"
So the ACR above myacr.azurecr.io does match the ACR in the secret. This error doesn't make sense to me?
Reconciler error auth for "myacr.azurecr.io" not found in secret flux-system/psbombb-image-acr-auth-cc8mg5tk84
So basically, do you know why reconcile fails now after a flux bootstrap?
Thank you

When flux bootstrap... was run accidentally on the cluster it upgraded kustomize to version 0.30.2. This was causing an issue with the formatting of the encrypted dockerconfigjson secret being written to Kubernetes.
When the dockerconfigjson contents were base64 decoded there were line feeds everywhere which seems to have caused the reconciler error whereby it could not find the ACR reference -> myacr.azurecr.io
I reverted the gotk-components.yaml kustomize-controller version back to the kustomize version prior to the accidental flux boostrap... i.e. from v03.2 to v022.3.
Once the Kubernetes Secret was recreated with the correct dockerconfigjson format, reconciliation started working correctly.

Related

Flux Terraform controller not picking the correct Terraform state

I have a terraform controller for Flux running with a Github provider, however, it seems to be picking up the wrong Terraform state, so it keeps trying to recreate the resources again and again (and fails because they already exist)
This is how it is configured
apiVersion: infra.contrib.fluxcd.io/v1alpha1
kind: Terraform
metadata:
name: saas-github
namespace: flux-system
spec:
interval: 2h
approvePlan: "auto"
workspace: "prod"
backendConfig:
customConfiguration: |
backend "s3" {
bucket = "my-bucket"
key = "my-key"
region = "eu-west-1"
dynamodb_table = "state-lock"
role_arn = "arn:aws:iam::11111:role/my-role"
encrypt = true
}
path: ./terraform/saas/github
runnerPodTemplate:
metadata:
annotations:
iam.amazonaws.com/role: pod-role
sourceRef:
kind: GitRepository
name: infrastructure
namespace: flux-system
locally running terraform init with a state.config file that has a similar/same configuration it works fine and it detect the current state properly:
bucket = "my-bucket"
key = "infrastructure-github"
region = "eu-west-1"
dynamodb_table = "state-lock"
role_arn = "arn:aws:iam::111111:role/my-role"
encrypt = true
Reading the documentation I saw also a configPath that could be used, so I tried to point it to the state file, but then I got the error:
Failed to initialize kubernetes configuration: error loading config file couldn't get version/kind; json parse error
Which is weird, like it tries to load Kuberntes configuration, not Terraform, or at least it expects a json file, which is not the case of my state configuration
I'm running Terraform 1.3.1 on both locally and on the tf runner pod
On the runner pod I can see the generated_backend_config.tf and it is the same configuration and .terraform/terraform.tfstate also points to the bucket
The only suspicious thing on the logs that I could find is this:
- Finding latest version of hashicorp/github...
- Finding integrations/github versions matching "~> 4.0"...
- Finding latest version of hashicorp/aws...
- Installing hashicorp/github v5.9.1...
- Installed hashicorp/github v5.9.1 (signed by HashiCorp)
- Installing integrations/github v4.31.0...
- Installed integrations/github v4.31.0 (signed by a HashiCorp partner, key ID 38027F80D7FD5FB2)
- Installing hashicorp/aws v4.41.0...
- Installed hashicorp/aws v4.41.0 (signed by HashiCorp)
Partner and community providers are signed by their developers.
If you'd like to know more about provider signing, you can read about it here:
https://www.terraform.io/docs/cli/plugins/signing.html
Terraform has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that Terraform can guarantee to make the same selections by default when
you run "terraform init" in the future.
Warning: Additional provider information from registry
The remote registry returned warnings for
registry.terraform.io/hashicorp/github:
- For users on Terraform 0.13 or greater, this provider has moved to
integrations/github. Please update your source in required_providers.
It seems that it installs 2 github providers, one from hashicorp and one from integrations... I have changed versions of Terraform/provider during the development, and I have removed any reference to the hashicorp one, but this warning still happens
However, it also happens locally, where it reads the correct state, so I don't think it is related.

When I am trying to install Prometheus helm chart, I get this error

I am executing the below-mentioned command to install Prometheus.
helm install my-kube-prometheus-stack prometheus-community/kube-prometheus-stack
I am getting the below error message. Please advise.
Error: unable to build kubernetes objects from release manifest: error validating "": error validating data: [ValidationError(Alertmanager.spec): unknown field "alertmanagerConfigNamespaceSelector" in com.coreos.monitoring.v1.Alertmanager.spec, ValidationError(Alertmanager.spec): unknown field "alertmanagerConfigSelector" in com.coreos.monitoring.v1.Alertmanager.spec]

Hello #saerma and welcome to Stack Overflow!
#rohatgisanat might be right but without seeing your current configs it's impossible to verify that. Please check if that was the case.
There are also two other things you should look for:
If there was any previous installations of other prometheus-relevant manifest files than delete the following:
crd alertmanagerconfigs.monitoring.coreos.com
alertmanagers.monitoring.coreos.com
crd podmonitors.monitoring.coreos.com
crd probes.monitoring.coreos.com
crd prometheuses.monitoring.coreos.com
crd prometheusrules.monitoring.coreos.com
crd servicemonitors.monitoring.coreos.com
crd thanosrulers.monitoring.coreos.com
Also, check if there are any other Prometheus related config files with:
kubectl get configmap --all-namespaces
and also delete them.
Notice that deleting the CRDs will result in deleting any servicemonitors and so on, which have previously been created by other charts.
After that you can try to install again from scratch.
If installing fresh, run:
kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/v0.45.0/example/prometheus-operator-crd/monitoring.coreos.com_alertmanagers.yaml
as CRD changed with the newer version and you need to use the updated ones.
Source.

Looks like the indentation of alertmanagerConfigNamespaceSelector is wrong. It should be on the same level as alertmanagerConfigSelector. Check your values.yaml for the same.

Configuring sops/helm-secrets in flux

I currently have Flux and the helm operator installed in my cluster via the helm charts. The flux deployment is monitoring a git repo where I have a .flux.yaml which I pass a folder context via the flux deployment git-path flag. This is used to run kustomize to patch which values files I want to use for the deployment. Some of these environments have files that are encrypted via sops.
I have configured Flux with sops enabled. sops/helm secrets is using an aws kms key, so locally, I assume a role which I have granted access to encrypt/decrypt with the specified kms arn. The issue I am running into is getting these secrets decrypted prior to the helm deploy. I currently end up with the encrypted values in the final kubernetes resource. Cant seem to find any additional documentation about configuring aws access/secret keys to be used by sops on the flux side, nor anything on the helm operator to potentially do it via helm secrets. Any tips would be greatly appreciated!

It turns out there was no issue with decrypting the secret. the flux pod runs sops using the node role (which I had granted access to decypt with the necessary kms key), and was successfully decrypting secrets. I tested this by execing into the pod and trying sops -d on the file containing my secrets.
The issue ended up being that I wasnt actually passing the decrypted file to my helmrelease. I ended up accomplishing this by using the following .flux.yaml:
version: 1
patchUpdated:
generators:
- command: sops -d --output secrets.yaml secrets.enc.yaml && kustomize build .
- command: rm secrets.yaml
patchFile: ../base/flux-patch.yaml
I originally had my secrets file formatted like a helm values file, but instead updated it to be able to patch the base helmrelease file values section with the decrypted values. This results in all the decrypted values being consumed by the helmrelease. The second command removes the decrypted secrets.yaml file so that it doesnt end up getting committed back to the repo.
Keep in mind that this results in the helmrelease in the cluster containing all of your secrets, so you need to manage access to helmrelease objects accordingly.

Tool to check YAML files for Kubernetes offline

Is there some tool available that could tell me whether a K8s YAML configuration (to-be-supplied to kubectl apply) is valid for the target Kubernetes version without requiring a connection to a Kubernetes cluster?
One concrete use-case here would be to detect incompatibilities before actual deployment to a cluster, just because some already-deprecated label has been finally dropped in a newer Kubernetes version, e.g. as has happened for Helm and the switch to Kubernetes 1.16 (see Helm init fails on Kubernetes 1.16.0):
Dropped:
apiVersion: extensions/v1beta1
New:
apiVersion: apps/v1
I want to check these kind of incompatibilities within a CI system, so that I can reject it before even attempting to deploy it.

just run below command to validate the syntax
kubectl create -f <yaml-file> --dry-run
In fact the dry-run option is to validate the YAML syntax and the object schema. You can grab the output into a variable and if there is no error then rerun the command without dry-run

You could use kubeval
https://kubeval.instrumenta.dev/
I don't think kubectl support client-side only validation yet (02/2022)

Error install istio in GKE = the server could not find the requested resource (post `gatewaies.networking.istio.io`)

I have the following error when installing istio in GKE
kubernetes ver = 1.11.2-gke.18
Istio ver = 1.0.4
Kubectl = latest from repo google
Error from server (NotFound): error when creating
"`install/kubernetes/istio-demo-auth.yaml`":
the server could not find the requested resource
(post `gatewaies.networking.istio.io`)
I have tried to follow the tutorial on GCP:
https://cloud.google.com/kubernetes-engine/docs/tutorials/installing-istio

You are missing the CustomResourceDefinition required by istio and hence getting this error. You need to apply following command from istio folder:
kubectl apply -f install/kubernetes/helm/istio/templates/crds.yaml
This will create all the CRD's like virtualservice, destinationrules etc.
Try following official documentation of Istio to install it on GKE:
https://istio.io/docs/setup/kubernetes/quick-start-gke-dm/

I am also getting this issue when installing a custom Istio helm chart:
[tiller] 2019/11/15 21:50:52 failed install perform step: release test failed: the server could not find the requested resource (post gatewaies.networking.istio.io)
I've confirmed the Istio CRDs are installed properly. Note how the installed Gateway CRD explicitly notes the accepted plural name:
status:
acceptedNames:
categories:
- istio-io
- networking-istio-io
kind: Gateway
listKind: GatewayList
plural: gateways
shortNames:
- gw
singular: gateway
I created an issue on Helm to see if that is the culprit, otherwise, I can open an issue on Istio to see if that is either. I'm very confused where the source of this issue could be coming from.
**Note: ** The type of the Gateway resource is correct:
apiVersion: networking.istio.io/v1alpha3
kind: Gateway

istio works by defining a series of crds(Custom Resource Definition), for istio to work, you first need to run command like this:
kubectl apply -f install/kubernetes/helm/istio/templates/crds.yaml
for my version(istio v1.2.0), the command is
for i in install/kubernetes/helm/istio-init/files/crd*yaml; do kubectl apply -f $i; done
but as I follow the instructions from the documentatino, I still get the annoying messages:
Error from server (NotFound): error when creating "samples/bookinfo/networking/bookinfo-gateway.yaml": the server could not find the requested resource (post gatewaies.networking.istio.io)
as the hint implies, the requested resource "gatewaies.networking.istio.io" cannot be found, and then I list the crds:
kubectl get crd
and I got a list like this:
enter image description here
as I see inspect this, I find something wrong.
the message issued by kubectl is (post gatewaies.networking.istio.io), but the crd enlisted is post gateways.networking.istio.io, then everything is clear, the kubectl CLI issued a wrong plural for word "gateway", the correct form is gateways, instead of gatewaies, so to satisfy the command form, the crd must change.
And I edit this file:
vim install/kubernetes/helm/istio-init/files/crd-10.yaml
by changing the name from "gateways.networking.istio.io" to "gatewaies.networking.istio.io", everything is ok now.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

FluxCD on Azure AKS: Reconciler errors - kubernetes

Related

Flux Terraform controller not picking the correct Terraform state

When I am trying to install Prometheus helm chart, I get this error

Configuring sops/helm-secrets in flux

Tool to check YAML files for Kubernetes offline

Error install istio in GKE = the server could not find the requested resource (post `gatewaies.networking.istio.io`)

Categories

Resources