AKS enable-pod-identity fails with DevOps error - kubernetes

I have a node.js app running in AKS and needs to access a Key Vault. I have used Deployment center in the K8s service to set up DevOps. By mistake I did the setup in Deployment center twice which lead to two copies of .yml files (deploytoAksCluster.yml and deploytoAksCluster-1.yml). I have fixed this, but when I run the following command to enable pod identity I get an error.
az aks update -g $resource_group -n $k8s_name --enable-pod-identity
Error:
(BadRequest) Tag name cannot be hidden-DevOpsInfo:GH:my-GithubOrg/myApplication:main:deploytoAksCluster-1.yml:deploytoAksCluster-1.yml:59a0dfdb:my-akscluster:1646402646541.43;GH:my-GithubOrg/myApplication:main:deploytoAksCluster-1.yml:deploytoAksCluster-1.yml:13350477:my-akscluster:1646924094935.21; or be longer than 512 characters. Please see https://learn.microsoft.com/en-us/azure/azure-resource-manager/resource-group-using-tags for more details.
Currently I have only one workflow in GitHub (deploytoAksCluster.yml), but the error with reference to deploytoAksCluster-1.yml never goes away.
I have used this sample as inspiration: https://learn.microsoft.com/en-us/azure/aks/use-azure-ad-pod-identity#run-a-sample-application
What I have tried
removed the duplicate files
reintroduce the duplicate files
delete the deployment
This is how AKS Deployment center looks.
[1]: https://i.stack.imgur.com/ziFEk.png
Update
59a0dfdb referers to a git commit. This commit resulted in a failed workflow. The workflow has been fixed and everything deploys nicely to K8s, but --enable-pod-identity keeps complaining with the above error. I have removed the commit from github history.
I have even removed the repository in github.
Must be a git history somewhere in k8s that --enable-pod-identity is hung up on somehow?

Please retry deleting the deployment cluster again as in some cases trying twice removed the required cluster which is not required .
Also check if corresponding resource group is deleted and clear the cache .
Try update the git version depending on the type of OS you are using.
Also
NOTE : If you're using existing resources when you're creating a new cluster, such as an IP address or route table, az aks create
overwrites the set of tags. If you delete that cluster later, any tags
set by the cluster will be removed.
from Azure tags on an AKS cluster
To update the tags on an existing cluster, we need to run az aks update with the --tags parameter.
Reference
errors when trying to create update scale delete or upgrade cluster

The tag name of the cluster was auto generated to "hidden-DevOpsInfo:GH:my-GithubOrg/myApplication:main:deploytoAksCluster-1.yml:deploytoAksCluster-1.yml:59a0dfdb:my-akscluster:1646402646541.43;GH:my-GithubOrg/myApplication:main:deploytoAksCluster-1.yml:deploytoAksCluster-1.yml:13350477:my-akscluster:1646924094935.21".
The solution was in the error message: "Tag name cannot be..."
I got the tags by running:
az aks show -g $resource_group -n $k8s_name --query '[tags]'
and updated the tag with:
az aks update --resource-group $resource_group --name $k8s_name --tags "key"="Value"

Related

Observing weird kubernetes behavior while deleting using yaml

When I run kubectl delete deployment.yaml, It is displayed on cli that the deployment is deleted. The pod also gets into terminating state. But a new pod is again created with the same deployment and replica-set.
On further digging in I found out that deployment and RS are not being removed. Any reason why deployment and RS wouldn't be removed? Why would the be terminated if deployment isn't removed?
Any leads are appreciated.
As OP confirmed in the comments that they are running argocd then the recreation of the resources is expected behaviour if argocd is running in auto sync mode for the impacted namespace.
Here is a short snippet from the document
Argo CD has the ability to automatically sync an application when it detects differences between the desired manifests in Git, and the live state in the cluster. A benefit of automatic sync is that CI/CD pipelines no longer need direct access to the Argo CD API server to perform the deployment. Instead, the pipeline makes a commit and push to the Git repository with the changes to the manifests in the tracking Git repo.
Solution: you can disable autosync and monitor the delta and approve sync manually. This is something decided at project level. you can read about it here.

Best practice for sanity test a K8s cluster? (ideally all from command line)

I am new here, I tried to search for the topic before I post here, this may have been discussed before, please let me know before being to hash on me :)
In my project, after performing some changes on either the DevOps tool sets or infrastructures, we always do some manual sanity test, this normally includes:
Building a new image and update the helm chart
Push the image to Artifactory and perform a "helm update", and see it it runs.
I want to automate the whole thing, and try to get advice from the community, here's some requirement:
Validate Jenkins agent being able to talk to cluster ( I can do this with kubectl get all -n <some_namespace_jenkins_user_has_access_to)
Validate the cluster has access to Github (let's say I am using Argo CD to sync yamls)
Validate the cluster has access to Artifactory and able to pull image ( I don't want to build a new image with new tag and update helm chart, so that to force to cluster to pull new image)
All of the above can be done in command line (so that I can implement using Jenkins groovy)
Any suggestion is welcome.
Thanks guys
Your best bet is probably a combination of custom Jenkins scripts (i.e. running kubectl in Jenkins) and some in-cluster checks (e.g. using kuberhealthy).
So, when your Jenkins pipeline is triggered, it could do the following:
Check connectivity to the cluster
Build and push an image, etc.
Trigger in-cluster checks for testing if the cluster has access to GitHub and Artifactory, e.g. by launching a custom Job in the cluster, or creating a KuberhealthyCheck custom resource if you use kuberhealthy
During all this, the Jenkins pipeline writes the results of its test as metrics to a Pushgateway which is scraped by your Prometheus. The in-cluster checks also push their results as metrics to the Pushgateway, or expose them via kuberhealthy, if you decide to use it. In the end, you should have the results of all checks in the same Prometheus instance where you can react on them, e.g. creating Prometheus alerts or Grafana dashboards.

How to fix k8s namespace permissions in gitlab ci

As I'm playing around with K8s deployment and Gitlab CI my deployment got stuck with the state ContainerStarting.
To reset that, I deleted the K8s namespace using kubectl delete namespaces my-namespace.
Now my Gitlab runner shows me
$ ensure_namespace
Checking namespace [MASKED]-docker-3
error: the server doesn't have a resource type "namespace"
error: You must be logged in to the server (Unauthorized)
I think that has something to do with RBAC and most likely Gitlab created that namespace with some arguments and permissions (but I don't know exactly when and how that happens), which are missing now because of my deletion.
Anybody got an idea on how to fix this issue?
In my case I had to delete the namespace in Gitlab database, so gitlab would readd service account and namespace:
On the gitlab machine or task runner enter the PostgreSQL console:
gitlab-rails dbconsole -p
Then select the database:
\c gitlabhq_production
Next step is to find the namespace that was deleted:
SELECT id, namespace FROM clusters_kubernetes_namespaces;
Take the id of the namespace to delete it:
DELETE FROM clusters_kubernetes_namespaces WHERE id IN (6,7);
Now you can restart the pipeline and the namespace and service account will be readded.
Deleting the namespace manually caused the necessary secrets from Gitlab to get removed. It seems they get autocreated on the first ever deployment and it's impossible to repeat that process.
I had to create a new repo and push to it. Now everything works.
Another solution is removing the cluster from Gitlab (under operations/kubernetes in your repo) and re-adding it.
From GitLab 12.6 you can simply clear the cluster cache.
To clear the cache:
Navigate to your project’s Operations > Kubernetes page, and select your cluster.
Expand the Advanced settings section.
Click Clear cluster cache.
This avoids losing secrets and potentially affecting other applications.

How do I fix "The specified cluster could not be found." in my IBM Cloud Continuous Delivery pipeline's deploy stage?

I have a pipeline that deploys to a IBM Cloud Kubernetes cluster in a resource group other than the default. Recently, I have started seeing errors of the form...
The specified cluster could not be found. If you're using resource groups, make sure that you target the correct resource group.
What can I do to fix this?
This is likely happening because of a change that was made to the IBM Cloud command line (i.e."ibmcloud ks" or "ibmcloud cs") that now requires the resource group to be set before doing a "cluster-config" command. If you are seeing this, for now you should be able to resolve it by going into your pipeline's deploy stage, selecting the correct resource group in the "Resource Group" field and then saving the stage.
We are working on an update to respond to the change in the ibmcloud command, and will release it as soon as it's finished and tested.
Likely your local resource group does not match the cluster's resource group:
Change the targeted resource group as follows:
$ ibmcloud ks clusters
Name ID State Created Workers Location Version Resource Group Name
mycluster-dev 236*******1071 normal 6 months ago 6 Washington 1.12.6_1521 My-Resource-Group
# Change the targeted resource group
$ ibmcloud target -g My-Resource-Group
$ ibmcloud ks cluster-config mycluster-dev
OK
The configuration for mycluster-dev was downloaded successfully.
Export environment variables to start using Kubernetes.
export KUBECONFIG=/home/.../.bluemix/plugins/.....yml

Auto update pod on every image push to GCR

I have a docker image pushed to Container Registry with docker push gcr.io/go-demo/servertime and a pod created with kubectl run servertime --image=gcr.io/go-demo-144214/servertime --port=8080.
How can I enable automatic update of the pod everytime I push a new version of the image?
I would suggest switching to some kind of CI to manage the process, and instead of triggering on docker push triggering the process on pushing the commit to git repository. Also if you switch to using a higher level kubernetes construct such as deployment, you will be able to run a rolling-update of your pods to your new image version. Our process is roughly as follows :
git commit #triggers CI build
docker build yourimage:gitsha1
docker push yourimage:gitsha1
sed -i 's/{{TAG}}/gitsha1/g' deployment.yml
kubectl apply -f deployment.yml
Where deployment.yml is a template for our deployment that will be updated to new tag version.
If you do it manually, it might be easier to simply update image in an existing deployment by running kubectl set image deployment/yourdeployment <containernameinpod>=yourimage:gitsha1
I'm on the Spinnaker team.
Might be a bit heavy, but without knowing your other areas of consideration, Spinnaker is a CD platform from which you can trigger k8s deployments from registry updates.
Here's a codelab to get you a started.
If you'd rather shortcut the setup process, you can get a starter Spinnaker instance with k8s and GCR integration pre-setup via the Cloud Launcher.
You can find further support on our slack channel (I'm #stevenkim).
It would need some glue, but you could use Docker Hub, which lets you define a webhook for each repository when a new image is pushed or a new tag created.
This would mean you'd have to build your own web API server to handle the incoming notifications and use them to update the pod. And you'd have to use Docker Hub, not Google Container Repository, which doesn't allow web hooks.
So, probably too many changes for the problem you're trying to solve.