How to fix k8s namespace permissions in gitlab ci - kubernetes

As I'm playing around with K8s deployment and Gitlab CI my deployment got stuck with the state ContainerStarting.
To reset that, I deleted the K8s namespace using kubectl delete namespaces my-namespace.
Now my Gitlab runner shows me
$ ensure_namespace
Checking namespace [MASKED]-docker-3
error: the server doesn't have a resource type "namespace"
error: You must be logged in to the server (Unauthorized)
I think that has something to do with RBAC and most likely Gitlab created that namespace with some arguments and permissions (but I don't know exactly when and how that happens), which are missing now because of my deletion.
Anybody got an idea on how to fix this issue?

In my case I had to delete the namespace in Gitlab database, so gitlab would readd service account and namespace:
On the gitlab machine or task runner enter the PostgreSQL console:
gitlab-rails dbconsole -p
Then select the database:
\c gitlabhq_production
Next step is to find the namespace that was deleted:
SELECT id, namespace FROM clusters_kubernetes_namespaces;
Take the id of the namespace to delete it:
DELETE FROM clusters_kubernetes_namespaces WHERE id IN (6,7);
Now you can restart the pipeline and the namespace and service account will be readded.

Deleting the namespace manually caused the necessary secrets from Gitlab to get removed. It seems they get autocreated on the first ever deployment and it's impossible to repeat that process.
I had to create a new repo and push to it. Now everything works.

Another solution is removing the cluster from Gitlab (under operations/kubernetes in your repo) and re-adding it.

From GitLab 12.6 you can simply clear the cluster cache.
To clear the cache:
Navigate to your project’s Operations > Kubernetes page, and select your cluster.
Expand the Advanced settings section.
Click Clear cluster cache.
This avoids losing secrets and potentially affecting other applications.

Related

Observing weird kubernetes behavior while deleting using yaml

When I run kubectl delete deployment.yaml, It is displayed on cli that the deployment is deleted. The pod also gets into terminating state. But a new pod is again created with the same deployment and replica-set.
On further digging in I found out that deployment and RS are not being removed. Any reason why deployment and RS wouldn't be removed? Why would the be terminated if deployment isn't removed?
Any leads are appreciated.
As OP confirmed in the comments that they are running argocd then the recreation of the resources is expected behaviour if argocd is running in auto sync mode for the impacted namespace.
Here is a short snippet from the document
Argo CD has the ability to automatically sync an application when it detects differences between the desired manifests in Git, and the live state in the cluster. A benefit of automatic sync is that CI/CD pipelines no longer need direct access to the Argo CD API server to perform the deployment. Instead, the pipeline makes a commit and push to the Git repository with the changes to the manifests in the tracking Git repo.
Solution: you can disable autosync and monitor the delta and approve sync manually. This is something decided at project level. you can read about it here.

AKS enable-pod-identity fails with DevOps error

I have a node.js app running in AKS and needs to access a Key Vault. I have used Deployment center in the K8s service to set up DevOps. By mistake I did the setup in Deployment center twice which lead to two copies of .yml files (deploytoAksCluster.yml and deploytoAksCluster-1.yml). I have fixed this, but when I run the following command to enable pod identity I get an error.
az aks update -g $resource_group -n $k8s_name --enable-pod-identity
Error:
(BadRequest) Tag name cannot be hidden-DevOpsInfo:GH:my-GithubOrg/myApplication:main:deploytoAksCluster-1.yml:deploytoAksCluster-1.yml:59a0dfdb:my-akscluster:1646402646541.43;GH:my-GithubOrg/myApplication:main:deploytoAksCluster-1.yml:deploytoAksCluster-1.yml:13350477:my-akscluster:1646924094935.21; or be longer than 512 characters. Please see https://learn.microsoft.com/en-us/azure/azure-resource-manager/resource-group-using-tags for more details.
Currently I have only one workflow in GitHub (deploytoAksCluster.yml), but the error with reference to deploytoAksCluster-1.yml never goes away.
I have used this sample as inspiration: https://learn.microsoft.com/en-us/azure/aks/use-azure-ad-pod-identity#run-a-sample-application
What I have tried
removed the duplicate files
reintroduce the duplicate files
delete the deployment
This is how AKS Deployment center looks.
[1]: https://i.stack.imgur.com/ziFEk.png
Update
59a0dfdb referers to a git commit. This commit resulted in a failed workflow. The workflow has been fixed and everything deploys nicely to K8s, but --enable-pod-identity keeps complaining with the above error. I have removed the commit from github history.
I have even removed the repository in github.
Must be a git history somewhere in k8s that --enable-pod-identity is hung up on somehow?
Please retry deleting the deployment cluster again as in some cases trying twice removed the required cluster which is not required .
Also check if corresponding resource group is deleted and clear the cache .
Try update the git version depending on the type of OS you are using.
Also
NOTE : If you're using existing resources when you're creating a new cluster, such as an IP address or route table, az aks create
overwrites the set of tags. If you delete that cluster later, any tags
set by the cluster will be removed.
from Azure tags on an AKS cluster
To update the tags on an existing cluster, we need to run az aks update with the --tags parameter.
Reference
errors when trying to create update scale delete or upgrade cluster
The tag name of the cluster was auto generated to "hidden-DevOpsInfo:GH:my-GithubOrg/myApplication:main:deploytoAksCluster-1.yml:deploytoAksCluster-1.yml:59a0dfdb:my-akscluster:1646402646541.43;GH:my-GithubOrg/myApplication:main:deploytoAksCluster-1.yml:deploytoAksCluster-1.yml:13350477:my-akscluster:1646924094935.21".
The solution was in the error message: "Tag name cannot be..."
I got the tags by running:
az aks show -g $resource_group -n $k8s_name --query '[tags]'
and updated the tag with:
az aks update --resource-group $resource_group --name $k8s_name --tags "key"="Value"

How to pull from private project's image registry using GitLab managed Kubernetes clusters

GitLab offers to manage a Kubernetes cluster, which includes (e.g.) creating the namespace, adding some tokens, etc. In GitLab CI jobs, one can directly use the $KUBECONFIG variable for contacting the cluster and e.g. creating deployments using helm. This works like a charm, as long as the GitLab project is public and therefore Docker images hosted by the GitLab project's image registry are publicly accessible.
However, when working with private projects, Kubernetes of course needs an ImagePullSecret to authenticate the GitLab's image registry to retreive the image. As far as I can see, GitLab does not automatically provide an ImagePullSecret for repository access.
Therefore, my question is: What is the best way to access the image repository of private GitLab repositories in a Kubernetes deployment in a GitLab managed deployment environment?
In my opinion, these are the possibilities and why they are not eligible/optimal:
Permanent ImagePullSecret provided by GitLab: When doing a deployment on a GitLab managed Kubernetes cluster, GitLab provides a list of variables to the deployment script (e.g. Helm Chart or kubectl apply -f manifest.yml). As far as I can (not) see, there is a lot of stuff like ServiceAccounts and tokens etc., but no ImagePullSecret - and also no configuration option for enabling ImagePullSecret creation.
Using $CI_JOB_TOKEN: When working with GitLab CI/CD, GitLab provides a variable named $CI_JOB_TOKEN which can be used for uploading Docker images to the registry during job execution. This token expires after the job is done. It could be combined with helm install --wait, but when a rescheduling takes place to a new node which does not have the image yet, the token is expired and the node is not able to download the image anymore. Therefore, this only works right in the moment of deploying the app.
Creating an ImagePullSecret manually and add it to the Deployment or the default ServiceAccount: *This is a manual step, has to be repeated for each individual project and just sucks - we're trying to automate things/GitLab managed Kubernetes clusters is designed for avoiding any manual step.`
Something else but I don't know about it.
So, am I wrong in one of these points? Am I missing a eligible option in this listing?
Again: It's all about a seamless integration with the "Managed Cluster" features of GitLab. I know how to add tokens from GitLab as ImagePullSecrets in Kubernetes, but I want to know how to automate this with the Managed Cluster feature.
There is another way. You can bake the ImagePullSecret in your container runtime configuration. Docker, containerd or CRI-O (Whatever you are using)
Docker
As root run docker login <your-private-registry-url>. Then a file /root/.docker/config.json should be created/updated. Stick that in all your Kubernetes node and make sure your kubelet runs as root (which typically does). Some background info.
The content of the file should look something like this:
{
"auths": {
"my-private-registry": {
"auth": "xxxxxx"
}
},
"HttpHeaders": {
"User-Agent": "Docker-Client/18.09.2 (Linux)"
}
}
Containerd
Configure your containerd.toml file with something like this:
[plugins.cri.registry.auths]
[plugins.cri.registry.auths."https://gcr.io"]
username = ""
password = ""
auth = ""
identitytoken = ""
CRI-O
Specify the global_auth_file option in your crio.conf file.
✌️
Configure your account.
For example, for kubernetes pull image gitlab.com, use the address registry.gitlab.com:
kubectl create secret docker-registry regcred --docker-server=<your-registry-server> --docker-username=<your-name> --docker-password=<your-pword> --docker-email=<your-email>

Specifying K8s namesapce for Gitlab runner

I have a Gitlab runner using a K8s executor. But when running the pipeline I am getting below error
Checking for jobs... received job=552009999
repo_url=https://gitlab.com/deadbug/rns.git runner=ZuT1t3BJ
WARNING: Namespace is empty, therefore assuming 'default'. job=552009999 project=18763260
runner=ThT1t3BJ
ERROR: Job failed (system failure): secrets is forbidden: User "deadbug" cannot create resource
"secrets" in API group "" in the namespace "default" duration=548.0062ms job=552009999
From the error message, I undestand the namespace needs to be updated. I specified namespace in the Gitlab variables
But after this also, pipeline is failing with the above error message. How do I change the namespace for the runner ?
This seems to be linked to the permissions of the service account rather than the namespace directly. If you use GitLab's Kubernetes integration, you should not override the namespace, as GitLab will create one for you.
Make sure the service account you added to GitLab has the correct role. From https://docs.gitlab.com/ee/user/project/clusters/add_remove_clusters.html:
When GitLab creates the cluster, a gitlab service account with cluster-admin privileges is created in the default namespace to manage the newly created cluster
You may be having the same issue I was having. Instead of installing the Gitlab Runner into the existing Kubernetes cluster with helm install, I used helm template and another manager to install it (kapp). This breaks the logic in the Helm template that specifies the namespace as the one used in the helm install (See code). This led the runner to attempt to create the pods in the default namespace, instead of the namespace I created. I was able to specify it manually in my values.yml file though:
runners:
namespace: my-namespace

Deploying Spinnaker to Openshift fails at spin-redis-bootstrap stage

I'm trying to deploy Spinnaker into an Openshift cluster(v3.10) using Halyard. Everything seems to deploy OK up until the deployment of spin-redis-bootstrap. The hal deploy apply command eventually times out, with the following error in the spin-redis-bootstrap pod logs:
Redis master data doesn't exist, data won't be persistent!
mkdir: cannot create directory '/redis-master-data': Permission denied
[7] 01 Oct 17:21:04.443 # Can't chdir to '/redis-master-data': No such file or directory
Seems like a permissions issue. This error does not occur when deploying directly to Kubernetes(v1.10).
Does halyard use a specific service account to deploy the Spinnaker services, that I would need to grant additional permissions to?
Any help would be appreciated.
I was able to spin Redis for Spinnaker by changing Docker image to registry.access.redhat.com/rhscl/redis-32-rhel7 in deployment config.
The reason it was failing due to more strictly permissions in OpenShift.