gcloud credentials using wrong cluster - gcloud

I'm using container builder to with a cloudbuild.yaml, my problem an old cluster name is being used which doesn't exist. I have tried deleting my service key and creating it again to no avail.
Starting Step #3
Step #3: Already have image (with digest): gcr.io/cloud-builders/kubectl
Step #3: Running: gcloud container clusters get-credentials --project="amx-instance-1" --zone="australia-southeast1-a" "amx-cluster-au9"
Step #3: Fetching cluster endpoint and auth data.
Step #3: ERROR: (gcloud.container.clusters.get-credentials) ResponseError: code=403, message=Required "container.clusters.get" permission(s) for "projects/amx-instance-1/zones/australia-southeast1-a/clusters/amx-cluster-au9". See https://cloud.google.com/kubernetes-engine/docs/troubleshooting#gke_service_account_deleted for more info.
Cluster name amx-cluster-au9 is an old cluster that no longer exist. What is causing this issue and how can I fix it?
edit: cloudbuild.yaml file
steps:
- name: gcr.io/cloud-builders/wget
args: [
"-O",
"go-cloud-debug",
"https://storage.googleapis.com/cloud-debugger/compute-go/go-cloud-debug"
]
- name: 'gcr.io/cloud-builders/go'
args: ["install", "-gcflags=-N", "-gcflags=-l", ".", ]
env: ['PROJECT_ROOT=github.com/amalexpress/amx-server', 'CGO_ENABLED=0', 'GOOS=linux']
- name: 'gcr.io/cloud-builders/docker'
args: ['build', '--tag=gcr.io/$PROJECT_ID/amx-img:$SHORT_SHA', '.']
- name: 'gcr.io/cloud-builders/kubectl'
args:
- set
- image
- deployment
- echoserver
- echoserver=gcr.io/$PROJECT_ID/amx-img:$SHORT_SHA
env:
- 'CLOUDSDK_COMPUTE_ZONE=australia-southeast1-a'
- 'CLOUDSDK_CONTAINER_CLUSTER=amx-cluster-au-2'
images: ['gcr.io/$PROJECT_ID/amx-img:$SHORT_SHA']
Basically I don't know why it keeps referencing a cluster that I've deleted and no longer use.
Here are the logs if it might help.

The issue was resolved by deleting the repository in Cloud Source Repository. I have no idea why this fixed the issue. I should note that I deleted the github repo and re-initialised it. Still seems like a bug though as there is no indication of the root cause at all. Furthermore while the above fixed the cluster name issue, I had to follow instructions here to to give proper role access to the cluster:
PROJECT="$(gcloud projects describe \
$(gcloud config get-value core/project -q) --format='get(projectNumber)')"
gcloud projects add-iam-policy-binding $PROJECT \
--member=serviceAccount:$PROJECT#cloudbuild.gserviceaccount.com \
--role=roles/container.developer

Related

Issue Deploying Git Sync DAG to Airflow on Kubernetes

I have been trying to deploy the Git Sync DAG (v3.4.0) to my instance of Airflow (v2.4.1 with helm chart version 1.7.0) running on a kubernetes cluster (v1.23.7+rke2r2).
I followed the deployment instructions from the Airflow documentation which can be found here.
My override_values.yaml is the following.
dags:
gitSync:
enabled: true
repo: git#github.com/MY_COMPANY_NAME/MY_COMPANY-dags.git
branch: main
subPath: ""
sshKeySecret: airflow-ssh-secret
extraSecrets:
airflow-ssh-secret:
data: |
gitSshKey: 'MY_PRIVATE_KEY_IN_B64'
Once airflow is stable, I use the following helm command to update my airflow deployment.
helm upgrade --install airflow apache-airflow/airflow --namespace airflow -f override-values.yaml
This succeeds, but the deployment never achieves a new stable state with the git-sync containers. The git-sync-init repeatedly fails to complete. I have previously used this approach to deploy git-sync and it worked for months, however it stopped working suddenly. When I attempt to check the logs for the git-sync-init container, they are empty and there doesn't seem to be a verbosity attribute I can enable.
After reading through github issues on the git-sync repo, I also attempted to prepend the ssh:// scheme to the repo url, but that did not fix the issue.
Is there an alternative way for me deploy a git-sync sidecar container to my airflow deployment so that I can access code from private repos?
EDIT:
It appears like the issue was actually with the rancher GUI. Whenever I would use the GUI, the container logs and shell would not load or show anything. However, I was able to open up a kubectl shell, query for the airflow pods with kubectl get pods -n airflow, and query for the specific init container logs with ubectl logs airflow-scheduler-65fcdbb58d-4pnzf git-sync -n airflow.
This yielded the following error.
"msg"="unexpected error syncing repo, will retry" "error"="Run(git submodule update --init --recursive --depth 2): exit status 128: { stdout: "", stderr: "fatal: No url found for submodule path 'COMPANY_NAME/PACKAGE_PATH/PACKAGE' in .gitmodules\n" }"
This pointed to a misconfigured .gitmodules that was not updated when the structure of our dag repo was changed.

Unable to get ENV variables in GoCD Kubernetes using YAML config

GoCD Version: 19.12.0
I'm trying to get environment variables defined in the Kubernetes deployment (system) in my GoCD YAML config in order to pass the GitHub authentication when pulling the resource.
I've confirmed that I'm able to call the repository using a personal access token. (via https://[TOKEN]#github.com/[COMPANY]/[REPO].git)
This, of course, also works if I do the same for the actual YAML git field.
The GoCD secrets in K8s:
apiVersion: v1
data:
GITHUB_ACCESS_KEY: base64EncodedKey
kind: Secret
type: Opaque
The GoCD deployment gets the secrets:
...
spec:
containers:
- env:
- name: GOCD_PLUGIN_INSTALL_kubernetes-elastic-agents
value: https://github.com/gocd/kubernetes-elastic-agents/releases/download/v3.4.0-196/kubernetes-elastic-agent-3.4.0-196.jar
- name: GOCD_PLUGIN_INSTALL_docker-registry-artifact-plugin
value: https://github.com/gocd/docker-registry-artifact-plugin/releases/download/v1.1.0-104/docker-registry-artifact-plugin-1.1.0-104.jar
- name: GITHUB_ACCESS_KEY
valueFrom:
secretKeyRef:
key: GITHUB_ACCESS_KEY
name: gocd-server
...
I've exec'd into the pod and echoed the variable, which returns the decoded value.
The YAML:
format_version: 9
pipelines:
db-docker-build:
group: someGroup
label_template: ${COUNT}-${git[:8]}
lock_behavior: unlockWhenFinished
display_order: 1
materials:
git:
git: 'https://$GITHUB_ACCESS_KEY#github.com/[COMPANY]/[REPO].git'
shallow_clone: true
auto_update: true
branch: master
...
I'd half expect that to work, but it doesn't, it actually just gets $GITHUB_ACCESS_KEY as the value. The jobs defined in the pipeline stages are run using an elastic agent pod which also has the required secrets defined. I've tried a few
Setting env variables -
environment_variables: GIT_KEY: ${GITHUB_ACCESS_KEY}
and then using that variable
git: 'https://$GIT_KEY#github.com/[COMPANY]/[REPO].git'
Setting env variables and no quotes -
environment_variables: GIT_KEY: ${GITHUB_ACCESS_KEY}
and then using that variable
git: https://${GIT_KEY}#github.com/[COMPANY]/[REPO].git
No quotes - git: https://$GITHUB_ACCESS_KEY#github.com/[COMPANY]/[REPO].git
No quotes with brackets - git: https://${GITHUB_ACCESS_KEY}#github.com/[COMPANY]/[REPO].git
I've seen from some YAML documentation that it is recommended to use encrypted_password for the GitHub password, but this seems unnecessary since the GUI hides the token, and that its running in Kubernetes with secrets.
The team and I researched this a little further and found a workaround. Most issues and articles explain what is written in the docs, that you really need access to /bin/bash -c in order to get the variables.
The YAML plugin creator also uses secure, encrypted variables to store sensitive data which is fine, but for our team it means that a lot of Kubernetes features are not utilised.
The workaround:
Use the GUI to create a pipeline in GoCD, enter the GitHub link, add a username and the personal access token for the user as the password, test the connection is OK. Once created, go to Admin -> Pipelines and click the Download pipeline configuration and select YAML.
The generated YAML has the token encrypted as with the GoCD servers private key.

Gitlab Runner is not able to resolve DNS of Gitlab Server

I'm facing a pretty strange Problem.
First of all my setup:
I got a private Gitlab server which uses Gitlab CI Runners on Kubernetes to build Docker Images. For that purpose I use the Kaniko Image.
The Runners are provisioned by Gitlab itself with the built-in Kubernetes management. All that is running behind a PFSense server.
Now to my problem:
Sometimes the Kaniko Pods can't resolve the Hostname of the GitLab server.
This leads to failed git pull and so to a failed build.
I would rate the chance to fail by 60%, which is way too high for us.
After retrying the build a few times, it will run without any problem.
The Kubernetes Cluster running the Gitlab CI is setup on CentOS 7.
SELinux and FirewallD are disabled. All of the Hosts can resolve the GitLab Server. It is also not related to a specific Host Server, which is causing the problem. I have seen it fail on all of the 5 Servers including the Manager Server. Also I haven't seen this problem appear in other Pods. But the other Deployments in the cluster dont really do connections via DNS. I am sure that the Runner is able to access DNS at all, because it is pulling the Kaniko Image from gcr.io.
Has anyone ever seen this problem or knows a workaround?
I have already tried spawning Pods that only do DNS requests to the Domain. I didn't see a single fail.
Also I tried to Reboot the whole Cluster and Gitlab instance.
I tried to do a static overwrite of the DNS route in PFSense. Still same problem.
Here is my CI config:
build:
stage: build
image:
name: gcr.io/kaniko-project/executor:debug
entrypoint: [""]
script:
- echo $REGISTRY_AUTH > /kaniko/.docker/config.json
- /kaniko/executor --context $CI_PROJECT_DIR --dockerfile $CI_PROJECT_DIR/Dockerfile --destination $REGISTRY_URL/$REGISTRY_IMAGE:$CI_JOB_ID
only:
- master
The following error happens:
Initialized empty Git repository in /builds/MYPROJECT/.git/
Fetching changes...
Created fresh repository.
fatal: unable to access 'https://gitlab-ci-token:[MASKED]#git.mydomain.com/MYPROJECT.git/': Could not resolve host: git.mydomain.com
We had same issue for couple of days. We tried change CoreDNS config, move runners to different k8s cluster and so on. Finally today i checked my personal runner and found that i'm using different version. Runners in cluster had gitlab/gitlab-runner:alpine-v12.3.0, when mine had gitlab/gitlab-runner:alpine-v12.0.1. We added line
image: gitlab/gitlab-runner:alpine-v12.1.0
in values.yaml and this solved problem for us
There are a env for gitlab-runner that can solve this problem
- name: RUNNER_PRE_CLONE_SCRIPT
value: "exec command before git fetch ..."
for example:
edit /etc/hosts
echo '127.0.0.1 git.demo.xxxx' >> /etc/hosts
or edit /etc/resolv.conf
echo 'nameserver 8.8.8.8' > /etc/resolv.conf
hope it works for you

Google Cloud get cluster credentials unauthorized

I've created a service account in the IAM page of google cloud console but unfortunately I'm unable to assign roles to this account - or I'm missing something.
When attempting to get the cluster credentials for kubectl, GCloud always responds with the following:
gcloud container clusters get-credentials api --zone europe-west1-b --project *****
Fetching cluster endpoint and auth data.
ERROR: (gcloud.container.clusters.get-credentials) ResponseError: code=403, message=Required "container.clusters.get" permission for "projects/*****/zones/europe-west1-b/clusters/api".
I've also added all the roles to the account as demonstrated here:
gcloud projects get-iam-policy project-tilas
bindings:
- members:
- serviceAccount:travis#*****.iam.gserviceaccount.com
role: roles/container.admin
- members:
- serviceAccount:travis#*****.iam.gserviceaccount.com
role: roles/editor
- members:
- user:Tj****n#gmail.com
role: roles/owner
- members:
- serviceAccount:travis#*****.iam.gserviceaccount.com
role: roles/viewer
etag: BwVqZB734TY=
version: 1
What am I missing?
Authentication is successful, and the project id/number's match up to what I see in the GCloud dashboard...
While not a definitive answer, I actually solved this but just deleting the service account and recreating it.
Frustrating.
I was facing same issue, my fault is I was mentioning project name instead of project id. I solved the issue with:
gcloud config set project <project_id>
and then:
gcloud container clusters get-credentials <cluster-name> --zone=<zone>
Just thought about leaving this here as it may help someone in the future

Is there a way to automatically deploy to GCE based on a new image being created in Google Container Registry?

I have a Kubernetes deployment on GCE, which I'd like to get automatically updated based on new images being created in Google Container Registry (ideally via a Build Trigger). Is there a way to do that?
Thanks in advance.
-Mark
I was able to do this using GCR and Cloud Builder with a cloudbuild.yaml file like the below. For it to work, the service account with a name like xyz#cloudbuild.gserviceaccount.com had to have the IAM permissions assigned by clicking Project -> Editor. This is required so that the Cloud Build service can make SSH keys and add them to your GCE metadata to allow Cloud Builder to SSH in. This SSHing is the big work-around to effectively run any command on your GCE VM server.
steps:
# Build Docker image: docker build -f Dockerfile -t gcr.io/my-project/my-image:latest .
- name: 'gcr.io/cloud-builders/docker'
args: ['build', '-f', 'Dockerfile', '-t', 'gcr.io/my-project/my-image:latest', '.']
# Push to GCR: gcloud docker -- push gcr.io/my-project/my-image:latest
- name: 'gcr.io/cloud-builders/docker'
args: ['push', 'gcr.io/my-project/my-image:latest']
# Connect to GCE server and pull new image
- name: 'gcr.io/cloud-builders/gcloud'
args: ['compute', 'ssh', '$_SERVER', '--zone', '$_ZONE', '--command', 'gcloud docker -- pull gcr.io/my-project/my-image:latest']
# Connect to server and stop current container
- name: 'gcr.io/cloud-builders/gcloud'
args: ['compute', 'ssh', '$_SERVER', '--zone', '$_ZONE', '--command', 'docker stop my-image']
# Connect to server and stop current container
- name: 'gcr.io/cloud-builders/gcloud'
args: ['compute', 'ssh', '$_SERVER', '--zone', '$_ZONE', '--command', 'docker rm my-image']
# Connect to server and start new container
- name: 'gcr.io/cloud-builders/gcloud'
args: ['compute', 'ssh', '$_SERVER', '--zone', '$_ZONE', '--command', 'docker run --restart always --name my-image -d -p 443:443 --log-driver=gcplogs gcr.io/my-project/my-image:latest']
substitutions:
_SERVER: 'my-gce-vm-server'
_ZONE: 'us-east1-c'
Bonus Pro Tips:
the substitutions are nice in case you prop up a new server some day and want to use it instead
using --log-driver=gcplogs makes your Docker logs show up in your Google Cloud Console's Stackdriver Logging in the appropriate "GCE VM Instance". Just be sure to have "All logs" and "Any Log Level" selected since Docker logs have no log level and are not syslog or activity_log messages
Another option: some of our users use the kubectl build step to trigger a deployment at the end of their build.
You can call any kubectl command in your build step, provided that you have set up the proper IAM permissions to do so as part of a build. (See the README.) This example calls kubectl get pods.
Note that images are only automatically pushed at the end of a completed build, so to build an image and deploy it in one build, you'll need to insert your own docker push build step prior to your deployment step.
You can use Google Cloud pub/sub to listen to changes in Google Container Registry. This page gives an overview of this feature. You may want to use the push model for your application.
However, please note that this is an alpha feature and its behavior may change in future releases.
If you don't want the external control provided by pub/sub, your build script should do the following,
Tag the image and upload to container registry
Update image version in deployment scripts
Run the deployment script which in turn will pull the latest image