Google Cloud Build deploy to GKE Private Cluster - kubernetes

I'm running a Google Kubernetes Engine with the "private-cluster" option.
I've also defined "authorized Master Network" to be able to remotely access the environment - this works just fine.
Now I want to setup some kind of CI/CD pipeline using Google Cloud Build -
after successfully building a new docker image, this new image should be automatically deployed to GKE.
When I first fired off the new pipeline, the deployment to GKE failed - the error message was something like: "Unable to connect to the server: dial tcp xxx.xxx.xxx.xxx:443: i/o timeout".
As I had the "authorized master networks" option under suspicion for being the root cause for the connection timeout, I've added 0.0.0.0/0 to the allowed networks and started the Cloud Build job again - this time everything went well and after the docker image was created it was deployed to GKE. Good.
The only problem that remains is that I don't really want to allow the whole Internet being able to access my Kubernetes master - that's a bad idea, isn't it?
Are there more elegant solutions to narrow down access by using allowed master networks and also being able to deploy via cloud build?

It's currently not possible to add Cloud Build machines to a VPC. Similarly, Cloud Build does not announce IP ranges of the build machines. So you can't do this today without creating a "ssh bastion instance" or a "proxy instance" on GCE within that VPC.
I suspect this would change soon. GCB existed before GKE private clusters and private clusters are still a beta feature.

We ended up doing the following:
1) Remove the deployment step from cloudbuild.yaml
2) Install Keel inside the private cluster and give it pub/sub editor privileges in the cloud builder / registry project
Keel will monitor changes in images and deploy them automatically based on your settings.
This has worked out great as now we get pushed sha hashed image updates, without adding vms or doing any kind of bastion/ssh host.

Updated answer (02/22/2021)
Unfortunately, while the below method works, IAP tunnels suffer from rate-limiting, it seems. If there are a lot of resources deployed via kubectl, then the tunnel times out after a while. I had to use another trick, which is to dynamically whitelist Cloud Build IP address via Terraform, and then to apply directly, which works every time.
Original answer
It is also possible to create an IAP tunnel inside a Cloud Build step:
- id: kubectl-proxy
name: gcr.io/cloud-builders/docker
entrypoint: sh
args:
- -c
- docker run -d --net cloudbuild --name kubectl-proxy
gcr.io/cloud-builders/gcloud compute start-iap-tunnel
bastion-instance 8080 --local-host-port 0.0.0.0:8080 --zone us-east1-b &&
sleep 5
This step starts a background Docker container named kubectl-proxy in cloudbuild network, which is used by all of the other Cloud Build steps. The Docker container establishes an IAP tunnel using Cloud Build Service Account identity. The tunnel connects to a GCE instance with a SOCKS or an HTTPS proxy pre-installed on it (an exercise left to the reader).
Inside subsequent steps, you can then access the cluster simply as
- id: setup-k8s
name: gcr.io/cloud-builders/kubectl
entrypoint: sh
args:
- -c
- HTTPS_PROXY=socks5://kubectl-proxy:8080 kubectl apply -f config.yml
The main advantages of this approach compared to the others suggested above:
No need to have a "bastion" host with a public IP - kubectl-proxy host can be entirely private, thus maintaining the privacy of the cluster
Tunnel connection relies on default Google credentials available to Cloud Build, and as such there's no need to store/pass any long-term credentials like an SSH key

I got cloudbuild working with my private GKE cluster following this google document:
https://cloud.google.com/architecture/accessing-private-gke-clusters-with-cloud-build-private-pools
This allows me to use cloudbuild and terraform to manage a GKE cluster with authorized network access to control plane enabled. I considered trying to maintain a ridiculous whitelist but that would ultimately defeat the purpose of using authorized network access control to begin with.
I would note that cloudbuild private pools are generally slower than non-private pools. This is due to the server-less nature of private pools. I have not experienced rate limiting so far as others have mentioned.

Our workaround was to add steps in the CI/CD -- to whitelist the cloudbuild's IP, via Authorized Master Network.
Note: Additional permission for the Cloud Build service account is needed
Kubernetes Engine Cluster Admin
On cloudbuild.yaml, add the whitelist step before the deployment/s.
This step fetches the Cloud Build's IP then updates the container clusters settings;
# Authorize Cloud Build to Access the Private Cluster (Enable Control Plane Authorized Networks)
- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
id: 'Authorize Cloud Build'
entrypoint: 'bash'
args:
- -c
- |
apt-get install dnsutils -y &&
cloudbuild_external_ip=$(dig #resolver4.opendns.com myip.opendns.com +short) &&
gcloud container clusters update my-private-cluster --zone=$_ZONE --enable-master-authorized-networks --master-authorized-networks $cloudbuild_external_ip/32 &&
echo $cloudbuild_external_ip
Since the cloud build has been whitelisted, deployments will proceed without the i/o timeout error.
This removes the complexity of setting up VPN / private worker pools.
Disable the Control Plane Authorized Networks after the deployment.
# Disable Control Plane Authorized Networks after Deployment
- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
id: 'Disable Authorized Networks'
entrypoint: 'gcloud'
args:
- 'container'
- 'clusters'
- 'update'
- 'my-private-cluster'
- '--zone=$_ZONE'
- '--no-enable-master-authorized-networks'
This approach works well even in cross-project / cross-environment deployments.

Update: I suppose this won't work with production strength for the same reason as #dinvlad's update above, i.e., rate limiting in IAP. I'll leave my original post here because it does solve the network connectivity problem, and illustrates the underlying networking mechanism.
Furthermore, even if we don't use it for Cloud Build, my method provides a way to tunnel from my laptop to a K8s private master node. Therefore, I can edit K8s yaml files on my laptop (e.g., using VS Code), and immediately execute kubectl from my laptop, rather than having to ship the code to a bastion host and execute kubectl inside the bastion host. I find this a big booster to development time productivity.
Original answer
================
I think I might have an improvement to the great solution provided by #dinvlad above.
I think the solution can be simplified without installing an HTTP Proxy Server. Still need a bastion host.
I offer the following Proof of Concept (without HTTP Proxy Server). This PoC illustrates the underlying networking mechanism without involving the distraction of Google Cloud Build (GCB). (When I have time in the future, I'll test out the full implementation on Google Cloud Build.)
Suppose:
I have a GKE cluster whose master node is private, e.g., having an IP address 10.x.x.x.
I have a bastion Compute Instance named my-bastion. It has only private IP but not external IP. The private IP is within the master authorized networks CIDR of the GKE cluster. Therefore, from within my-bastion, kubectl works against the private GKE master node. Because my-bastion doesn't have an external IP, my home laptop connects to it through IAP.
My laptop at home, with my home internet public IP address, doesn't readily have connectivity to the private GKE master node above.
The goal is for me to execute kubectl on my laptop against that private GKE cluster. From network architecture perspective, my home laptop's position is like the Google Cloud Build server.
Theory: Knowing that gcloud compute ssh (and the associated IAP) is a wrapper for SSH, the SSH Dynamic Port Forwarding should achieve that goal for us.
Practice:
## On laptop:
LAPTOP~$ kubectl get ns
^C <<<=== Without setting anything up, this hangs (no connectivity to GKE).
## Set up SSH Dynamic Port Forwarding (SOCKS proxy) from laptop's port 8443 to my-bastion.
LAPTOP~$ gcloud compute ssh my-bastion --ssh-flag="-ND 8443" --tunnel-through-iap
In another terminal of my laptop:
## Without using the SOCKS proxy, this returns my laptop's home public IP:
LAPTOP~$ curl https://checkip.amazonaws.com
199.xxx.xxx.xxx
## Using the proxy, the same curl command above now returns a different IP address,
## i.e., the IP of my-bastion.
## Note: Although my-bastion doesn't have an external IP, I have a GCP Cloud NAT
## for its subnet (for purpose unrelated to GKE or tunneling).
## Anyway, this NAT is handy as a demonstration for our curl command here.
LAPTOP~$ HTTPS_PROXY=socks5://127.0.0.1:8443 curl -v --insecure https://checkip.amazonaws.com
* Uses proxy env variable HTTPS_PROXY == 'socks5://127.0.0.1:8443' <<<=== Confirming it's using the proxy
...
* SOCKS5 communication to checkip.amazonaws.com:443
...
* TLSv1.2 (IN), TLS handshake, Finished (20): <<<==== successful SSL handshake
...
> GET / HTTP/1.1
> Host: checkip.amazonaws.com
> User-Agent: curl/7.68.0
> Accept: */*
...
< Connection: keep-alive
<
34.xxx.xxx.xxx <<<=== Returns the GCP Cloud NAT'ed IP address for my-bastion
Finally, the moment of truth for kubectl:
## On laptop:
LAPTOP~$ HTTPS_PROXY=socks5://127.0.0.1:8443 kubectl --insecure-skip-tls-verify=true get ns
NAME STATUS AGE
default Active 3d10h
kube-system Active 3d10h

It is now possible to create a pool of VM's that are connected to you private VPC and can be access from Cloud Build.
Quickstart

Related

I am not able expose a service in kubernetes cluster to the internet

I have created a simple hello world service in my kubernetes cluster. I am not using any cloud provider and have created it in a simple Ubuntu 16.04 server from scratch.
I am able to access the service inside the cluster but now when I want to expose it to the internet, it does not work.
Here is the yml file - deployment.yml
And this is the result of the command - kubectl get all:
Now when I am trying to access the external IP with the port in my browser, i.e., 172.31.8.110:8080, it does not work.
NOTE: I also tried the NodePort Service Type, but then it does not provide any external IP to me. The state remains pending under the "External IP" tab when I do "kubectl get services".
How to resolve this??
I believe you might have a mix of networking problems tied together.
First of all, 172.31.8.110 belongs to a private network, and it is not routable via Internet. So make sure that the location you are trying to browse from can reach the destination (i.e. same private network).
As a quick test you can make an ssh connection to your master node and then check if you can open the page:
curl 172.31.8.110:8080
In order to expose it to Internet, you need a to use a public IP for your master node, not internal one. Then update your Service externalIPs accordingly.
Also make sure that your firewall allows network connections from public Internet to 8080 on master node.
In any case I suggest that you use this configuration for testing purposes only, as it is generally bad idea to use master node for service exposure, because this applies extra networking load on the master and widens security surface. Use something like an Ingress controller (like Nginx or other) + Ingress resource instead.
One option is also to do SSH local port forwarding.
ssh -L <local-port><private-ip-on-your-server><remote-port> <ip-of-your-server>
So in your case for example:
ssh -L 8888:172.31.8.110:8080 <ip-of-your-ubuntu-server>
Then you can simply go to your browser and configure a SOCKS Proxy for localhost:8888.
Then you can access the site on http://localhost:8888 .

Is it possible to perform ssh to a VM within a pod?

I have a pod inside a Kubernetes cluster on GKE that remotely creates a Kubernetes cluster on Azure and I want to ssh into the master VM of the Azure cluster from the pod so I can remotely run some commands on it. However, I encountered a timeout problem whenever I run ssh / scp inside the pod:
ssh: connect to host port 22: Connection timed out
I already installed OpenSSH-client/server in my pod. I ensured that the VM has a public IP address and the pod also has access to the private key of the VM. I tried ssh into the Azure master VM on my laptop and it works just fine. Any ideas?
If you are running a private cluster in GKE, check their docs:
it says:
Private nodes do not have outbound Internet access because they don't have external IP addresses. Private Google Access provides private nodes and their workloads with limited outbound access to Google Cloud Platform APIs and services over Google's private network. For example, Private Google Access makes it possible for private nodes to pull container images from Google Container Registry, and to send logs to Stackdriver.
Check this other question => Kubernetes: Connect to the outside world from pod
Follow the below steps
deploy a test pod that has ssh binary in azure cluster.
update ssh certificates on the cluster nodes ( ignore if you already have certs )
copy ssh certs into test pod using kubectl cp command
get into test pod and ssh to any of the cluster nodes
you should be able to run commands on cluster node

Fail to connect the GKE with GCE on the same VPC?

I am new to Google Cloud Platform and the following context:
I have a Compute Engine VM running as a MongoDB server and a Compute Engine VM running as a NodeJS server already with Docker. Then the NodeJS application connects to Mongo via the default VPC internal IP. Now, I'm trying to migrate the NodeJS application to Google Kubernetes Engine, but I can't connect to the MongoDB server when I deploy the NodeJS application Docker image to the cluster.
All services like GCE and GKE are in the same region (us-east-1).
I did a hard test accessing a kubernetes cluster node via SSH and deploying a simple MongoDB Docker image and trying to connect to the remote MongoDB server via command line, but the problem is the same, time out when trying to connect.
I have also checked the firewall settings on GCP as well as the bindIp setting on the MongoDB server and it has no blocking on that.
Does anyone know what may be happening? Thank you very much.
In my case traffic from GKE to GCE VM was blocked by Google Firewall even thou both are in the same network (default).
I had to whitelist cluster pod network listed in cluster details:
Pod address range 10.8.0.0/14
https://console.cloud.google.com/kubernetes/list
https://console.cloud.google.com/networking/firewalls/list
By default, containers in a GKE cluster should be able to access GCE VMs of the same VPC through internal IPs. It is just like you access the internet (e.g., google.com) from GKE containers, GKE and VPC know how to route the traffic. The problem must be with other configurations (firewall or your application).
You can do a test, start a simple HTTP server in the GCE VM, say the internal IP is 10.138.0.5:
python -m SimpleHTTPServer 8080
then create a GKE container and try to access the service:
kubectl run my-client -it --image=tutum/curl --generator=run-pod/v1 -- curl http://10.138.0.5:8080

Node to Pod communication doesn't work on GCP by default

I am doing the CKAD (Certified Kubernetes Application Developer) 2019 using GCP (Google Cloud Platform) and I am facing timeouts issue when trying to curl the pod from another node. I set a simple Pod with a simple Service.
Looks the firewall is blocking something ip/port/protocol but I cannot find any documentation.
Any ideas?
So after some heavy investigation with tshark and google firewall I was able to unblock myself.
If you add a new firewall rule to GPC allowing ipip protocol for your node networks (in my case 10.128.0.0/9) the curl works !!
sources: https://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml
You can create nodeport service and use below command to set firewall rule.
gcloud compute firewall-rules create test-node-port --allow tcp:[NODE_PORT]
Then you can access service even from outside of cluster.

Azure Container Service with Kubernetes - Containers not able to reach Internet

I created an ACS (Azure Container Service) using Kubernetes by following this link : https://learn.microsoft.com/en-us/azure/container-service/container-service-kubernetes-windows-walkthrough & I deployed my .net 4.5 app by following this link : https://learn.microsoft.com/en-us/azure/container-service/container-service-kubernetes-ui . My app needs to access Azure SQL and other resources that are part of some other resource groups in my account, but my container is not able to make any outbound calls to network - both inside azure and to internet. I opened some ports to allow outbound connections, that is not helping either.
When I create an ACS does it come with a gateway or should I create one ? How can I configure ACS so that it allows outbound network calls ?
Thanks,
Ashok.
Outbound internet access works from an Azure Container Service (ACS) Kubernetes Windows cluster if you are connecting to IP Addresses other than the range 10.0.0.0/16 (that is you are not connecting to another service on your VNET).
Before Feb 22,2017 there was a bug where Internet access was not available.
Please try the latest deployment from ACS-Engine: https://github.com/Azure/acs-engine/blob/master/docs/kubernetes.windows.md., and open an issue there if you still see this, and we (Azure Container Service) can help you debug.
For the communication with service running inside the cluster, you can use the Kube-dns which allows you to access service by its name. You can find more details at https://kubernetes.io/docs/admin/dns/
For the external communication (internet), there is no need to create any gateway etc. By default your containers inside a pod can make outbound connections. To verify this, you can run powershell in one of your containers and try to run
wget http://www.google.com -OutFile testping.txt
Get-Contents testping.txt
and see if it works.
To run powershell, ssh to your master node - instructions here
kubectl exec -it <pod_name> -- powershell