External IP of Google Cloud Dataproc cluster changes after cluster restart - google-cloud-dataproc

There is an option for google cloud dataproc to stop(Not delete) the cluster (Master + Worker nodes) and start as well but when we do so, external IP address of master and worker nodes are changing which causes problem for using Hue and other IP based Web UI on it.
Is there any option to persist the same IP after restart?

Though Dataproc doesn't currently provide a direct option for using static IP addresses, you can use the underlying Compute Engine interfaces to add a static IP address to your master node, possibly removing the previous "ephemeral IP address".
That said, if you're accessing your UIs through external IP addresses, that presumably means you also had to manage your firewall rules to carefully limit the inbound IP ranges. Depending on what UIs you're using, if they're not using HTTPS/SSL then that's still not ideal even if you have firewall rules limiting access from other external sources.
The recommended way to access your Dataproc UIs is through SSH tunnels; you can even add the gcloud compute ssh and browser-launching commands to a shell script for convenience if you don't want to re-type all the SSH flags each time. This approach would also ensure that links work in pages like the YARN ResourceManager, since those will be using GCE internal hostnames which your external IP address would not work for.

Related

How to assign a single static source IP address for all pods of a service or deployment in kubernetes?

Consider a microservice X which is containerized and deployed in a kubernetes cluster. X communicates with a Payment Gateway PG. However, the payment gateway requires a static IP for services contacting it as it maintains a whitelist of IP addresses which are authorized to access the payment gateway. One way for X to contact PG is through a third party proxy server like QuotaGuard which will provide a static IP address to service X which can be whitelisted by the Payment Gateway.
However, is there an inbuilt mechanism in kubernetes which can enable a service deployed in a kube-cluster to obtain a static IP address?
there's no mechanism in Kubernetes for this yet.
other possible solutions:
if nodes of the cluster are in a private network behind a NAT then just add your network's default gateway to the PG's whitelist.
if whitelist can accept a cidr apart from single IPs (like 86.34.0.0/24 for example) then add your cluster's network cidr to the whitelist
If every node of the cluster has a public IP and you can't add a cidr to the whitelist then it gets more complicated:
a naive way would be to add ever node's IP to the whitelist, but it doesn't scale above tiny clusters few just few nodes.
if you have access to administrating your network, then even though nodes have pubic IPs, you can setup a NAT for the network anyway that targets only packets with PG's IP as a destination.
if you don't have administrative access to the network, then another way is to allocate a machine with a static IP somewhere and make it act as a proxy using iptables NAT similarly like above again. This introduces a single point of failure though. In order to make it highly available, you could deploy it on a kubernetes cluster again with few (2-3) replicas (this can be the same cluster where X is running: see below). The replicas instead of using their node's IP to communicate with PG would share a VIP using keepalived that would be added to PG's whitelist. (you can have a look at easy-keepalived and either try to use it directly or learn from it how it does things). This requires high privileges on the cluster: you need be able to grant to pods of your proxy NET_ADMIN and NET_RAW capabilities in order for them to be able to add iptables rules and setup a VIP.
update:
While waiting for builds and deployments during last few days, I've polished my old VIP-iptables scripts that I used to use as a replacement for external load-balancers on bare-metal clusters, so now they can be used as well to provide egress VIP as described in the last point of my original answer. You can give them a try: https://github.com/morgwai/kevip
There are two answers to this question: for the pod IP itself, it depends on your CNI plugin. Some allow it with special pod annotations. However most CNI plugins also involve a NAT when talking to the internet so the pod IP being static on the internal network is kind of moot, what you care about is the public IP the connection ends up coming from. So the second answer is "it depends on how your node networking and NAT is set up". This is usually up to the tool you used to deploy Kubernetes (or OpenShift in your case I guess). With Kops it's pretty easy to tweak the VPC routing table.

Changing Kubernetes cluster IP to internal IP

I have created a Kubernetes cluster in Google Cloud. I have done it a few months ago and configured the cluster to have external IP address limited with authorized networks.
I want to change the cluster IP to internal IP. Is this possible without re-creating the cluster?
As documented here, you currently "cannot convert an existing, non-private cluster to a private cluster."
Having said that, you'll need to create a new private cluster from scratch, which will have both an external IP and an internal IP. However, you'll be able to disable access to the external IP or restrict access to it as per your needs. Have a look here for the different settings available.

How to make cluster nodes private on Google Kubernetes Engine?

I noticed every node in a cluster has an external IP assigned to it. That seems to be the default behavior of Google Kubernetes Engine.
I thought the nodes in my cluster should be reachable from the local network only (through its virtual IPs), but I could even connect directly to a mongo server running on a pod from my home computer just by connecting to its hosting node (without using a LoadBalancer).
I tried to make Container Engine not to assign external IPs to newly created nodes by changing the cluster instance template settings (changing property "External IP" from "Ephemeral" to "None"). But after I did that GCE was not able to start any pods (Got "Does not have minimum availability" error). The new instances did not even show in the list of nodes in my cluster.
After switching back to the default instance template with external IP everything went fine again. So it seems for some reason Google Kubernetes Engine requires cluster nodes to be public.
Could you explain why is that and whether there is a way to prevent GKE exposing cluster nodes to the Internet? Should I set up a firewall? What rules should I use (since nodes are dynamically created)?
I think Google not allowing private nodes is kind of a security issue... Suppose someone discovers a security hole on a database management system. We'd feel much more comfortable to work on fixing that (applying patches, upgrading versions) if our database nodes are not exposed to the Internet.
GKE recently added a new feature allowing you to create private clusters, which are clusters where nodes do not have public IP addresses.
This is how GKE is designed and there is no way around it that I am aware of. There is no harm in running kubernetes nodes with public IPs, and if these are the IPs used for communication between nodes you can not avoid it.
As for your security concern, if you run that example DB on kubernetes, even if you go for public IP it would not be accessible, as this would be only on the internal pod-to-pod networking, not the nodes them selves.
As described in this article, you can use network tags to identify which GCE VMs or GKE clusters are subject to certain firewall rules and network routes.
For example, if you've created a firewall rule to allow traffic to port 27017, 27018, 27019, which are the default TCP ports used by MongoDB, give the desired instances a tag and then use that tag to apply the firewall rule that allows those ports access to those instances.
Also, it is possible to create GKE cluster with applying the GCE tags on all nodes in the new node pool, so the tags can be used in firewall rules to allow/deny desired/undesired traffic to the nodes. This is described in this article under --tags flag.
Kubernetes Master is running outside your network and it needs to access your nodes. This could the the reason for having public IPs.
When you create your cluster, there are some firewall rules created automatically. These are required by the cluster, and there's e.g. ingress from master and traffic between the cluster nodes.
Network 'default' in GCP has readymade firewall rules in place. These enable all SSH and RDP traffic from internet and enable pinging of your machines. These you can remove without affecting the cluster and your nodes are not visible anymore.

Joining an external Node to an existing Kubernetes Cluster

I have a custom Kubernetes Cluster (deployed using kubeadm) running on Virtual Machines from an IAAS Provider. The Kubernetes Nodes have no Internet facing IP Adresses (except for the Master Node, which I also use for Ingress).
I'm now trying to join a Machine to this Cluster that is not hosted by my main IAAS provider. I want to do this because I need specialized computing resources for my application that are not offered by the IAAS.
What is the best way to do this?
Here's what I've tried already:
Run the Cluster on Internet facing IP Adresses
I have no trouble joining the Node when I tell kube-apiserver on the Master Node to listen on 0.0.0.0 and use public IP Adresses for every Node. However, this approach is non-ideal from a security perspective and also leads to higher cost because public IP Adresses have to be leased for Nodes that normally don't need them.
Create a Tunnel to the Master Node using sshuttle
I've had moderate success by creating a tunnel from the external Machine to the Kubernetes Master Node using sshuttle, which is configured on my external Machine to route 10.0.0.0/8 through the tunnel. This works in principle, but it seems way too hacky and is also a bit unstable (sometimes the external machine can't get a route to the other nodes, I have yet to investigate this problem further).
Here are some ideas that could work, but I haven't tried yet because I don't favor these approaches:
Use a proper VPN
I could try to use a proper VPN tunnel to connect the Machine. I don't favor this solution because it would add a (admittedly quite small) overhead to the Cluster.
Use a cluster federation
It looks like kubefed was made specifically for this purpose. However, I think this is overkill in my case: I'm only trying to join a single external Machine to the Cluster. Using Kubefed would add a ton of overhead (Federation Control Plane on my Main Cluster + Single Host Kubernetes Deployment on the external machine).
I couldn't think about any better solution than a VPN here. Especially since you have only one isolated node, it should be relatively easy to make the handshake happen between this node and your master.
Routing the traffic from "internal" nodes to this isolated node is also trivial. Because all nodes already use the master as their default gateway, modifying the route table on the master is enough to forward the traffic from internal nodes to the isolated node through the tunnel.
You have to be careful with the configuration of your container network though. Depending on the solution you use to deploy it, you may have to assign a different subnet to the Docker bridge on the other side of the VPN.

Google Container Engine: assign static IP to nodes for outbound traffic

I am using Google Container Engine to launch a cluster that connects to remote services (in a different data center / provider). The containers that are connecting may not have a kubernetes service associated with them and don't need external in-bound ip addresses. However, I want to set up firewall rules on the remote machines and have a known subnet that the nodes will be within when I expand/reduce the cluster or if a node goes down and is re-built.
In looking at Google Networks they appear to be related to internal networks (e.g. 10.128.0.0, etc). The external IP lets me set up single static IP addresses but not a range and I don't see how to apply that to a node — applying to a load balancer won't change the outbound IP address.
Is there a way I can reserve a block of IP addresses for my cluster to use in my firewall rules on my remote servers? Or is there some other solution I'm missing for this kind of thing?
The proper solution for this is to use a VPN to connect the two networks. Google Cloud VPN allows you to create this on the Google side.