Getting ZooKeeper to run on Google's Compute Engine using external IPs - apache-zookeeper

I have been trying to setup a ZooKeeper cluster on the Google Compute Engine and have run into some issues when using the external IPs of the machines. My cluster consists of 3 nodes on their own separate instances on GCE.
Now, when I configure each node to use the external IP of the instance they seem to be unable to communicate with each other.
zoo.cfg
tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=externalIp1:2888:3888
server.2=externalIp2:2888:3888
server.3=externalIp3:2888:3888
If I configure them with their internal IP, however, everything works perfectly fine. My guess is that when ZooKeeper starts up, it binds itself to the internal IP of the instance regardless of the configurations. Because of this, when each node tries to look for the other 2 using the external IPs that they were configured, they're unable to find them.
So my question is, is there any way to make it so that ZooKeeper uses the external IP of the machine instead of the internal one? I'm relatively new to the Google Cloud Platform and to setting up hardware in general, so I'm not really sure if something like ip forwarding, firewall rules, or something else would achieve what I'm trying to do (assuming it's even possible).

According to the Zookeeper 3.4.5 docs, you need to specify the following option:
clientPortAddress
New in 3.3.0: the address (ipv4, ipv6 or hostname) to listen for client connections; that is, the address that clients attempt to connect to. This is optional, by default we bind in such a way that any connection to the clientPort for any address/interface/nic on the server will be accepted.
Although it appears that by default, it will bind to all available IPs on the server, so theoretically, it should have worked as you have set it up.
Important note: if Zookeeper instances talk to each other using external IPs rather than internal IPs, you will be charged for data egress whereas if all communication is over internal network (using internal IPs) within the same zone, you won't.

Related

Using a static IP for both ingress/egress in Kubernetes

I have a program which I'm trying to run in a Kubernetes cluster.
The program is a server that speaks a non-standard UDP-based protocol.
The protocol mostly consists of short request/reply pairs, similar to DNS.
One major difference from DNS is that both the "server" and the "clients" can send requests, ie. the communication can be initiated by either party.
The clients are embedded devices configured with the server's IP address.
The clients send their requests to this IP.
They also check that incoming messages originate from this IP, discarding messages from other IPs.
My question is how I can use Kubernetes to set up the server such that
The server accepts incoming UDP messages on a specific IP.
Real client source IPs are seen by the server.
Any replies (or other messages) the servers sends have that same IP as their source (so that the clients will accept them).
One thing I have tried that doesn't work is to set up a Service with type: LoadBalancer and externalTrafficPolicy: Local (the latter to preserve source IPs for requirement 2).
This setup fulfills requirements 1 and 2 above, but since outbound messages don't pass through the load balancer, their source IP is that of whatever node the pod containing the server is running on.
I'm running Kubernetes on Google Cloud Platform (GKE).
Please verify solution as described in:
1. Kubernetes..,
c) Source IP for Services with Type=LoadBalancer
- expose deployment as: --type=LoadBalancer
- set service.spec.externalTrafficPolicy: '{"spec":{"externalTrafficPolicy":"Local"}}'
Using the image as described in the example "echoserver" is returning my public address.

Change Kubernetes Instance Template to open HTTPS port

I was using NodePort to host a webapp on Google Container Engine (GKE). It allows you to directly point your domains to the node IP address, instead of an expensive Google load balancer. Unfortunately, instances are created with HTTP ports blocked by default, and an update locked down manually changing the nodes, as they are now created using and Instance Group/and an Immutable Instance Template.
I need to open port 443 on my nodes, how do I do that with Kubernetes or GCE? Preferably in an update resistant way.
Related github question: https://github.com/nginxinc/kubernetes-ingress/issues/502
Using port 443 on your Kubernetes nodes is not a standard practice. If you look at the docs you and see the kubelet option --service-node-port-range which defaults to 30000-32767. You could change it to 443-32767 or something. Note that every port under 1024 is restricted to root.
In summary, it's not a good idea/practice to run your Kubernetes services on port 443. A more typical scenario would be an external nginx/haproxy proxy that sends traffic to the NodePorts of your service. The other option you mentioned is using a cloud load balancer but you'd like to avoid that due to costs.
Update: A deamonset with a nodeport can handle the port opening for you. nginx/k8s-ingress has a nodeport on 443 which gets exposed by a custom firewall rule. the GCE UI will not show「Allow HTTPS traffic」as checked, because its not using the default rule.
You can do everything you do on the GUI Google Cloud Console using the Cloud SDK, most easily through the Google Cloud Shell. Here is the command for adding a network tag to a running instance. This works, even though the GUI disabled the ability to do so
gcloud compute instances add-tags gke-clusty-pool-0-7696af58-52nf --zone=us-central1-b --tags https-server,http-server
This also works on the beta, meaning it should continue to work for a bit.
See https://cloud.google.com/sdk/docs/scripting-gcloud for examples on how to automate this. Perhaps consider running on a webhook when downtime is detected. Obviously none of this is ideal.
Alternatively, you can change the templates themselves. With this method you can also add a startup to new nodes, which allows you do do things like fire a webhook with the new IP Address for a round robin low downtime dynamic dns.
Source (he had the opposite problem, his problem is our solution): https://stackoverflow.com/a/51866195/370238
If I understand correctly, if nodes can be destroyed and recreated themselves , how are you going to rest assured that certain service behind port reliably available on production w/o any sort of load balancer which takes care of route orchestration diverting port traffic to new node(s)

Assign external ip to kubernetes pod

Context:
We're working on an integration with one of our clients
In order to get access to their systems, we need to establish a VPN connection
For security reasons, we need to bind this VPN connection to a static IP on our side (basically, layer 4 security check enforced by a Juniper router; we use OpenSwan to connect to it).
To do that, we must be connecting from that IP ; that is, we need to establish a socket connection, where the source IP corresponds to that static IP from the router's perspective (and, of course, that needs to route back to our pod successfully)
Client's side has very limited resources ops-wise, so this security hoop is the only way to connect to their systems
While our current system is running (AWS) Kubernetes, which is:
Made out of transient pods, transient nodes, with shifting IPs
Can assign an ExternalIP to a service (which, in turn, can route it to a pod); however that, by default, makes no guarantees about the originator IP of the traffic initiated by that pod
For this reason, we set up an external box & assigned Elastic IP to it, as a binding for the VPN, exposing endpoints, and calling our Kubernetes Services. This introduces a single point of failure -if that box goes down, so does our integration.
Question: in what ways can this be made HA within the Kubernetes world, given the constrains on the first list above?

External IP of Google Cloud Dataproc cluster changes after cluster restart

There is an option for google cloud dataproc to stop(Not delete) the cluster (Master + Worker nodes) and start as well but when we do so, external IP address of master and worker nodes are changing which causes problem for using Hue and other IP based Web UI on it.
Is there any option to persist the same IP after restart?
Though Dataproc doesn't currently provide a direct option for using static IP addresses, you can use the underlying Compute Engine interfaces to add a static IP address to your master node, possibly removing the previous "ephemeral IP address".
That said, if you're accessing your UIs through external IP addresses, that presumably means you also had to manage your firewall rules to carefully limit the inbound IP ranges. Depending on what UIs you're using, if they're not using HTTPS/SSL then that's still not ideal even if you have firewall rules limiting access from other external sources.
The recommended way to access your Dataproc UIs is through SSH tunnels; you can even add the gcloud compute ssh and browser-launching commands to a shell script for convenience if you don't want to re-type all the SSH flags each time. This approach would also ensure that links work in pages like the YARN ResourceManager, since those will be using GCE internal hostnames which your external IP address would not work for.

Akka-cluster discovering other machines in local network

I'm trying to run http://typesafe.com/activator/template/akka-distributed-workers on few machines connected to local network.
I want to host configuration be as transparent as possible, so I set in my project configuration just linux.local (as netty.tcp.hostname and as seed nodes) and at each machine there is a avahi daemon which is resolving linux.local to appropriate IP address.
Should akka-cluster/akka-remote discover other machines automatically using gossip protocol or above configuration won't be work and I need to explicitly set on each machine the IP address e.g. passing it by argument?
You need to set the hostname configuration on each machine to be an address where that machine can be contacted by the other nodes in the cluster.
So unfortunately, the configuration does need to be different on each node. One way to do this is to override the host configuration programmatically in your application code.
The seed nodes list, however, should be the same for all the nodes, and also should be the externally accessible addresses.