Connect ArangoDB cluster hosted on Kubernetes pods via Python

Connect ArangoDB cluster hosted on Kubernetes pods via Python - kubernetes

I have a 3 node system, on which I have hosted a basic ArangoDB cluster having 3 dbservers. I am using Python-arango library, which says that to connect to the clusters, we need to supply the list of IPs with the port 8529 to ArangoClient class.My Python code is running in a different pod.
Using Arangosh, I can access the coordinator pods using localhost:8529 configuration. However, on Arangosh, I can only access one coordinator pod at once. Since the Arango arrangement is a cluster, I would want to connect to all three coordinator pods at once in a roundrobin manner. Now as per the Kubernetes documentation in case of MongoDB, I can supply the list of the <pod_name.service_name> to the MongoClient and it would get connected. I am trying something similar with the Arango coordinator pods, but that does not seem to work. The examples in Arango documentation only point to writing 'localhost-1','localhost-2', etc.
To summarize, the issue in hand is that I do not know how to identify the IP of the Arango coordinator pods. I tried using the internal IPs of the coordinators, I tried using the http://pod_name.8529 and I have tried using the end point of the coordinator that shows in the Web UI (which begins with ssl, but I had replaced ssl with http) and yet got no success.
Would be grateful for any help in this context. Please let me know if some additional information is needed. Am stuck with this problem since a week now.

if you follow this : https://www.arangodb.com/2018/12/deploying-arangodb-3-4-on-kubernetes/
there is YAML config files it's creating multiple PODs. How your application should be connecting with ArragoDB is using the Kubernetes service.
So your application will be connecting to Kubernetes service and by default kubernets service manage the RoundRobin load balancing.
in example you can see it's creating three services
my-arangodb-cluster ClusterIP 10.11.247.191 <none> 8529/TCP 46m
my-arangodb-cluster-ea LoadBalancer 10.11.241.253 35.239.220.180 8529:31194/TCP 46m
my-arangodb-cluster-int ClusterIP None <none> 8529/TCP 46m
service my-arangodb-cluster-ea publicly open. While service my-arangodb-cluster can be accessed internally using the application.
Your flow will be something like
Application > Arango service (Round robin) > Arango Pods

I guess you either can connect to them using LoadBalancer as mentioned above, in that case k8s will balance your requests between coordinator pods,
or you can explicitly create services for each coordinator and use their IP addresses in your python client configuration in the way you want

Related

Within a Kubernetes cluster catch outgoing requests from a Pod and redirect to a different target

I have a cluster with 3 nodes. In each node i have a frontend application running in a Pod and backend application running in a separate Pod.
I send data from the frontend application to the backend application, to do this i utilise the Cluster IP Service and k8 dns resource.
I also have a function in my frontend where i send data to a separate service unrelated to my k8s cluster. I send this data using a standard AJAX request to a url with a payload i.e http://my-seperate-service-unrelated-tok8.com.
All of this works correctly and the cluster operates as i want. - i have this cluster deployed to GKE. I now want to run this cluster local using minikube, which i have been able to do, however, when i am running locally i do not want to send data to my external service - instead i want to forward it to either a new Pod i will create or just not send it.
The problem here is i need a proxy to intercept outgoing network traffic, check if the outgoing request is the request i am looking for and if it is then redirect it.
I understand each node running in a cluster has a kube-proxy service running within the node - which is used to forward traffic to the relevant services in the cluster. I would like to either extend this service, or create a new proxy service where i can listen for outgoing traffic to a specific url and redirect it. Is this possible to do in a k8 cluster? I assume there is a Service i can create to listen for all outgoing requests and redirect specific requests based on rules i set. I wasn’t sure if k8 clusters have a Service already configured i can simply add to - that’s why i thought of the kube-proxy, would anyone be able to advice on this? I wanted to add this proxy so i don’t have to change my code when its ran locally in minikube or deployed to GKE.
Any help is greatly appreciated. Thanks!

I did a tool that help you to forward a service to another service,local port, service from other cluster, etc...
This way you can have exactly your same urls, ports and code... but the underlying services gets "replaced", if I understand correctly this is what you are looking for.
Here is a quick example of an stage service being replaced with my local 3000 port
This is the repository with more info and examples: linker-tool
If you are interested let me know if you need help or have any question.

How to setup up DNS and ingress-controllers for a public facing web app?

I'm trying to understand the concepts of ingress and ingress controllers in kubernetes. But I'm not so sure what the end product should look like. Here is what I don't fully understand:
Given I'm having a running Kubernetes cluster somewhere with a master node which runes the control plane and the etcd database. Besides that I'm having like 3 worker nodes - each of the worker nodes has a public IPv4 address with a corresponding DNS A record (worker{1,2,3}.domain.tld) and I've full control over my DNS server. I want that my users access my web application via www.domain.tld. So I point the the www CNAME to one of the worker nodes (I saw that my ingress controller i.e. got scheduled to worker1 one so I point it to worker1.domain.tld).
Now when I schedule a workload consisting of 2 frontend pods and 1 database pod with 1 service for the frontend and 1 service for the database. From what've understand right now, I need an ingress controller pointing to the frontend service to achieve some kind of load balancing. Two questions here:
Isn't running the ingress controller only on one worker node pointless to internally load balance two the two frontend pods via its service? Is it best practice to run an ingress controller on every worker node in the cluster?
For whatever reason the worker which runs the ingress controller dies and it gets rescheduled to another worker. So the ingress point will get be at another IPv4 address, right? From a user perspective which tries to access the frontend via www.domain.tld, this DNS entry has to be updated, right? How so? Do I need to run a specific kubernetes-aware DNS server somewhere? I don't understand the connection between the DNS server and the kubernetes cluster.
Bonus question: If I run more ingress controllers replicas (spread across multiple workers) do I do a DNS-round robin based approach here with multiple IPv4 addresses bound to one DNS entry? Or what's the best solution to achieve HA. I rather not want to use load balancing IP addresses where the worker share the same IP address.

Given I'm having a running Kubernetes cluster somewhere with a master
node which runes the control plane and the etcd database. Besides that
I'm having like 3 worker nodes - each of the worker nodes has a public
IPv4 address with a corresponding DNS A record
(worker{1,2,3}.domain.tld) and I've full control over my DNS server. I
want that my users access my web application via www.domain.tld. So I
point the the www CNAME to one of the worker nodes (I saw that my
ingress controller i.e. got scheduled to worker1 one so I point it to
worker1.domain.tld).
Now when I schedule a workload consisting of 2 frontend pods and 1
database pod with 1 service for the frontend and 1 service for the
database. From what've understand right now, I need an ingress
controller pointing to the frontend service to achieve some kind of
load balancing. Two questions here:
Isn't running the ingress controller only on one worker node pointless to internally load balance two the two frontend pods via its
service? Is it best practice to run an ingress controller on every
worker node in the cluster?
Yes, it's a good practice. Having multiple pods for the load balancer is important to ensure high availability. For example, if you run the ingress-nginx controller, you should probably deploy it to multiple nodes.
For whatever reason the worker which runs the ingress controller dies and it gets rescheduled to another worker. So the ingress point
will get be at another IPv4 address, right? From a user perspective
which tries to access the frontend via www.domain.tld, this DNS entry
has to be updated, right? How so? Do I need to run a specific
kubernetes-aware DNS server somewhere? I don't understand the
connection between the DNS server and the kubernetes cluster.
Yes, the IP will change. And yes, this needs to be updated in your DNS server.
There are a few ways to handle this:
assume clients will deal with outages. you can list all load balancer nodes in round-robin and assume clients will fallback. this works with some protocols, but mostly implies timeouts and problems and should generally not be used, especially since you still need to update the records by hand when k8s figures it will create/remove LB entries
configure an external DNS server automatically. this can be done with the external-dns project which can sync against most of the popular DNS servers, including standard RFC2136 dynamic updates but also cloud providers like Amazon, Google, Azure, etc.
Bonus question: If I run more ingress controllers replicas (spread
across multiple workers) do I do a DNS-round robin based approach here
with multiple IPv4 addresses bound to one DNS entry? Or what's the
best solution to achieve HA. I rather not want to use load balancing
IP addresses where the worker share the same IP address.
Yes, you should basically do DNS round-robin. I would assume external-dns would do the right thing here as well.
Another alternative is to do some sort of ECMP. This can be accomplished by having both load balancers "announce" the same IP space. That is an advanced configuration, however, which may not be necessary. There are interesting tradeoffs between BGP/ECMP and DNS updates, see this dropbox engineering post for a deeper discussion about those.
Finally, note that CoreDNS is looking at implementing public DNS records which could resolve this natively in Kubernetes, without external resources.

Isn't running the ingress controller only on one worker node pointless to internally load balance two the two frontend pods via its service? Is it best practice to run an ingress controller on every worker node in the cluster?
A quantity of replicas of the ingress will not affect the quality of load balancing. But for HA you can run more than 1 replica of the controller.
For whatever reason the worker which runs the ingress controller dies and it gets rescheduled to another worker. So the ingress point will get be at another IPv4 address, right? From a user perspective which tries to access the frontend via www.domain.tld, this DNS entry has to be updated, right? How so? Do I need to run a specific kubernetes-aware DNS server somewhere? I don't understand the connection between the DNS server and the kubernetes cluster.
Right, it will be on another IPv4. Yes, DNS should be updated for that. There are no standard tools for that included in Kubernetes. Yes, you need to run external DNS and somehow manage records on it manually (by some tools or scripts).
DNS server inside a Kubernetes cluster and your external DNS server are totally different things. DNS server inside the cluster provides resolving only inside the cluster for service discovery. Kubernetes does not know anything about access from external networks to the cluster, at least on bare-metal. In a cloud, it can manage some staff like load-balancers to automate external access management.
I run more ingress controllers replicas (spread across multiple workers) do I do a DNS-round robin based approach here with multiple IPv4 addresses bound to one DNS entry? Or what's the best solution to achieve HA.
DNS round-robin works in that case, but if one of the nodes is down, your clients will get a problem with connecting to that node, so you need to find some way to move/remove IP of that node.
The solutions for HA provided by #jjo is not the worst way to achieve what you want if you can prepare an environment for that. If not, you should choose something else, but the best practice is using a Load Balancer provided by an infrastructure. Will it be based on several dedicated servers, or load balancing IPs, or something else - it does not matter.

The behavior you describe is actually a LoadBalancer (a Service with type=LoadBalancer in Kubernetes), which is "naturally" provided when you're running Kubernetes on top of a cloud provider.
From your description, it looks like your cluster is on bare-metal (either true or virtual metal), a possible approach (that has worked for me) will be:
Deploy https://github.com/google/metallb
this is where your external IP will "live" (HA'd), via the speaker-xxx pods deployed as DaemonSet to each worker node
depending on your extn L2/L3 setup, you'll need to choose between L3 (BGP) or L2 (ARP) modes
fyi I've successfully used L2 mode + simple proxyarp at the border router
Deploy nginx-ingress controller, with its Service as type=LoadBalancer
this will make metallb to "land" (actually: L3 or L2 "advertise" ...) the assigned IP to the nodes
fyi I successfully tested it together with kube-router using --advertise-loadbalancer-ip as CNI, the effect will be that e.g. <LB_IP>:80 will be redirected to the ingress-nginx Service NodePort
Point your DNS to ingress-nginx LB IP, i.e. what's shown by:
kubectl get svc --namespace=ingress-nginx ingress-nginx -ojsonpath='{.status.loadBalancer.ingress[].ip}{"\n"}'
fyi you can also quickly test it using fake DNSing with http://A.B.C.D.xip.io/ (A.B.C.D being your public IP addr)

Here is a Kubernetes DNS add-ons Configure external DNS servers (AWS Route53, Google CloudDNS and others) for Kubernetes Ingresses and Services allowing to handle DNS record updates for ingress LoadBalancers. It allows to keep DNS record up to date according to Ingress controller config.

How to install Kubernetes dashboard on external IP address?

How to install Kubernetes dashboard on external IP address?
Is there any tutorial for this?

You can expose services and pods in several ways:
expose the internal ClusterIP service through Ingress, if you have that set up.
change the service type to use 'type: LoadBalancer', which will try to create an external load balancer.
If you have external IP addresses on your kubernetes nodes, you can also expose the ports directly on the node hosts; however, I would avoid these unless it's a small, test cluster.
change the service type to 'type: NodePort', which will utilize a port above 30000 on all cluster machines.
expose the pod directly using 'type: HostPort' in the pod spec.
Depending on your cluster type (Kops-created, GKE, EKS, AKS and so on), different variants may not be setup. Hosted clusters typically support and recommend LoadBalancers, which they charge for, but may or may not have support for NodePort/HostPort.
Another, more important note is that you must ensure you protect the dashboard. Running an unprotected dashboard is a sure way of getting your cluster compromised; this recently happened to Tesla. A decent writeup on various way to protect yourself was written by Jo Beda of Heptio

Frontend communication with API in Kubernetes cluster

Inside of a Kubernetes Cluster I am running 1 node with 2 deployments. React front-end and a .NET Core app. I also have a Load Balancer service for the front end app. (All working: I can port-forward to see the backend deployment working.)
Question: I'm trying to get the front end and API to communicate. I know I can do that with an external facing load balancer but is there a way to do that using the clusterIPs and not have an external IP for the back end?
The reason we are interested in this, it simply adds one more layer of security. Keeping the API to vnet only, we are removing one more entry point.
If it helps, we are deploying in Azure with AKS. I know they have some weird deployment things sometimes.

Pods running on the cluster can talk to each other using a ClusterIP service, which is the default service type. You don't need a LoadBalancer service to make two pods talk to each other. According to the docs on this topic
ClusterIP exposes the service on a cluster-internal IP. Choosing this value makes the service only reachable from within the cluster. This is the default ServiceType.
As explained in the Discovery documentation, if both Pods (frontend and API) are running on the same namespace, the frontend just needs to send requests to the name of the backend service.
If they are running on different namespaces, the frontend API needs to use a fully qualified domain name to be able to talk with the backend.
For example, if you have a Service called "my-service" in Kubernetes Namespace "my-ns" a DNS record for "my-service.my-ns" is created. Pods which exist in the "my-ns" Namespace should be able to find it by simply doing a name lookup for "my-service". Pods which exist in other Namespaces must qualify the name as "my-service.my-ns". The result of these name lookups is the cluster IP.
You can find more info about how DNS works on kubernetes in the docs.

The problem with this configuration is the idea that the Frontend app will be trying to reach out to the API via the internal cluster. But it will not. My app, on the client's browser can not reach services and pods in my Kluster.
My cluster will need something like nginx or another external Load Balancer to allow my client side api calls to reach my API.
You can alternatively used your front end app, as your proxy, but that is highly not advised!

I'm trying to get the front end and api to communicate
By api, if you mean the Kubernetes API server, first setup a service account and token for the front-end pod to communicate with the Kubernetes API server by following the steps here, here and here.
is there a way to do that using the clusterIPs and not have an external IP for the back end
Yes, this is possible and more secure if external access is not needed for the service. Service type ClusterIP will not have an ExternalIP and the pods can talk to each other using ClusterIP:Port within the cluster.

OpenShift and hostnetwork=true

I have deployed two POD-s with hostnetwork set to true. When the POD-s are deployed on same OpenShfit node then everything works fine since they can discover each other using node IP.
When the POD-s are deployed on different OpenShift nodes then they cant discover each other, I get no route to host if I want to point one POD to another using node IP. How to fix this?

The uswitch/kiam (https://github.com/uswitch/kiam) service is a good example of a use case.
it has an agent process that runs on the hostnetwork of all worker nodes because it modifies a firewall rule to intercept API requests (from containers running on the host) to the AWS api.
it also has a server process that runs on the hostnetwork to access the AWS api since the AWS api is on a subnet that is only available to the host network.
finally... the agent talks to the server using GRPC which connects directly to one of the IP addresses that are returned when looking up the kiam-server.
so you have pods of the agent deployment running on the hostnetwork of node A trying to connect to kiam server running on the hostnetwork of node B.... which just does not work.
furthermore, this is a private service... it should not be available from outside the network.

If you want the two containers to be share the same physical machine and take advantage of loopback for quick communications, then you would be better off defining them together as a single Pod with two containers.
If the two containers are meant to float over a larger cluster and be more loosely coupled, then I'd recommend taking advantage of the Service construct within Kubernetes (under OpenShift) and using that for the appropriate discovery.
Services are documented at https://kubernetes.io/docs/concepts/services-networking/service/, and along with an internal DNS service (if implemented - common in Kubernetes 1.4 and later) they provide a means to let Kubernetes manage where things are, updating an internal DNS entry in the form of <servicename>.<namespace>.svc.cluster.local. So for example, if you set up a Pod with a service named "backend" in the default namespace, the other Pod could reference it as backend.default.svc.cluster.local. The Kubernetes documentation on the DNS portion of this is available at https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
This also avoids the "hostnetwork=true" complication, and lets OpenShift (or specifically Kubernetes) manage the networking.

If you have to absolutely use hostnetwork, you should be creating router and then use those routers to have the communication between pods. You can create ha proxy based router in opeshift, reference here --https://docs.openshift.com/enterprise/3.0/install_config/install/deploy_router.html