Should my database server be a Pod on the same Service - kubernetes

In terms of providing a url (to a postgres database) for my web server. Should the postgres database be behind it's own Service or is it okay for it to be a Pod on the same Service as the web server?
Can I configure a Pod to have a FQDN that doesn't change?

Its absolutely fine and I would say recommended to keep the database behind its own service in k8s.
The database would need to be backed by a persistent volume as well.
You can reference the service in other webserver/application pods.
As long as you expose the service properly, FQDN should work.
"This is one of the simpler methods, you could evolve based on your network design"

Related

CloudSQL Proxy on GKE : Service vs Sidecar

Does anyone know the pros and cons for installing the CloudSQL-Proxy (that allows us to connect securely to CloudSQL) on a Kubernetes cluster as a service as opposed to making it a sidecar against the application container?
I know that it is mostly used as a sidecar. I have used it as both (in non-production environments), but I never understood why sidecar is more preferable to service. Can someone enlighten me please?
The sidecar pattern is preferred because it is the easiest and more secure option. Traffic to the Cloud SQL Auth proxy is not encrypted or authenticated, and relies on the user to restrict access to the proxy (typically be running local host).
When you run the Cloud SQL proxy, you are essentially saying "I am user X and I'm authorized to connect to the database". When you run it as a service, anyone that connects to that database is connecting authorized as "user X".
You can see this warning in the Cloud SQL proxy example running as a service in k8s, or watch this video on Connecting to Cloud SQL from Kubernetes which explains the reason as well.
The Cloud SQL Auth proxy is the recommended way to connect to Cloud SQL, even when using private IP. This is because the Cloud SQL Auth proxy provides strong encryption and authentication using IAM, which can help keep your database secure.
When you connect using the Cloud SQL Auth proxy, the Cloud SQL Auth proxy is added to your pod using the sidecar container pattern. The Cloud SQL Auth proxy container is in the same pod as your application, which enables the application to connect to the Cloud SQL Auth proxy using localhost, increasing security and performance.
As sidecar is a container that runs on the same Pod as the application container, because it shares the same volume and network as the main container, it can “help” or enhance how the application operates. In Kubernetes, a pod is a group of one or more containers with shared storage and network. A sidecar is a utility container in a pod that’s loosely coupled to the main application container.
Sidecar Pros: Scales indefinitely as you increase the number of pods. Can be injected automatically. Already used by serviceMeshes.
Sidecar Cons: A bit difficult to adopt, as developers can't just deploy their app, but deploy a whole stack in a deployment. It consumes much more resources and it is harder to secure because every Pod must deploy the log aggregator to push the logs to the database or queue.
Refer to the documentation for more information.

My understanding of headless service in k8s and two questions to verify

I am learning the headless service of kubernetes.
I understand the following without question (please correct me if I am wrong):
A headless service doesn't have a cluster IP,
It is used for communicating with stateful app
When client app container/pod communicates with a database pod via headless service the pod IP address is returned instead of the service's.
What I don't quite sure:
Many articles on internet explaining headless service is vague in my opinion. Because all I found only directly state something like :
If you don't need load balancing but want to directly connect to the
pod (e.g. database) you can use headless service
But what does it mean exactly?
So, following are my thoughts of headless service in k8s & two questions with an example
Let's say I have 3 replicas of PostgreSQL database instance behind a service, if it is a regular service I know by default request to database would be routed in a round-robin fasion to one of the three database pod. That's indeed a load balancing.
Question 1:
If using headless service instead, does the above quoted statement mean the headless service will stick with one of the three database pod, never change until the pod dies? I ask this because otherwise it would still be doing load balancing if not stick with one of the three pod. Could some one please clarify it?
Question 2:
I feel no matter it is regular service or headless service, client application just need to know the DNS name of the service to communicate with database in k8s cluster. Isn't it so? I mean what's the point of using the headless service then? To me the headless service only makes sense if client application code really needs to know the IP address of the pod it connects to. So, as long as client application doesn't need to know the IP address it can always communicate with database either with regular service or with headless service via the service DNS name in cluster, Am I right here?
A normal Service comes with a load balancer (even if it's a ClusterIP-type Service). That load balancer has an IP address. The in-cluster DNS name of the Service resolves to the load balancer's IP address, which then forwards to the selected Pods.
A headless Service doesn't have a load balancer. The DNS name of the Service resolves to the IP addresses of the Pods themselves.
This means that, with a headless Service, basically everything is up to the caller. If the caller does a DNS lookup, picks the first address it's given, and uses that address for the lifetime of the process, then it won't round-robin requests between backing Pods, and it will not notice if that Pod disappears. With a normal Service, so long as the caller gets the Service's (cluster-internal load balancer's) IP address, these concerns are handled automatically.
A headless Service isn't specifically tied to stateful workloads, except that StatefulSets require a headless Service as part of their configuration. An individual StatefulSet Pod will actually be given a unique hostname connected to that headless Service. You can have both normal and headless Services pointing at the same Pods, though, and it might make sense to use a normal Service for cases where you don't care which replica is (initially) contacted.
A headless service will return all Pod IPs that are associated through the selector. The order is not stable, so if a client is making repeated DNS queries and uses only the first returned IP, this will result in some kind of load balancing as well.
Regarding your second question: That is correct. In general, if a client does not need to know all instances - and handle the unstable IPs - a regular service provides more benefits.

How to connect to PostgreSQL cluster on DigitalOcean from CircleCI?

I have a Kubernetes cluster setup on DigitalOcean and a separate database Postgres instance there. In database cluster settings there is a list of limited IP addresses that have an access to that database cluster (looks like a great idea).
I have a build and deploy proccess setup with CircleCI and at the end of that process, after deploying a container to K8s cluster, I need to run database migration. The problem is that I don't know CircleCI agent IP address and can not allow it in DO settings. Does anybody know how we can access DigitalOcean Postgres cluster from within CircleCI steps?
Unfortunately when you use a distributed service like that that you don't manage, I would be very cautious about using the restricted IP approach. (Really you have three services you don't manage - Postgres, Kubernetes, and CircleCI.) I feel as if DigitalOcean has provided a really excellent security option for internal networking, since it can track changes in droplet IP, etc.
But when you are deploying on another service, especially if this is for production, and even if the part of your solution you're deploying is deployed (partially) on DigitalOcean infrastructure, I'd be very concerned that CircleCI will change IP dynamically. DO has no way of knowing when this happens, as unlike Postgres and Kubernetes, they don't manage it even if they do host part of it
Essentially I have to advise you to either get an assurance of a static IP from your CircleCI vendor/provider, or disable the IP limitation on Postgres.

Two services for the same Pod on GKE

Question
Is it problematic to create two Services for the same pod, one for internal access and the other for external access?
Context
I have a simple app running on GKE.
There are two pods, each with one container:
flask-pod, which runs a containerized flask app
postgres-pod, which runs a containerized postgres DB
The flask app accesses the postgres DB through a ClusterIP Service around the postgres DB.
Concern
I also have connected a client app, TablePlus (running on my machine), to the postgres DB through a LoadBalancer Service. Now I have 2 separate services to access my postgres DB. Is this redundant, or can this cause problems?
Thanks for your help.
It is perfectly fine. If you look at StatefulSets, you define one headless service that is used for internal purpose and another service to allow access from clients.
This approach is absolutely valid, there is nothing wrong with it. You can create as many Services per Pod as you like.

How to connect different deployments in Kubernetes?

I have two back-end deployments, REST server and a database server, each running on some specific ports. The REST server internally calls a database server.
Now how do I refer my database server deployment in my REST server deployment so that they can communicate with each other?
first, define a service for your DB server, that will create sort of a loadbalancer (internal kube integration based on iptables in most cases). With that, you will be able to refer to it by service name or fqdn like mydbsvc.namespace.svc.cluster.local. Which will return "Cluster IP" to that loadbalancer.
Then it's just an issue of regular app config to point it to your DB on mydbsvc, preferably by means of env variable like say DB_HOST=mydbsvc set in your REST API deployment manifest (pod template envs)
Expose your deployments as service. For example, kubectl expose ...
Connect/Allow these to communicate by creating network policies.
Service object (of database) will give you a virtual (stable) IP. Depending upon the type of service your rest code can call DB via clusterIP/externalName/externalIP/DNS.