How to connect to PostgreSQL cluster on DigitalOcean from CircleCI? - postgresql

I have a Kubernetes cluster setup on DigitalOcean and a separate database Postgres instance there. In database cluster settings there is a list of limited IP addresses that have an access to that database cluster (looks like a great idea).
I have a build and deploy proccess setup with CircleCI and at the end of that process, after deploying a container to K8s cluster, I need to run database migration. The problem is that I don't know CircleCI agent IP address and can not allow it in DO settings. Does anybody know how we can access DigitalOcean Postgres cluster from within CircleCI steps?

Unfortunately when you use a distributed service like that that you don't manage, I would be very cautious about using the restricted IP approach. (Really you have three services you don't manage - Postgres, Kubernetes, and CircleCI.) I feel as if DigitalOcean has provided a really excellent security option for internal networking, since it can track changes in droplet IP, etc.
But when you are deploying on another service, especially if this is for production, and even if the part of your solution you're deploying is deployed (partially) on DigitalOcean infrastructure, I'd be very concerned that CircleCI will change IP dynamically. DO has no way of knowing when this happens, as unlike Postgres and Kubernetes, they don't manage it even if they do host part of it
Essentially I have to advise you to either get an assurance of a static IP from your CircleCI vendor/provider, or disable the IP limitation on Postgres.

Related

CloudSQL Proxy on GKE : Service vs Sidecar

Does anyone know the pros and cons for installing the CloudSQL-Proxy (that allows us to connect securely to CloudSQL) on a Kubernetes cluster as a service as opposed to making it a sidecar against the application container?
I know that it is mostly used as a sidecar. I have used it as both (in non-production environments), but I never understood why sidecar is more preferable to service. Can someone enlighten me please?
The sidecar pattern is preferred because it is the easiest and more secure option. Traffic to the Cloud SQL Auth proxy is not encrypted or authenticated, and relies on the user to restrict access to the proxy (typically be running local host).
When you run the Cloud SQL proxy, you are essentially saying "I am user X and I'm authorized to connect to the database". When you run it as a service, anyone that connects to that database is connecting authorized as "user X".
You can see this warning in the Cloud SQL proxy example running as a service in k8s, or watch this video on Connecting to Cloud SQL from Kubernetes which explains the reason as well.
The Cloud SQL Auth proxy is the recommended way to connect to Cloud SQL, even when using private IP. This is because the Cloud SQL Auth proxy provides strong encryption and authentication using IAM, which can help keep your database secure.
When you connect using the Cloud SQL Auth proxy, the Cloud SQL Auth proxy is added to your pod using the sidecar container pattern. The Cloud SQL Auth proxy container is in the same pod as your application, which enables the application to connect to the Cloud SQL Auth proxy using localhost, increasing security and performance.
As sidecar is a container that runs on the same Pod as the application container, because it shares the same volume and network as the main container, it can “help” or enhance how the application operates. In Kubernetes, a pod is a group of one or more containers with shared storage and network. A sidecar is a utility container in a pod that’s loosely coupled to the main application container.
Sidecar Pros: Scales indefinitely as you increase the number of pods. Can be injected automatically. Already used by serviceMeshes.
Sidecar Cons: A bit difficult to adopt, as developers can't just deploy their app, but deploy a whole stack in a deployment. It consumes much more resources and it is harder to secure because every Pod must deploy the log aggregator to push the logs to the database or queue.
Refer to the documentation for more information.

How to whitelist entire kubernetes cluster on external server

I have a kubernetes cluster with several nodes, and it is connecting to a SQL server outside of the cluster. How can I whitelist these (potentially changing) nodes on the SQL server firewall, without having to whitelist each Node's external IP independently?
Is there a clean solution for this? Perhaps some intra-cluster tooling to route all requests through a single node?
You would have to use a NAT. It is possible, but fiddly (we do this weekly in order to connect to a hosted service to make backups, and the hosted service only whitelists a specific IP.)
We used Terraform for spinning up a cluster, then deploying our backup job to it so it could connect to the hosted service, and since it was going via the NAT IP, the remote host would allow the connection.
We used Cloud NAT via Terraform (as we were on GKE): https://registry.terraform.io/modules/terraform-google-modules/cloud-nat/google/latest
Though there are surely similar options for whichever Kubernetes provider you are using. If you are running bare-metal, you'll need to do the routing yourself.

Good solutions to automate infrastructure deployment locally?

I have recently been reading more about infrastructure as a service (IaaS) and platform as a service (PaaS) and had some questions. I see when we opt for a PaaS solution, it is generally very easy to create the infrastructure as the cloud providers handle that for us and we can even automate the deployment using an infrastructure as code solution like Terraform.
But if we use an IaaS solution or even a local on premise cluster, we lose a lot of the automation it seems that PaaS allows. So I was curious, are there any good tools out there for automating infrastructure deployment on a local cluster that is not in the cloud?
The best thing I could think of was to run a local Kubernetes cluster and then Dockerize each of the infrastructure components, but this seems difficult as each node in the cluster will need its own specific configuration files.
From my basic Googling, it seems like there is not a good solution to this.
Edit:
I was not clear enough with my original intentions. I have two problems I am trying to solve.
How do I automate infrastructure deployment locally? For example, suppose I wanted to create a Hadoop HDFS cluster. I would need to configure one node to be the namenode with an accessible IP, and the other nodes to be datanodes that are aware of the namenode's IP. At the moment, I have to do this manually by logging into each node, checking it's IP, and then configuring each one. How would I automate this? If I were to use a Kubernetes approach, how do I specify that one of the running pods needs to be the namenode and the others are datanodes? How do I find the pods' IPs and have them be aware of the namenode IP?
The next problem I have is very similar to the first, but a slight modification. How would I deploy specific configuration files to each node. For instance in Kafka, the configuration file for one node, requires the IPs of the Zookeeper nodes, as well as the IP it should listen on. This may be different for every node in the cluster. Is there a good way to make these config files pod specific, so that I do not have to do bash text processing to insert the correct contents into each pod's config files?
You can use Terraform for all of your on-premise Infra. Automation, and Ansible for configuration management.
Let's say you have three HPE servers, Install K8s or VMware on them using Ansible, then you can treat them as three Avvaliabilty zones in one region, same as AWS. from this you can start deploying dockerize apps, or helm charts using Terraform.
Summary:
Ansbile for installing and configuration K8s.
Terraform for provisioning K8s.
Helm for installing apps on K8s.
After this you gonna have a base automated on-premise Infra.

How to access internal services in production environment of Kubernetes from local machine? (Need: performance & security)

I have a Kubernetes cluster running my production environments. I have bastion machine, and my own computer can connect to bastion & the bastion can access the cluster machines. I want to connect to some internal (i.e. not exposed to public network) services, such as MySQL, Redis, Kibana, etc, on my own computer. I need to have enough performance (e.g. the kubectl forward is toooo slow), and have enough security.
I have tried to use kubectl forward. But it is very slow, and after a search, they say it is just slow. So I guess I cannot make it faster.
I guess I can also expose every service as a NodePort. Then I can use things like ssh port forward. However, I am afraid whether the security is low? Because we have to create a NodePort, then if hacker can touch the cluster, he can use the nodeport to access my MySQL, Redis, Kafka, etc, which is terrible.
EDITED: In addition, I need not only my own computer, but my mobile phone to able to touch some services, such as my Spring Boot internal admin url. currently I do ssh port forward and bind to 0.0.0.0, so my mobile phone can connect to my_computer_ip:the_port to use it. But how can I do it without ssh port forward?
Thank you!

Joining an external Node to an existing Kubernetes Cluster

I have a custom Kubernetes Cluster (deployed using kubeadm) running on Virtual Machines from an IAAS Provider. The Kubernetes Nodes have no Internet facing IP Adresses (except for the Master Node, which I also use for Ingress).
I'm now trying to join a Machine to this Cluster that is not hosted by my main IAAS provider. I want to do this because I need specialized computing resources for my application that are not offered by the IAAS.
What is the best way to do this?
Here's what I've tried already:
Run the Cluster on Internet facing IP Adresses
I have no trouble joining the Node when I tell kube-apiserver on the Master Node to listen on 0.0.0.0 and use public IP Adresses for every Node. However, this approach is non-ideal from a security perspective and also leads to higher cost because public IP Adresses have to be leased for Nodes that normally don't need them.
Create a Tunnel to the Master Node using sshuttle
I've had moderate success by creating a tunnel from the external Machine to the Kubernetes Master Node using sshuttle, which is configured on my external Machine to route 10.0.0.0/8 through the tunnel. This works in principle, but it seems way too hacky and is also a bit unstable (sometimes the external machine can't get a route to the other nodes, I have yet to investigate this problem further).
Here are some ideas that could work, but I haven't tried yet because I don't favor these approaches:
Use a proper VPN
I could try to use a proper VPN tunnel to connect the Machine. I don't favor this solution because it would add a (admittedly quite small) overhead to the Cluster.
Use a cluster federation
It looks like kubefed was made specifically for this purpose. However, I think this is overkill in my case: I'm only trying to join a single external Machine to the Cluster. Using Kubefed would add a ton of overhead (Federation Control Plane on my Main Cluster + Single Host Kubernetes Deployment on the external machine).
I couldn't think about any better solution than a VPN here. Especially since you have only one isolated node, it should be relatively easy to make the handshake happen between this node and your master.
Routing the traffic from "internal" nodes to this isolated node is also trivial. Because all nodes already use the master as their default gateway, modifying the route table on the master is enough to forward the traffic from internal nodes to the isolated node through the tunnel.
You have to be careful with the configuration of your container network though. Depending on the solution you use to deploy it, you may have to assign a different subnet to the Docker bridge on the other side of the VPN.