Migrating Cockroach DB from local machine to GCP Kubernetes Engine - kubernetes

Followed instructions here to create a local 3 node secure cluster
Got the go example app running with the following DB connection string to connect to the secure cluster
sql.Open("postgres", "postgresql://root#localhost:26257/dbname?sslmode=verify-full&sslrootcert=<location of ca.crt>&sslcert=<location of client.root.crt>&sslkey=<location of client.root.key>")
Cockroach DB worked well locally so I decided to move the DB (as in the DB solution and not the actual data) to GCP Kubernetes Engine using the instructions here
Everything worked fine - pods created and could use the built in SQL client from the cloud console.
Now I want to use the previous example app to now connect to this new cloud DB. I created a load balancer using kubectl expose command and got a public ip to use in the code.
How do I get the new ca.crt, client.root.crt, client.root.key files to use in my connection string for the DB running on GCP?
We have 5+ developers and the idea is to have them write code on their local machines and connect to the cloud db using the connection strings and the certificates.
Or is there a better way to let 5+ developers use a single DEV DB cluster running on GCP?

The recommended way to run against a Kubernetes CockroachDB cluster is to have your apps run in the same cluster. This makes certificate generation fairly simple. See the built-in SQL client example and its config file.
The config above uses an init container to send a CSR for client certificates and makes them available to the container (in this case just the cockroach sql client, but it would be anything else).
If you wish to run a client outside the kubernetes cluster, the simplest way is to copy the generated certs directly from the client pod. It's recommended to use a non root user:
create the user through the SQL command
modify the client-secure.yaml config for your new user and start the new client pod
approve the CSR for the client certificate
wait for the pod to finish initializing
copy the ca.crt, client.<username>.crt and client.<username>.key from the pod onto your local machine
Note: the public DNS or IP address of your kubernetes cluster is most likely not included in the node certificates. You either need to modify the list of hostnames/addresses before bringing up the nodes, or change your connection URL to sslmode=verify-ca (see client connection parameters for details).
Alternatively, you could use password authentication in which case you would only need the CA certificate.

Related

CloudSQL Proxy on GKE : Service vs Sidecar

Does anyone know the pros and cons for installing the CloudSQL-Proxy (that allows us to connect securely to CloudSQL) on a Kubernetes cluster as a service as opposed to making it a sidecar against the application container?
I know that it is mostly used as a sidecar. I have used it as both (in non-production environments), but I never understood why sidecar is more preferable to service. Can someone enlighten me please?
The sidecar pattern is preferred because it is the easiest and more secure option. Traffic to the Cloud SQL Auth proxy is not encrypted or authenticated, and relies on the user to restrict access to the proxy (typically be running local host).
When you run the Cloud SQL proxy, you are essentially saying "I am user X and I'm authorized to connect to the database". When you run it as a service, anyone that connects to that database is connecting authorized as "user X".
You can see this warning in the Cloud SQL proxy example running as a service in k8s, or watch this video on Connecting to Cloud SQL from Kubernetes which explains the reason as well.
The Cloud SQL Auth proxy is the recommended way to connect to Cloud SQL, even when using private IP. This is because the Cloud SQL Auth proxy provides strong encryption and authentication using IAM, which can help keep your database secure.
When you connect using the Cloud SQL Auth proxy, the Cloud SQL Auth proxy is added to your pod using the sidecar container pattern. The Cloud SQL Auth proxy container is in the same pod as your application, which enables the application to connect to the Cloud SQL Auth proxy using localhost, increasing security and performance.
As sidecar is a container that runs on the same Pod as the application container, because it shares the same volume and network as the main container, it can “help” or enhance how the application operates. In Kubernetes, a pod is a group of one or more containers with shared storage and network. A sidecar is a utility container in a pod that’s loosely coupled to the main application container.
Sidecar Pros: Scales indefinitely as you increase the number of pods. Can be injected automatically. Already used by serviceMeshes.
Sidecar Cons: A bit difficult to adopt, as developers can't just deploy their app, but deploy a whole stack in a deployment. It consumes much more resources and it is harder to secure because every Pod must deploy the log aggregator to push the logs to the database or queue.
Refer to the documentation for more information.

how to connect to k8s cluster with bitnami postgresql-ha deployed?

My setup (running locally in two minikubes) is I have two k8s clusters:
frontend cluster is running a golang api-server,
backend cluster is running an ha bitnami postgres cluster (used bitnami postgresql-ha chart for this)
Although if i set the pgpool service to use nodeport and i get the ip + port for the node that the pgpool pod is running on i can hardwire this (host + port) to my database connector in the api-server (in the other cluster) this works.
However what i haven't been able to figure out is how to generically connect to the other cluster (e.g. to pgpool) without using the ip address?
I also tried using Skupper, which also has an example of connecting to a backend cluster with postgres running on it, but their example doesn't use bitnami ha postgres helm chart, just a simple postgres install, so it is not at all the same.
Any ideas?
For those times when you have to, or purposely want to, connect pods/deployments across multiple clusters, Nethopper (https://www.nethopper.io/) is a simple and secure solution. The postgresql-ha scenario above is covered under their free tier. There is a two cluster minikube 'how to' tutorial at https://www.nethopper.io/connect2clusters which is very similar to your frontend/backend use case. Nethopper is based on skupper.io, but the configuration is much easier and user friendly, and is centralized so it scales to many clusters if you need to.
To solve your specific use case, you would:
First install your api server in the frontend and your bitnami postgresql-ha chart in the backend, as you normally would.
Go to https://mynethopper.com/ and
Register
Clouds -> define both clusters (clouds), frontend and backend
Application Network -> create an application network
Application Network -> attach both clusters to the network
Application Network -> install nethopper-agent in each cluster with copy paste instructions.
Objects -> import and expose pgpool (call the service 'pgpool') in your backend.
Objects -> distribute the service 'pgpool' to frontend, using a distribution rule.
Now, you should see 'pgpool' service in the frontend cluster
kubectl get service
When the API server pods in the frontend request service from pgpool, they will connect to pgpool in the backend, magically. It's like the 'pgpool' pod is now running in the frontend.
The nethopper part should only take 5-10 minutes, and you do NOT need IP addresses, TLS certs, K8s ingresses or loadbalancers, a VPN, or an istio service mesh or sidecars.
After moving to the one cluster architecture, it became easier to see how to connect to the bitnami postgres-ha cluster, by trying a few different things finally this worked:
-postgresql-ha-postgresql-headless:5432
(that's the host and port I'm using to call from my golang server)
Now i believe it should be fairly straightforward to also run the two cluster case using skupper to bind to the headless service.

How to connect to PostgreSQL cluster on DigitalOcean from CircleCI?

I have a Kubernetes cluster setup on DigitalOcean and a separate database Postgres instance there. In database cluster settings there is a list of limited IP addresses that have an access to that database cluster (looks like a great idea).
I have a build and deploy proccess setup with CircleCI and at the end of that process, after deploying a container to K8s cluster, I need to run database migration. The problem is that I don't know CircleCI agent IP address and can not allow it in DO settings. Does anybody know how we can access DigitalOcean Postgres cluster from within CircleCI steps?
Unfortunately when you use a distributed service like that that you don't manage, I would be very cautious about using the restricted IP approach. (Really you have three services you don't manage - Postgres, Kubernetes, and CircleCI.) I feel as if DigitalOcean has provided a really excellent security option for internal networking, since it can track changes in droplet IP, etc.
But when you are deploying on another service, especially if this is for production, and even if the part of your solution you're deploying is deployed (partially) on DigitalOcean infrastructure, I'd be very concerned that CircleCI will change IP dynamically. DO has no way of knowing when this happens, as unlike Postgres and Kubernetes, they don't manage it even if they do host part of it
Essentially I have to advise you to either get an assurance of a static IP from your CircleCI vendor/provider, or disable the IP limitation on Postgres.

How to submit a Dask job to a remote Kubernetes cluster from local machine

I have a Kubernetes cluster set up using Kubernetes Engine on GCP. I have also installed Dask using the Helm package manager. My data are stored in a Google Storage bucket on GCP.
Running kubectl get services on my local machine yields the following output
I can open the dashboard and jupyter notebook using the external IP without any problems. However, I'd like to develop a workflow where I write code in my local machine and submit the script to the remote cluster and run it there.
How can I do this?
I tried following the instructions in Submitting Applications using dask-remote. I also tried exposing the scheduler using kubectl expose deployment with type LoadBalancer, though I do not know if I did this correctly. Suggestions are greatly appreciated.
Yes, if your client and workers share the same software environment then you should be able to connect a client to a remote scheduler using the publicly visible IP.
from dask.distributed import Client
client = Client('REDACTED_EXTERNAL_SCHEDULER_IP')

How to re-connect to Amazon kubernetes cluster after stopping & starting instances?

I create a cluster for trying out kubernetes using cluster/kube-up.sh in Amazon EC2. Then I stop it to save money when not using it. Next time I start the master & minion instances in amazon, *~/.kube/config has old IP-s for the cluster master as EC2 assigns new public IP to the instances.
Currently I havent found way to provide Elastic IP-s to cluster/kube-up.sh so that consistent IP-s between stopping & starting instances would be set in place. Also the certificate in ~/.kube/config for the old IP so manually changing IP doesn't work either:
Running: ./cluster/../cluster/aws/../../cluster/../_output/dockerized/bin/darwin/amd64/kubectl get pods --context=aws_kubernetes
Error: Get https://52.24.72.124/api/v1beta1/pods?namespace=default: x509: certificate is valid for 54.149.120.248, not 52.24.72.124
How to make kubectl make queries against the same kubernetes master on a running on different IP after its restart?
If the only thing that has changed about your cluster is the IP address of the master, you can manually modify the master location by editing the file ~/.kube/config (look for the line that says "server" with an IP address).
This use case (pausing/resuming a cluster) isn't something that we commonly test for so you may encounter other issues once your cluster is back up and running. If you do, please file an issue on the GitHub repository.
I'm not sure which version of Kubernetes you were using but in v1.0.6 you can pass MASTER_RESERVED_IP environment variable to kube-up.sh to assign a given Elastic IP to Kubernetes Master Node.
You can check all the available options for kube-up.sh in config-default.sh file for AWS in Kubernetes repository.