Kubernetes: Databases & DB Users - postgresql

We are planning to use Kube for Postgres deployments. Our applications will be microservices with separated schema (or logical database). For security sake, we'd like to have separate users for each schema/logical_db.
I suppose that the db/schema&user should be created by Kube, so the application itself does not need to have access to DB admin account.
In Stolon it seems there is just a possibility to create a single user and single database and this seems to be the case also for other HA Postgres charts.
Question: What is the preferred way in Microservices in Kube to create DB users?

When it comes to creating user, as you said, most charts and containers will have environment variables for creating a user at boot time. However, most of them do not consider the possibility of creating multiple users at boot time.
What other containers do is, as you said, have the root credentials in k8s secrets so they access the database and create the proper schemas and users. This does not necessarily need to be done in the application logic but, for example, using an init container that sets up the proper database for your application to run.
https://kubernetes.io/docs/concepts/workloads/pods/init-containers
This way you would have a pod with two containers: one for your application and an init container for setting up the DB.

Related

Expressing that a service requires another

I'm new to k8s, so this question might be kind of weird, please correct me as necessary.
I have an application which requires a redis database. I know that I should configure it to connect to <redis service name>.<namespace> and the cluster DNS will get me to the right place, if it exists.
It feels to me like I want to express the relationship between the application and the database. Like I want to say that the application shouldn't be deployable until the database is there and working, and maybe that it's in an error state if the DB goes away. Is that something you'd normally do, and if so - how? I can think of other instances: like with an SQL database you might need to create the tables your app wants to use at init time.
Is the alternative to try to connect early and exit 1, so that the cluster keeps on retrying? Feels like that would work but it's not very declarative.
Design for resiliency
Modern applications and Kubernetes are (or should be) designed for resiliency. The applications should be designed without single point of failure and be resilient to changes in e.g. network topology. Also see Twelve factor-app: IV. Backing services.
This means that your Redis typically should be a cluster of e.g. 3 instances. It also means that your app should retry connections if connections fails - this can also happens same time after running - since upgrades of a cluster (or rolling upgrade of an app) is done by terminating one instance at a time meanwhile a new instance at a time is launched. E.g. the instance (of a cluster) that your app currently is connected to might go away and your app need to reconnect, perhaps establish a connection to a different instance in the same cluster.
SQL Databases and schemas
I can think of other instances: like with an SQL database you might need to create the tables your app wants to use at init time.
Yes, this is a different case. On Kubernetes your app is typically deployed with at least 2 replicas, or more (for high-availability reasons). You need to consider that when managing schema changes for your app. Common tools to manage the schema are Flyway or Liquibase and they can be run as Jobs. E.g. first launch a Job to create your DB-tables and after that deploy your app. And after some weeks you might want to change some tables and launch a new Job for this schema migration.
As you've seen, YAML objects can not express such dependencies. As suggested by #fabian-lopez, your application container may include an initContainer that would wait for dependencies to be available, before starting their main container.
Now, if you want a state machine, capable to provision a database, initialize its schema, maybe import some records, and only then create your application: you're looking for an operator. Then, you may use the operator-sdk ( https://github.com/operator-framework/operator-sdk ), or pretty much anything integrating with some Kubernetes cluster API.
I think Init Containers is something you could leverage for this use case
This is up to your application code, not something Kubernetes helps nor hinders.

Multiple Database in a postgres cluster on Kubernetes?

Is it possible to have multiple databse in a cluster with Crunchydata (postgres)?
When I create a cluster with "pgo create cluster" command I can specify only one database.
-d, --database string If specified, sets the name of the initial database that is created for the user. Defaults to the value set in the PostgreSQL Operator configuration, or if that is not present, the name of the cluster
But I need multiple database per cluster, and I can't find any official way to create them.
Another question: How can I find the "superuser" username and password to login to PGOAmin Web?
Thanks a lot.
This could be useful, however it stated:
"It may make more sense to have each of your databases in its own
cluster if you want to have them spread out over your Kubernetes
topology."
https://github.com/CrunchyData/postgres-operator/issues/2655

Cloud Foundry for SaaS

I am implementing a service broker for my SaaS application on Cloud Foundry.
On create-service of my SaaS application, I create instance of another service (Say service-A) also ie. a new service instance of another service (service-A) is also created for every tenant which on-boards my application.
The details of the newly created service instance (service-A) is passed to my service-broker via environment variable.
To be able to process this newly injected environment variable, the service-broker need to be restaged/restarted.
This means a down-time for the service-broker for every new on-boarding customer.
I have following questions:
1) How these kind on use-cases are handled in Cloud Foundry?
2) Why Cloud Foundry chose to use environment variables to pass the info required to use a service? It seems limiting, as it requires application restart.
As a first guess, your service could be some kind of API provided to a customer. This API must store the data it is sent in some database (e.g. MongoDb or Mysql). So MongoDb or Mysql would be what you call Service-A.
Since you want the performance of the API endpoints for your customers to be independent of each other, you are provisioning dedicated databases for each of your customers, that is for each of the service instances of your service.
You are right in that you would need to restage your service broker if you were to get the credentials to these databases from the environment of your service broker. Or at least you would have to re-read the VCAP_SERVICES environment variable. Yet there is another solution:
Use the CC-API to create the services, and bind them to whatever app you like. Then use again the CC-API to query the bindings of this app. This will include the credentials. Here is the link to the API docs on this endpoint:
https://apidocs.cloudfoundry.org/247/apps/list_all_service_bindings_for_the_app.html
It sounds like you are not using services in the 'correct' manner. It's very hard to tell without more detail of your use case. For instance, why does your broker need to have this additional service attached?
To answer your questions:
1) Not like this. You're using service bindings to represent data, rather than using them as backing services. Many service brokers (I've written quite a few) need to dynamically provision things like Cassandra clusters, but they keep some state about which Cassandra clusters belong to which CF service in a data store of their own. The broker does not bind to each thing it is responsible for creating.
2) Because 12 Factor applications should treat backing services as attached, static resources. It is not normal to say add a new MySQL database to a running application.

In a containerized cluster, should mongodb servers be running on a worker or a core service?

I'm trying to implement an architecture that's similar to the coreos's production architecture (shown below)
Should I run the database as a central service or one or more of the workers?
I figured the database needs some kind of replication, which makes me think that putting it in the worker cluster makes more sense, but I'm just not sure.
This should be run as a worker. The central services are the basic things that come with CoreOS (mainly etcd). The workers host your applications, the database being one of them. You do have a persistence issue because your database will have state to remember between restarts. So, there is a bigger issue of how do you make that persistence? One was to do it is use a host file and give the database an affinity to that host and mount the host file. Another thing you might consider is running more than one database (if your db technology supports that) and replicate that database so you have two (or more) copies in different workers. (non-affinity). If your database creates transaction logs that can be applied to a backup, you can manage those transaction logs in a worker.
Another thing to consider is not using a container for your database. The database is a weird animal, its care and feeding is not like the rest of the applications. So it is reasonable (in my opinion) to have your database managed and maintained outside the scope of your cluster (but still reachable by the cluster).

How to replicate MySQL database to Cloud SQL Database

I have read that you can replicate a Cloud SQL database to MySQL. Instead, I want to replicate from a MySQL database (that the business uses to keep inventory) to Cloud SQL so it can have up-to-date inventory levels for use on a web site.
Is it possible to replicate MySQL to Cloud SQL. If so, how do I configure that?
This is something that is not yet possible in CloudSQL.
I'm using DBSync to do it, and working fine.
http://dbconvert.com/mysql.php
The Sync version do the service that you want.
It work well with App Engine and Cloud SQL. You must authorize external conections first.
This is a rather old question, but it might be worth noting that this seems now possible by Configuring External Masters.
The high level steps are:
Create a dump of the data from the master and upload the file to a storage bucket
Create a master instance in CloudSQL
Setup a replica of that instance, using the external master IP, username and password. Also provide the dump file location
Setup additional replicas if needed
VoilĂ !