can many Postgres processes run with the same data directory? - postgresql

I have an application running in multiple pods. You can imagine the app as a web application which connects to Postgres (so each container has both app and Postgres processes). I would like to mount the volume into each pod at /var/lib/postgresql/data so that every app can have the same state of the database. They can read/write at the same time.
This is just an idea of how I will go.
My question is: is there any concern I need to be aware of? Or is this the totally wrong way to go?
Or will it be better to separate Postgres from the app container into a single pod and let the app containers connect to that one pod?
If my questions show knowledge I lack, please provide links I should read, thank you!

This will absolutely fail to work, and PostgreSQL will try to prevent you from starting several postmasters against the same data directory as good as it can. If you still manage to do it, instant data corruption will ensue.
The correct way to do this is to have a single database server and have all your “pods” connect to that one. If you have many of these “pods”, you should probably use a connection pooler like pgbouncer to fight the problems caused by too many database connections.

Related

Expressing that a service requires another

I'm new to k8s, so this question might be kind of weird, please correct me as necessary.
I have an application which requires a redis database. I know that I should configure it to connect to <redis service name>.<namespace> and the cluster DNS will get me to the right place, if it exists.
It feels to me like I want to express the relationship between the application and the database. Like I want to say that the application shouldn't be deployable until the database is there and working, and maybe that it's in an error state if the DB goes away. Is that something you'd normally do, and if so - how? I can think of other instances: like with an SQL database you might need to create the tables your app wants to use at init time.
Is the alternative to try to connect early and exit 1, so that the cluster keeps on retrying? Feels like that would work but it's not very declarative.
Design for resiliency
Modern applications and Kubernetes are (or should be) designed for resiliency. The applications should be designed without single point of failure and be resilient to changes in e.g. network topology. Also see Twelve factor-app: IV. Backing services.
This means that your Redis typically should be a cluster of e.g. 3 instances. It also means that your app should retry connections if connections fails - this can also happens same time after running - since upgrades of a cluster (or rolling upgrade of an app) is done by terminating one instance at a time meanwhile a new instance at a time is launched. E.g. the instance (of a cluster) that your app currently is connected to might go away and your app need to reconnect, perhaps establish a connection to a different instance in the same cluster.
SQL Databases and schemas
I can think of other instances: like with an SQL database you might need to create the tables your app wants to use at init time.
Yes, this is a different case. On Kubernetes your app is typically deployed with at least 2 replicas, or more (for high-availability reasons). You need to consider that when managing schema changes for your app. Common tools to manage the schema are Flyway or Liquibase and they can be run as Jobs. E.g. first launch a Job to create your DB-tables and after that deploy your app. And after some weeks you might want to change some tables and launch a new Job for this schema migration.
As you've seen, YAML objects can not express such dependencies. As suggested by #fabian-lopez, your application container may include an initContainer that would wait for dependencies to be available, before starting their main container.
Now, if you want a state machine, capable to provision a database, initialize its schema, maybe import some records, and only then create your application: you're looking for an operator. Then, you may use the operator-sdk ( https://github.com/operator-framework/operator-sdk ), or pretty much anything integrating with some Kubernetes cluster API.
I think Init Containers is something you could leverage for this use case
This is up to your application code, not something Kubernetes helps nor hinders.

Can two postgres service share one common PGDATA folder, one at a time

Can I share data between two postgres services in separate machines (PGDATA folder will be in a shared location) while only one service will run at a time?
PostgreSQL has a number of ways to make sure that you cannot start two postmaster processes on the same data directory, but if you mount a filesystem on two machines, these mechanisms will fail. So you would have to make very sure that you don't start servers on both machines; that would lead to data corruption. Moreover, you'd have to make sure that the remote file system is reliable. A Windows network share isn't, for example.
So, all in all, my only recommendation is "don't do that". For high availability, use a proven shared-nothing architecture like Patroni.

Dockerized MongoDB on Heroku?

I'm not sure if this is the right StackExchange to be asking this, but I'm in the process of setting up a MEAN stack application and I want to do it right from the get-go.
I really would like to use Docker and Heroku (due to their new pipelining groups and ease of deployment as the sole developer), but I can't find any guides on how to run MongoDB as a docker image on Heroku.
Is this even possible? I also don't really understand how you can put a database into a binary image (Docker image) anyways, yet every guide I've read says to separate the micro-services.
Has anyone else done this?
Thanks.
EDIT: Or is it just a better idea to leave Mongo undockerized and use MongoLabs and have two separate instances for Dev/Prod databases?
There is an official mongodb docker image which you can use. you just need to make sure you have docker installed on heroku.
If you are concerned about the data persistance you can easily mount host directories into your container so you will have physical access to your data. if you are worried about accebility you can easily expose ports inside your comtainer to your host so everything can connect to it.
Having your database in a container makes you able to be worried only on the db configuration and not the ehole stack . so when something goes down you always know where to look.

In a containerized cluster, should mongodb servers be running on a worker or a core service?

I'm trying to implement an architecture that's similar to the coreos's production architecture (shown below)
Should I run the database as a central service or one or more of the workers?
I figured the database needs some kind of replication, which makes me think that putting it in the worker cluster makes more sense, but I'm just not sure.
This should be run as a worker. The central services are the basic things that come with CoreOS (mainly etcd). The workers host your applications, the database being one of them. You do have a persistence issue because your database will have state to remember between restarts. So, there is a bigger issue of how do you make that persistence? One was to do it is use a host file and give the database an affinity to that host and mount the host file. Another thing you might consider is running more than one database (if your db technology supports that) and replicate that database so you have two (or more) copies in different workers. (non-affinity). If your database creates transaction logs that can be applied to a backup, you can manage those transaction logs in a worker.
Another thing to consider is not using a container for your database. The database is a weird animal, its care and feeding is not like the rest of the applications. So it is reasonable (in my opinion) to have your database managed and maintained outside the scope of your cluster (but still reachable by the cluster).

Aws app with Heroku Postgres database

Is it possible to have an app running at aws EC2 and have it's database running at heroku's postgres?
In case it is, what are the downsides I should consider?
Since heroku is hosted at AWS, is there a way to know where is the location of the machine running my database?
Hosting my app in the same region of the database would help to keep the performance?
I would like to hear some opinions about this, I've been searching the topic without much success.
You can determine the public-facing location of your Heroku DB at any given time with a traceroute ... but there's no guarantee that it'll stay at that location, or that there isn't any internal re-routing going on. You'd probably want to speak directly with Heroku support about ways to make sure your Heroku DB instances are local to your AWS application instances, as that certainly would benefit performance. See if you can find out which availability zone, or at least which major region, they run the DB in, and whether you can "pin" your database instance to a given region/zone.
Amazon's RDS looks OK, but doesn't support PostgreSQL. Please keep nagging them to.
I'd probably just run the DB on AWS if performance wasn't particularly important. Use a raid10 of provisioned IOPS EBS volumes on an EBS-optimized instance and you'll get kind-of-ok performance (but at a really big price); alternately, you can use non-crash-safe ssd-based instance store servers and rely on replication and backups to keep your data safe.
I dont have any experience on Heroku PostgreSQL.
Generally of course you can run your own service on Amazon EC2 and use the managed database services of Heroku.
Downsides might be
nobody guarantees, that Herouku exclusively uses AWS and you probably can't determine the physical Heroku service location within the cloud so you will have to deal with network latencies
in addition to your external traffic fees you'll have to pay for the database traffic unless you talk to a server in the same availability zone in the same region
My suggestion ( without knowing any detail about the pros of Heroku )
Have a look at Amazon RDS if you don't want to run a database server on our own.
http://aws.amazon.com/de/rds/
I am operating around 70 server instances on AWS, both RDS and EC2 for more than a year now and I can't imagine any simpler way to keep your stuff running