I have a big (300Gb) Postgres DB running on GKE cluster (Stateful Set, SSD Volume). I need to move this DB to another GKE cluster.
What is the easiest way to accomplish it?
I tried to do it with piping pg_dump/pg_restore, but it takes forever and for some reason, not all constraints/triggers were recreated.
Is there any proper way to gracefully "shutdown" Postgres server running in Kubernetes and copy the /pgdata folder directly (from one volume to another)?
Other ideas?
tnx
I got few ideas (listed from the most probable to the least) about how you could approach this:
Remember to use proper format when using pg_dump. Default plain format may not work properly with pg_restore. Try to specify different format with pg_dump or use psql -f xxx.tar instead of pg_restore. Remember that it might take a while.
You can use a tool to assist you with that. For example pghoard.
You can make a tared backup of you DB and try to copy as a object via Google Cloud Storage.
You can try to create PVCs manually, attach pods to those PVCs and than copy your dataset onto those pods.
Finally, you may try to create an Init container and use it later for your new cluster.
I suggest starting from point 1 as I think it is the most possible solution. If that would not be enough, try later points from the list.
Please let me know if that helped.
Related
I have a web-app that depends on a read-only MongoDB database. Through trial and error, I discovered that by far the fastest way to run the ETL pipeline that populates the database is to run a local copy of MongoDB, populate the database, stop the database, and tarball the state directory.
To deploy a high-availability "cluster," I create multiple instances (or containers) running the app, each with access to a copy of the state in locally mounted storage. Putting these behind a load balancer with regular health checks and autoscaling (or in a Kubernetes cluster as a ReplicaSet), I get isolation, redundancy, easy rollbacks (using versioned storage), and easy setup in virtually any environment.
The key idea here is that because the database is read-only, it is in a sense a "stateless" application. Thus, I can treat it like any other static provider of information
There are many apparent advantages to this setup. Nevertheless, I have always had a nagging feeling that I was missing something. Given a read-only context, is there still some reason why it might be better to run a "proper" MongoDB cluster?
If you don't mind outages when the single node goes down and you don't mind taking the system down during upgrades then this is probably an ok deployment. You might get a safer dump and restore using mongodump and mongorestore rather than tar but apart from that this setup should work for a read-only deployment.
I have an application (JHipster Gateway, UAA, Registry, 5 microservices) and each application source builds a Docker image and pushes to GitLab registry. Currently I'm running everything on Rancher using a Docker-Compose file. My volumes for Mongo databases are currently in each container.
I need advice about volume mounts. Here are my options as I see them.
Leave data in containers and monitor and backup
Use external mounts and monitor volumes on host.
If I leave Mongo data in the containers, do I just set up to just cluster and when the internal volumes fill, the database just scales? I am looking for some explanation to help my choice with Mongo database mounts, internal or external (on host)?
Thanks in advance,
David L. Whitehurst
Never store any data you care about directly in containers. There are good arguments in favor of both named volumes (native to Docker, some support in a multi-host Swarm environment, fewer host-specific dependencies) and host bind mounts (much easier to back up and maintain, possible to examine directly if needed) but use some sort of mounted storage.
The most important note here is that it's fairly routine to delete and recreate containers. If the software you're running or its underlying library stack has a security issue, you generally need to get (or build) an updated image, delete your existing container, and rebuild it against the new image. If data is stored only inside a container, then during this very routine delete-and-recreate operation, there's significant risk of losing data.
In principle, if you're really careful, and you have a replicated data store, you can roll this over without external volumes and not lose data. It's tricky, and takes a lot of patience; you'll be forced to take down one replica, wait for its data to be rebalanced across the other replicas, start up a new replica, wait for it to accept some of the data, and so on. If you can take a point release by stopping a container, deleting it, starting a new one with the same data store, and have it come up instantly with populated data, that's much easier to manage.
(The other corollary here is that you don't "back up containers", since they don't have any data you care about. You do back up the data stored on the host or in Docker named volumes, and you can always recreate the container from its image plus the external data.)
We are experimenting with Kubernetes and Confluence in the cloud and have deployed Confluence connected to a pgsql database. When applying an update, something happened that caused the pgsql pod to tank and lose the persistent volume connections.
Thankfully the volume was set to retain, so we have the volume and I have since been able to point a new pgsql instance to this volume, but I can't find a way to get Confluence to see this existing database. Confluence just proceeds to the initial fresh install screens. I've tried installing it on a temporary database and then modifying the confluence.cfg.xml file to point to the old data once completed but Confluence will not restart when I try this.
Any help is appreciated.
Using the web installer you should have a step to select "My own database". From there you can configure the database credentials and host. Go ahead and let the installer run, it will overwrite the default settings but will retain your existing data.
Also, you may want to get on the psql shell via console and check to make sure that your data actually exists and you haven't ended up with an empty database.
If you're still stuck, reach out here and we can check out the next steps.
In my case the original solution posted here is accurate:
However I had to do this in a non containerized environment. I installed Confluence on a VM using a blank database, then modified the confluence.cfg.xml file to point to the pgsql database in the kubernetes cluster and restarted confluence. I was able to see my data, so I then used confluence's XML export feature to grab the dataset. I then blew away the kubernetes environment and re-created it from scratch and imported the backed up XML into the new instance. Not a super clean way of doing it, but got where I needed to.
I'm not sure if this is the right StackExchange to be asking this, but I'm in the process of setting up a MEAN stack application and I want to do it right from the get-go.
I really would like to use Docker and Heroku (due to their new pipelining groups and ease of deployment as the sole developer), but I can't find any guides on how to run MongoDB as a docker image on Heroku.
Is this even possible? I also don't really understand how you can put a database into a binary image (Docker image) anyways, yet every guide I've read says to separate the micro-services.
Has anyone else done this?
Thanks.
EDIT: Or is it just a better idea to leave Mongo undockerized and use MongoLabs and have two separate instances for Dev/Prod databases?
There is an official mongodb docker image which you can use. you just need to make sure you have docker installed on heroku.
If you are concerned about the data persistance you can easily mount host directories into your container so you will have physical access to your data. if you are worried about accebility you can easily expose ports inside your comtainer to your host so everything can connect to it.
Having your database in a container makes you able to be worried only on the db configuration and not the ehole stack . so when something goes down you always know where to look.
I know that MongoDB can scale vertically. What about if I am running out of disk?
I am currently using EC2 with EBS. As you know, I have to assign EBS for a fixed size.
What if the MongoDB growth bigger than the EBS size? Do I have to create a larger EBS and Copy & Paste the files?
Or shall we start more MongoDB instance and each connect to different EBS disk? In such case, I could connect to a different instance for different databases.
If you're running out of disk, you obviously need to get a bigger disk.
There are several ways to migrate your data, it really depends on the type of up-time you need. First steps of course involve bundling the machine and creating the new volume.
These tips go from easiest to hardest.
Can you take the database completely off-line for several minutes?
If so, do this (migration by copy):
Mount new EBS on the server.
Stop your app from connecting to Mongo.
Shut down mongod and wait for everything to write (check the logs)
Copy all of the data files (and probably the logs) to the new EBS volume.
While the copy is happening, update your mongod start script (or config file) to point to the new volume.
Start mongod and check connection
Restart your app.
Can you take the database off-line for just a few minutes?
If so, do this (slaving and switch):
Start up a new instance and mount the new EBS on that server.
Install / start mongod as a --slave pointing at the current database. (you may need to re-start the current as --master)
The slave will do a fresh synchronization. Once the slave is up-to-date, you'll do a "switch" (next steps).
Turn off writes from the system.
Shut down the original mongod process.
Re-start the "new" mongod as a master instead of the slave.
Re-activate system writes pointing at the new master.
Done correctly those last three steps can happen in minutes or even seconds.
Can you not afford any down-time?
If so, do this (master-master):
Start up a new instance and mount the new EBS on that server.
Install / start mongod as a master and a slave against the current database. (may need to re-start current as master, minimal down-time?)
The new computer should do a fresh synchronization.
Once the new computer is up-to-date, switch the system to point at the new server.
I know it seems like this last version is actually the best, but it can be a little dicey (as of this writing). The reason is simply that I've honestly had a lot of issues with "Master-Master" replication, especially if you don't start with both active.
If you plan on using this method, I highly suggest a smaller practice run first. If something bombs here, Mongo might simply wipe all of your data files which will have the effect of taking more stuff down.
If you get a good version of this please post the commands, I'd like to see it in action.
Doesn't the E in EBS stand for elastic meaning something like resizing on the fly?
Currently the MongoDB team is working on finishining sharding which will allow you horizontal scaling by partitioning data separately on different servers. Give it a month or two and it will work fine. The developers are quite good at keeping their promises.
http://api.mongodb.org/wiki/current/Sharding%20Introduction.html
http://api.mongodb.org/wiki/current/Sharding%20Limits.html
You could slave the bigger disk off the smaller until it's caught up
or
fsync+lock and take a file system snapshot and copy it onto the bigger disk.
well, I am using Mongo DB now. I am pretty amazed the performance it generated, especially on some simple sorting.
I believe it's a good tool for simple web application logic. The remaining concern for is how to scale and backup. I will continue to explore.
The only disadvantage I have is that I didn't have any good tools to reveal the data stored inside. For example, I want to put my logging from MYSQL into Mongo as well. However, it's pretty difficult for me to view the log. Previously, i can use MYSQL query to fetch what I want easily.
Anyway, it's a good tool and I will continue to use it.