Is it a good idea to backup/restore Neo4j databases with Kubernetes VolumeSnapshot? - kubernetes

I have a Neo4j database running on Kubernetes. I want to make scheduled backups for the database. I know that Neo4j provides a set of tools for backup and restore. However, Kubernetes VolumeSnapshot also looks viable for backup and restore.
I wonder if it's a good idea to use Kubernetes VolumeSnapshot to backup/restore Neo4j databases? Will it cause errors like inconsistency database status or faulty disk problem? Thanks.

Generally, if it is not supported by the database, then it is a bad idea.
Think of your database as being stored across:
Database files on disk
Page cache (in volatile memory)
Write ahead transaction logs on disk
A volume snapshot would not save enough information to get a consistent state of your database (unless the database is gracefully shut down).
Use the set of tools provided for backup/restore

Related

Is there a way to restore a neo4j 3.5 database without a backup from a persistent volume?

I had a neo4j 3.5 enterprise edition running in a kubernetes cluster.
The cluster was deleted by error with any chance to make a recent neo4j database backup.
The only related things remaining to the old database are three Persistent Disk in the Google Cloud Compute Engine.
Is it possible to recover o restore the data stored in them? How?
The disk detail:
{"kubernetes.io/created-for/pv/name":"pvc-fd4fe6eb-2c24-11ea-bd38-42010a8e0228",
"kubernetes.io/created-for/pvc/name":"datadir-neo4j-neo4j-core-2",
"kubernetes.io/created-for/pvc/namespace":"default"}
The Secret storing the old password is lost.
Thanks
You probably need to first identify which disk hold the <neo4j-home>/data directory. Then create a snapshot of this disk (to be safe). Finally, you start a new neo4j pod by create a volume from the snapshot and mount to <neo4j-home>/data.

How to back update Postgres database inside K8s cluster

I have setup a postgres database inside the Kubernetes cluster and now I would like to backup the database and don't know how it is possible.
Can anyone help me to get it done ?
Thanks
Sure you can backup your database. You can setup a CronJob to periodically run pg_dump and upload the dumped data into a cloud bucket. Check this blog post for more details.
However, I recommend you to use a Kubernetes native disaster recovery tool like Velero, Stash, Portworx PX-Backup, etc.
If you use an operator to manage your database such as zalando/postgres-operator, CrunchyData, KubeDB, etc. You can use their native database backup functionality.
Disclosure: I am one of the developer of Stash tool.

Do I gain anything by using "proper" replicas for a read-only MongoDB database?

I have a web-app that depends on a read-only MongoDB database. Through trial and error, I discovered that by far the fastest way to run the ETL pipeline that populates the database is to run a local copy of MongoDB, populate the database, stop the database, and tarball the state directory.
To deploy a high-availability "cluster," I create multiple instances (or containers) running the app, each with access to a copy of the state in locally mounted storage. Putting these behind a load balancer with regular health checks and autoscaling (or in a Kubernetes cluster as a ReplicaSet), I get isolation, redundancy, easy rollbacks (using versioned storage), and easy setup in virtually any environment.
The key idea here is that because the database is read-only, it is in a sense a "stateless" application. Thus, I can treat it like any other static provider of information
There are many apparent advantages to this setup. Nevertheless, I have always had a nagging feeling that I was missing something. Given a read-only context, is there still some reason why it might be better to run a "proper" MongoDB cluster?
If you don't mind outages when the single node goes down and you don't mind taking the system down during upgrades then this is probably an ok deployment. You might get a safer dump and restore using mongodump and mongorestore rather than tar but apart from that this setup should work for a read-only deployment.

Backup of ignite stateful set in Kubernetes

I’m trying to come up with a strategy to backup data in my apache ignite cache hosted as a stateful set in google cloud Kubernetes.
My ignite deployment uses ignite native persistence and runs a 3 node ignite cluster backed up by persistence volumes in Kubernetes.
I’m using a binaryConfiguration to store binary objects in cache.
I’m looking for a reliable way to back up my ignite data and be able to restore it.
So far I’ve tried backing up just the persistence files and then restoring them back.
It hasn’t worked reliably yet.
The issue I’m facing is that after restore, the cache data which isn’t binary objects is restored properly, e.g. strings or numbers. I’m able to access numeric or string data just fine. But binary objects are not accessible. It seems the binary objects are restored, but I’m unable to fetch them.
The weird part is that after the restore, once I add a new binary object to the cache all the restored data seems to be accessed normally.
Can anyone please suggest a reliable way to back up and restore ignite native persistence data?
You should either backup ${ignite.work.dir}/marshaller directory, or call ignite.binary().type(KeyOrValue.class) for every type you have in cache to prime binary marshaller.
Apache Ignite providers ACID transactions which are pretty reliable. The cache also uses its own mechanism for primary backups and copies and assuming you have its WAL enabled some stuff is kept in memory.
The most likely thing happening is that you do your restore and the moment you make an initial write memory starts populating allowing you to see what's on disk (cache). This is not really a supported restore mechanism (there isn't one in the docs) but it could work that way where after the restore you run a minor sample irrelevant write. I advise testing this thoroughly though.

EBS snapshots vs. WAL-E for PostgreSQL on EC2

I'm getting ready to move our posgresql databases to EC2 but I'm a little unclear on the best backup and recovery strategy. The original plan was to build an EBS backed server, set up WAL-E to handle WAL archiving and base backups to S3. I would take snapshots of the final production server volume to be used if the instance crashed. I also see that many people perform frequent snapshots of the EBS for recovery purposes.
What is the recommended strategy? Is there a reason to archive with WAL and perform scheduled EBS snapshots?
The EBS Snapshots will give you a slightly different type of backup than then WAL-E backups. EBS backups the entire drives, which means if your EC2 Virt goes down you can just restart the virt with your last EBS snapshot and things will pickup right where you last snapshotted things.
The frequency of your EBS snapshots would define how good your database backups are.
The appealing thing about WAL-E is the "continuous archiving". If I needed every DB transaction backed up, then WAL-E seems the right choice. Manys apps I can envision cannot afford to lose transactions, so that seems a very prudent choice.
I think your plan to snapshot the production volumes as a baseline, then use WAL-E to continuously archive the database seems very reasonable. Personally I would likely add a periodic snapshot (once a day?) to that plan just to take a hard baseline and make your recovery process a bit easier.
The usual caveat of "Test your recovery plans!" applies here. You're mixing a number of technologies (EC2, EBS, Postgres, Snapshots, S3, WAL-E) so making sure you can actually recover - rather than just back - is of critical importance.
EBS snapshots will save the image of an entire disk, so you can back up all the disks in the server and recover it as a whole in case of data loss or disaster. Besides that, the block-level property of EBS snapshots allows instant recovery, you can have a 1TB database restored and have it up and running in a few minutes. To recover a 1TB database from scratch using a file based solution (like WAL-E) will require copying the data from S3 first, a process that will take hours. Using WAL files for recovery is a good approach, since you can go back to any time by transaction, but snapshotting the entire server will include WAL files as well, so you’ll still have that option. The backup and rapid recovery process using EBS snapshots can be automated with scripts or EC2 backup solutions (for example, Backup solutions for AWS EC2 instances).