postgres db backup is small after crontab

postgres db backup is small after crontab - postgresql

I'm trying to do a simple postgres backup with crontab. Here is the command I use:
# m h dom mon dow user command
49 13 * * * postgres /usr/bin/pg_dump store | bzip2 > /home/backups/postgres/$(date +"\%Y-\%m-\%d")_store.sq.bz2
A backup file is created but it is very small (looks like 14 bytes).
I can run this command just fine in the terminal (with a filesize that matches my db).
The log files don't mention any errors (grep CRON /var/log/syslog). Any idea what might be off?

The key to the solution of this, is to realize is that running the 'same' command in bash and running via Cron isn't the same thing!
For e.g. when run via Cron obvious defaults (.bash_profile / .pgpass / default binary paths) aren't the same and thus what works on bash may not work in Cron.
For a checklist:
Ensure Bzip2 is replaced with the complete path (for e.g. /usr/bin/bzip2 on CentOS / RHEL)
Ensure that 'Store' Database is readable by the Cron command (for e.g. adding -U Postgres would be a good addition). If the DB login is dependent on .pgpass file, it wouldn't work with cron. In such scenarios, you'd need to ensure pg_hba.conf is configured for this purpose (for e.g. you could allow 'trust' authentication for a specific-known DB / machine / user combination etc.
Ensure that /home/backups/postgres/... is accessible for writing, for obvious reasons.

Related

Issues when upgrading and dockerising a Postgres v9.2 legacy database using pg_dumpall and pg_dump

I am using an official postgres v12 docker image that I want to initialise with two SQL dump files that are gathered from a remote legacy v9.2 postgres server during the docker build phase:
RUN ssh $REMOTE_USER#$REMOTE_HOST "pg_dumpall -w -U $REMOTE_DB_USER -h localhost -p $REMOTE_DB_PORT --clean --globals-only -l $REMOTE_DB_NAME" >> dump/a_globals.sql
RUN ssh $REMOTE_USER#$REMOTE_HOST "pg_dump -w -U $REMOTE_DB_USER -h localhost -p $REMOTE_DB_PORT --clean --create $REMOTE_DB_NAME" >> dump/b_db.sql
By placing both a_globals.sql and b_db.sql files into the docker image folder docker-entrypoint-initdb.d, then the database is initialised with the legacy SQL files when the v12 container starts (as described here). Docker is working correctly, the dump files are retrieved successfully. However I am running into problems initialising the container's database and require guidance:
When the container starts to initialise its DB, it stops with ERROR: role $someDBRole does not exist. This is because the psql v9.2 dump SQL files DROP roles before reinstating them; the container DB does not like this. Unfortunately it is not until psql v9.4 that pg_dumpall and pg_dump have the option to --if-exists (see pg_dumpall v9.2 documentation). What would you suggest that I do in order to remedy this? I could manually edit the SQL dump files, but this would be impractical as the snapshots of the legacy DB need to be automated. Is there a way to suppress this error during container startup?
If I want to convert from ASCII to UTF-8, is it adequate to simply set the encoding option for pg_dumpall and pg_dump? Or do I need to take into consideration other issues when upgrading?
Is there a way to supress the removal and adding of the postgres super user which is in the dump SQL?
In general are there any other gotchas when containerising and/or updating a postgres DB.

I'm not familiar with Docker so I don't know how straightforward it'll be do to these things, but in general, pg_dump/dumpall output, when it's in SQL format, will work just fine after having gone through some ugly string manipulation.
Pipe it through sed -e 's/DROP ROLE/DROP ROLE IF EXISTS/', ideally when writing the .sqls, but it's fine to just run sed -i -e <...> to munge the files in-place after they're created if you don't have a full shell available. Make it sed -r -e '/^DROP ROLE/DROP ROLE IF EXISTS/ if you're worried about strings containing DROP ROLE in your data, at the cost of portability (AFAIK -r is a GNU addition to sed).
Yes. It's worth checking the data in pg12 to make sure it got imported correctly, but in the general case, pg_dump has been aware of encoding considerations since time immemorial, and a dump->load is absolutely the best way to change your DB encoding.
Sure. Find the lines that do it in your .sql, copy enough of it to be unique, and pipe it through grep -v <what you copied> :D
I can't speak to the containerizing aspect of things, but - and this is more of a general practice, not even really PG-specific - if you're dealing with a large DB that's getting migrated, prepare a small one, as similar as possible to the real one but omitting any bulky data, to test with to get everything working so that doing the real migration is just a matter of changing some vars (I guess $REMOtE_HOST and $REMOTE_PORT in your case). If it's not large, then just be comfortable blowing away any pg12 containers that failed partway through the import, figure out & do whatever to fix the failure, and start from the top again until it works end-to-end.

Solve a permission problem found when running pgbackrest as a cron job

I have a permissions error on a ubuntu host from the cron job setup to make a database backup using pgbackrest.
ERROR [041]: : unable to open /var/lib/postgresql/10/main/global/pg_control
The cron job is setup to run under my administrator account. The only option I see to fix this is to change the directory permissions to /var/lib/postgresql/10/main to allow my admin account in, and I don't want to do that.
Clearly only the postgres user has access to this directory and I found that its not possible to setup a cron job using that user.
i.e.
postgres#host110:~/$ crontab -e
You (postgres) are not allowed to use this program (crontab)
See crontab(1) for more information
What else can I do? There is no more information on this in the pgbackrest manual.

Only the PostgreSQL OS user (postgres) and its group are allowed to access the PostgreSQL data directory. See this code from the source:
/*
* Check if the directory has correct permissions. If not, reject.
*
* Only two possible modes are allowed, 0700 and 0750. The latter mode
* indicates that group read/execute should be allowed on all newly
* created files and directories.
*
* XXX temporarily suppress check when on Windows, because there may not
* be proper support for Unix-y file permissions. Need to think of a
* reasonable check to apply on Windows.
*/
#if !defined(WIN32) && !defined(__CYGWIN__)
if (stat_buf.st_mode & PG_MODE_MASK_GROUP)
ereport(FATAL,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("data directory \"%s\" has invalid permissions",
DataDir),
errdetail("Permissions should be u=rwx (0700) or u=rwx,g=rx (0750).")));
#endif
If the data directory allows the group in, the group will normally also have permissions for pg_control.
So you can allow that pgBackRest user in if you give it postgres' primary user group.
postgres is allowed to create a crontab if the system is configured accordingly.
From man crontab:
Running cron jobs can be allowed or disallowed for different users. For this purpose, use the cron.allow and cron.deny files. If the cron.allow file exists, a
user must be listed in it to be allowed to use cron If the cron.allow file does not exist but the cron.deny file does exist, then a user must not be listed in
the cron.deny file in order to use cron. If neither of these files exists, only the super user is allowed to use cron. Another way to restrict access to cron
is to use PAM authentication in /etc/security/access.conf to set up users, which are allowed or disallowed to use crontab or modify system cron jobs in the
/etc/cron.d/ directory.

If you are able to sudo -u postgres at the prompt, you can do it in your cron job, too.
Your question doesn't reveal which actual commands you are trying to run, but to run thiscommand as postgres, simply
sudo -u postgres thiscommand
If you have su but not sudo, the adaptation is minor but not entirely trivial:
su -c thiscommand postgres
With sudo you can set fine-grained limitations on what exactly you can do as another user, so in that sense, it's safer than full unlimited su.

postgreSQL pg_dump Through LAN

I'm looking for a way to back up a database through a LAN on a mounted drive on a workstation. Basically a Bash script on the workstation or the server to dump a single database into a path on that volume. The volume isn't mounted normally, so I'm not clear as to which box to put the script on, given username/password and mounted volume permissions/availability.
The problem I currently have is permissions on the workstation:
myfile='/volumes/Dragonfly/PG_backups/serverbox_PG_mydomain5myusername_'`date +%Y_%m_%d_%H_%M`'.sql'
pg_dump -h serverbox.local -U adminuser -w dbname > $myfile
Is there a syntax that I can provide for this? Read the docs and there is no provision for a password, which is kind of expected. I also don't want to echo the password and keep it in a shell script. Or is there another way of doing this using rsync after the backups are done locally? Cheers

First, note the pg_dump command you are using includes the -w option, which means pg_dump will not issue a password prompt. This is indeed what you want for unattended backups (i.e. performed by a script). But you just need to make sure you have authentication set up properly. The options here are basically:
Set up a ~/.pgpass file on the host the dump is running from. Based on what you have written, you should keep this file in the home directory of the server this backup job runs on, not stored somewhere on the mounted volume. Based on the info in your example, the line in this file should look like:
serverbox.local:5432:database:adminuser:password
Remember to specify the database name that you are backing up! This was not specified in your example pg_dump command.
Fool with your Postgres server's pg_hba.conf file so that connections from your backup machine as your backup user don't require a password, but use something like trust or ident authentication. Be careful here of course, if you don't fully trust the host your backups are running on (e.g. it's a shared machine), this isn't a good idea.
Set environment variables on the server such as PGPASSWORD that are visible to your backup script. Using a ~/.pgpass file is generally recommended instead for security reasons.
Or is there another way of doing this using rsync after the backups are done locally?
Not sure what you are asking here -- you of course have to specify credentials for pg_dump before the backup can take place, not afterwards. And pg_dump is just one of many backup options, there are other methods that would work if you have SSH/rsync access to the Postgres server, such as file-system level backups. These kinds of backups (aka "physical" level) are complementary to pg_dump ("logical" level), you could use either or both methods depending on your level of paranoia and sophistication.

Got it to work with ~/.pgpass, pg_hba.conf on the server, and a script that included the TERM environment variable (xterm), and a path to pg_dump.
There is no login for the crontab, even as the current admin user. So it's running a bit blind.

How to load fish configuration from a remote repository?

I have a zillion machines in different places (home network, cloud, ...) and I use fish on each of them. The problem is that I have to synchronize their configuration every time I change something in there.
Is there a way to load the configuration from a remote repository? (= a place where it would be stored, not necessarily git but ideally I would manage them in GitHub). In such a case I would just have a one liner everywhere.
I do not care too much about startup time, loading the config each time would be acceptable
I cannot push the configuration to the machines (via Ansible for instance) - not of them are reachable from everywhere directly - but all of them can reach Internet

There are two parts to your question. Part one is not specific to fish. For systems I use on a regular basis I use Dropbox. I put my ~/.config/fish directory in a Dropbox directory and symlink to it. For machines I use infrequently, such as VMs I use for investigating problems unique to a distro, I use rsync to copy from my main desktop machine. For example,
rsync --verbose --archive --delete -L --exclude 'fishd.*' krader#macpro:.config .
Note the exclusion of the fishd.* pattern. That's part two of your question and is unique to fish. Files in your ~/.config/fish directory named with that pattern are the universal variable storage and are currently unique for each machine. We want to change that -- see https://github.com/fish-shell/fish-shell/issues/1912. The problem is that file contains the color theme variables. So to copy your color theme requires exporting those vars on one machine:
set -U | grep fish_color_
Then doing set -U on the new machine for each line of output from the preceding command. Obviously if you have other universal variables you want synced you should just do set -U and import all of them.

Disclaimer: I wouldn't choose this solution myself. Using a cloud storage client as Kurtis Rader suggested or a periodic cron job to pull changes from a git repository (+ symlinks) seems a lot easier and fail-proof.
On those systems where you can't or don't want to sync with your cloud storage, you can download the configuration file specifically, using curl for example. Some precious I/O time can be saved by utilizing HTTP cache control mechanisms. With or without cache control, you will still need to create a connection to a remote server each time (or each X times or each Y time passed) and that wastes quite some time already.
Following is a suggestion for such a fish script, to get you started:
#!/usr/bin/fish
set -l TMP_CONFIG /tmp/shared_config.fish
curl -s -o $TMP_CONFIG -D $TMP_CONFIG.headers \
-H "If-None-Match: \"$SHARED_CONFIG_ETAG\"" \
https://raw.githubusercontent.com/woj/dotfiles/master/fish/config.fish
if test -s $TMP_CONFIG
mv $TMP_CONFIG ~/.config/fish/conf.d/shared_config.fish
set -U SHARED_CONFIG_ETAG (sed -En 's/ETag: "(\w+)"/\1/p' $TMP_CONFIG.headers)
end
Notes:
Warning: Not tested nearly enough
Assumes fish v2.3 or higher.
sed behavior varies from platform to platform.
Replace woj/dotfiles/master/fish/config.fish with the repository, branch and path that apply to your case.
You can run this from a cron job, but if you insist to update the configuration file on every init, change the script to place the configuration in a path that's not already automatically loaded by fish, e.g.:
mv $TMP_CONFIG ~/.config/fish/shared_config.fish
and in your config.fish run this whole script file, followed by a
source ~/.config/fish/shared_config.fish

Automatic backup of PostgreSQL database in Ubuntu

How to take an automatic backup of a PostgreSQL database in Ubuntu?
Or is there a script available to take time-to-time PostgreSQL database backups?

you can use the following:
sudo crontab -e
at the end of the file add this:
0 6 * * * sudo pg_dump -U USERNAME -h REMOTE_HOST -p REMOTE_PORT NAME_OF_DB > LOCATION_AND_NAME_OF_BACKUP_FILE
This command will take an automated backup of your selected db everyday at 6:00 AM (after changing options of the command to fit to ur db)

It is advisable to create a backup every time with a new name in order to be able to restore data to a specific date. Also good practice to send notifications in case the backups fail.
Here is a good script for automatic backup, as well as general recommendations for automating backups:
How to Automate PostgreSQL Database Backups