I'd like to monitor on my Zabbix server the number of concurrent PostgreSQL connections, so I've created a cron job that outputs the COUNT of rows on the pg_stat_activity to a file, that will be read by zabbix once a minute.
My problem is that I might have a scenario where I get a COUNT of, say, 10, then have a quick peak of 50 connections, get back to 10, where I do the COUNT again.
In this case the peak would not be noticed.
I've wondered about some counter being inc/dec each connection/disconnection, but failed to think how to do this.
Also I could do the COUNT on a higher frequency and keep an average per minute, but this not solve the problem.
Any thougts in that matter?
Thanks,
Gabriel
Use log files. Here is a quick tutorial for Linux.
1)
Find out where if postgres.conf file located:
postgres=# show config_file;
┌──────────────────────────────────────────┐
│ config_file │
├──────────────────────────────────────────┤
│ /etc/postgresql/9.5/main/postgresql.conf │
└──────────────────────────────────────────┘
2)
Find and edit parameters in it (store the copy somewhere before):
log_connections = on
log_disconnections = on
log_destination = 'csvlog'
logging_collector = on
3)
Restart PostgreSQL:
sudo service postgresql restart
(not sure but probably sudo service postgresql reload will be enough)
4)
Find out where the logs stored:
postgres=# show data_directory; show log_directory;
┌──────────────────────────────┐
│ data_directory │
├──────────────────────────────┤
│ /var/lib/postgresql/9.5/main │
└──────────────────────────────┘
┌───────────────┐
│ log_directory │
├───────────────┤
│ pg_log │
└───────────────┘
5)
Almost done. In files /var/lib/postgresql/9.5/main/pg_log/*.csv you will find records about connections/disconections. It is up to you how to deal with this info.
Related
The gsutil command supports options to enable compression during transport only (with -J or -j ext), allowing you to compress during transport only, thereby saving network bandwidth and speeding up the copy itself.
Is there an equivalent way to do this when downloading from GCS to local machine? That is, if I have an uncompressed text file at gs://foo/bar/file.json, is there some equivalent to -J that will compress the contents of "file.json" during transport only?
The goal is to speed up a copy from remote to local, and not just for a single file but dozens. I'm already using -m to do parallel copies, but would like to transmit compressed data to reduce network transfer time.
I didn't find anything relevant in the docs, and including -J doesn't appear to do anything during downloads. I've tried the following, but the "ETA" numbers printed by gsutil look identical whether -J is present or absent:
gsutil -cp -J gs://foo/bar/file.json .
This feature is not yet available.
As an alternative, you will need to implement your own solution for compressing, be it an App Engine, Cloud Function or Cloud Run. Your application will need to compress your files while they are on Cloud Storage.
The ideal solution would be to use -m along with the compressed files. This entails that you're making parallel copies of compressed files. Consider the following structures. If [1] is how you are doing your you are downloading each file individually. If you look at [2], you would only download the compressed files.
[1]
Bucket Foo
├───FooScripts
│ ├───SysWipe.sh
│ └───DropAll.sql
├───barconfig
│ ├───barRecreate.sh
│ └───reGenAll.sql
├───Baz
│ ├───BadBaz.sh
│ └───Drop.sh
...
[2]
Bucket Foo
├───FooScripts
│ ├───SysWipe.sh
│ └───DropAll.sql
│ ├───FooScripts.zip
├───barconfig
│ ├───barRecreate.sh
│ └───reGenAll.sql
│ ├───barconfig.zip
├───Baz
│ ├───BadBaz.sh
│ └───Drop.sh
│ ├───Baz.zip
...
Once your data has been downloaded, you should consider deleting the compressed files as they are no longer needed for your operations and you will be charged for them. Alternatively, you can raise a Feature Request on the Public Issue Tracker, which will be sent to the Cloud Storage team, who can look into the feasibility of this request.
I have a table EMPLOYEE.EMPLOYEE inside database HELLO which contains 3 records as listed below:
EMP_NO BIRTH_DATE FIRST_NAME LAST_NAME GENDER HIRE_DATE BANK_ACCOUNT_NUMBER PHONE_NUMBER
------- ---------- ------------------ -------------------- ------ ---------- ------------------- --------------
1. 06/05/1998 A B M 01/02/2019 026201521420 +91X
2. 10/14/1997 C D M 01/07/2019 034212323454 +91Y
3. 05/27/1997 E F F 01/14/2019 92329323123 +91Z
Then I first take an offline backup using the following commands
mkdir offlinebackup
db2 terminate
db2 deactivate database HELLO
db2 backup database HELLO to ~/offlinebackup/
After which I get this output:
Backup successful. The timestamp for this backup image is : 20190128115210
Now I take an online backup using the following commands
db2 update database configuration for HELLO using LOGARCHMETH1 'DISK:/database/config/db2inst1/onlinebackup'
db2 backup database HELLO online to /database/config/db2inst1/onlinebackup compress include logs
After this I get the output as:
Backup successful. The timestamp for this backup image is : 20190128115616
Now I go back to db2 and run CONNECT TO HELLO which connects me to my database. When I check for rows in the EMPLOYEE.EMPLOYEE table, I still get all my 3 rows.
Now I remove the row with EMP_NO 3. This gets succesfully removed. Then I run quit from the db2 terminal
Then I use this command to run the restore from my offline backup:
db2 restore db HELLO from ~/offlinebackup/ replace existing
It says DB20000I The RESTORE DATABASE command completed successfully
Now I try to connect to HELLO, it says SQL1117N A connection to or activation of database "HELLO" cannot be made because of ROLL-FORWARD PENDING. SQLSTATE=57019
To which I run db2 rollforward db HELLO to end of logs and stop
Then I connect to HELLO and try to find out the rows, I get only 2 rows, and not 3 as it was in the backup.
EMP_NO BIRTH_DATE FIRST_NAME LAST_NAME GENDER HIRE_DATE BANK_ACCOUNT_NUMBER PHONE_NUMBER
------- ---------- ------------------ --------------------- ------ ---------- ------------------- --------------
1. 06/05/1998 A B M 01/02/2019 026201521420 +91X
2. 10/14/1997 C D M 01/07/2019 034212323454 +91Y
The third record is not visible, which was present in the backup. Can anyone figure out why I am not able to restore the third record from the backup
The rollforward command that you ran:
db2 rollforward db HELLO to end of logs and stop
replayed all available logs, including the record corresponding to the delete statement.
If you wanted to restore the database to the state right after the backup was taken you could have run
db2 rollforward db HELLO to end of backup and stop
Alternatively, since you are restoring from an offline backup, rollforward is altogether not necessary and you could have used
db2 rollforward db HELLO stop
Alternatively, skip the rollforward completely (for offline backups only, of course):
db2 restore db HELLO from ~/offlinebackup/ replace existing without rolling forward
We have a master-slave replication configuration as follows.
On the master:
postgresql.conf has replication configured as follows (commented line taken out for brevity):
max_wal_senders = 1
wal_keep_segments = 8
On the slave:
Same postgresql.conf as on the master. recovery.conf looks like this:
standby_mode = 'on'
primary_conninfo = 'host=master1 port=5432 user=replication password=replication'
trigger_file = '/tmp/postgresql.trigger.5432'
When this was initially setup, we performed some simple tests and confirmed the replication was working. However, when we did the initial data load, only some of the data made it to the slave.
Slave's log is now filled with messages that look like this:
< 2015-01-23 23:59:47.241 EST >LOG: started streaming WAL from primary at F/52000000 on timeline 1
< 2015-01-23 23:59:47.241 EST >FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010000000F00000052 has already been removed
< 2015-01-23 23:59:52.259 EST >LOG: started streaming WAL from primary at F/52000000 on timeline 1
< 2015-01-23 23:59:52.260 EST >FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010000000F00000052 has already been removed
< 2015-01-23 23:59:57.270 EST >LOG: started streaming WAL from primary at F/52000000 on timeline 1
< 2015-01-23 23:59:57.270 EST >FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010000000F00000052 has already been removed
After some analysis and help on the #postgresql IRC channel, I've come to the conclusion that the slave cannot keep up with the master. My proposed solution is as follows.
On the master:
Set max_wal_senders=5
Set wal_keep_segments=4000 . Yes I know it is very high, but I'd like to monitor the situation and see what happens. I have room on the master.
On the slave:
Save configuration files in the data directory (i.e. pg_hba.conf pg_ident.conf postgresql.conf recovery.conf)
Clear out the data directory (rm -rf /var/lib/pgsql/9.3/data/*) . This seems to be required by pg_basebackup.
Run the following command:
pg_basebackup -h master -D /var/lib/pgsql/9.3/data --username=replication --password
Am I missing anything ? Is there a better way to bring the slave up-to-date w/o having to reload all the data ?
Any help is greatly appreciated.
The two important options for dealing with the WAL for streaming replication:
wal_keep_segments should be set high enough to allow a slave to catch up after a reasonable lag (e.g. high update volume, slave being offline, etc...).
archive_mode enables WAL archiving which can be used to recover files older than wal_keep_segments provides. The slave servers simply need a method to retrieve the WAL segments. NFS is the simplest method, but anything from scp to http to tapes will work so long as it can be scripted.
# on master
archive_mode = on
archive_command = 'cp %p /path_to/archive/%f'
# on slave
restore_command = 'cp /path_to/archive/%f "%p"'
When the slave can't pull the WAL segment directly from the master, it will attempt to use the restore_command to load it. You can configure the slave to automatically remove segments using the archive_cleanup_commandsetting.
If the slave comes to a situation where the next WAL segment it needs is missing from both the master and the archive, there will be no way to consistently recover the database. The only reasonable option then is to scrub the server and start again from a fresh pg_basebackup.
You can configure replication slots for postgress to keep WAL segments for replica mentioned in such slot.
Read more at https://www.percona.com/blog/2018/11/30/postgresql-streaming-physical-replication-with-slots/
On master server run
SELECT pg_create_physical_replication_slot('standby_slot');
On slave server add next line to recovery.conf
primary_slot_name = 'standby_slot'
actually to recover, you don't have to drop the whole DB and start from scratch. since master has up-to-date binary, you can do following to recover the slave and bring them back to in-sync:
psql -c "select pg_start_backup('initial_backup');"
rsync -cva --inplace --exclude=*pg_xlog* <data_dir> slave_IP_address:<data_dir>
psql -c "select pg_stop_backup();"
Note:
1. slave has to be turned down by service stop
2. master will turn to read-only due to query pg_start_backup
3. master can continue serving read only queries
4. bring back slave at the end of the steps
I did this in prod, it works perfect for me.
slave and master are in sync and there is no data loss.
You will get that error if keep_wal_segments setting is too low.
When you set the value for keep_wal_segments consider that "How long is the pg_basebackup taking?"
Remember that segments are generated about every 5 minutes, so if the backup takes an hour, you need at least 12 segments saved. At 2 hours, you need 24, etc. I would set the value to about 12.2 segments/hour of backup.
As Ben Grimm suggested in the comments, this is a question of making sure to set segments to the maximum possible value to allow the slave to catch up.
I am using pg_trgm extension for fuzzy search. The default threshold is 0.3 as show in:
# select show_limit();
show_limit
------------
0.3
(1 row)
I can change it with:
# select set_limit(0.1);
set_limit
-----------
0.1
(1 row)
# select show_limit();
show_limit
------------
0.1
(1 row)
But when I restart my session, the threshold is reset to default value:
# \q
$ psql -Upostgres my_db
psql (9.3.5)
Type "help" for help.
# select show_limit();
show_limit
------------
0.3
(1 row)
I want to execute set_limit(0.1) every time I start postgresql. Or in other words, I want to set 0.1 as default value for threshold of pg_trgm extension. How do I do that?
This has been asked before:
Set default limit for pg_trgm
The initial setting is hard coded in the source. One could hack the source and recompile the extension.
To address your comment:
You could put a command in your psqlrc or ~/.psqlrc file. A plain and simple SELECT command in a separate line:
SELECT set_limit(0.1)
Be aware that the additional module is installed per database, while psql can connect to any database cluster and any database within that cluster. It will cause an error message when connecting to any database where pg_trgm is not installed. Nothing bad will happen, though.
On the other hand, connecting with any other client will not set the limit, this may be a bit of a trap.
pg_trgm should really provide a config setting ...
This is specifically about maintaining confidence in using various replication solutions that you'd be able to failover to the other server without data loss. Or in a master-master situation that you could know within a reasonable amount of time if one of the databases has fallen out of sync.
Are there any tools out there for this, or do people generally depend on the replication system itself to warn over inconsistencies? I'm currently most familiar with postgresql WAL shipping in a master-standby setup, but am considering a master-master setup with something like PgPool. However, as that solution is a little less directly tied with PostgreSQL itself (my basic understanding is that it provides the connection an app would use, thus intercepting the various SQL statements, and would then send them on to whatever servers were in its pool), it got me thinking more about actually verifying data consistency.
Specific requirements:
I'm not talking about just table structure. I'd want to know that actual record data is the same, so that I'd know if records were corrupted or missed (in which case, I would re-initialize the bad database with a recent backup + WAL files before bringing it back into the pool)
Databases are in the order of 30-50 GB. I'm doubting that raw SELECT queries would work very well.
I don't see the need for real-time checking (though it would, of course, be nice). Hourly or even daily would be better than nothing.
Block-level checking wouldn't work. It would be two databases with independent storage.
Or is this type of verification simply not realistic?
You can check the current WAL locations on both the machines...
If they represent the same value, that means your underlying databases are consistent with each other...
$ psql -c "SELECT pg_current_xlog_location()" -h192.168.0.10 (do it on primary host)
pg_current_xlog_location
--------------------------
0/2000000
(1 row)
$ psql -c "select pg_last_xlog_receive_location()" -h192.168.0.20 (do it on standby host)
pg_last_xlog_receive_location
-------------------------------
0/2000000
(1 row)
$ psql -c "select pg_last_xlog_replay_location()" -h192.168.0.20 (do it on standby host)
pg_last_xlog_replay_location
------------------------------
0/2000000
(1 row)
you can also check this with the help of walsender and walreceiver processes:
[do it on primary] $ ps -ef | grep sender
postgres 6879 6831 0 10:31 ? 00:00:00 postgres: wal sender process postgres 127.0.0.1(44663) streaming 0/2000000
[ do it on standby] $ ps -ef | grep receiver
postgres 6878 6872 1 10:31 ? 00:00:01 postgres: wal receiver process streaming 0/2000000
If you are looking for the whole table you should be able to do something like this (assuming a table that quite easily fits in RAM):
SELECT md5(array_to_string(array_agg(mytable), ' '))
FROM mytable order by id;
That will give you a hash on the tuple representation on the tables.
Note that you could break this down by ranges, etc. Depending on the type of replication you could even break it down by page range (for streaming replication).