I have xlog questions that I'm not sure about.
1) I have two servers that were once slaves. How can I know if they were slaves of the same master? Is it possible to check if they were split from the same source in the past? I know pg_rewind knows how to check if, but is it possible to easily check it without running pg_rewind in dry run mode?
2) Is it true that if pg_last_xlog_replay_location is empty this server was never a slave?
3) Is it possible to know from the database itself to which master the slave is connected? I know to get this info from the recovery.conf or from the process attributes, but is it written in some system tables as well?
Thanks
Avi
were slaves of the same master
indirectly. you can compare select xmin,ctid,oid, datname from pg_database. of course dropping and creating postgres and template databases will change those, so this is very unreliable. but if you check those and find that ALL identifiers match - there's a good change that databases have same source.
more reliable and sophisticated method is comparing history file. Eg - if both ex slaves have same timeline, eg in case below 4:
-bash-4.2$ psql -d 'dbname=replication replication=true sslmode=require' -U replica -h 1.1.1.1 -c 'IDENTIFY_SYSTEM'
Password for user replica:
systemid | timeline | xlogpos
---------------------+----------+--------------
9999384298900975599 | 4 | F79/275B2328
(1 row)
you can check timelines history:
-bash-4.2$ psql -d 'dbname=replication replication=true sslmode=require' -U replica -h 1.1.1.1 -c 'TIMELINE_HISTORY 4'
Password for user replica:
filename | content
------------------+------------------------------------------------------
00000004.history | 1 9E/C3000090 no recovery target specified+
| +
| 2 C1/5A000090 no recovery target specified+
| +
| 3 A52/DB2F98B8 no recovery target specified+
|
(1 row)
If both servers have same timeline and same xlog position at which a timeline was created, you can say with much reliability, I believe, that came from same sourse.
empty pg_last_xlog_replay_location
I would say so. It was never a slave and was never recovered from WALs. At least I don't know how to reset pg_last_xlog_replay_location on promoted master...
system tables to tell to which master the slave is connected
Nothing suitable comes to my mind. If you are SU then you can read recovery.conf even without shell access, if you're not, you probably would not be able to select such a view...
Related
I'm still a relative newbie to Postgresql, so pardon if this is simple ignorance.
I've setup a active/read-only pacemaker cluster of Postgres v9.4 per the cluster labs documentation.
I'm trying to verify that both databases are indeed in sync. I'm doing the dump on both hosts and checking the diff between the output. The command I'm using is:
pg_sql -U myuser mydb >dump-node-1.sql
Pacemaker shows the database status as 'sync' and querying Postgres directly also seems to indicate the sync is good... (Host .59 is my read-only standby node)
psql -c "select client_addr,sync_state from pg_stat_replication;"
+---------------+------------+
| client_addr | sync_state |
+---------------+------------+
| 192.16.111.59 | sync |
+---------------+------------+
(1 row)
However, when I do a dump on the read-only host I end up with all my tables having 'public.' added to the front of the names. So table foo on the master node dumps as 'foo' whereas on the read-only node it dumps as 'public.foo'. I don't understand why this is happening... I had done a 9.2 Postgresql cluster in a similar setup and didn't see this issue. I don't have tables in the public schema on the master node...
Hope someone can help me understand what is going on.
Much appreciated!
Per a_horse_with_no_name, the security updates in 9.4.18 changed the way the dump is written compared to 9.4.15. I didn't catch that one node was still running an older version. The command that identified the problem was his suggestion to run:
psql -c "select version();"
After adding a node to test scaling then removing that node with cloudbreak, the service ambari-server won't restart.
The error at launch is:
DB configs consistency check failed. Run "ambari-server start --skip-database-check" to skip. You may try --auto-fix-database flag to attempt to fix issues automatically. If you use this "--skip-database-check" option, do not make any changes to your cluster topology or perform a cluster upgrade until you correct the database consistency issues. See /var/log/ambari-server/ambari-server-check-database.log for more details on the consistency issues.
Looking the logs doesn't say much more. I tried restarting postgres, sometimes it works, like 1 on 10 times (HOW is it possible ?)
I went deeper in my reasonning rather than just restarting postgres.
I opened the ambari table to look in it:
sudo su - postgres
psql ambari -U ambari -W -p 5432
(password is bigdata)
and when I looked in tables topology_logical_request, topology_request and topology_hostgroup, I saw that the cluster register a remove request, only an adding request:
ambari=> select * from topology_logical_request;
id | request_id | description
----+------------+-----------------------------------------------------------
1 | 1 | Logical Request: Provision Cluster 'sentelab-perf'
62 | 51 | Logical Request: Scale Cluster 'sentelab-perf' (+1 hosts)
Check the ids to delete (track all requests with adding node operation) and begin to delete them (order matters):
delete from topology_hostgroup where id = 51;
delete from topology_logical_request where id = 62;
DELETE FROM topology_request WHERE id = 51;
close with \q, restart ambari-server, and it works !
I am very new to postgres and being new I got stuck at a point and need some help, please pardon if you find it silly.
I am doing a pgpool HA and at postgres level i have streaming replication between 3 nodes of postgresql-9.5 - 1 master and 2 slaves
I was trying to configure auto failover but when i switched back to my original master, and restarted the postgres service, I am getting the following error:
slave 1-highest timeline 1 of the primary is behind recovery timeline 11
slave 2-highest timeline 1 of the primary is behind recovery timeline 10
slave 3-highest timeline 1 of the primary is behind recovery timeline 3
I tried deleting pg_xlog files in slaves and copying all the files from master pg_xlog into the slaves and then did a rsync.
i also did a pg_rewind but it says:
target server needs to use either data checksums or wal_log_hints = on
(I have wal_log_hints = on set in postgresql.conf already)
I've tried doing a pg_basebackup but since the data base server in slaves are still starting up its not able to connect to the server
Is there any way to bring the master and the slave at a same timeline?
In my case, it happened because ( experimentally ), I updated the standby database tables and again when I simulate the master-standby streaming replication I got the same errors.
So once again I cleaned the whole standby database directory and migrate the master database using cmd like
"pg_basebackup -P -R -X stream -c fast -h 10.10.40.105 -U postgres -D standby/"
I think something is wrong in your pgpool configuration. What tool you have been using for manement of replication and master-slave control? Is it post master or repmgr?
I was trying to configure pgpool with 3 data nodes using a tutorial from http://jensd.be/591/linux/setup-a-redundant-postgresql-database-with-repmgr-and-pgpool and have done it correctly.
Also you can lean auto failover here.
(These question is obviously duplicate of this one, so I'll repeat the answer also.)
I'm not sure what you exactly mean by "when i switched back to my original master", but it looks that you are doing the wrongest possible thing in PostgreSQL streaming replication - introducing the second master.
The most important thing you should know about PostgreSQL replication is that once the failover is performed, you cannot simply "switch back to original master" - there's now a new master in cluster, and existence of two masters will make damage.
After a slave is promoted to master, the only way for you to re-join the old master is to:
Destroy it (delete the data directory);
Join it as a slave.
If you want it to be master again you'll continue with the following:
Let it run for awhile as a slave so that it can sync the data;
Kill temporary master and failover to old master;
Rejoin temporary master again as a slave.
You cannot simply switch master servers! Master can be created ONLY by failover (promoting a slave)
You should also know that whenever you are performing failover (whenever the master is changed), all slaves (except for the one that is promoted) need to be reconfigured to target the new master.
I suggest you reading this tutorial - it'll help.
I'm trying to move a database from server1 to server2. I read docummentation of postgres, and I think everything is right except that after I dumped db from server1 moved it and restored on server2 the sizes are different.
Server1
SELECT pg_size_pretty(pg_database_size('db_name'));
pg_size_pretty
----------------
118 MB
(1 row)
Server2
select pg_size_pretty(pg_database_size('db_name'));
pg_size_pretty
----------------
81 MB
(1 row)
I've made the dump with -a -Fc -Z9 flags and restore with pg_restore -U user -c -d db_name dump_file.dump
My questions are:
Why the sizes are different?
What is the correct approach to move a database like this if the application that access the db is a rails one? (I mean, I want a restore that doesn't affect future rails migrations)
Do you have other ideas? Other docummentation that I can read?
Thank you for reading this.
This is fine and normal.
Dump and reload produces a more compact database because there's no dead space in the tables and the b-tree indexes are newly reindexed so they're packed and well balanced. You'll find the size is the same or much closer if you:
VACUUM FULL;
REINDEX DATABASE mydb;
on the main DB.
On a side note, though, I strongly recommend restoring using the -1 option to pg_restore unless you need parallel restore. That way you'll either get an empty DB or a complete restore. Of course, you should also always check the return codes from pg_dump and pg_restore.
No comment on the Rails part, I don't know what you're referring to. Please don't do multi-questions like this, they're hard to answer definitively and you get different "correct" answers in different parts. Post a new SO question for a new question.
This is specifically about maintaining confidence in using various replication solutions that you'd be able to failover to the other server without data loss. Or in a master-master situation that you could know within a reasonable amount of time if one of the databases has fallen out of sync.
Are there any tools out there for this, or do people generally depend on the replication system itself to warn over inconsistencies? I'm currently most familiar with postgresql WAL shipping in a master-standby setup, but am considering a master-master setup with something like PgPool. However, as that solution is a little less directly tied with PostgreSQL itself (my basic understanding is that it provides the connection an app would use, thus intercepting the various SQL statements, and would then send them on to whatever servers were in its pool), it got me thinking more about actually verifying data consistency.
Specific requirements:
I'm not talking about just table structure. I'd want to know that actual record data is the same, so that I'd know if records were corrupted or missed (in which case, I would re-initialize the bad database with a recent backup + WAL files before bringing it back into the pool)
Databases are in the order of 30-50 GB. I'm doubting that raw SELECT queries would work very well.
I don't see the need for real-time checking (though it would, of course, be nice). Hourly or even daily would be better than nothing.
Block-level checking wouldn't work. It would be two databases with independent storage.
Or is this type of verification simply not realistic?
You can check the current WAL locations on both the machines...
If they represent the same value, that means your underlying databases are consistent with each other...
$ psql -c "SELECT pg_current_xlog_location()" -h192.168.0.10 (do it on primary host)
pg_current_xlog_location
--------------------------
0/2000000
(1 row)
$ psql -c "select pg_last_xlog_receive_location()" -h192.168.0.20 (do it on standby host)
pg_last_xlog_receive_location
-------------------------------
0/2000000
(1 row)
$ psql -c "select pg_last_xlog_replay_location()" -h192.168.0.20 (do it on standby host)
pg_last_xlog_replay_location
------------------------------
0/2000000
(1 row)
you can also check this with the help of walsender and walreceiver processes:
[do it on primary] $ ps -ef | grep sender
postgres 6879 6831 0 10:31 ? 00:00:00 postgres: wal sender process postgres 127.0.0.1(44663) streaming 0/2000000
[ do it on standby] $ ps -ef | grep receiver
postgres 6878 6872 1 10:31 ? 00:00:01 postgres: wal receiver process streaming 0/2000000
If you are looking for the whole table you should be able to do something like this (assuming a table that quite easily fits in RAM):
SELECT md5(array_to_string(array_agg(mytable), ' '))
FROM mytable order by id;
That will give you a hash on the tuple representation on the tables.
Note that you could break this down by ranges, etc. Depending on the type of replication you could even break it down by page range (for streaming replication).