I am trying to backup a live postgres database using pg_dump, however when ever I attempt to do so it causes things to blow up.
I have many live queries aggressively reading from a materialized view (it's a cache), which is aggressively refreshed (every minute or more).
I am starting to believe that pg_dump blocks the REFRESH MATERIALIZED VIEW from occurring, which blocks the reads from the materialized view, which causes things to explode.
Is this line of reasoning correct? What other operations does pg_dump block, and how should I do the backup?
From the documentation (emphasis mine):
pg_dump does not block other users accessing the database (readers or writers).
Your problem lies elsewhere.
Related
I'm running backup restore on a schema every day and get this every now and then:
pg_dump: Error message from server: ERROR: relation not found (OID
86157003) DETAIL: This can be validly caused by a concurrent delete
operation on this object. pg_dump: The command was: LOCK TABLE
myschema.products IN ACCESS SHARE MODE
How can this be avoided? It seems the table was being used at the time, or someone was running something against the table. can I just kill all connections to the DB before restoring or is there another alternative?
As far as I understand, pg_dump could run even if users are doing something with the table but it doesn't seem to be the case.
Thanks,
It is somewhat buried but the answer lies here:
https://www.postgresql.org/docs/current/app-pgdump.html
"
-j njobs
...
To detect this conflict, the pg_dump worker process requests another shared lock using the NOWAIT option. If the worker process is not granted this shared lock, somebody else must have requested an exclusive lock in the meantime and there is no way to continue with the dump, so pg_dump has no choice but to abort the dump.
"
Which is borne out by the this in the error message:
"LOCK TABLE myschema.products IN ACCESS SHARE MODE"
ACCESS SHARE will cooperate with all other locks modes except ACCESS EXCLUSIVE. ACCESS EXCLUSIVE is used by DROP TABLE, TRUNCATE, REINDEX, etc. See here Locks for more information. So you need to do the dump during a time where the operations listed for ACCESS EXCLUSIVE are known to not happen or by blocking/dropping connections.
Somebody dropped a table between the time pg_dump took an inventory of the tables and the time it tries to dump the table.
This can happen if your application is in the habit of dropping tables all the time.
This is not an answer to your main question, but a caution regarding:
As far as I understand, pg_dump could run even if users are doing something with the table but it doesn't seem to be the case.
It assumes that the application performs every action in a single transaction. I have known of applications which accomplish some tasks using more than one.
I don't know exactly what the tasks were or if it was unavoidable that they use multiple transactions, but dumps could only be trusted when the application was idle or, better yet, when the service was stopped.
For the function that those applications performed, it wasn't a big deal to work around down times or stop services.
I don't know how you'd determine this behaviour without being told by the developers. Just something to consider.
Loading a large table (150GB+) from Teradata into PostgreSQL 9.6 on AWS. Conventional load was extremely slow so I set the table to "unlogged" and the load is going much faster. Once loaded, this database will be strictly read only for archival purposes. Is there any need to alter it back to logged when loaded? I understand that process takes a lot of time and would like to avoid that time if possible. we will be taking a backup of the data once this table is loaded and the data verified.
Edit: I should note I am using the copy command reading from a named pipe for this.
If you leave the table UNLOGGED, it will be empty after PostgreSQL recovers from a crash. So that is a bad idea if you need these data.
We are using Postgres (9.3) Hot Standby to build a read-only copy of a database. We have a UI which reads from a materialized view.
When we try to read from the materialized view in the standby database, the query hangs.
The materialized view takes ~10 seconds to rebuild in the master database. We've waited over 30 minutes for the query in the standby database and it seems to never complete.
Notably, the materialized view does exist in the standby database. We can't refresh it of course (since the DB is read only)
We can't find anything in the documentation which indicates that materialized views can't be used in standby databases, but that appears to be the case.
Has anyone got this to work, and/or what is the recommended work-around?
According to PostgreSQL Documentation - Hot Standby there is a way to handle Query Conflicts by assigning proper values to
max_standby_archive_delay & max_standby_streaming_delay
that define the maximum allowed delay in WAL application. In your case a high value may be preferable.
Installed postgres 9.1 in both the machine.
Initially the DB size is 7052 MB then i used the following command for copy to another server.
pg_dump -C dbname | bzip2 | ssh remoteuser#remotehost "bunzip2 | psql dbname"
After successfully copies, In destination machine i check size it shows 6653 MB.
then i checked for table count its same.
Has there been data loss? Is there missing data?
Note:
Two machines have same hardware and software configuration.
i used:
SELECT pg_size_pretty(pg_database_size('dbname'));
One of the PostgreSQL's most sophisticated features is so called Multi-Version Concurrency Control (MVCC), a standard technique for avoiding conflicts between reads and writes of the same object in database. MVCC guarantees that each transaction sees a consistent view of the database by reading non-current data for objects modified by concurrent transactions. Thanks to MVCC, PostgreSQL has great scalability, a robust hot backup tool and many other nice features comparable to the most advanced commercial databases.
Unfortunately, there is one downside to MVCC, the databases tend to grow over time and sometimes it can be a problem. In recent versions of PostgreSQL there is a separate server process called the autovacuum daemon (pg_autovacuum), whose purpose is to keep the database size reasonable. It does that by trying to recover reusable chunks of the database files. Still, there are many scenarios that will force the database to grow, even if the amount of the useful data in it doesn't really change. That happens typically if you have lots of UPDATE and/or DELETE statements in the applications that are using the database.
When you do a COPY, you recover extraneous space and so your copied DB appears smaller.
That looks normal. Databases are often smaller after restore, because a newly created b-tree index is more compact than one that's been progressively built by inserts. Additionally, UPDATEs and DELETEs leave empty space in the tables.
So you have nothing to worry about. You'll find that if you diff an SQL dump from the old DB and a dump taken from the just-restored DB, they'll be the same except for comments.
I am considering log-shipping of Write Ahead Logs (WAL) in PostgreSQL to create a warm-standby database. However I have one table in the database that receives a huge amount of INSERT/DELETEs each day, but which I don't care about protecting the data in it. To reduce the amount of WALs produced I was wondering, is there a way to prevent any activity on one table from being recorded in the WALs?
Ran across this old question, which now has a better answer. Postgres 9.1 introduced "Unlogged Tables", which are tables that don't log their DML changes to WAL. See the docs for more info, but at least now there is a solution for this problem.
See Waiting for 9.1 - UNLOGGED tables by depesz, and the 9.1 docs.
Unfortunately, I don't believe there is. The WAL logging operates on the page level, which is much lower than the table level and doesn't even know which page holds data from which table. In fact, the WAL files don't even know which pages belong to which database.
You might consider moving your high activity table to a completely different instance of PostgreSQL. This seems drastic, but I can't think of another way off the top of my head to avoid having that activity show up in your WAL files.
To offer one option to my own question. There are temp tables - "temporary tables are automatically dropped at the end of a session, or optionally at the end of the current transaction (see ON COMMIT below)" - which I think don't generate WALs. Even so, this might not be ideal as the table creation & design will be have to be in the code.
I'd consider memcached for use-cases like this. You can even spread the load over a bunch of cheap machines too.