Why Restore doesnt end? - postgresql

First I thought was too big restore, so instead of a single 2GB (compress) db backup I split into several backup, one for schema. This schema map has 600 Mb. Next step would be split for tables.
This one have some spatial data from my country map, not sure if that is relavant.
As you can see almost 2h. The disk arent really in use anymore. When restore start, disk reach 100% several times. But last hour has been flat 0%
And as you can see here I can access the data in the all the restored tables. So looks like is already done.
Is this normal?.
There is anything I can check to see what is doing the restore?
Hardware Setup:
Core i7 # 3.4 GHz - 24 GB Ram
DB on 250 gb SSD Backup files in SATA disk
EDIT
SELECT application_name, query, *
FROM pg_stat_activity
ORDER BY application_name, query;

Yes, that seems perfectly normal.
Most likely you observe index or constraint creation. Look at the output of
SELECT * FROM pg_stat_activity;
to confirm that (it should contain CREATE INDEX or ALTER TABLE).
It is too late now, but increasing maintenance_work_mem will speed up index creation.

Related

PostgreSQL V12 create temp table runs out of shared memory, can I create on disk?

Running on RHEL 7
PostgreSQL Version 12
System has 28G Memory, and 12G shared Memory
The DB uses over 6T on disk
Some rows have around 300 million rows.
Moved my DB from version 9 to version 12 and am running tests on the new DB. We have a process that generates summary data in a temporary table and then we query the temporary table for different things, and then we delete the temporary table - much faster than running very similar queries multiple times is why this was done.
They query is similar to this:
CREATE TEMPORARY TABLE
XXX
AS
SELECT
COUNT(t.id) AS count,
t.tagged AS tagged,
t.tag_state AS tag_state,
t.error AS error,
td.duplicate AS duplicate
FROM
ttt t
INNER JOIN tweet_data td ON (td.tweet_id = t.id)
GROUP BY
t.tagged,
t.tag_state,
t.error,
td.duplicate;
Note that this works fine on V9, but, I have not watched it very carefully on V9 to see what it does. On V12, shared memory usage grows slowly and then after about 15 minutes it kicks into high gear, grows to about 12G and then tries to make it bigger and failes:
The error is:
ERROR: could not resize shared memory segment "/PostgreSQL.868719775" to 2147483648 bytes: No space left on device
On a whim, we ran just the select statement without creating the temporary table and it also failed while shared memory was increasing, but, the error message said that it was killed by admin.
I am currently running vacuum against the DB to see if that helps.
The largest concern is that this does work with V9, but fails on V12. I also know that they query engine is very different and new in V12 compared to V9.
I had some crazy hope that running vacuum in stages would make a difference. The data was migrated using pg_upgrade.
vacuumdb -U postgres -p 5431 --all --analyze-in-stages
I don't know if the temporary table is created or not, but, after running vacuum, we ran the full query again creating the temp table and it also failed.
Any thoughts? Is my only choice to try more shared memory?
These shared memory segments are used for communication between worker processes with parallel query.
PostgreSQL seems to be tight on resources, and while the error is a symptom rather than the cause of the problem, you can improve the situation by disabling parallel query for this statement:
SET max_parallel_workers_per_gather = 0;
Then your query will take more time, but use less resources, which might be enough to get rid of the problem.
In the long run, you should review your configuration, which might be too generous with memory or the number of connections, but I cannot diagnose that from here.

Heroku Postgres database size doesn't go down after deleting rows

I'm using a Dev level database on Heroku that was about 63GB and approaching about 9.9 million rows (close to the limit of 10 million for this tier). I ran a script that deleted about 5 million rows I didn't need, and now (few days later) in the Postgres control panel/using pginfo:table-size it shows roughly 4.7 million rows but it's still at 63GB. 64 is the limit for he next tier so I need to reduce the size.
I've tried vacuuming but pginfo:bloat said the bloat was only about 3GB. Any idea what's happening here?
If you have [vacuum][1]ed the table, don't worry about the size one disk still remaining unchanged. The space has been marked as reusable by new rows. So you can easily add another 4.7 million rows and the size on disk wont grow.
The standard form of VACUUM removes dead row versions in tables and
indexes and marks the space available for future reuse. However, it
will not return the space to the operating system, except in the
special case where one or more pages at the end of a table become
entirely free and an exclusive table lock can be easily obtained. In
contrast, VACUUM FULL actively compacts tables by writing a complete
new version of the table file with no dead space. This minimizes the
size of the table, but can take a long time. It also requires extra
disk space for the new copy of the table, until the operation
completes.
If you want to shrink it on disk, you will need to VACUUM FULL which locks the tables and needs as much extra space as the size of the tables when the operation is in progress. So you will have to check your quota before you try this and your site will be unresponsive.
Update:
You can get a rough idea about the size of your data on disk by querying the pg_class table like this:
SELECT SUM(relpages*8192) from pg_class
Another method is a query of this nature:
SELECT pg_database_size('yourdbname');
This link: https://www.postgresql.org/docs/9.5/static/disk-usage.html provides additional information on disk usage.

Database table size did not decrease proportionately

I am working with a PostgreSQL 8.4.13 database.
Recently I had around around 86.5 million records in a table. I deleted almost all of them - only 5000 records are left. I ran
reindex
and
vacuum analyze
after deleting the rows. But I still see that the table is occupying a large disk space:
jbossql=> SELECT pg_size_pretty(pg_total_relation_size('my_table'));
pg_size_pretty
----------------
7673 MB
Also, the index value of the remaining rows are pretty high still - like in the million range. I thought after vacuuming and re-indexing, the index of the remaining rows would start from 1.
I read the documentation and it's pretty clear that my understanding of re-indexing was skewed.
But nonetheless, my intention is to reduce the table size after delete operation and bring down the index values so that the read operations (SELECT) from the table does not take that long - currently it's taking me around 40 seconds to retrieve just one record from my table.
Update
Thanks Erwin. I have corrected the pg version number.
vacuum full
worked for me. I have one follow up question here:
Restart primary key numbers of existing rows after deleting most of a big table
To actually return disk space to the OS, run VACUUM FULL.
Further reading:
VACUUM returning disk space to operating system

Is killing a "CLUSTER ON index" dangerous for database?

All the question is in the title,
if we kill a cluster query on a 100 millions row table, will it be dangerous for database ?
the query is running for 2 hours now, and i need to access the table tomorrow morning (12h left hopefully).
I thought it would be far quicker, my database is running on raid ssd and Bi-Xeon Processor.
Thanks for your wise advice.
Sid
No, you can kill the cluster operation without any risk. Before the operation is done, nothing has changed to the original table- and indexfiles. From the manual:
When an index scan is used, a temporary copy of the table is created
that contains the table data in the index order. Temporary copies of
each index on the table are created as well. Therefore, you need free
space on disk at least equal to the sum of the table size and the
index sizes.
When a sequential scan and sort is used, a temporary sort file is also
created, so that the peak temporary space requirement is as much as
double the table size, plus the index sizes.
As #Frank points out, it is perfectly fine to do so.
Assuming you want to run this query in the future and assuming you have the luxury of a service window and can afford some downtime, I'd tweak some settings to boost the performance a bit.
In your configuration:
turn off fsync, for higher throughput to the file system
Fsync stands for file system sync. With fsync on, the database waits for the file system to commit on every page flush.
maximize your maintenance_work_mem
It's ok to just take all memory available, as it will not be allocated during production hours. I don't know how big your table and the index you are working on are, things will run faster when they can be fully loaded in main memory.

PostgreSQL Long VACUUM

I am currently cleaning up a table with 2 indexes and 250 million active rows and approximately as many dead rows (or more). I issued the command VACCUM FULL ANALYSE from my client computer (laptop) to my server. It has been going about its business for the last 3-4 days or so; I am wondering if it will end anytime soon for I have much work to do!
The server has a quad-code Xeon 2.66 GHz processor, 12 GB or RAM and a RAID controller attached to 2 x 10K rpm 146 GB SAS HDs in a RAID 1 configuration; it is running Suse Linux. I am wondering...
Now, firstly the VACUUM postmaster process seems to be making use of only one core. Secondly, I am not seeing a very high I/O writes to I/O idle time ratio. Thirdly, from calling procinfo, I can extrapolate that the VACUUM process spends most of its time (88%) waiting for I/0.
So why isn't it utilizing more cores through threads in order to overload the RAID controller (get high I/O writes to idle ratio)? Why is it waiting for I/O if the I/O load isn't high? Why is it not going faster with all this power/resources at its fingers? It seems to me that VACUUM can and should be multithreaded, especially if it is working on a huge table and it is the only one working!
Also, is their a way to configure postgresql.conf to let it multithread such VACUUMs? Can I kill it and still benefit from its partial clean-up? I need to work on that table.
[I am using PostgreSQL 8.1]
Thx again
You don't say what version of PostgreSQL you are using. Is it possible it is pre-8.0?
I had this exact same scenario. Your best best:
kill the vacuum
back up the table with pg_dump -t option
drop the table
restore the table
If you are using 8.x, look at the autovacuum options. Vacuum is single threaded, there's nothing you can do to make it use multiple threads.
Some quick tips:
Run VACUUM FULL VERBOSE so you can se what is going on.
Drop all indexes before the VACUUM. It's faster to rebuild them than vacuum them. You also need to rebuild them now and then because VACUUM FULL isn't good enough (especially on such an old PosgreSQL as 8.1).
Set the maintenance_work_mem really high.
Use a newer PostgreSQL. Btw, 8.4 will have an huge improvement in vacuuming.
An alternative to VACUUM is to dump and restore.
Edit: Since 9.0 VACUUM FULL rewrites the whole table. It's basically the same thing as doing a dump + restore, so running REINDEX is unnecessary.
Are you sure you don't have anything ongoing that could lock tables and prevent vacuum from running ?
(Anyway, it's best to use vacuum_cost_delay so that vacuum is not disruptive to production.)
Old VACUUM FULL is a fossil. It's pretty slow too, and you got to REINDEX afterwards. Don't use it. If you really want to defrag a table, use CLUSTER, or this :
Lettssay you have some disk space left, that's much faster than dump&reload :
CREATE TABLE newtable AS SELECT * FROM oldtable;
CREATE INDEX bla ON newtable( ... );
ALTER TABLE oldtable RENAME TO archive;
ALTER TABLE newtable RENAME TO oldtable;
Note this will not copy your constraints. You can use CREATE TABLE LIKE ... to copy them.
So why isn't it utilizing more cores through threads
pg doesn't support this.