How can I determine whether a file in a PostgreSQL data directory is used in the database? - postgresql

I have a situation in which summing the size of the tables in a tablespace (using pg_class among others) reveals that there is 550G of datafiles in a particular tablespace in a particular database.
However, there is 670G of files in that directory on the server.
FWIW, I don't know how that can be. No files have been written to that directory via any mechanism other than Postgres. My best guess is perhaps the database crashed while an autovacuum was going on, leaving orphan files laying around...does that sound plausible?)
SO, I've worked out a way, by reading the contents of a ls command into the database, strip off the numeric extensions for tables > 1G in size, and compare them with the contents of pg_class, and have, in fact, found about 120G of files not reflected in pg_class.
My question is, is it safe for me to delete these files, or could they be in active use by the database but not reflected in pg_class?

Do not manually delete files in the PostgreSQL data directory.
This is not safe and will corrupt your database.
The safe way to purge any files that don't belong to the database is to perform a pg_dumpall, stop the server, remove the data directory and the contents of all tablespace directories, breate a new cluster with inindb and restore the dump.
If you want to investigate the issue, you could try to create a new tablespace and move everything from the old to the new tablespace. I will describe that in the rest of my answer.
Move all the tables and indexes in all databases to the new tablespace:
ALTER TABLE ALL IN TABLESPACE oldtblsp SET TABLESPACE newtblsp;
ALTER INDEX ALL IN TABLESPACE oldtblsp SET TABLESPACE newtblsp;
If oldtblsp is the default tablespace of a database:
ALTER DATABASE mydb SET TABLESPACE newtblsp;
Then run a checkpoint:
CHECKPOINT;
Make sure you forgot no database:
SELECT datname
FROM pg_database d
JOIN pg_tablespace s
ON d.dattablespace = s.oid
WHERE s.spcname = 'oldtblsp';
Make sure that there are no objects in the old tablespace by running this query in all databases:
SELECT t.relname, t.relnamespace::regnamespace, t.relkind
FROM pg_class t
JOIN pg_tablespace s
ON t.reltablespace = s.oid
WHERE s.spcname = 'oldtblsp';
This should return no results.
Now the old tablespace should be empty and you can
DROP TABLESPACE oldtblsp;
If you really get an error
ERROR: tablespace "tblsp" is not empty
there might be some files left behind.
Delete them at your own risk...

Related

Restore or some recreate a tablespace to another new tablespace on the side or empty db with only

On db2 LUW LInux: Is there a possibility to recreate tablespace somewhere aside pointing to a different tablespace name created only for the purpose of unloading data from the table. I'd like to be able to extract data or restore data from a damaged table so that you don't have to recreate the entire database, which usually takes a lot of disk space and time
You can't do it back into the original db, but you could create a separate db that contains just the tablespace with the data you're interested in:
db2 restore db foo rebuild with tablespace ( syscatspace, mytbsp )
db2 rollforward db foo to end of logs and stop
db2 export to mytable.del of del select * from mytable

What is stored in pg_default tablespace?

I have ~2.5 TB database, which is divided into tablespaces. The problem is that ~250 GB are stored in pg_defalut tablespace.
I have 3 tables and 6 tablespaces: 1 per each table and 1 for its index. Each tablespace directory is not empty, so there are no missing tablespaces for some tables/indexes. But the size of data/main/base/OID_of_database directory is about 250 GB.
Can anyone tell me what is stored there, is it OK, and if not, how can I move it to tablespace?
I am using PostgreSQL 10.
Inspect the base subdirectory of the data directory. It will contain number directories that correspond to your databases and perhaps a pgsql_tmp directory.
Find out which directory contains the 250GB. Map directory names to databases using
SELECT oid, datname
FROM pg_database;
Once you have identified the directory, change into it and see what it contains.
Map the numbers to database objects using
SELECT relname, relkind, relfilenode
FROM pg_class;
(Make sure you are connected to the correct database.)
Now you know which objects take up the space.
If you had frequent crashes during operations like ALTER TABLE or VACUUM (FULL), the files may be leftovers from that. They can theoretically be deleted, but I wouldn't do that without consulting with a PostgreSQL expert.

check size of relation being build in Postgres

I have a loaded OLTP db. I ALTER TABLE.. ADD PK on 100GB relation - want to check the progress. But until it is built I haven't it in pg_catalog for other transactions, so can't just select it.
I tried find ./base/14673648/ -ctime 1 also -mtime - hundreds of files, an dthen I thought - why do I think it has created a filenode?.. Just because it ate some space.
So forgive my ignorance and advise - how do I check the size on PK being created so far?
Update: I can sum ./base/pgsql_tmp/pgsql_tmpPID.N. where PID is pid of session that creats PK as per docs:
Temporary files (for operations such as sorting more data than can fit
in memory) are created within PGDATA/base/pgsql_tmp, or within a
pgsql_tmp subdirectory of a tablespace directory if a tablespace other
than pg_default is specified for them. The name of a temporary file
has the form pgsql_tmpPPP.NNN, where PPP is the PID of the owning
backend and NNN distinguishes different temporary files of that
backend.
New question: How can I get it from pg_catalog?
pondstats=# select pg_size_pretty(temp_bytes) from pg_stat_database where datid = 14673648;
pg_size_pretty
----------------
89 GB
(1 row)
shows the sum of all temp files, not per relation
A primary key is implemented with a unique index, and that has files in the data directory.
Unfortunately there is no way to check the progress of index creation (unless you know your way around the source and attach to the backend with a debugger).
You only need to concentrate on relation files are do not in the output of
SELECT relfilenode FROM pg_class
WHERE relfilenode <> 0
UNION
SELECT pg_relation_filenode(oid) FROM pg_class
WHERE pg_relation_filenode(oid) IS NOT NULL;
Once you know which file belongs to your index-in-creation (it should be growing fast, unless there is a lock blocking the statement) you can start guessing how long it has to go by comparing it to files belonging to a comparable index on a comparable table.
All pretty hand-wavy, I'm afraid.

default tablespace for schema in DB2

We have got some DDLs for our schema. These DDLs create tables and indexes in a given tablespace. Something like:
CREATE TABLE mySchema.myTable
(
someField1 CHAR(2) NOT NULL ,
someField2 VARCHAR(70) NOT NULL
)
IN MY_TBSPC
INDEX IN MY_TBSPC;
We want to reuse this DDLs to run some integration tests using APACHE DERBY. The problem is that such syntax is not accepted by DERBY. Is there any way to define a kind of default tablespace for tables and indexes, so we can remove this 'IN TABLESPACE' statements.
There is no deterministic way of defining a "default" tablespace in DB2 (I'm assuming we're dealing with DB2 for LUW here). If the tablespaces are not explicitly indicated in the CREATE TABLE statement, the database manager will pick for table data the first tablespace with the suitable page size that you are authorized to use, and indexes will be stored in the same tablespace as data.
This means that if you only have one user tablespace it will always be used for both data and indexes, so in a way it becomes the default. However, if you have more than one tablespace with different page sizes you may end up with tables (and their indexes) in different tablespaces.

Where will the tablespace be stored?

I am creating a table with tablespace:
CREATE TABLE SALARY.....
IN ACCOUNTING INDEX IN ACCOUNT_IDX
Where will the Accounting and Account_IDX be created?
The script that you have above will create SALARY in the ACCOUNTING tablespace, with indexes for that table in ACCOUNT_IDX tablespace.
The ACCOUNTING and ACCOUNT_IDX tablespaces need to be created in a separate script that has CREATE TABLESPACE statements.
If you look at the syntax for CREATE TABLESPACE, the USING part of the statement will tell DB2 where to put the files for the tablespace.
DB2 Create Tablespace Reference