Is it possible to dis-connect and re-connect a POSTGRES tablespace and all the associated objects within that tablespace?
I have a Postgres database with two tablespaces, one on a high-speed SSD drive (I've named this FASTSPACE) , and the other on a slower, traditional magnetic HDD (named SLOWSPACE). The slower tablespace is reserved for large volumes of historic data which is rarely accessed.
Is it possible to temporarily disconnect SLOWSPACE, with the intention of reconnecting it at a later date? the DROP TABLESPACE documentation can only be used once all database objects within it have been dropped.
I'm aware that I can backup all the tables in SLOWSPACE, then delete them, and then DROP the tablespace, however this will take time (there are several Terabytes of data). If I then need the archived data again I'll have create a new version of the SLOWSPACE tablespace from blank, then re-create all the objects from the backups. Again, this will take time.
Is there any way of temporarily disconnecting SLOWSPACE from the database - whilst still leaving the rest of the database up and running?
Update - happy to accept Franks Heikens two letter answer - 'no'
Related
We are using pgbackrest to backup our database to Amazon S3. We do full backups once a week and an incremental backup every other day.
Size of our database is around 1TB, a full backup is around 600GB and an incremental backup is also around 400GB!
We found out that even read access (pure select statements) on the database has the effect that the underlying data files (in /usr/local/pgsql/data/base/xxxxxx) change. This results in large incremental backups and also in very large storage (costs) on Amazon S3.
Usually the files with low index names (e.g. 391089.1) change on read access.
On an update, we see changes in one or more files - the index could correlate to the age of the row in the table.
Some more facts:
Postgres version 13.1
Database is running in docker container (docker version 20.10.0)
OS is CentOS 7
We see the phenomenon on multiple servers.
Can someone explain, why postgresql changes data files on pure read access?
We tested on a pure database without any other resources accessing the database.
This is normal. Some cases I can think of right away are:
a SELECT or other SQL statement setting a hint bit
This is a shortcut for subsequent statements that access the data, so they don't have t consult the commit log any more.
a SELECT ... FOR UPDATE writing a row lock
autovacuum removing dead row versions
These are leftovers from DELETE or UPDATE.
autovacuum freezing old visible row versions
This is necessary to prevent data corruption if the transaction ID counter wraps around.
The only way to fairly reliably prevent PostgreSQL from modifying a table in the future is:
never perform an INSERT, UPDATE or DELETE on it
run VACUUM (FREEZE) on the table and make sure that there are no concurrent transactions
I have installed PostgreSQL 13.1 on two computers, a laptop and desktop.
I created database using a Tablespace on a portable SSD drive under F:\MyProject\DB\PG_13_202007201\20350. So I expect all the database files to be in there somewhere.
I want to work on the database on the SSD drive which is fine on the laptop that created the database, but when I go to my desktop I can't figure out how to attach (SQL Server style) the existing database/tablespace and continue working on it.
Is there a way to work like this with Postgresql?
You cannot do that.
A tablespace contains just the data files, but essential information is stored in the metadata (the table name, the columns and their data type, constraints, ...) and elsewhere (for example, the commit log determines which table entries you can see and which ones you cannot).
In short, a tablespace is not self-contained, and you cannot use a tablespace from one database cluster with another database cluster.
If you want to move data between databases, use pg_dump.
I am wanting to know how an AWS postgres RDS does replication where I rename schemas to "swap" them within the read/write instance of the database.
Does it replicate this action to the read-replicas by sending on the "alter schema" rename commands I gave to my read/write instance? Or after my renames, does it see wholly different sets of data in the schemas and do a whole new copy of each out to the read-replicas?
For example...
In my RDS instance I have a read/write instance of "my_mega_database" which I want to create read-replicas of for my applications to connect to.
Typically, in "my_mega_database" there are two schemas "my_data" and "my_data_old", whereby "my_data" contains data that was delivered last night, and "my_data_old" contains data from the previous night. Each contains many tables and huge amounts of data.
If I were to do the following...
ALTER SCHEMA my_data_old RENAME TO my_data_tmp;
ALTER SCHEMA my_data RENAME TO my_data_old;
ALTER SCHEMA my_data_tmp RENAME TO my_data;
... I have affectively swapped these around.
My expectation is that these actions are replicated via the postgres WAL (ie: it sends the rename commands out to the replicas) and AWS RDS replication won't try and waste time copying huge amounts of data all over the place.
Is this correct?
(Speaking about PostgreSQL here, but RDS is probably similar.)
Renaming a schema (or any other object) is a small update in a catalog table, and no data are moved. Internally PostgreSQL uses only the numeric object ID, which stays the same.
You might wrap the three statements in a transaction to make the whole magic atomic.
The same is true on the standby, it is a trivial (meta)data modification.
The only thing that might be a problem are concurrent sessions holding locks.
I have a mobile/web project, using pg9.3 as database, and linux as server.
The data won't be huge, but as time goes on, the data increase.
For long term considering, I want to know about:
Questions:
1. Is it necessary for me to create tablespace for my database, or just use the default one?
2. If I create new tablespace, what is the proper location on linux to create the folder, and why?
3. If I don't create it now, and wait until I have to, till then, will it be easy for me to migrate db with data to new tablespace?
Just use the default tablespace, do not create new tablespaces. Tablespaces are only useful if you have multiple physical disks, so you can define which data is stored on which physical disk. The directory where your data is located is not that important for the workings of postgres, so if you only have one disk it is useless to use tablespaces
Should your data grow beyond the capacity of 1 disk, you will have to perform a full data migration anyway to move it to another physical disk, so you can configure tablespaces at that time
The idea behind defining which data is located on which disk (with tablespaces) is that you can do things like putting a big table which is hardly used on a slow disk, and putting this very intensively used table on a separated faster disk. But I assume you're not there yet, so don't over complicate things
Installed postgres 9.1 in both the machine.
Initially the DB size is 7052 MB then i used the following command for copy to another server.
pg_dump -C dbname | bzip2 | ssh remoteuser#remotehost "bunzip2 | psql dbname"
After successfully copies, In destination machine i check size it shows 6653 MB.
then i checked for table count its same.
Has there been data loss? Is there missing data?
Note:
Two machines have same hardware and software configuration.
i used:
SELECT pg_size_pretty(pg_database_size('dbname'));
One of the PostgreSQL's most sophisticated features is so called Multi-Version Concurrency Control (MVCC), a standard technique for avoiding conflicts between reads and writes of the same object in database. MVCC guarantees that each transaction sees a consistent view of the database by reading non-current data for objects modified by concurrent transactions. Thanks to MVCC, PostgreSQL has great scalability, a robust hot backup tool and many other nice features comparable to the most advanced commercial databases.
Unfortunately, there is one downside to MVCC, the databases tend to grow over time and sometimes it can be a problem. In recent versions of PostgreSQL there is a separate server process called the autovacuum daemon (pg_autovacuum), whose purpose is to keep the database size reasonable. It does that by trying to recover reusable chunks of the database files. Still, there are many scenarios that will force the database to grow, even if the amount of the useful data in it doesn't really change. That happens typically if you have lots of UPDATE and/or DELETE statements in the applications that are using the database.
When you do a COPY, you recover extraneous space and so your copied DB appears smaller.
That looks normal. Databases are often smaller after restore, because a newly created b-tree index is more compact than one that's been progressively built by inserts. Additionally, UPDATEs and DELETEs leave empty space in the tables.
So you have nothing to worry about. You'll find that if you diff an SQL dump from the old DB and a dump taken from the just-restored DB, they'll be the same except for comments.