I have postgres database running on Amazon EC2 instance. I have few tablespaces created for
some monthly tables, such that each table is on individual tablespace. To get the maximum performance, I have created each tablespace on individual amazon ebs volume.
I want to move some of this tables to different instance and database. I will explain it with one example.
Lets say.
I have EC2 instance A with postgres setup as explained above.
I have another Amazon instance B running and I installed postgres on it as well.
I want to create the same table structure for some of the tables present in A on B. I want to detach the volumes from instance A and attach it to instance B.
Also, I want to create tablespaces on instance B, which will point to the newly attached volumes.
And when I start up this newly created postgres, I expect to see the tables populated with data from those volumes(database).
finally I will delete those tables from A
I know I am being rusty in writing, but couldn't find a better way to ask the question.
Is something along these lines is possible? Are there any pointers for achieving something like this?
No.
The data in the tablespace directory is only the data. You also need the metadata that's in the tables in the pg_catalog schema, as well as the information from pg_clog and pg_xlog to access it.
If you want to move things across using volumes, you must move the entire installation at once (all the tablespaces, including pg_default). Otherwise, you need to use pg_dump/pg_restore to transfer the data over.
Related
I have installed PostgreSQL 13.1 on two computers, a laptop and desktop.
I created database using a Tablespace on a portable SSD drive under F:\MyProject\DB\PG_13_202007201\20350. So I expect all the database files to be in there somewhere.
I want to work on the database on the SSD drive which is fine on the laptop that created the database, but when I go to my desktop I can't figure out how to attach (SQL Server style) the existing database/tablespace and continue working on it.
Is there a way to work like this with Postgresql?
You cannot do that.
A tablespace contains just the data files, but essential information is stored in the metadata (the table name, the columns and their data type, constraints, ...) and elsewhere (for example, the commit log determines which table entries you can see and which ones you cannot).
In short, a tablespace is not self-contained, and you cannot use a tablespace from one database cluster with another database cluster.
If you want to move data between databases, use pg_dump.
I have one postgres database deployed in kubernetes attached to a pvc [with RWX access mode]. What is the right way to update (create table) the database through my CI/CD instead of logging in to the pod and running queries [Without deleting the pvc] ?
My understanding is that the background reason for your question is how to deploy DB structure changes DB onto production with minimal downtime. For that I'd go with a Blue Green Deployments 1 .
After your comments I assume that you already have running instance of PostgreSQL and would like to modify the content of the DB by altering file structure directly on a "disk" (in this case PVC).
Modifying data structure directly on disk is not the best idea if we are speaking about data integrity, etc.
The reasons for that statement are explained in this article 2. It describes how exactly postgreSQL stores data on disk.
PostgreSQL (by default) writes blocks of data (what PostgreSQL calls pages) to disk in 8k chunks.
Additionally, there is a relation between table and file_path, so postgresql knows which exactly file stores which table.
SELECT pg_relation_filepath('test_data');
pg_relation_filepath
----------------------
base/20886/186770
In this example the file /database/base/20866/186770 contains the actual data for the table test_data.
What is the right way to update the database instead of logging in to the pod and running queries
However, if you sure that you have complete set of files for the DB to operate (like the one you are using during pg_dump / pg_restore) you can try placing that data on another PVC and recreate pod, however that will still result in a downtime.
Hope that helps.
I am wanting to know how an AWS postgres RDS does replication where I rename schemas to "swap" them within the read/write instance of the database.
Does it replicate this action to the read-replicas by sending on the "alter schema" rename commands I gave to my read/write instance? Or after my renames, does it see wholly different sets of data in the schemas and do a whole new copy of each out to the read-replicas?
For example...
In my RDS instance I have a read/write instance of "my_mega_database" which I want to create read-replicas of for my applications to connect to.
Typically, in "my_mega_database" there are two schemas "my_data" and "my_data_old", whereby "my_data" contains data that was delivered last night, and "my_data_old" contains data from the previous night. Each contains many tables and huge amounts of data.
If I were to do the following...
ALTER SCHEMA my_data_old RENAME TO my_data_tmp;
ALTER SCHEMA my_data RENAME TO my_data_old;
ALTER SCHEMA my_data_tmp RENAME TO my_data;
... I have affectively swapped these around.
My expectation is that these actions are replicated via the postgres WAL (ie: it sends the rename commands out to the replicas) and AWS RDS replication won't try and waste time copying huge amounts of data all over the place.
Is this correct?
(Speaking about PostgreSQL here, but RDS is probably similar.)
Renaming a schema (or any other object) is a small update in a catalog table, and no data are moved. Internally PostgreSQL uses only the numeric object ID, which stays the same.
You might wrap the three statements in a transaction to make the whole magic atomic.
The same is true on the standby, it is a trivial (meta)data modification.
The only thing that might be a problem are concurrent sessions holding locks.
What is the hierarchy of Database related objects in postgres SQL?
Should it be like, table space must be created at instance level unlike other RDBMS(where we have table space under database).
If so we create the table space at instance level, what is the purpose of database? and what is difference between table space and database on postgres server?
An instance (in PostgreSQL called cluster) is a data directory initialized with initdb with a PostgreSQL server process.
A tablespace is a directory outside the data directory where objects can also be stored. Tablespaces are useful for certain corner cases like distributing I/O or limiting space for a subset of the data.
A database is a container for objects with permissions, organized in schemas.
The difference is that tablespaces are a physical concept, it defines a space where the data are stored, while databases are a logical concept about how data are organized, what they mean, how they are related, who is allowed to access them and so on.
The two concepts are orthogonal.
A database can have tables in several tablespaces, and a tablespace can contain data from several databases.
Database is where you organize all your objects. Tablespace is just storage space for those object.
You can storage your db object in different Tablespace. For example one table is storage in a Tablespace in diskA but another Table use a Tablespace in diskB to improve the performance. Or maybe you need a tablespace for big tables and dont mind use a slow big HDD for those objects.
I have a mobile/web project, using pg9.3 as database, and linux as server.
The data won't be huge, but as time goes on, the data increase.
For long term considering, I want to know about:
Questions:
1. Is it necessary for me to create tablespace for my database, or just use the default one?
2. If I create new tablespace, what is the proper location on linux to create the folder, and why?
3. If I don't create it now, and wait until I have to, till then, will it be easy for me to migrate db with data to new tablespace?
Just use the default tablespace, do not create new tablespaces. Tablespaces are only useful if you have multiple physical disks, so you can define which data is stored on which physical disk. The directory where your data is located is not that important for the workings of postgres, so if you only have one disk it is useless to use tablespaces
Should your data grow beyond the capacity of 1 disk, you will have to perform a full data migration anyway to move it to another physical disk, so you can configure tablespaces at that time
The idea behind defining which data is located on which disk (with tablespaces) is that you can do things like putting a big table which is hardly used on a slow disk, and putting this very intensively used table on a separated faster disk. But I assume you're not there yet, so don't over complicate things