Difference between oid and relfilenode - postgresql

I am reading Internals of postgreSQL chp 1 and I am unable to understand the difference between object identifier and relfilenode.
Tables and indexes as database objects are internally managed by individual OIDs, while those data files are managed by the variable, relfilenode. The relfilenode values of tables and indexes basically but not always match the respective OIDs
I get that both these are the attributes of the system catalog 'pg_class' and OID can be thought of as the primary key of the table, so what is the purpose of relfilenode and how is it different from OID?

relfilenode is the prefix for the name of the files that make up the table. Initially it is identical to the immutable object ID (oid), but SQL statements that rewrite the table will modify it (for example VACUUM (FULL), CLUSTER, TRUNCATE or the variants of ALTER TABLE that rewrite the table).

Related

How to get the describe tables from the Redshift and ALTER it

I have create a redshift cluster and created a db inside.
My schema is new_schema
I have created 2 tables inside two tables inside table1, table2
My Question.
I want to list the datatypes of table1
I need to change the datatype of description which is inside the table1 which is of VARCHAR to TEXT
I have tried to list the datatypes of table1 with below query but nothing listing
SELECT * FROM PG_TABLE_DEF WHERE schemaname = 'new_schema';
A few possibilities as to why you are not seeing the expected results. Most likely is that new_schema isn't in your search_path. Pg_table_info only return info for tables in your search_path - see: https://docs.aws.amazon.com/redshift/latest/dg/r_PG_TABLE_DEF.html
Another possibility is that the tables have no data rows (no blocks assigned) and this can lead to incomplete info from some system tables.
Another possibility is that the tables were not committed by the creating session and being checked by a different session. Since you say that you are creating a new db this comes to mind.
Are the tables visible in svv_table_info?
Also the premise of changing varchar to text is a bit off. From https://docs.aws.amazon.com/redshift/latest/dg/r_Character_types.html#r_Character_types-text-and-bpchar-types
You can create an Amazon Redshift table with a TEXT column, but it is
converted to a VARCHAR(256) column that accepts variable-length values
with a maximum of 256 characters.
So it seems like the objective you are trying to achieve is a bit off.

Can two temporary tables with the same name exist in separate queries

I was wondering, if it is possible to have two temp tables with the same name in two separate queries without them conflicting when called upon later in the queries.
Query 1: Create Temp Table Tmp1 as ...
Query 2: Create Temp Table Tmp1 as ...
Query 1: Do something with Tmp1 ...
I am wondering if postgresql distinguishes between those two tables, maybe through addressing them as Query1.Tmp1 and Query2.Tmp1
Each connection to the database gets its own special temporary schema name, and temp tables are created in that schema. So there will not be any conflict between concurrent queries from separate connections, even if the tables have the same names. https://dba.stackexchange.com/a/5237 for more info
The PostgreSQL docs for creating tables states:
Temporary tables exist in a special schema, so a schema name cannot be given when creating a temporary table.

default tablespace for schema in DB2

We have got some DDLs for our schema. These DDLs create tables and indexes in a given tablespace. Something like:
CREATE TABLE mySchema.myTable
(
someField1 CHAR(2) NOT NULL ,
someField2 VARCHAR(70) NOT NULL
)
IN MY_TBSPC
INDEX IN MY_TBSPC;
We want to reuse this DDLs to run some integration tests using APACHE DERBY. The problem is that such syntax is not accepted by DERBY. Is there any way to define a kind of default tablespace for tables and indexes, so we can remove this 'IN TABLESPACE' statements.
There is no deterministic way of defining a "default" tablespace in DB2 (I'm assuming we're dealing with DB2 for LUW here). If the tablespaces are not explicitly indicated in the CREATE TABLE statement, the database manager will pick for table data the first tablespace with the suitable page size that you are authorized to use, and indexes will be stored in the same tablespace as data.
This means that if you only have one user tablespace it will always be used for both data and indexes, so in a way it becomes the default. However, if you have more than one tablespace with different page sizes you may end up with tables (and their indexes) in different tablespaces.

At what level do Postgres index names need to be unique?

In Microsoft SQL Server and MySQL, index names need to unique within the table, but not within the database. This doesn't seem to be the case for PostgreSQL.
Here's what I'm doing: I made a copy of a table using CREATE TABLE new_table AS SELECT * FROM old_table etc and need to re-create the indexes.
Running a query like CREATE INDEX idx_column_name ON new_table USING GIST(column_name) causes ERROR: relation "idx_column_name" already exists
What's going on here?
Indexes and tables (and views, and sequences, and...) are stored in the pg_class catalog, and they're unique per schema due to a unique key on it:
# \d pg_class
Table "pg_catalog.pg_class"
Column | Type | Modifiers
----------------+-----------+-----------
relname | name | not null
relnamespace | oid | not null
...
Indexes:
"pg_class_oid_index" UNIQUE, btree (oid)
"pg_class_relname_nsp_index" UNIQUE, btree (relname, relnamespace)
Per #wildplasser's comment, you can omit the name when creating the index, and PG will assign a unique name automatically.
Names are unique within the schema. A schema is basically a namespace for {tables,constraints}, (and indexes, functions,etc).
cross-schema-constraints are allowed
Indexes share their namespace ( :=schema) with tables. (for Postgres: an index is a table).
(IIRC) the SQL standard does not define indexes; use constraints whenever you can (The GIST index in the question is probably an exception)
Ergo You'll need to invent another name.
or omit it: the system can invent a name if you dont supply one.
The downside of this: you can create multipe indices with the same definition (their names will be suffixed with _1, _2, IIRC)

How to add a sort key to an existing table in AWS Redshift

In AWS Redshift, I want to add a sort key to a table that is already created. Is there any command which can add a column and use it as sort key?
UPDATE:
Amazon Redshift now enables users to add and change sort keys of existing Redshift tables without having to re-create the table. The new capability simplifies user experience in maintaining the optimal sort order in Redshift to achieve high performance as their query patterns evolve and do it without interrupting the access to the tables.
source: https://aws.amazon.com/about-aws/whats-new/2019/11/amazon-redshift-supports-changing-table-sort-keys-dynamically/
At the moment I think its not possible (hopefully that will change in the future). In the past when I ran into this kind of situation I created a new table and copied the data from the old one into it.
from http://docs.aws.amazon.com/redshift/latest/dg/r_ALTER_TABLE.html:
ADD [ COLUMN ] column_name
Adds a column with the specified name to the table. You can add only one column in each ALTER TABLE statement.
You cannot add a column that is the distribution key (DISTKEY) or a sort key (SORTKEY) of the table.
You cannot use an ALTER TABLE ADD COLUMN command to modify the following table and column attributes:
UNIQUE
PRIMARY KEY
REFERENCES (foreign key)
IDENTITY
The maximum column name length is 127 characters; longer names are truncated to 127 characters. The maximum number of columns you can define in a single table is 1,600.
As Yaniv Kessler mentioned, it's not possible to add or change distkey and sort key after creating a table, and you have to recreate a table and copy all data to the new table.
You can use the following SQL format to recreate a table with a new design.
ALTER TABLE test_table RENAME TO old_test_table;
CREATE TABLE new_test_table([new table columns]);
INSERT INTO new_test_table (SELECT * FROM old_test_table);
ALTER TABLE new_test_table RENAME TO test_table;
DROP TABLE old_test_table;
In my experience, this SQL is used for not only changing distkey and sortkey, but also setting the encoding(compression) type.
To add to Yaniv's answer, the ideal way to do this is probably using the CREATE TABLE AS command. You can specify the distkey and sortkey explicitly. I.e.
CREATE TABLE test_table_with_dist
distkey(field)
sortkey(sortfield)
AS
select * from test_table
Additional examples:
http://docs.aws.amazon.com/redshift/latest/dg/r_CTAS_examples.html
EDIT
I've noticed that this method doesn't preserve encoding. Redshift only automatically encodes during a copy statement. If this is a persistent table you should redefine the table and specify the encoding.
create table test_table_with_dist(
field1 varchar encode row distkey
field2 timestam pencode delta sortkey);
insert into test_table select * from test_table;
You can figure out which encoding to use by running analyze compression test_table;
AWS now allows you to add both sortkeys and distkeys without having to recreate tables:
TO add a sortkey (or alter a sortkey):
ALTER TABLE data.engagements_bot_free_raw
ALTER SORTKEY (id)
To alter a distkey or add a distkey:
ALTER TABLE data.engagements_bot_free_raw
ALTER DISTKEY id
Interestingly, the paranthesis are mandatory on SORTKEY, but not on DISTKEY.
You still cannot inplace change the encoding of a table - that still requires the solutions where you must recreate tables.
I followed this approach for adding the sort columns to my table table_transactons its more or less same approach only less number of commands.
alter table table_transactions rename to table_transactions_backup;
create table table_transactions compound sortkey(key1, key2, key3, key4) as select * from table_transactions_backup;
drop table table_transactions_backup;
Catching this query a bit late.
I find that using 1=1 the best way to create and replicate data into another table in redshift
eg:
CREATE TABLE NEWTABLE AS SELECT * FROM OLDTABLE WHERE 1=1;
then you can drop the OLDTABLE after verifying that the data has been copied
(if you replace 1=1 with 1=2, it copies only the structure - which is good for creating staging tables)
it is now possible to alter a sort kay:
Amazon Redshift now supports changing table sort keys dynamically
Amazon Redshift now enables users to add and change sort keys of existing Redshift tables without having to re-create the table. The new capability simplifies user experience in maintaining the optimal sort order in Redshift to achieve high performance as their query patterns evolve and do it without interrupting the access to the tables.
Customers when creating Redshift tables can optionally specify one or more table columns as sort keys. The sort keys are used to maintain the sort order of the Redshift tables and allows the query engine to achieve high performance by reducing the amount of data to read from disk and to save on storage with better compression. Currently Redshift customers who desire to change the sort keys after the initial table creation will need to re-create the table with new sort key definitions.
With the new ALTER SORT KEY command, users can dynamically change the Redshift table sort keys as needed. Redshift will take care of adjusting data layout behind the scenes and table remains available for users to query. Users can modify sort keys for a given table as many times as needed and they can alter sort keys for multiple tables simultaneously.
For more information ALTER SORT KEY, please refer to the documentation.
documentation
as for the documentation itself:
ALTER DISTKEY column_name or ALTER DISTSTYLE KEY DISTKEY column_name A
clause that changes the column used as the distribution key of a
table. Consider the following:
VACUUM and ALTER DISTKEY cannot run concurrently on the same table.
If VACUUM is already running, then ALTER DISTKEY returns an error.
If ALTER DISTKEY is running, then background vacuum doesn't start on a table.
If ALTER DISTKEY is running, then foreground vacuum returns an error.
You can only run one ALTER DISTKEY command on a table at a time.
The ALTER DISTKEY command is not supported for tables with interleaved sort keys.
When specifying DISTSTYLE KEY, the data is distributed by the values in the DISTKEY column. For more information about DISTSTYLE, see CREATE TABLE.
ALTER [COMPOUND] SORTKEY ( column_name [,...] ) A clause that changes
or adds the sort key used for a table. Consider the following:
You can define a maximum of 400 columns for a sort key per table.
You can only alter a compound sort key. You can't alter an interleaved sort key.
When data is loaded into a table, the data is loaded in the order of the sort key. When you alter the sort key, Amazon Redshift reorders the data. For more information about SORTKEY, see CREATE TABLE.
According to the updated documentation it is now possible to change a sort key type with:
ALTER [COMPOUND] SORTKEY ( column_name [,...] )
For reference (https://docs.aws.amazon.com/redshift/latest/dg/r_ALTER_TABLE.html):
"You can alter an interleaved sort key to a compound sort key or no sort key. However, you can't alter a compound sort key to an interleaved sort key."
ALTER TABLE table_name ALTER SORTKEY (sortKey1, sortKey2 ...etc)