How to get the describe tables from the Redshift and ALTER it - amazon-redshift

I have create a redshift cluster and created a db inside.
My schema is new_schema
I have created 2 tables inside two tables inside table1, table2
My Question.
I want to list the datatypes of table1
I need to change the datatype of description which is inside the table1 which is of VARCHAR to TEXT
I have tried to list the datatypes of table1 with below query but nothing listing
SELECT * FROM PG_TABLE_DEF WHERE schemaname = 'new_schema';

A few possibilities as to why you are not seeing the expected results. Most likely is that new_schema isn't in your search_path. Pg_table_info only return info for tables in your search_path - see: https://docs.aws.amazon.com/redshift/latest/dg/r_PG_TABLE_DEF.html
Another possibility is that the tables have no data rows (no blocks assigned) and this can lead to incomplete info from some system tables.
Another possibility is that the tables were not committed by the creating session and being checked by a different session. Since you say that you are creating a new db this comes to mind.
Are the tables visible in svv_table_info?
Also the premise of changing varchar to text is a bit off. From https://docs.aws.amazon.com/redshift/latest/dg/r_Character_types.html#r_Character_types-text-and-bpchar-types
You can create an Amazon Redshift table with a TEXT column, but it is
converted to a VARCHAR(256) column that accepts variable-length values
with a maximum of 256 characters.
So it seems like the objective you are trying to achieve is a bit off.

Related

Difference between oid and relfilenode

I am reading Internals of postgreSQL chp 1 and I am unable to understand the difference between object identifier and relfilenode.
Tables and indexes as database objects are internally managed by individual OIDs, while those data files are managed by the variable, relfilenode. The relfilenode values of tables and indexes basically but not always match the respective OIDs
I get that both these are the attributes of the system catalog 'pg_class' and OID can be thought of as the primary key of the table, so what is the purpose of relfilenode and how is it different from OID?
relfilenode is the prefix for the name of the files that make up the table. Initially it is identical to the immutable object ID (oid), but SQL statements that rewrite the table will modify it (for example VACUUM (FULL), CLUSTER, TRUNCATE or the variants of ALTER TABLE that rewrite the table).

pg_largeobject huge, but no tables have OID column type

postrgresql noob, PG 9.4.x, no access to application code, developers, anyone knowledgeable about it
User database CT has 427GB pg_largeobject (PGLOB) table, next largest table is 500ish MB.
Per this post (Does Postgresql use PGLOB internally?) a very reputable member said postgresql does not use PGLOB internally.
I have reviewed the schema of all user tables in the database, and none of them are of type OID (or lo) - which is the value used for PGLOB rows to tie the collection of blob chunks back to a referencing table row. I think this means I cannot use vacuumlo (vacuumlo) to delete orphaned PGLOB rows because that utility searches user objects for those two data types in user tables.
I HAVE identified a table with an integer field type that has int values that match LOID values in PGLOB. This seems to indicate that the developers somehow got their blobs into PGLOB using the integer value stored in a user table row.
QUESTION: Is that last statement possible?
A) If it is not, what could be adding all this data to PGLOB table?
B) If it is possible, is there a way I can programatically search ALL tables for integer values that might represent rows in PGLOB?
NEED: I DESPERATELY need to reduce the size of the PGLOB table, as we are running out of disk space. And no, we cannot add space to existing disk per admin. So I somehow need to determine if there are LOID values in PGLOB that do NOT exist in ANY user tables as integer-type fields and then run lo_unlink to remove the rows. This could get me more usable 8K pages in the table.
BTW, I have also run pg_freespace on PGLOB, and it identified that most of the pages in PGLOB did not contain enough space in which to insert another blob chunk.
THANKS FOR THE ASSISTANCE!
Not really an answer but thinking out loud:
As you found all large objects are stored in a single table. The oid field you refer to is something you add to a table so you can have a pointer to a particular LO oid in pg_largeobject. That being said there is nothing compelling you to store that info in a table, you can just create LO's in pg_largeobject. From the looks of it, and just a guess, the developers stored the oid's as integer's with the intent of doing integer::oid to get a particular LO back as needed. I would look at other information is stored in that table to see if helps determine what the LO's are for?
Also you might join the integer::oid values to the oid(loid) in pg_catalog to see if that table accounts for all of them?
I was able to do a detailed analysis of all user tables in the database, find all columns that contained numeric data with no decimals, and then do a a query from pg_largeobject with a NOT EXISTS clause for every table matching pglob.loid against the appropriate field(s) in the user tables.
I found 25794 LOIDs that could be DELETEd from the PGLOB table, totaling 3.4M rows.
select distinct loid
into OrphanedBLOBs
from pg_largeobject l
where NOT exists (select * from tbl1 cn where cn.noteid = l.loid)
and not exists (select * from tbl1 d where d.document = l.loid)
and not exists (select * from tbl1 d where d.reportid = l.loid)
I used that table to execute lo_unlink(loid) for each of the LOIDs.

PostgreSQL shows wrong sequence details in the table's information schema

Recently I saw a strange scenario with my PostgreSQL DB. The information schema of my database is showing a different sequence name than the one actually allocated for the column of my table.
The issue is:
I have a table tab_1
id name
1 emp1
2 emp2
3 emp3
Previously the id column (integer) of the table was an auto generated field where the sequence number was generated at run time via JPA. (Sequence name: tab_1_seq)
We made a change and updated the table's column id to bigserial and the sequence is maintained in the column level (allocated new sequence: tab_1_temp_seq) not handled by the JPA anymore.
After this change everything was working fine for few months and after that we faced an error - "the sequence "tab_1_temp_seq" is not yet defined in this session"
On analyzing the issue I found out that there is a mismatch between the sequences allocated for the table.
In the table structure, we where shown the sequence as tab_1_temp_seq and in the information_schema the table was allocated with the old sequence - tab_1_seq.
I am not sure what has really triggered this to happen, as we are not managing our database system. If you have faced any issues like this, kindly let me know its root cause.
Queries:
SELECT table_name, column_name, column_default from information_schema.columns where table_name = ‘tab_1’;
result :
table_name column_name column_default
tab_1 id nextval('tab_1_seq::regclass')
Below are the details found in the table structure/properties:
id nextval('tab_1_temp_seq::regclass')
name varChar
Perhaps you are suffering from data corruption, but it is most likely that you are suffering from bad tools to visualize your database objects. Whatever program shows you the “table structure/properties” might be confused.
To find out the truth (which DEFAULT value PostgreSQL uses), run:
SELECT pg_get_expr(adbin, adrelid)
FROM pg_attrdef
WHERE adrelid = 'tab1'::regclass;
This is also what information_schema.columns will show, but I added the naked query for clarity.
This DEFAULT value will be used whenever the INSERT statement either doesn't specify the id column or fills it with the special value DEFAULT.
Perhaps the confusion is also caused by different programs that may set default values in their way. The way I showed above is PostgreSQL's way, but nothing can keep a third-party tool from using its own sequence to filling the id.

Alter the column type over several tables

In a PostgreSQL db I'm working on, half of the tables have one particular column, always named the same, that is of type varchar(5). The size became a bit too restricting and I want to change it to varchar(10).
The number of tables in my particular case is actually very manageable to do it by hand. But I was wondering how one could script this with a query for larger dbs. It generally should be possible in just a few steps.
Identify all the tables in the schema, then (?) filter by condition if column present.
Create ALTER TABLE statements for each table found
I have some idea about how to write a query that identifies all tables in the schema. But I wouldn't know how to filter them. And if I didn't filter them, I assume the generated alter table statements would break.
Would be great if someone could share their knowledge on this.
Thanks to Abelisto for providing some guidance. Eventually, this is how I did it.
First, I created a query that in turn creates the ALTER TABLE statements. MyDB and MyColumn need to reflect actual values.
SELECT
'ALTER TABLE '||columns.table_name||' ALTER COLUMN '||MyColumn||' TYPE varchar(20);'
FROM
information_schema.columns
WHERE
columns.table_catalog = 'MyDB' AND
columns.table_schema = 'public' AND
columns.column_name = 'MyColumn';
Then it was just a matter of executing the output as a new query. All done.

What is the difference between these two T-SQL statements?

In a SSIS package at work there are some SQL tasks that create staging tables for holding import data. All the statements take this form:
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'dbo.tbNewTable') AND type in (N'U'))
BEGIN
TRUNCATE TABLE dbo.tbNewTable
END
ELSE
BEGIN
CREATE TABLE dbo.tbNewTable (
ColumnA VARCHAR(10) NULL,
ColumnB VARCHAR(10) NULL,
ColumnC INT NULL
) ON PRIMARY
END
In Itzik Ben-Gan's T-SQL Fundamentals I see a different form of statement for creating a table:
IF OBJECT_ID('dbo.tbNewTable', 'U') IS NOT NULL
BEGIN
DROP TABLE dbo.tbNewTable
END
CREATE TABLE dbo.tbNewTable (
ColumnA VARCHAR(10) NULL,
ColumnB VARCHAR(10) NULL,
ColumnC INT NULL
) ON PRIMARY
Each of these appears to do the same thing. After execution, there will be a empty table called tbNewTable in the dbo schema.
Are there any practical or theoretical differences between the two? What implications might they have?
The first one assumes that if the table exists, it has the same columns as those it would create. The second one does not make that assumption. So if a table with that name happened to exist and had a different set of columns, the two would have very different results.
The first will not actually DROP the table -- it merely TRUNCATES all the data in said table. Hence why the CREATE is guarded.
Thus the form with the DROP will allow the subsequent CREATE to change the schema (when the new table is created) even if tbNewTable previously existed.
Because the DROP/CREATE alters the database schema it may not also be allowed in all cases. For instance, a view created with a SCHEMABINDING will prevent the table from being dropped. (This also hold true for more general FK relationships, should any exist.)
...when SCHEMABINDING is specified, the base table or tables cannot be modified in a way that would affect the view definition.
The TRUNCATE should be marginally faster in one of those constant "don't care" ways: there should be no performance consideration given to one over the other.
There are also permission differences. TRUNCATE only requires the ALTER permission.
The minimum permission required is ALTER on table_name. TRUNCATE TABLE permissions default to the table owner...
Happy coding.
These are very different..
The first does an equality check on the sys.objects system table and looks to see if there is a matching table name. If so, it truncates the table. Basically removing all rows but maintaining the table structure itself - i.e. the actual table is never dropped.
In the second, the check to make sure that the table exists is implicitly done using the OBJECT_ID() method. If so, the table is dropped completely - rows and structure.
If you have a primary and foreign key constraint on the table, you'll certainly have issues dropping it completely... and if you have other tables that are linked to the table you are trying to 'truncate' you'll have issues there too, unless you have cascade deletion turned on.
I tend to dislike either construction in an SSIS package. I create the tables in a deployment script and I want the package to fail if one of the tables I use is missing later on because then something drastically wrong has happened and I want to investigate what before I try putting data anywhere.