db2 alter column statement from char to varchar - db2

Does it possible using some db2 trick to alter a column from type char to type varchar trimming the white space on the right?
I know that is possible to alter the column type from char to varchar (to extends its size)
but db2 leaves the left white space on the right, so I need to issue an update statement after the alter table statement to trim the white space on the right.
But we have table also with 400 million of records and the update statement has an important cost in terms of time.
I ask this question also after I read the db2 documentation of alter table statement: seems that does exists nothing that allows me to change the type and trim right the values at the same time.

As far as I know it is not possible to do what you ask for. Have you considered export and load replace?

As you said, you can do the UPDATE after the ALTER. You have the spaces now. In a sense, you are simply making a small improvement to the data. It may take a while, but how much does it matter if that process chugs away for a while in the background?

Doing an export+load would have a significant downtime for the table.
If you use the alter to varchar and update approach you could minimize the log and lock amount by performing the update within a compound sql or stored procedure which commits after X rows updated.
This would still run quite long but locks won't be held that long.

Related

SQL Server Express 2008 10GB Size Limit

I am approaching the 10 GB limit that Express has on the primary database file.
The main problem appears to be some fixed length char(500) columns that are never near that length.
I have two tables with about 2 million rows between them. These two tables add up to about 8 GB of data with the remainder being spread over another 20 tables or so. These two tables each have 2 char(500) columns.
I am testing a way to convert these columns to varchar(500) and recover the trailing spaces.
I tried this:
Alter Table Test_MAILBACKUP_RECIPIENTS
Alter Column SMTP_address varchar(500)
GO
Alter Table Test_MAILBACKUP_RECIPIENTS
Alter Column EXDN_address varchar(500)
This quickly changed the column type but obviously didn’t recover the space.
The only way I can see to do this successfully is to:
Create a new table in tempdb with the varchar(500) columns,
Copy the information into the temp table trimming off the trailing spaces,
Drop the real table,
Recreate the real table with the new varchar(500) columns,
Copy the information back.
I’m open to other ideas here as I’ll have to take my application offline while this process completes?
Another thing I’m curious about is the primary key identity column.
This table has a Primary Key field set as an identity.
I know I have to use Set Identity_Insert on to allow the records to be inserted into the table and turn it off when I’m finished.
How will recreating a table affect new records being inserted into the table after I’m finished. Or is this just “Microsoft Magic” and I don’t need to worry about it?
The problem with you initial approach was that you converted the columns to varchar but didn't trim the existing whitespace (which is maintained after the conversion), after changing the data type of the columns to you should do:
update Test_MAILBACKUP_RECIPIENTS set
SMTP_address=rtrim(SMTP_address), EXDN_address=rtrim(EXDN_address)
This will eliminate all trailing spaces from you table, but note that the actual disk size will be the same, as SQL Server don't shrink automatically database files, it just mark that space as unused and available for other data.
You can use this script from another question to see the actual space used by data in the DB files:
Get size of all tables in database
Usually shrinking a database is not recommended but when there is a lot of difference between used space and disk size you can do it with dbcc shrinkdatabase:
dbcc shrinkdatabase (YourDatabase, 10) -- leaving 10% of free space for new data
OK I did a SQL backup, disabled the application and tried my script anyway.
I was shocked that it ran in under 2 minutes on my slow old server.
I re-enabled my application and it still works. (Yay)
Looking at the reported size of the table now it went from 1.4GB to 126Mb! So at least that has bought me some time.
(I have circled the Data size in KB)
Before
After
My next problem is the MailBackup table which also has two char(500) columns.
It is shown as 6.7GB.
I can't use the same approach as this table contains a FileStream column which has around 190gb of data and tempdb does not support FleStream as far as I know.
Looks like this might be worth a new question.

Implications of using ADD COLUMN on large dataset

Docs for Redshift say:
ALTER TABLE locks the table for reads and writes until the operation completes.
My question is:
Say I have a table with 500 million rows and I want to add a column. This sounds like a heavy operation that could lock the table for a long time - yes? Or is it actually a quick operation since Redshift is a columnar db? Or it depends if column is nullable / has default value?
I find that adding (and dropping) columns is a very fast operation even on tables with many billions of rows, regardless of whether there is a default value or it's just NULL.
As you suggest, I believe this is a feature of the it being a columnar database so the rest of the table is undisturbed. It simply creates empty (or nearly empty) column blocks for the new column on each node.
I added an integer column with a default to a table of around 65M rows in Redshift recently and it took about a second to process. This was on a dw2.large (SSD type) single node cluster.
Just remember you can only add a column to the end (right) of the table, you have to use temporary tables etc if you want to insert a column somewhere in the middle.
Personally I have seen rebuilding the table works best.
I do it in following ways
Create a new table N_OLD_TABLE table
Define the datatype/compression encoding in the new table
Insert data into N_OLD(old_columns) select(old_columns) from old_table Rename OLD_Table to OLD_TABLE_BKP
Rename N_OLD_TABLE to OLD_TABLE
This is a much faster process. Doesn't block any table and you always have a backup of old table incase anything goes wrong

ADD COLUMN with DEFAULT value to a huge table

I have a postgresql DB and a table with almost billion of rows.
when I try to add a new column with default value:
ALTER TABLE big_table
ADD COLUMN some_flag integer NOT NULL DEFAULT 0;
The transaction goes on for 30+ min .. and the DB logs starts to shoots warnings.
Any way to optimize the query ?
Besides doing it in batches (which will still take a while):
You could dump the table as COPY statements and write a script to edit the contents of the COPY statements to insert another column (COPY can be CSV IIRC).
Then you just reload your altered COPY dump and it should in theory be faster than the ALTER because COPY will not log transactions.
The other option is to turn off fsync while you run the command... just remember to turn it back on.
You can also do both of the above in batches.
Starting from PostgreSQL 11 this behaviour will change.
Waiting for PostgreSQL 11 – Fast ALTER TABLE ADD COLUMN with a non-NULL default:
So, for the longest time, when you did:
alter table x add column z text;
it was virtually instantaneous. Get a lock on table, add information about new column to system catalogs, and it's done.
But when you tried:
alter table x add column z text default 'some value';
then it took long time. How long it did depend on size of table.
This was because postgresql was actually rewriting the whole table, adding the column to each row, and filling it with default value.
"What happens if you want to set the column to NOT NULL also? Are we back to the slow version in that case or does this handle that as well?"
not null doesn’t change anything. it is a constraint for new rows. so adding a column with “not null default ‘xxx'” will be fast.
I'd consider creating the column without the default and manually updating the rows in batches with intermittent commits to apply the default.

How to avoid fragmented database storage by very often updates?

When I have the following table:
CREATE TABLE test
(
"id" integer NOT NULL,
"myval" text NOT NULL,
CONSTRAINT "test-id-pkey" PRIMARY KEY ("id")
)
When doing a lot of queries like the following:
UPDATE "test" set "myval" = "myval" || 'foobar' where "id" = 12345
Then the row myval will get larger and larger over time.
What will postgresql do? Where will it get the space from?
Can I avoid that postgresql needs more than one seek to read a particular myval-column?
Will postgresql do this automatically?
I know that normally I should try to normalize the data much more. But I need to read the value with one seek. Myval will enlarge by about 20 bytes with each update (that adds data). Some colums will have 1-2 updates, some 1000 updates.
Normally I would just use one new row instead of an update. But then selecting is getting slow.
So I came to the idea of denormalizing.
Change the FILLFACTOR of the table to create space for future updates. This can also be HOT updates because the text field doesn't have an index, to make the update faster and autovacuum overhead lower because HOT updates use a microvacuum. The CREATE TABLE statement has some information about the FILLFACTOR.
ALTER TABLE test SET (fillfactor = 70);
-- do a table rebuild to blow some space in your current table:
VACUUM FULL ANALYZE test;
-- start testing
The value 70 is not the perfect setting, it depends on your unique situation. Maybe you're fine with 90, it could also be 40 or something else.
This is related to this question about TEXT in PostgreSQL, or at least the answer is similar. PostgreSQL stores large columns away from the main table storage:
Very long values are also stored in background tables so that they do not interfere with rapid access to shorter column values.
So you can expect a TEXT (or BYTEA or large VARCHAR) column to always be stored away from the main table and something like SELECT id, myval FROM test WHERE id = 12345 will take two seeks to pull both columns off the disk (and more seeks to resolve their locations).
If your UPDATEs really are causing your SELECTs to slow down then perhaps you need to review your vacuuming strategy.

PostgreSQL v7.4 ALTER TABLE to change column

I have a need to change the length of CHAR columns in tables in a PostgreSQL v7.4 database. This version did not support the ability to directly change the column type or size using the ALTER TABLE statement. So, directly altering a column from a CHAR(10) to CHAR(20) for instance isn't possible (yeah, I know, "use varchars", but that's not an option in my current circumstance). Anyone have any advice/tricks on how to best accomplish this? My initial thoughts:
-- Save the table's data in a new "save" table.
CREATE TABLE save_data AS SELECT * FROM table_to_change;
-- Drop the columns from the first column to be changed on down.
ALTER TABLE table_to_change DROP column_name1; -- for each column starting with the first one that needs to be modified
ALTER TABLE table_to_change DROP column_name2;
...
-- Add the columns back, using the new size for the CHAR column
ALTER TABLE table_to_change ADD column_name1 CHAR(new_size); -- for each column dropped above
ALTER TABLE table_to_change ADD column_name2...
-- Copy the data bace from the "save" table
UPDATE table_to_change
SET column_name1=save_data.column_name1, -- for each column dropped/readded above
column_name2=save_date.column_name2,
...
FROM save_data
WHERE table_to_change.primary_key=save_data.primay_key;
Yuck! Hopefully there's a better way? Any suggestions appreciated. Thanks!
Not PostgreSQL, but in Oracle I have changed a column's type by:
Add a new column with a temporary name (ie: TMP_COL) and the new data type (ie: CHAR(20))
run an update query: UPDATE TBL SET TMP_COL = OLD_COL;
Drop OLD_COL
Rename TMP_COL to OLD_COL
I would dump the table contents to a flat file with COPY, drop the table, recreate it with the correct column setup, and then reload (with COPY again).
http://www.postgresql.org/docs/7.4/static/sql-copy.html
Is it acceptable to have downtime while performing this operation? Obviously what I've just described requires making the table unusable for a period of time, how long depends on the data size and hardware you're working with.
Edit: But COPY is quite a bit faster than INSERTs and UPDATEs. According to the docs you can make it even faster by using BINARY mode. BINARY makes it less compatible with other PGSQL installs but you won't care about that because you only want to load the data to the same instance that you dumped it from.
The best approach to your problem is to upgrade pg to something less archaic :)
Seriously. 7.4 is going to be removed from "supported versions" pretty soon, so I wouldn't wait for it to happen with 7.4 in production.