DB2 error code -670 when adding a new column programmatically - db2

I'm developing against a DB2 database, and at some point I get an error code "-670" when trying to add a new column.
The error code indicates an insufficiently sized tablespace page size, anyway, I just went and ran a DESCRIBE command and I estimate I don't have more than 17K for the table width (I just added the numeric value contained in the "Length" column), anyway I'm not sure of that estimate since I have many BLOB columns. There is a SQL command (or DB2 command line utility) I could use to retrieve the exact info regarding the table width?

The sum of the LENGTH values in the output of the DESCRIBE TABLE command is a fairly accurate gauge of row width if you don't count the BLOB, CLOB, or LONG VARCHAR columns, which are not stored inline with the rest of the columns. There is a small amount of overhead bytes that aren't shown in that report, but it's usually not a significant portion of the table. DB2 has historically stored large objects separately to improve manageability and performance of the rest of the data in the table. DB2 has recently supported storing large objects inline in order to make use of compression and buffering, but I haven't seen it used widely and I doubt it will become a popular approach.
It sounds like it's time for you to relocate your table to a tablespace with a larger page size. Unless you're maxed out at a 32K page already, you have the option of doubling your page size by migrating your table to a larger bufferpool and tablespace, which will give you more room for additional columns. If you need to keep the data from the old table, loading from a cursor is a quick way to copy a large amount of data from one table to another within the same database. Your other option is to export the table's contents to a flatfile so you can drop and recreate the table in the wider tablespace and load the data back in.

Answering my own question, this script can be very useful in giving you a very good estimation about the used table width size (hence, you can have an idea about the remaining free space):
select SUM(300) from sysibm.syscolumns where tbname = 'MY_TABLE' and (typename = 'BLOB' or typename = 'DBCLOB')
select 2 * SUM(length) from sysibm.syscolumns where tbname = 'MY_TABLE' and typename = 'VARGRAPHIC'
select SUM(length) from sysibm.syscolumns where tbname = 'MY_TABLE' and typename != 'BLOB' and typename != 'DBCLOB' and typename != 'VARGRAPHIC'

Related

PostgreSQL Database size is not equal to sum of size of all tables

I am using an AWS RDS PostgreSQL instance. I am using below query to get size of all databases.
SELECT datname, pg_size_pretty(pg_database_size(datname))
from pg_database
order by pg_database_size(datname) desc
One database's size is 23 GB and when I ran below query to get sum of size of all individual tables in this particular database, it was around 8 GB.
select pg_size_pretty(sum(pg_total_relation_size(table_schema || '.' || table_name)))
from information_schema.tables
As it is an AWS RDS instance, I don't have rights on pg_toast schema.
How can I find out which database object is consuming size?
Thanks in advance.
The documentation says:
pg_total_relation_size ( regclass ) → bigint
Computes the total disk space used by the specified table, including all indexes and TOAST data. The result is equivalent to pg_table_size + pg_indexes_size.
So TOAST tables are covered, and so are indexes.
One simple explanation could be that you are connected to a different database than the one that is shown to be 23GB in size.
Another likely explanation would be materialized views, which consume space, but do not show up in information_schema.tables.
Yet another explanation could be that there have been crashes that left some garbage files behind, for example after an out-of-space condition during the rewrite of a table or index.
This is of course harder to debug on a hosted platform, where you don't have shell access...

Most efficient way to DECODE multiple columns -- DB2

I am fairly new to DB2 (and SQL in general) and I am having trouble finding an efficient method to DECODE columns
Currently, the database has a number of tables most of which have a significant number of their columns as numbers, these numbers correspond to a table with the real values. We are talking 9,500 different values (e.g '502=yes' or '1413= Graduate Student')
In any situation, I would just do WHERE clause and show where they are equal, but since there are 20-30 columns that need to be decoded per table, I can't really do this (that I know of).
Is there a way to effectively just display the corresponding value from the other table?
Example:
SELECT TEST_ID, DECODE(TEST_STATUS, 5111, 'Approved, 5112, 'In Progress') TEST_STATUS
FROM TEST_TABLE
The above works fine.......but I manually look up the numbers and review them to build the statements. As I mentioned, some tables have 20-30 columns that would need this AND some need DECODE statements that would be 12-15 conditions.
Is there anything that would allow me to do something simpler like:
SELECT TEST_ID, DECODE(TEST_STATUS = *TableWithCodeValues*) TEST_STATUS
FROM TEST_TABLE
EDIT: Also, to be more clear, I know I can do a ton of INNER JOINS, but I wasn't sure if there was a more efficient way than that.
From a logical point of view, I would consider splitting the lookup table into several domain/dimension tables. Not sure if that is possible to do for you, so I'll leave that part.
As mentioned in my comment I would stay away from using DECODE as described in your post. I would start by doing it as usual joins:
SELECT a.TEST_STATUS
, b.TEST_STATUS_DESCRIPTION
, a.ANOTHER_STATUS
, c.ANOTHER_STATUS_DESCRIPTION
, ...
FROM TEST_TABLE as a
JOIN TEST_STATUS_TABLE as b
ON a.TEST_STATUS = b.TEST_STATUS
JOIN ANOTHER_STATUS_TABLE as c
ON a.ANOTHER_STATUS = c.ANOTHER_STATUS
JOIN ...
If things are too slow there are a couple of things you can try:
Create a statistical view that can help determine cardinalities from the joins (may help the optimizer creating a better plan):
https://www.ibm.com/support/knowledgecenter/sl/SSEPGG_9.7.0/com.ibm.db2.luw.admin.perf.doc/doc/c0021713.html
If your license admits you can experiment with Materialized Query Tables (MQT). Note that there is a penalty for modifications of the base tables, so if you have more of a OLTP workload, this is probably not a good idea:
https://www.ibm.com/developerworks/data/library/techarticle/dm-0509melnyk/index.html
A third option if your lookup table is fairly static is to cache the lookup table in the application. Read the TEST_TABLE from the database, and lookup descriptions in the application. Further improvements may be to add triggers that invalidate the cache when lookup table is modified.
If you don't want to do all these joins you could create yourself an own LOOKUP function.
create or replace function lookup(IN_ID INTEGER)
returns varchar(32)
deterministic reads sql data
begin atomic
declare OUT_TEXT varchar(32);--
set OUT_TEXT=(select text from test.lookup where id=IN_ID);--
return OUT_TEXT;--
end;
With a table TEST.LOOKUP like
create table test.lookup(id integer, text varchar(32))
containing some id/text pairs this will return the text value corrseponding to an id .. if not found NULL.
With your mentioned 10k id/text pairs and an index on the ID field this shouldn't be a performance issue as such data amount should be easily be cached in the corresponding bufferpool.

Select * from table_name is running slow

The table contains around 700 000 data. Is there any way to make the query run faster?
This table is stored on a server.
I have tried to run the query by taking the specific columns.
If select * from table_name is unusually slow, check for these things:
Network speed. How large is the data and how fast is your network? For large queries you may want to think about your data in bytes instead of rows. Run select bytes/1024/1024/1024 gb from dba_segments where segment_name = 'TABLE_NAME'; and compare that with your network speed.
Row fetch size. If the application or IDE is fetching one-row-at-a-time, each row has a large overhead with network lag. You may need to increase that setting.
Empty segment. In a few weird cases the table's segment size can increase and never shrink. For example, if the table used to have billions of rows, and they were deleted but not truncated, the space would not be released. Then a select * from table_name may need to read a lot of empty extents to get to the real data. If the GB size from the above query seems too large, run alter table table_name move; to rebuild the table and possible save space.
Recursive query. A query that simple almost couldn't have a bad execution plan. It's possible, but rate, for a recursive query has a bad execution plan. While the query is running, look at select * from gv$sql where users_executing > 0;. There might be a data dictionary query that's really slow and needs to be tuned.

Evaluate how much space will be freed by VACUUM in Redshift

According to AWS doc:
Amazon Redshift does not automatically reclaim and reuse space that is freed when you delete rows and update rows.
Before running VACUUM, is there a way to know or evaluate how much space will be free from disk by the VACUUM?
Thx
References:
http://docs.aws.amazon.com/redshift/latest/dg/t_Reclaiming_storage_space202.html
http://docs.aws.amazon.com/redshift/latest/dg/r_VACUUM_command.html
You can calculate the amount of storage that will be freed up from a vacuum command by looking up the tbl_rows column in the svv_table_info view. This includes rows that are marked for deletion. Compare that to a select count(*) from the same table and you'll have a ratio. Something like this on a theoretical table named factsales.
select (select cast(count(*) as numeric(12,0)) from factsales) /
cast(tbl_rows as numeric(12,0))
as "percentage of non deleted rows"
from svv_table_info where "table" = 'factsales'
There doesn't appear to be a straightforward way to execute dynamic SQL and cursors so to get this same ratio across all tables you'd have to execute the code from an external source or programming language i.e. python.
Its not an extremely accurate way, but you can query svv_table_info and look for the column deleted_pct. This will give you a rough idea, in percentage terms, about what fraction of the table needs to be rebuilt using vacuum.
You can run it for all the tables in your system to get this estimate for the whole system.

select * or individual columns

Is there any difference in performance between
select *
from table name
and
select [col1]
,[col2]
......
,[coln]
from table name
It is a SQL antipattern to use SELECT *. It is faster for the database when you specify columns (it doesn't have to look them up) and more importantly, you should not ever specify more columns than you actually need. If you have a join in your query you have at least one column you don't need (the join column) and thus SELECT * is always slower to return records in a query with a join since it is returning more infomation than necessary.
Now all this sounds like a small improvement and in a small system it might be, but as the database grows and gets busy, the performance implications become bigger. There is no excuse for using SELECT *.
SELECT * is also bad for maintenance especially if you use it to insert records - always specify both the columns you are inserting to and the fields from the select in an INSERT statement that uses a SELECT. It will break if you change the table structure. You may also end up showing the user columns you don't want them to see such as the GUID added for replication.
If you use SELECT * in a view (at least in SQL Server) the view will still not automatically update in the case of a change to the underlying tables. If someone gets the silly idea to re-arrange the column order in the table (yeah I know you shouldn't but people do this on occasion) using SELECT * might mean that data will show up in the wrong columns in reports or inserts which can cause problems where things might be misinterpreted. I can think of one case where two columns were swapped in a staging table and the social security number became the amount we intended to pay the person giving a speech. You can see how that might really muck up the accounting except we didn't use SELECT * so we were safe because the columns retained the same name.
I want to note that you don't even save much development time by using SELECT *. It takes me a max of about 15 seconds more to use the table names even in a large table as I drag them over from the object browser in SQL Server (you can get all the columns in one step).
I suppose select * takes a few cpu cycles less since sql server parses your statements, but i don't think it will make a noticable difference