When I alter a table in db2, I have to reorganize it
so I execute the next query:
Call Sysproc.admin_cmd ('reorg Table myTable');
I m searching an appropriate solution to reorganize a table when it s altered, or reorganize all the schema after making various modifications
You can determine when tables will require a REORG by looking at SYSIBMADM.ADMINTABINFO:
select tabschema, tabname
from sysibmadm.admintabinfo
where reorg_pending = 'Y'
You may also want to look at the NUM_REORG_REC_ALTERS column as this may show you additional tables that don't require reorganization due to various ALTER TABLE statements.
The reorg operation is similar to a defrag in hard disk. It frees empty spaces in pages, and eventually it could reorganize data according to an index. Depending on the features, it creates the compression dictionary and compress data.
As you can see, reorg operation is an administrative task, and it is not necessary each time data is modified. A database could run without reorg.
It order to ease this, DB2 included autonomic features like automatic backup, however this doesn't answer you own question. This will only trigger reorg on tables that need that.
To reorg a table explicitly you need to execute the command reorg http://publib.boulder.ibm.com/infocenter/db2luw/v10r1/topic/com.ibm.db2.luw.admin.cmd.doc/doc/r0001966.html
or via the admin_cmd http://publib.boulder.ibm.com/infocenter/db2luw/v10r1/topic/com.ibm.db2.luw.sql.rtn.doc/doc/r0023582.html
in db2 config we have:
Automatic reorganization (AUTO_REORG) = OFF
we can set auto_reorg to on
Related
If CLUSTER is set on a table, then is it applied by pg_dump?
Specifically, the following:
Is it used to order the rows in the dump? If not, is there a way to do this?
Is it set on the table when using pg_restore? If not, is there a way to do this?
The dump will contain the statement
ALTER TABLE mytable CLUSTER ON anindex;
Restoring the dump will execute that statement. As the documentation explains,
This form selects the default index for future CLUSTER operations. It does not actually re-cluster the table.
I want to load many rows from a CSV file.
The files contain data like these "article_name,article_time,start_time,end_time"
There is a contraint on the table: for the same article name, i don't insert a new row if the new article_time falls in an existing range [start_time,end_time] for the same article.
ie: don't insert row y if exists [start_time_x,end_time_x] for which time_article_y inside range [start_time_x,end_time_x] , with article_name_y = article_name_x
I tried with psycopg by selecting the existing article names ad checking manually if there is an overlap --> too long
I tried again with psycopg, this time by setting a condition 'exclude using...' and tryig to insert with specifying "on conflict do nothing" (so that it does not fail) but still too long
I tried the same thing but this time trying to insert many values at each call of execute (psycopg): it got a little better (1M rows processed in almost 10minutes), but still not as fast as it needs to be for the amount of data I have (500M+)
I tried to parallelize by calling the same script many time, on different files but the timing didn't get any better, I guess because of the locks on the table each time we want to write something
Is there any way to create a lock only on rows containing the same article_name? (and not a lock on the whole table?)
Could you please help with any idea to make this parallellizable and/or more time efficient?
Lots of thanks folks
Your idea with the exclusion constraint and INSERT ... ON CONFLICT is good.
You could improve the speed as follows:
Do it all in a single transaction.
Like Vao Tsun suggested, maybe COPY the data into a staging table first and do it all with a single SQL statement.
Remove all indexes except the exclusion constraint from the table where you modify data and re-create them when you are done.
Speed up insertion by disabling autovacuum and raising max_wal_size (or checkpoint_segments on older PostgreSQL versions) while you load the data.
I am using the following commands below in postgresql 9.1.3 to move data from a temp staging table to a table being used in a webapp (geoserver) all in the same db. Then dropping the temp table.
TRUNCATE table_foo;
INSERT INTO table_foo
SELECT * FROM table_temp;
DROP TABLE table_temp;
I want to wrap this in a transaction to allow for concurrency. The data-set is small less than 2000 rows and truncating is faster than delete.
What is the best way to run these commands in a transaction?
Is creating a function advisable or writing a UPSERT/MERGE etc in a CTE?
Would it be better to DELETE all rows then bulk INSERT from temp table instead of TRUNCATE?
In postgres which would allow for a roll back TRUNCATE or DELETE?
The temp table is delivered daily via an ETL scripted in arcpy how could I automate the truncate/delete/bulk insert parts within postgres?
I am open to using PL/pgsql, PL/python (or the recommended py for postgres)
Currently I am manually executing the sql commands after the temp staging table is imported into my DB.
Both, truncate and delete can be rolled back (which is clearly documented in the manual).
truncate - due to its nature - has some oddities regarding the visibility.
See the manual for details: http://www.postgresql.org/docs/current/static/sql-truncate.html (the warning at the bottom)
If your application can live with the fact that table_foo is "empty" during that process, truncate is probably better (again see the big red box in the manual for an explanation). If you don't want the application to notice, you need to use delete
To run these statements in a transaction simply put them into one:
begin transaction;
delete from table_foo;
insert into ....
drop table_temp;
commit;
Whether you do that in a function or not is up to you.
truncate/insert will be faster (than delete/insert) as that minimizes the amount of WAL generated.
I am getting this error when I ran:
alter table tablename add column columnname varchar(1) default 'N';
DB2 SQL Error: SQLCODE=-911, SQLSTATE=40001, SQLERRMC=68
How to solve it?
The alter statement wants to get an X lock on this row in SYSIBM.SYSTABLES. There is an open transaction that has this row/index value in an incompatible lock state. This lock that caused the timeout could even be from an open cursor that reads this row with an RS or RR isolation level.
Terminate any other SQL currently trying to query SYSTABLES and any utilities that may be trying to update SYSTABLES like reorg and runstats then try the alter again.
See DB2 Info center (I picked the one for DB2 10, most likely this error code is the same in other versions, but doublecheck!).
Seems there is a transaction open on your table, that prevents your alter command from execution.
after you have Altered a table you need to Reorg: reade up on it here:
Run the runstats script, which is a DB2 script, at regular intervals and set the script to gather RUNSTATS WITH DISTRIBUTION AND DETAILED INDEXES ALL.
In addition to running the runstats scripts regularly, you can perform the following tasks to avoid the problem:
Use REOPT ONCE or REOPT ALWAYS with the command-line interface (CLI ) packages to change the query optimization behavior.
In the DB2 database, change the table to make it volatile. Volatile tables indicate to the DB2 optimizer that the table cardinality can change significantly at run time (from empty to large and vice versa). Therefore, DB2 uses an index to access a table rather than a table scan.
I am Loading large no of rows into a table from a csv data file . For every 10000 records I want to update the indexs on the table for optimization (update statistics ). Any body tell me what is the command i can use? Also what is SQL Server "UPDATE STATISTICS" equivalent in Oracle.is Update statistics means index optimization or gatehring statistics. I am using Oracle 10g and 11g. Thanks in advance.
Index optimization is a tricky question. You can COALESCE an index to eliminate adjacent empty blocks, and you can REBUILD an index to completely trash and recreate it. In my opinion, what you may wish to do for the period of your data load, is make the indexes UNUSABLE, then when you're done, REBUILD them.
ALTER INDEX my_table_idx01 DISABLE;
-- run loader process
ALTER INDEX my_table_idx01 REBUILD;
You only want to gather statistics once when you're done, and that's done with a call to DBMS_STATS, like so:
EXEC DBMS_STATS.GATHER_TABLE_STATS ('my_schema', 'my_table');
I would recommend taking a different approach. I would drop the index(es), load the data and then recreate the index. After enabling it Oracle will build a good index on the data you just loaded. Two things are accomplished here, the records will load faster and the index will be rebuilt with a properly balanced tree. (Note: Be careful here, if the table is a really big table, you may need to declare a temporary tablespace for it to work in.)
drop index my_index;
-- uber awesome loading process
create index my_index on my_table(my_col1, my_col2);