What is the command for Index optimization and update statistics for Oracle 10g and 11g? - oracle10g

I am Loading large no of rows into a table from a csv data file . For every 10000 records I want to update the indexs on the table for optimization (update statistics ). Any body tell me what is the command i can use? Also what is SQL Server "UPDATE STATISTICS" equivalent in Oracle.is Update statistics means index optimization or gatehring statistics. I am using Oracle 10g and 11g. Thanks in advance.

Index optimization is a tricky question. You can COALESCE an index to eliminate adjacent empty blocks, and you can REBUILD an index to completely trash and recreate it. In my opinion, what you may wish to do for the period of your data load, is make the indexes UNUSABLE, then when you're done, REBUILD them.
ALTER INDEX my_table_idx01 DISABLE;
-- run loader process
ALTER INDEX my_table_idx01 REBUILD;
You only want to gather statistics once when you're done, and that's done with a call to DBMS_STATS, like so:
EXEC DBMS_STATS.GATHER_TABLE_STATS ('my_schema', 'my_table');

I would recommend taking a different approach. I would drop the index(es), load the data and then recreate the index. After enabling it Oracle will build a good index on the data you just loaded. Two things are accomplished here, the records will load faster and the index will be rebuilt with a properly balanced tree. (Note: Be careful here, if the table is a really big table, you may need to declare a temporary tablespace for it to work in.)
drop index my_index;
-- uber awesome loading process
create index my_index on my_table(my_col1, my_col2);

Related

incremental update of a postgres index

I have inserted a lot of data (more than 2 millions documents) in a table and created an a full text search index using GIN and it works great. I can query the database and retrieve the apropriate documents rapidly.
Regularly, I collect new data that I can insert in the database. What I would like to do is to update my index with the new data only, but I have failed so far. I don't want to drop the index and recreate it because it takes ages to recreate it. I basically would like to do an incremental update of the index. I can do that on the fly when data is being inserted but this is very very slow. I read that creating an index on inserted data was faster (true) so I guessed that updating an index on the new data could be done. But I can't do it so far.
I use postgresql 12.
Can anybody help me, please?
There is no way to suspend adding values to the index while you load data.
But GIN indexes already have a feature to optimize that: the GIN fast update technique.
If you set the gin_pending_list_limit storage parameter to the index to a high value. Once you are done with the bulk load, VACUUM the table to integrate the pending list into the main index.
An alternative approach is to use partitioning and load a partition at once. Then create the index on the partition and attach it to the partitioned table.

Postgres parallel/efficient load huge amount of data psycopg

I want to load many rows from a CSV file.
The file​s​ contain​ data like these​ "article​_name​,​article_time,​start_time,​end_time"
There is a contraint on the table: for the same article name, i don't insert a new row if the new ​article_time falls in an existing range​ [start_time,​end_time]​ for the same article.
ie: don't insert row y if exists [​start_time_x,​end_time_x] for which time_article_y inside range [​start_time_x,​end_time_x] , with article_​name_​y = article_​name_​x
I tried ​with psycopg by selecting the existing article names ad checking manually if there is an overlap --> too long
I tried again with psycopg, this time by setting a condition 'exclude using...' and tryig to insert with specifying "on conflict do nothing" (so that it does not fail) but still too long
I tried the same thing but this time trying to insert many values at each call of execute (psycopg): it got a little better (1M rows processed in almost 10minutes)​, but still not as fast as it needs to be for the amount of data ​I have (500M+)
I tried to parallelize by calling the same script many time, on different files but the timing didn't get any better, I guess because of the locks on the table each time we want to write something
Is there any way to create a lock only on rows containing the same article_name? (and not a lock on the whole table?)
Could you please help with any idea to make this parallellizable and/or more time efficient?
​Lots of thanks folks​
Your idea with the exclusion constraint and INSERT ... ON CONFLICT is good.
You could improve the speed as follows:
Do it all in a single transaction.
Like Vao Tsun suggested, maybe COPY the data into a staging table first and do it all with a single SQL statement.
Remove all indexes except the exclusion constraint from the table where you modify data and re-create them when you are done.
Speed up insertion by disabling autovacuum and raising max_wal_size (or checkpoint_segments on older PostgreSQL versions) while you load the data.

PostgreSQL - reindex when add new index

I have a table with 100k records without indexes. I created a new index on column that is used for left join.
Do I need to reindex my table?
Creation of an index took a few ms. So I am guessing that query can not use this index (no data) until I reindex my table (in case I would have other indexes I would reindex only index - I read the manual).
I can't find any information when new index is populated with data? Is this done automatically? When?
Once CREATE INDEX has finished, the index is ready to be used. There is no need to run REINDEX after that.
From the REINDEX documentation page:
REINDEX is similar to a drop and recreate of the index in that the index contents are rebuilt from scratch. However, the locking considerations are rather different. REINDEX locks out writes but not reads of the index's parent table.
That means REINDEX behaves similar to CREATE after DROP.
And from the CREATE INDEX documentation page:
Creating an index can interfere with regular operation of a database. Normally PostgreSQL locks the table to be indexed against writes and performs the entire index build with a single scan of the table. Other transactions can still read the table, but if they try to insert, update, or delete rows in the table they will block until the index build is finished.
I think this unambiguously explains that creation implies indexation.
Whether or not a specific query uses the index depends on many different things though. If your query doesn't use the index, you need to post the query, the table definitions (e.g. as a create table statement), the index you have defined and the output of explain (analyze, verbose) of your query.

db2 reorganize a table

When I alter a table in db2, I have to reorganize it
so I execute the next query:
Call Sysproc.admin_cmd ('reorg Table myTable');
I m searching an appropriate solution to reorganize a table when it s altered, or reorganize all the schema after making various modifications
You can determine when tables will require a REORG by looking at SYSIBMADM.ADMINTABINFO:
select tabschema, tabname
from sysibmadm.admintabinfo
where reorg_pending = 'Y'
You may also want to look at the NUM_REORG_REC_ALTERS column as this may show you additional tables that don't require reorganization due to various ALTER TABLE statements.
The reorg operation is similar to a defrag in hard disk. It frees empty spaces in pages, and eventually it could reorganize data according to an index. Depending on the features, it creates the compression dictionary and compress data.
As you can see, reorg operation is an administrative task, and it is not necessary each time data is modified. A database could run without reorg.
It order to ease this, DB2 included autonomic features like automatic backup, however this doesn't answer you own question. This will only trigger reorg on tables that need that.
To reorg a table explicitly you need to execute the command reorg http://publib.boulder.ibm.com/infocenter/db2luw/v10r1/topic/com.ibm.db2.luw.admin.cmd.doc/doc/r0001966.html
or via the admin_cmd http://publib.boulder.ibm.com/infocenter/db2luw/v10r1/topic/com.ibm.db2.luw.sql.rtn.doc/doc/r0023582.html
in db2 config we have:
Automatic reorganization (AUTO_REORG) = OFF
we can set auto_reorg to on

Best use of indices on temporary tables in T-SQL

If you're creating a temporary table within a stored procedure and want to add an index or two on it, to improve the performance of any additional statements made against it, what is the best approach? Sybase says this:
"the table must contain data when the index is created. If you create the temporary table and create the index on an empty table, Adaptive Server does not create column statistics such as histograms and densities. If you insert data rows after creating the index, the optimizer has incomplete statistics."
but recently a colleague mentioned that if I create the temp table and indices in a different stored procedure to the one which actually uses the temporary table, then Adaptive Server optimiser will be able to make use of them.
On the whole, I'm not a big fan of wrapper procedures that add little value, so I've not actually got around to testing this, but I thought I'd put the question out there, to see if anyone had any other approaches or advice?
A few thoughts:
If your temporary table is so big that you have to index it, then is there a better way to solve the problem?
You can force it to use the index (if you are sure that the index is the correct way to access the table) by giving an optimiser hint, of the form:
SELECT *
FROM #table (index idIndex)
WHERE id = #id
If you are interested in performance tips in general, I've answered a couple of other questions about that at some length here:
Favourite performance tuning tricks
How do you optimize tables for specific queries?
What's the problem with adding the indexes after you put data into the temp table?
One thing you need to be mindful of is the visibility of the index to other instances of the procedure that might be running at the same time.
I like to add a guid to these kinds of temp tables (and to the indexes), to make sure there is never a conflict. The other benefit of this approach is that you could simply make the temp table a real table.
Also, make sure that you will need to query the data in these temp tables more than once during the running of the stored procedure, otherwise the cost of index creation will outweigh the benefit to the select.
In Sybase if you create a temp table and then use it in one proc the plan for the select is built using an estimate of 100 rows in the table. (The plan is built when the procedure starts before the tables are populated.) This can result in the temp table being table scanned since it is only "100 rows". Calling a another proc causes Sybase to build the plan for the select with the actual number of rows, this allows the optimizer to pick a better index to use. I have seen significant improvedments using this approach but test on your database as sometimes there is no difference.