Adding Default value to oracle database with high volumes of data

Adding Default value to oracle database with high volumes of data - oracle10g

I am trying to add a new column to a table with upwards of 9 million records.
This issue is the column needs to be default value of 'N'. When updating the table the database is getting an issue with the temp data being filled. Also, it is taking a huge amount of time.
I was wondering if anyone knows of anyway to make this faster or a better way of doing this to avoid problems with the temp data filling up.
The database is Oracle10g.

If you could move to 11g and the column was NOT NULL, Oracle has an optimization where the default value doesn't need to be stored in each row so you can add the column very quickly. Unfortunately, it sounds like you're stuck with a depricated version of Oracle where that isn't available.
Most likely, you don't have a lot of really good options other than waiting. It may be more efficient, assuming you're doing this during a period of downtime, to create a new table with the new column, do a direct-path insert of all the data from the old table to the new table, rename the tables, and re-point any constraints at the new table. Whether this is actually more efficient than waiting for the update will depend on your hardware and your table but an INSERT is likely to be more efficient than an UPDATE. On the other hand, for a new single-character column that isn't going to create a lot of migrated rows, you're probably better off waiting for the UPDATE rather than going to this level of effort-- there are a lot of things that could potentially go wrong that you'd need to test and validate (i.e. making sure that you updated all the constraints correctly).

Related

Can I configure a table such that inserted rows always have a greater primary key

I would like to configure a table in Postgres to behave like an append only log. This table will have an automatically generated primary ID.
Workers will work on the items in the table in order and should only need to store the last row ID that they have completed.
How can i prevent rows being written to the table (perhaps by some transactions taking longer than others) where the row ID is less than the greatest value in the table?

There is no way to prevent concurrent inserts in the table (short of locking the table, which is a bad idea, because it breaks autovacuum).
So there is no way to to guarantee that rows are inserted in a certain order. The order in which rows are inserted isn't really a meaningful concept in PostgreSQL.
If you really want that, you have to use a different mechanism to serialize inserts, for example using PostgreSQL advisory locks or synchronization mechanisms on the client side.

Except the numbers assigned are session specific, so a session that starts earlier but lasts longer can write to the table with an id that is less then one that started later but finished sooner. So either you create your own number sequence generation that involves locking or you use an INSERT timestamp.

DB2 updated rows since last check

I want to periodically export data from db2 and load it in another database for analysis.
In order to do this, I would need to know which rows have been inserted/updated since the last time I've exported things from a given table.
A simple solution would probably be to add a timestamp to every table and use that as a reference, but I don't have such a TS at the moment, and I would like to avoid adding it if possible.
Is there any other solution for finding the rows which have been added/updated after a given time (or something else that would solve my issue)?

There is an easy option for a timestamp in Db2 (for LUW) called
ROW CHANGE TIMESTAMP
This is managed by Db2 and could be defined as HIDDEN so existing SELECT * FROM queries will not retrieve the new row which would cause extra costs.
Check out the Db2 CREATE TABLE documentation
This functionality was originally added for optimistic locking but can be used for such situations as well.
There is a similar concept for Db2 z/OS - you have to check that out as I have not tried this one.
Of cause there are other ways to solve it like Replication etc.

That is not possible if you do not have a timestamp column. With a timestamp, you can know which are new or modified rows.
You can also use the TimeTravel feature, in order to get the new values, but that implies a timestamp column.
Another option, is to put the tables in append mode, and then get the rows after a given one. However, this option is not sure after a reorg, and affects the performance and space utilisation.
One possible option is to use SQL replication, but that needs extra tables for staging.
Finally, another option is to read the logs, with the db2ReadLog API, but that implies a development. Also, just appliying the archived logs into the new database is possible, however the database will remain in roll forward pending.

Change tracking in postgres

I want to track changes on some tables in Postgres. Is there any native means in PG for doing this? Probably not, because this requirement may be very specific depending on use-case.
I would like to observe only some columns in table for changes, and make a copy of the row, if such a filed changes. The row should be copied in another, similar table, which additionally has a column for current user, change time-stamp, id of the original row and of cause own primary key column.
Are there any good patterns for doing this? Which native Postgres tools could I use, and what should I implement myself?

How should i keep track of the delete operations in database without using triggers?

The appliocation polls the database after certain intervals of time. On each polling, the application would read all the tables.
As a part of optimization, we want that application should read the table only if any INSERT/UPDATE/DELETE has happened. So i want to use the timestamp concept.
Having a seperate timestamp column can help me in tracking any row modifications.
While querying on a table i can check if the in-memory stored timestamp is lesser than the max-of-TimeStamp in the table. if yes, it means that some row has been modified.
But if certain row gets deleted, then the latest timestamp assosiated with this row is no more pressent. Hence the above algorithm fails in this case since the max-of-timestamp does not give the correct value.
Is there a way in which i can track the delete operations as well without using triggers?
Any help would be highly appreciated.
I am using Sybase ASA database.

Maybe you could implement a logical deletion. Instead of removing a record you simply mark it as deleted with a specific flag for example.
You still have the max timestamp and you can exclude the flagged records from the selection queries (maybe create some views on top of the table to do the job automatically).

How do I ALTER a set of partitioned tables in Postgres?

I created a set of partitioned tables in Postgres, and started inserting a lot of rows via the master table. When the load process blew up on me, I realized I should have declared the id row BIGSERIAL (BIGINT with a sequence, behind the scenes), but inadvertently set it as SERIAL (INTEGER). Now that I have a couple of billion rows loaded, I am trying to ALTER the column to BIGINT. The process seems to be working, but is taking a long time. So, in reality, I don't really know if it is working or it is hung. I'd rather not restart the entire load process again.
Any suggestions?

When you update a row to alter it in PostgreSQL, that writes out a new copy of the row and then does some cleanup later to remove the original. This means that trying to fix the problem by doing updates can take longer than just loading all the data in from scratch again--it's more disk I/O than loading a new copy, and some extra processing time too. The only situation where you'd want to do an update instead of a reload is when the original load was very inefficient, for example if a slow client programs is inserting the data and it's the bottleneck on the process.
To figure out if the process is still working, see if it's using CPU when you run top (UNIX-ish systems) or the Task Manager (Windows). On Linux, "top -c" will even show you what the PostgreSQL client processes are doing. You probably just expected it to take less time than the original load, which it won't, and it's still running rather than hung up.

Restart it (clarifying edit: restart the entire load process again).
Altering a column value requires a new row version, and all indexes pointing to the old version to be updated to point to the new version.
Additionally, see how much of the advise on populating databases you can follow.
Correction from #archnid:
altering the type of the column will trigger a table rewrite, so the row versioning isn't a big problem, but it will still take lots of disk space temporarily. you can usually monitor progress by looking at which files in the database directory are being appended to...

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Adding Default value to oracle database with high volumes of data - oracle10g

Related

Can I configure a table such that inserted rows always have a greater primary key

DB2 updated rows since last check

Change tracking in postgres

How should i keep track of the delete operations in database without using triggers?

How do I ALTER a set of partitioned tables in Postgres?

Categories

Resources