How should i keep track of the delete operations in database without using triggers? - sybase-asa

The appliocation polls the database after certain intervals of time. On each polling, the application would read all the tables.
As a part of optimization, we want that application should read the table only if any INSERT/UPDATE/DELETE has happened. So i want to use the timestamp concept.
Having a seperate timestamp column can help me in tracking any row modifications.
While querying on a table i can check if the in-memory stored timestamp is lesser than the max-of-TimeStamp in the table. if yes, it means that some row has been modified.
But if certain row gets deleted, then the latest timestamp assosiated with this row is no more pressent. Hence the above algorithm fails in this case since the max-of-timestamp does not give the correct value.
Is there a way in which i can track the delete operations as well without using triggers?
Any help would be highly appreciated.
I am using Sybase ASA database.

Maybe you could implement a logical deletion. Instead of removing a record you simply mark it as deleted with a specific flag for example.
You still have the max timestamp and you can exclude the flagged records from the selection queries (maybe create some views on top of the table to do the job automatically).

Related

Azure Data Factory - Copy Data Upsert only updating a single row at a time

I'm using Data Factory (well synapse pipelines) to ingest data from sources into a staging layer. I am using the Copy Data activity with UPSERT. However i found the performance of incrementally loading large tables particularly slow so i did some digging.
So my incremental load brought in 193k new/modified records from the source. These get stored in the transient staging/landing table that the copy data activity creates in the database in the background. In this table it adds a column called BatchIdentifier, however the batch identifier value is different for every row.
Profiling the load i can see individual statements issued for each batchidentifier so effectively its processing the incoming data row by row rather than using a batch process to do the same thing.
I tried setting the sink writebatchsize property on copy data activity to 10k but that doesn't make any difference.
Has anyone else come across this, or a better way to perform a dynamic upsert without having to specify all the columns in advance (which i'm really hoping to avoid)
This is the SQL statement issued 193k times on my load as an example.
Does a check to see if the record exists in the target table, if so performs an update otherwise performs an insert. logic makes sense but its performing this on a row by row basis when this could just be done in bulk.
Is your primary key definition in the source the same as in the sink?
I just ran into this same behavior when the columns in the source and destination tables used different columns.
It also appears ADF/Synapse does not use MERGE for upserts, but its own IF EXISTS THEN UPDATE ELSE INSERT logic so there may be something behind the scenes making it select single rows for those BatchId executions.

Can I configure a table such that inserted rows always have a greater primary key

I would like to configure a table in Postgres to behave like an append only log. This table will have an automatically generated primary ID.
Workers will work on the items in the table in order and should only need to store the last row ID that they have completed.
How can i prevent rows being written to the table (perhaps by some transactions taking longer than others) where the row ID is less than the greatest value in the table?
There is no way to prevent concurrent inserts in the table (short of locking the table, which is a bad idea, because it breaks autovacuum).
So there is no way to to guarantee that rows are inserted in a certain order. The order in which rows are inserted isn't really a meaningful concept in PostgreSQL.
If you really want that, you have to use a different mechanism to serialize inserts, for example using PostgreSQL advisory locks or synchronization mechanisms on the client side.
Except the numbers assigned are session specific, so a session that starts earlier but lasts longer can write to the table with an id that is less then one that started later but finished sooner. So either you create your own number sequence generation that involves locking or you use an INSERT timestamp.

DB2 updated rows since last check

I want to periodically export data from db2 and load it in another database for analysis.
In order to do this, I would need to know which rows have been inserted/updated since the last time I've exported things from a given table.
A simple solution would probably be to add a timestamp to every table and use that as a reference, but I don't have such a TS at the moment, and I would like to avoid adding it if possible.
Is there any other solution for finding the rows which have been added/updated after a given time (or something else that would solve my issue)?
There is an easy option for a timestamp in Db2 (for LUW) called
ROW CHANGE TIMESTAMP
This is managed by Db2 and could be defined as HIDDEN so existing SELECT * FROM queries will not retrieve the new row which would cause extra costs.
Check out the Db2 CREATE TABLE documentation
This functionality was originally added for optimistic locking but can be used for such situations as well.
There is a similar concept for Db2 z/OS - you have to check that out as I have not tried this one.
Of cause there are other ways to solve it like Replication etc.
That is not possible if you do not have a timestamp column. With a timestamp, you can know which are new or modified rows.
You can also use the TimeTravel feature, in order to get the new values, but that implies a timestamp column.
Another option, is to put the tables in append mode, and then get the rows after a given one. However, this option is not sure after a reorg, and affects the performance and space utilisation.
One possible option is to use SQL replication, but that needs extra tables for staging.
Finally, another option is to read the logs, with the db2ReadLog API, but that implies a development. Also, just appliying the archived logs into the new database is possible, however the database will remain in roll forward pending.

What is the safe limit for the number of rows in postgresql?

I have a cron job that fires a query every three minutes. With each firing, data is entered into my db. So my database keeps on growing after each interval. However I am only interested in the latest row. Is it safe? Will postgresql truncate oldest entries automatically?
If you post the gist of your cron job you could get a better answer.
If its a direct insert, execute a truncate before hand to delete the old unwanted data. Delete is also possible, but you will end up with a lot of dead tuples and you will need to vacuum the table on a regular basis.
update is a good option but it depends on how much of the data is static and how much is not. eg. if you repeat values in any columns then go for update. This will also be subject to dead tuples and vacuuming.
if you are loading from an external source, eg csv, json, xml there are methods to overwrite existing data automatically. pg_loader may be an option here.

Adding Default value to oracle database with high volumes of data

I am trying to add a new column to a table with upwards of 9 million records.
This issue is the column needs to be default value of 'N'. When updating the table the database is getting an issue with the temp data being filled. Also, it is taking a huge amount of time.
I was wondering if anyone knows of anyway to make this faster or a better way of doing this to avoid problems with the temp data filling up.
The database is Oracle10g.
If you could move to 11g and the column was NOT NULL, Oracle has an optimization where the default value doesn't need to be stored in each row so you can add the column very quickly. Unfortunately, it sounds like you're stuck with a depricated version of Oracle where that isn't available.
Most likely, you don't have a lot of really good options other than waiting. It may be more efficient, assuming you're doing this during a period of downtime, to create a new table with the new column, do a direct-path insert of all the data from the old table to the new table, rename the tables, and re-point any constraints at the new table. Whether this is actually more efficient than waiting for the update will depend on your hardware and your table but an INSERT is likely to be more efficient than an UPDATE. On the other hand, for a new single-character column that isn't going to create a lot of migrated rows, you're probably better off waiting for the UPDATE rather than going to this level of effort-- there are a lot of things that could potentially go wrong that you'd need to test and validate (i.e. making sure that you updated all the constraints correctly).