Does DB2 CURRENT TIMESTAMP on z/OS return unique values? - db2

We have a DB2 running on z/OS and some tables use a timestamp as a Primary Key.
My opinion is, that it might be possible that two transactions calling CURRENT TIMESTAMP in the same nanosecond can have exactly the same Timestamp returned.
My colleague thinks that the CURRENT TIMESTAMP function on the same database is always unique.
The DB2 documentation here is not very clear.
Is there an offical statement from IBM, which proofs the one or the other thesis? I found only a statement for UNIX DB2, which is maybe not applicable for z/OS.
Thank you.

There are instances when it won't be unique. They are:
Datetime special registers are stored in an internal format. When two or more of these registers are implicitly or explicitly specified in a single SQL statement, they represent the same point in time.
If the SQL statement in which a datetime special register is used is in a user-defined function or stored procedure that is within the scope of a trigger, DB2 uses the timestamp for the triggering SQL statement to determine the special register value.
Source: http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/topic/com.ibm.db2.doc.sqlref/xfbb68.htm#xfbb68
You should use GENERATE_UNIQUE() if you want a unique timestamp. Good example here: http://www.mainframesupport.dk/tips/tip0925.html

There is no guarantee that CURRENT TIMESTAMP will return a unique value.
I have seen many examples of DB/2 SQL INSERT statements in a z/os environment failing on duplicate key when CURRENT TIMESTAMP was used to populate a column defined as unique.
Once upon a time CURRENT TIMESTAMP had a fine enough "granularity" that the probability of a collision was extremely small. This lead to quite a few applications treating them as unique identifiers. Processors are faster and parallelism has increased tremendously over the years. Any process that expects unique values from CURRENT TIMESTAMP today is likely to crash and burn on a very regular basis.
Your colleague is running a bit behind the times (on a couple of levels).

Related

Postgres table partitioning based on table name

I have a table that stores information about weather for specific events and for specific timestamps. I do insert, update and select (more often than delete) on this table. All of my queries query on timestamp and event_id. Since this table is blowing up, I was considering doing table partitioning in postgres.
I could also think of having multiple tables and naming them "table_< event_id >_< timestamp >" to store specific timestamp information, instead of using postgres declarative/inheritance partitioning. But, I noticed that no one on the internet has done or written about any approach like this. Is there something I am missing?
I see that in postgres partitioning, the data is both kept in master as well as child tables. Why keep in both places? It seems less efficient to do inserts and updates to me.
Is there a generic limit on the number of tables when postgres will start to choke?
Thank you!
re 1) Don't do it. Why re-invent the wheel if the Postgres devs have already done it for you by providing declarative partitioning
re 2) You are mistaken. The data is only kept in the partition to which it belongs to. It just looks as if it is stored in the "master".
re 3) there is no built-in limit, but anything beyond a "few thousand" partitions is probably too much. It will still work, but especially query planning will be slower. And sometime the query execution might also suffer because runtime partition pruning is not as efficient any more.
Given your description you probably want to do hash partitioning on the event ID and then create range sub-partitions on the timestamp value (so each partition for an event is again partitioned on the range of the timestamps)

Is it a better practice to use Postgres default value or generate them before passing to Postgres?

Is it a better practice to use Postgres default value or generate them before passing to Postgres?
For example, is it better to use DEFAULT CURRENT_TIMESTAMP in Postgres or is it better to get current time on server and then pass to Postgres?
I would say to go with DEFAULT.
At least you have a backup value if, for whatever reason, you don't pass or forget to pass in anything to the database.
Moreover, when you pass around data between back-end and front-end which you believe is of a certain type (for example, a string or TEXT in Postgres) when in reality it is NULL this may very well lead to the client app throwing an exception.
If you provide a default value for all columns then you can always assume to have at least this value present when you query data from this table. This is particularly interesting when you are working with strings and want to check e.g. the length of a bio in a profile. You can skip any null check completely and possibly avoid exceptions.

DB2 updated rows since last check

I want to periodically export data from db2 and load it in another database for analysis.
In order to do this, I would need to know which rows have been inserted/updated since the last time I've exported things from a given table.
A simple solution would probably be to add a timestamp to every table and use that as a reference, but I don't have such a TS at the moment, and I would like to avoid adding it if possible.
Is there any other solution for finding the rows which have been added/updated after a given time (or something else that would solve my issue)?
There is an easy option for a timestamp in Db2 (for LUW) called
ROW CHANGE TIMESTAMP
This is managed by Db2 and could be defined as HIDDEN so existing SELECT * FROM queries will not retrieve the new row which would cause extra costs.
Check out the Db2 CREATE TABLE documentation
This functionality was originally added for optimistic locking but can be used for such situations as well.
There is a similar concept for Db2 z/OS - you have to check that out as I have not tried this one.
Of cause there are other ways to solve it like Replication etc.
That is not possible if you do not have a timestamp column. With a timestamp, you can know which are new or modified rows.
You can also use the TimeTravel feature, in order to get the new values, but that implies a timestamp column.
Another option, is to put the tables in append mode, and then get the rows after a given one. However, this option is not sure after a reorg, and affects the performance and space utilisation.
One possible option is to use SQL replication, but that needs extra tables for staging.
Finally, another option is to read the logs, with the db2ReadLog API, but that implies a development. Also, just appliying the archived logs into the new database is possible, however the database will remain in roll forward pending.

Making Entity Framework generate SQL with column=GetUtcDate()

Our app runs on many web servers. The time of these web servers can get skewed over time, as is to be expected. The database is a single separate machine with it's own time. We're using EF 5.0 and have a table that needs very precise and consistent times in multiple columns. I would like to be sure the date columns in this table always use the database servers time.
In SQL I would just set the column to GetUtcDate(). Simple, the date is computed and set on the database server, done. But how can I do this with EF on an insert or update? To be clear I need the SQL generated by EF to set the column to the function GetUtcDate() so that the value comes from the database server. I do not want the date being calculated on the web server. Some ideas I've seen and considered and why they don't work for me:
1) I could use default values on the columns in the schema. But I have many update scenarios where I also need consistent dates, not just inserts.
2) I could use triggers in the database. But we currently have zero logic in our database (we are using an ORM after all) and I don't want to set that precedent if I can avoid it. It also is tricky to determine when to update these columns on the database end.
3) I can get the database server time manually (separate query as in the example below), set the column to that value, then do the update. But this is very inefficient as it requires an extra call to the database. In a tight loop this is way too much overhead. Plus the time is now less accurate since I got the time milliseconds earlier, though it is at least consistent.
CreateQuery<DateTime>("CurrentUtcDateTime()").Execute().First();
So what is the right way to do this? Or is it even possible to make EF do the right thing here?
This question is really, Can I tell EF to get the Date/Time from the DB / Underlying provider. As far as I know, this isnt possible with EF statements no.
You should use a simple SQL statement prior to the Get DBTime
T-SQL GetDate Choose the preferred date option
var dq = context.Database.SqlQuery<DateTime>("select GETUTCDATE();");
DateTime serverDate;
foreach (var dt in dq) {
serverDate = dt;
}
Now use serverDate in your EF linq statement.

How should i keep track of the delete operations in database without using triggers?

The appliocation polls the database after certain intervals of time. On each polling, the application would read all the tables.
As a part of optimization, we want that application should read the table only if any INSERT/UPDATE/DELETE has happened. So i want to use the timestamp concept.
Having a seperate timestamp column can help me in tracking any row modifications.
While querying on a table i can check if the in-memory stored timestamp is lesser than the max-of-TimeStamp in the table. if yes, it means that some row has been modified.
But if certain row gets deleted, then the latest timestamp assosiated with this row is no more pressent. Hence the above algorithm fails in this case since the max-of-timestamp does not give the correct value.
Is there a way in which i can track the delete operations as well without using triggers?
Any help would be highly appreciated.
I am using Sybase ASA database.
Maybe you could implement a logical deletion. Instead of removing a record you simply mark it as deleted with a specific flag for example.
You still have the max timestamp and you can exclude the flagged records from the selection queries (maybe create some views on top of the table to do the job automatically).