In many examples I've read of SQL timestamp useage, a typical case would be that a timestamp column is added to prevent a kind of race condition whereby a user is changing data that has lost it's integrity since another user 'got in there first'.
More specifically, prior to issuing an update on a row, business logic would cross check the timestamp they believe to be changing so that there isn't a mix up with row versioning.
Question
Why wouldn't DATETIME suffice for this task? In fact, by that logic - why wouldn't any unique data type be appropriate instead? NEWID() every time an update is issued, for example?
In mySQL, timestamp is a physically smaller datatype to store than datetime. In addition, timestamp is universal, ignoring all timezones. For international products, this is important.
ID's are not recommended as they often generate at the point of insert/update.
It appears I missed the fundamental feature of timestamp, it auto updates.
So calling UPDATE on a row will automatically increment it's TIMESTAMP column without me manually setting it.
I'll leave this answer here just in case anybody has comments about what else I may have missed.
Related
basically i have the same problem with this and use this solution too https://community.tableau.com/s/question/0D54T00000C6aS6/datediff-in-lod
so i want to count time diff between transaction for each transaction and for each users, basically my problem is just like this
same with that, ON interaction id = user id in my real data, so i want to know time different between date transaction for each users.
based on that, i made this on calculation DATEDIFF('day',LOOKUP(MIN([Created At]),-1), MIN([Created At]))
and here's the results
what makes me confuse is, why the first transaction of users always have time difference, instead it must be nothing because there's no time difference if you do first transaction, so how to make it not appear?
Create a table calc field for First
FIRST()=0
Edit the table calc so it resets ever new user id.
Now filter for False. You can remove it from Rows, I just kept it there for demo. Your calculation will still work.
I am using Microsoft Access 2016. I am trying to find out how many years exist from the current year until a future year. I have a column that is end_date. I am trying to create a calculated field that is essentially YEAR(end_date) - YEAR(current_year). I tried to use YEAR(DATE()) but DATE() is not allowed to be used in a calculated field apparently.
Is there no way to do a calculation like this?
Nope. Calculated fields are cached and static, so are NEVER allowed to contain ANY information that will change over time, due to system settings, or anything else that is not directly entered in that row.
However, you should not be using calculated fields anyway. See http://allenbrowne.com/casu-14.html, among many posts advocating for not using calculated fields.
Instead, use queries to do calculations. That way, you won't have any trouble using the current date, and won't have to deal with the possible errors and portability issues calculated fields come with.
I changed my thinking to calculate this in a form. It does not seem good practice to have a field in a DB that changes everyday.
In a form, you can use this expression as controlsource for a textbox:
=DateDiff("yyyy",Date(),[EndDate])
However, that return the difference in calendar years. To find the count of full years, use a function like AgeSimple and this expression:
=AgeSimple([EndDate])
Our web based app with 100,000 concurrent users has a use case where we auto-save the user's activity every 5 seconds. Consider a table like this:
create table essays
(
id uuid not null constraint essays_pkey primary key,
userId text not null,
essayparts jsonb default '{ }' :: jsonb,
create_date timestamp with time zone default now() not null,
modify_date timestamp with time zone default now() not null
);
create index essays_create_idx on essays ("create_date");
create index essays_modify_idx on essays ("modify_date");
This works well for us as all the stuff related to a user's essay such as title, brief byline. requestor, full essay body, etc. are all stored in the essayparts column as a JSON. For auto-saving the essay, we don't insert new rows all the time though. We update each ID (each essay) with all its components.
So there are plenty of updates per essay, as this is a time consuming and thoughtful activity. Given the auto save every 5 seconds, if a user was to be writing for half an hour, we'd have updated her essay around 360 times.
This would be fine with the "HOT" (heap only tuples) functionality of PostgreSQL. We're using v10 so we are fine. However, the challenge is that we also update the modify_date column every time the essay is saved and this has an index too. Which means by the principle of HOT this is not benefiting from the HOT update and a lot of fragmentation occurs.
I suppose in the web or mobile world this is not an unusual pattern. Many services seem to auto-save content. Are they insert only? If so, if the user logs out and comes back in, how do they show the records, by looking at the max(modify_date)? Or is there any other mechanism to leverage HOT updates while also updating an indexed column in the table?
Appreciate any pointers, thank you!
Performing an update every 5 second with 100000 concurrent users will produce 20000 updates per second. This is quite challenging as such, and you would need a good system to pull it off, but autovacuum will never be able to keep up if those updates are not HOT.
You have several options:
Choose a relational database management system other than PostgreSQL that updates rows in place.
Do not index modify_date and hope that HOT will do the trick.
Perform these updates way less often than once every 5 seconds (who needs auto-save every 5 seconds anyway?).
Auto-save the data somewhere else than in the database.
I am trying to create an indexed view using the following code (so that I can publish it to replication it as a table):
CREATE VIEW lc.vw_dates
WITH SCHEMABINDING
AS
SELECT DATEADD(day, DATEDIFF(day, 0, GETDATE()), number) AS SettingDate
FROM lc.numbers
WHERE number<8
GO
CREATE UNIQUE CLUSTERED INDEX
idx_LCDates ON lc.vw_dates(SettingDate)
lc.numbers is simply a table with 1 column (number) which is incremented by row 1-100.
However, I keep getting the error:
Column 'SettingDate' in view 'lc.vw_dates' cannot be used in an index or statistics or as a partition key because it is non-deterministic.
I realize that GETDATE() is non-deterministic. But, is there a way to make this work?
I am using MS SQL 2012.
Edit: The hope was to be able to Convert GetDate() to make it deterministic (it seems like it should be when stripping off the time). If nobody knows of a method to do this, I will close this question and mark the suggestion to create a calendar table as correct.
The definition of a deterministic function (from MSDN) is:
Deterministic functions always return the same result any time they are called with a specific set of input values and given the same state of the database. Nondeterministic functions may return different results each time they are called with a specific set of input values even if the database state that they access remains the same.
Note that this definition does not involve any particular span of time over which the result must remain the same. It must be the same result always, for a given input.
Any function you can imagine that always returns the date at the point the function is called, will by definition, return a different result if you run it one day and then again the next day (regardless of the state of the database).
Therefore, it is impossible for a function that returns the current date to be deterministic.
The only possible interpretation of this question that could enable a deterministic function, is if you were happy to pass as input to the function some information about what day it is.
Something like:
select fn_myDeterministicGetDate('2015-11-25')
But I think that would defeat the point as far as you're concerned.
I am trying to creating a way to convert bulk date queries into incremental query. For example, if a query has where condition specified as
WHERE date > now()::date - interval '365 days' and date < now()::date
this will fetch a years data if executed today. Now if the same query is executed tomorrow, 365 days data will again be fetched. However, I already have last 364 days data from previous run. I just want a single day's data to be fetched and a single day's data to be deleted from the system, so that I end up with 365 days data with better performance. This data is to be stored in a separate temp table.
To achieve this, I create an incremental query, which will be executed in next run. However, deleting the single date data is proving tricky when that "date" column does not feature in the SELECT clause but feature in the WHERE condition as the temp table schema will not have the "date" column.
So I thought of executing the bulk query in chunks and assign an ID to that chunk. This way, I can delete a chunk and add a chunk and other data remains unaffected.
Is there a way to achieve the same in postgres or greenplum? Like some inbuilt functionality. I went through the whole documentation but could not find any.
Also, if not, is there any better solution to this problem.
I think this is best handled with something like an aggregates table (I assume the issue is you have heavy aggregates to handle over a lot of data). This doesn't necessarily cause normalization problems (and data warehouses often denormalize anyway). In this regard the aggregates you need can be stored per day so you are able to cut down to one record per day of the closed data, plus non-closed data. Keeping the aggregates to data which cannot change is what is required to avoid the normal insert/update anomilies that normalization prevents.