Azure Data Factory fails with UPSERT for every table with a TIMESTAMP column - azure-data-factory

my azure data factory throws the error "Cannot update a timestamp column" for every table with a TIMESTAMP column.
ErrorCode=SqlOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A database operation failed. Please search error to get more details.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Data.SqlClient.SqlException,Message=Cannot update a timestamp column.,Source=.Net SqlClient Data Provider,SqlErrorNumber=272,Class=16,ErrorCode=-2146232060,State=1,Errors=[{Class=16,Number=272,State=1,Message=Cannot update a timestamp column.,},],'
I do not want to update the column itself. But even when I delete it from column mapping, it crashes. Here it is not yet deleted:
I get that TIMESTAMP is not a simple datetime and is updated automatically whenever a another column in that row is updated.
The timestamp data type is just an incrementing number and does not preserve a date or a time.
But how do I solve this problem?

I tried to reproduce the issue, and on my ADF, if I remove the timestamp column from mapping the pipeline run with no errors.
But since this doesn't work for you, here are 2 workaround options:
Option 1 - on the source, use a query and remove the timestamp column from the query.
Option 2 - I tried to reproduce your error, and found out that it only happens on upsert. If I use insert, it runs with no error (though it ignore the insert on the timestamp column and increment the timestamp). So you can try to insert to a staging table and then update in sql only the columns you want.

Related

Range check error when creating GIST index on tsrange value

Fixing range bound data and creating gist tsrange index causes exception. I can guess the PostgreSQL sees old version of records and takes them into account when creating the gist index.
You can reproduce it using this script:
BEGIN;
CREATE TABLE _test_gist_range_index("from" timestamp(0) WITHOUT time zone, "till" timestamp(0) WITHOUT time zone);
--let's enter some invalid data
INSERT INTO _test_gist_range_index VALUES ('2021-01-02', '2021-01-01');
--let's fix the data
DELETE FROM _test_gist_range_index;
CREATE INDEX idx_range_idx_test2 ON _test_gist_range_index USING gist (tsrange("from", "till", '[]'));
COMMIT;
The result is:
SQL Error [22000]: ERROR: range lower bound must be less than or equal to range upper bound
db<>fiddle
I've tested this on all versions of PostgreSQL starting from v9.5 and ending with v13 using db<>fiddle. The result is the same on all of them.
The same error is received if we fix the data using "UPDATE".
Is there a way to fix the data and have range index on it? Maybe there is a way to clean the table somehow from those old values?..
EDIT
It seems that the exception is raised only if data correcting statements (DELETE in my example) and CREATE INDEX statement are in the same transaction. If I DELETE and COMMIT first, and then creating the index succeeds.
That is working as expected and not a bug.
When you delete a row in a PostgreSQL table, it is not actually deleted, but marked as invisible. Similarly, updating a row creates a new version of the row, but the old version is retained and marked invisible. This is the way how PostgreSQL implements multiversioning: concurrent transactions can still see the "invisible" data. Eventually, invisible rows are reclaimed by VACUUM.
Now a B-tree or GiST index contains one entry for each row version in the table, unless the row version is not visible by anybody (is dead). This explains why a deleted row will still cause an error if the data don't form a valid range.
If you run the statements in autocommit mode on an otherwise idle database, the deleted rows are dead, and no index entry has to be created.

Is it possible to update a column(automatically) with "current_timestamp" in PostgreSQL using "Generated Columns"?

Is it possible to update a column (automatically) with "current_timestamp" in PostgreSQL using "Generated Columns", whenever the row gets update?
At present, I am using trigger to update the audit field last_update_date. But I am planning to switch to generated column
ALTER TABLE test ADD COLUMN last_update_date timestamp without time zone
GENERATED ALWAYS AS (current_timestamp) STORED;
Getting error while altering column
ERROR: generation expression is not immutable
No, that won't work, for the reason specified in the error.
Functions used in generated columns must always return the same value for the same arguments, that is, depend on nothing but the current database row. current_timestamp obviously is not of that kind.
If PostgreSQL did allow such functions to be used in generated columns, then the value of the column would change if the database is restored from a pg_dump, for example.
Use a BEFORE INSERT OR UPDATE trigger for this purpose.

Created date, last modified date fields in postgress

In PostgreSQL, is there a way to add columns that will automatically record the creation date and latest updated date of a row?
for table creation date look to event triggers
for insertion look into DEFAULT value for timestamptz column (works only if you don't explicitly define value)
for last modification, use trigger FOR EACH ROW before DELETE/UPDATE
The idea - Robust way of adding created and modified fields for data we add to database through db triggers
Update modified_by and modeified_on or modified_at for every db transaction.
Pick created_on and created_by or created_at from modified details whenever you insert a row into tables.
For trigger function, check this repo https://github.com/charan4ks/created_fields.git

Importing csv into Postgres database with improper date value

I have a query which has a date field with values that look like this in the query results window:
2013-10-01 00:00:00
However, when I save the results to csv, it gets saved like this:
2013-10-01T00:00:00
This is causing a problem when I'm trying to COPY the csv into a table in Redshift, where it gives me an error stating that the value is not a valid timestamp (the field I'm importing to is a timestamp field).
How can I get it so that it either strips out the time component completely, leaving just the date, or at least that the "T" is removed from the results?
I'm exporting results to csv using Aginity SQL Workbench for Redshift.
According to this knowledgebase article:
After import, add new TIMESTAMP columns and use the CAST() function to
populate them:
ALTER TABLE events ADD COLUMN received_at TIMESTAMP DEFAULT NULL;
UPDATE events SET received_at = CAST(received_at_raw as timestamp);
ALTER TABLE events ADD COLUMN generated_at TIMESTAMP DEFAULT NULL;
UPDATE events SET generated_at = CAST(generated_at_raw as timestamp);
Finally, if you forsee no more imports to this table, the raw VARCHAR
timestamp columns may be removed. If you forsee importing more events
from S3, do not remove these columns. To remove the columns, run:
ALTER TABLE events DROP COLUMN received_at_raw; ALTER TABLE events
DROP COLUMN generated_at_raw;
Hope that helps...

Postgres pg_dump now stored procedure fails because of boolean

I have a stored procedure that has started to fail for no reason. Well there must be one but I can't find it!
This is the process I have followed a number of times before with no problem.
The source server works fine!
I am doing a pg_dump of the database on source server and imported it onto another server - This is fine I can see all the data and do updates.
Then I run a stored procedure on the imported database that does the following on the database which has 2 identical schema's -
For each table in schema1
Truncate table in schema2
INSERT INTO schema2."table" SELECT * FROM schema1."table" WHERE "Status" in ('A','N');
Next
However this gives me an error now when it did not before -
The error is
*** Error ***
ERROR: column "HBA" is of type boolean but expression is of type integer
SQL state: 42804
Hint: You will need to rewrite or cast the expression.
Why am I getting this - The only difference between the last time I followed this procedure and this time is that the table in question now has an extra column added to it so the "HBA" boolean column is not the last field. But then why would it work in original database!
I have tried removing all data, dropping and rebuilding table these all fail.
However if I drop column and adding it back in if works - Is there something about Boolean fields that mean they need to be the last field!
Any help greatly apprieciated.
Using Postgres 9.1
The problem here - tables in different schemas were having different column order.
If you do not explicitly specify column list and order in INSERT INTO table(...) or use SELECT * - you are relying on the column order of the table (and now you see why it is a bad thing).
You were trying to do something like
INSERT INTO schema2.table1(id, bool_column, int_column) -- based on the order of columns in schema2.table1
select id, int_column, bool_column -- based on the order of columns in schema1.table1
from schema1.table1;
And such query caused cast error because column type missmatch.