Copying timestamp format from avro to redshift - amazon-redshift

I am trying to copy an avro file to redshift using the COPY command. The file has a column that is of the type:
{'name': 'timestamp',
'type': ['null', {'logicalType': 'timestamp-millis', 'type': 'long'}]}],
Redshift variable type: "timestamp" timestamptz
When I run the following command copy if fails:
COPY table_name
from 'fil_path.avro'
iam_role 'the_role'
FORMAT AS avro 'auto'
raw field value: 1581306474335
Invalid timestamp format or value [YYYY-MM-DD HH24:MI:SSOF]
However If I add the following line It works:
timeformat 'epochmillisecs'
I tried to put my timestamp in microseconds which should be the base supported epoch resolution but it fails as well, and didn't find an appropriate name (epochmicrosecs didn't seem to do the job).
My question is why is it so?
Furthermore I have another field that is causing some problem. A date field which apparently is saved as a number of days in the avro file (7305) that gives the following error:
Redshift variable type: "birthdate" date
avro: 'date_of_birth', 'type': ['null', {'type': 'int', 'logicalType': 'date'}]}
Invalid Date Format - length must be 10 or more

Firstly, about the Time Format:
As Docs states:
COPY command attempts to implicitly convert the strings in the source data to the data type of the target column. If you need to specify a conversion that is different from the default behavior, or if the default conversion results in errors, you can manage data conversions by specifying the following parameters.
First Solution:
Redshift Doesn't Recognize epoch time by default to be able to convert it to the format of TimeStamp as a result it can't extract year, month, day..etc from the epoch time to put them in the TimeStamp Format, as stated by the Docs:
If your source data is represented as epoch time, that is the number of seconds or milliseconds since January 1, 1970, 00:00:00 UTC, specify 'epochsecs' or 'epochmillisecs'.
This is the supported Formats that Redshift can convert Using automatic recognition.
TimeStamp needs the format to be as YYYYMMDD HHMISS = 19960108 040809 to be able to extract it right, that's what the error state Invalid timestamp format or value [YYYY-MM-DD HH24:MI:SSOF], while epoch time format is just seconds or milliseconds since January 1, 1970 that it doesn't understand how to extract it's values from.
microseconds isn't supported as a parameter for TIMEFORMAT in Redshift.
Second Solution:
You won't need to pass TIMEFORMAT to the COPY command, but you will insert epoch time in your staging tables as VARCHAR or TEXT.
Then, when inserting epoch time from your staging tables into the schema tables convert it like this: TIMESTAMP 'epoch' + epoch_time/1000 * interval '1 second' AS time
Secondly, about date field:
DATE data type is specified as Calendar date (year, month, day) as stated by the Docs, As a result it can't be the number of days or be less than 10 characters in length (as 2021-03-04) and that's what the error tell us Invalid Date Format - length must be 10 or more.
The solution for Date field:
You need to do a work-around, by passing the number of days as a VARCHAR or text to your staging tables.
When loading the schema tables from the staging tables, apply Data cleaning by convert number of days to a DATE using TOCHAR: TO_DATE(TO_CHAR(number of days, '9999-99-99'),'YYYY-MM-DD')
As a result, number of days will be a valid DATE in your schema tables.

Related

Azure Data factory - data flow expression date and timestamp conversions

Using derived column i am adding 3 columns -> 2 columns for date and 1 for timestamp. for the date columns i am passing a string as parameter. for eg: 21-11-2021 and timstamp i am using currenttimestamp fucntion.
i wrote expressions in derived columns to convert them as date and timestamp datatype and also in a format that target table needs which is dd-MM-yyyy and dd-MM-yyyy HH:mm:ss repectively
For date->
expression used: toDate($initialdate, 'dd-MM-yyyy')
data preview output: 2021-01-21 --(not in the format i want)
After pipline Debug Run, value in target DB(Azure sql database) column:
2021-01-21T00:00:00 -- in table it shows like this I dont understand why
For Timstamp conversion:
Expression used:
toTimestamp(toString(currentTimestamp(), 'dd-MM-yyyy HH:mm:ss', 'Europe/Amsterdam'), 'dd-MM-yyyy HH:mm:ss')
Data preview output: 2021-11-17 19:37:04 -- not in the format i want
After pipline Debug Run, value in target DB(Azure sql database) column:
2021-11-17T19:37:04:932 -in table it shows like this I dont understand why
question 1: I am NOT getting values in the format the target requires ???and it should be only in DATE And Datetime2 dataype respectively so no string conversions
question 2: after debug run i dont know why after insert the table values look different from Data preview???
Kinldy let me know if i have written any wrong expressions??
--apologies i am not able post pictures---
toDate() converts input date string to date with default format as yyyy-[M]M-[d]d. Accepted formats are :[ yyyy, yyyy-[M]M, yyyy-[M]M-[d]d, yyyy-[M]M-[d]dT* ].
Same goes with toTimestamp(), the default pattern is yyyy-[M]M-[d]d hh:mm:ss[.f...] when it is used.
In Azure SQL Database as well the default date and datetime2 formats are in YYYY-MM-DD and YYYY-MM-DD HH:mm:ss as shown below.
But if your column datatypes are in string (varchar) format, then you can change the output format of date and DateTime in azure data flow mappings.
When loaded to Azure SQL database, it is shown as below:
Note: This format results when datatype is varchar
If the datatype is the date in the Azure SQL database, you can convert them to the required format using date conversions as
select id, col1, date1, convert(varchar(10),date1,105) as 'dd-MM-YYYY' from test1
Azure SQL Database always follows the UTC time zone. Using “AT TIME ZONE” convert it another non-UTC time zone.
select getdate() as a, getdate() AT TIME ZONE 'UTC' AT TIME ZONE 'Central Standard Time' as b
You can also refer to sys.time_zone_info view to check current UTC offset information.
select * from sys.time_zone_info

got the following error while declaring the timestamp sa data type in postgresql

My time stamp value format is like 20200203160857. I am declaring my variable time stamp as TIMESTAMP.
time_stamp TIMESTAMP NULL,
and I am copying a csv file in that the time stamp value is as shown above. While inserting i got the following error.
date/time field value out of range: "20200203160857"\n HINT: Perhaps you need a different "datestyle" setting.\n
use to_timestamp() function like below to manage your format of timestamp
to_timestamp('20200203160857', 'YYYYMMDDHH24MISS')

How do I convert a Julian date stored as a double-precision value to a timestamp with at least one-minute resolution?

I exported data from an SQLite table to a CSV file. The data includes a timestamp with at least one-minute resolution: "2019-11-15 01:30:06". The data is actually stored as a Julian date, in this case 2458802.35424295. I imported the data into a double-precision field. I need to convert that number into a timestamp with time zone. I tried casting the double-precision number to text and then using to_timestamp(), but that appears to work only with integer days. I can get a timestamp, but it is always at midnight of the correct date. I tried using to_timestamp() passing in my number, but that returns an epoch (number of milliseconds since 1/1/1970).
I could try to take the fractional part of my Julian date value, calculate the number of milliseconds since midnight that represents, use the to_timestamp(text,text) method to get the date I need, and then add the epoch since midnight to that date. But that's awfully cumbersome. Isn't there a better way?
I'm using PostgreSQL 9.3.
NOTE: The simple answer to my problem, which occured to me just before I clicked the Post button, is to export the data in the form I want, using SQLite's datetime() function to convert the number to a date string during export. But I remain curious. I would have thought there would be a standard way to do this conversion.

Convert Epoch to Date with select value

I'm trying to convert a epoch timecode to a date in Pentaho Spoon. I use an input text file to extract fields from. I want to export the fields in a database but there is this timestamp field that contains epoch timestamps like this "1480017396", the datatype is set as an integer and the field is named timestamp. I want to convert with it with Select value.
So I go to the next step and use the select value option to select the field and change the datatype to Date with a format of dd/MM/yyyy the result gives me all kinds of dates in 18-01-1970 range. I tried everything (Different formats etc.) but I just can't seem to solve it.
Any guesses? Image of output
The time in epoch is in miliseconds, not seconds, so, take your number, multiply by 1000, and turn to date.
See that if you divide, the date goes back a few ... and multiply it you get the correct date because of the timestamp.

Format of date / time values in database tables

I am reading a csv file with date fields of formatted mm/dd/yyyy. I expected the same kind of format from a Postgres table after the import, but I see yyyy-mm-dd hh:mm:ss.
The date fields in my table are defined as timestamp without time zone data type.
How do I maintain the same format of data? I am using PostgreSQL 9.3.
Postgresql only stores the value, it doesn't store formatting (which would waste space).
You can use the to_char function in your query if you like to get the output formatted in a special way. Details are in the manual.