Spark Scala - convert Timestamp with milliseconds to Timestamp without milliseconds - scala

I have a column in Timestamp format that includes milliseconds.
I would like to reformat my timestamp column so that it does not include milliseconds. For example if my Timestamp column has values like 2019-11-20T12:23:13.324+0000, I would like my reformatted Timestamp column to have values of 2019-11-20T12:23:13
Is there a straight forward way to perform this operation in spark-scala? I have found lots of posts on converting string to timestamp but not for changing the format of a timestamp.

You can try trunc.
See more examples: https://sparkbyexamples.com/spark/spark-date-functions-truncate-date-time/

Related

How to reset time part of timestamp to 00:00:00 in Hive?

I am new to hive.
I have column in one of my seed tables, say seed_timestamp.
example:
seed.timestamp = '28/04/2020 12:30:54', from this time stamp I want to create a new timestamp such that new_timestamp = '28/04/2020 00:00:00'.
I want to use these timestamps in my where clause of the query such that I check the data from midnight to the seed_timestamp.
In Hive, you can use to_date() to truncate the time part of a timestamp, so just:
to_date(seed_timestamp)
From the documentation:
to_date(string timestamp): returns the date part of a timestamp string
Concatenate date with ' 00:00:00.0':
concat(to_date(seed_timestamp),' 00:00:00.0')
It will produce string compatible with timestamp and you can compare. You can also cast it as timestamp:
cast(concat(to_date(seed_timestamp),' 00:00:00.0') as timestamp)
but it should work without it.

Copying timestamp format from avro to redshift

I am trying to copy an avro file to redshift using the COPY command. The file has a column that is of the type:
{'name': 'timestamp',
'type': ['null', {'logicalType': 'timestamp-millis', 'type': 'long'}]}],
Redshift variable type: "timestamp" timestamptz
When I run the following command copy if fails:
COPY table_name
from 'fil_path.avro'
iam_role 'the_role'
FORMAT AS avro 'auto'
raw field value: 1581306474335
Invalid timestamp format or value [YYYY-MM-DD HH24:MI:SSOF]
However If I add the following line It works:
timeformat 'epochmillisecs'
I tried to put my timestamp in microseconds which should be the base supported epoch resolution but it fails as well, and didn't find an appropriate name (epochmicrosecs didn't seem to do the job).
My question is why is it so?
Furthermore I have another field that is causing some problem. A date field which apparently is saved as a number of days in the avro file (7305) that gives the following error:
Redshift variable type: "birthdate" date
avro: 'date_of_birth', 'type': ['null', {'type': 'int', 'logicalType': 'date'}]}
Invalid Date Format - length must be 10 or more
Firstly, about the Time Format:
As Docs states:
COPY command attempts to implicitly convert the strings in the source data to the data type of the target column. If you need to specify a conversion that is different from the default behavior, or if the default conversion results in errors, you can manage data conversions by specifying the following parameters.
First Solution:
Redshift Doesn't Recognize epoch time by default to be able to convert it to the format of TimeStamp as a result it can't extract year, month, day..etc from the epoch time to put them in the TimeStamp Format, as stated by the Docs:
If your source data is represented as epoch time, that is the number of seconds or milliseconds since January 1, 1970, 00:00:00 UTC, specify 'epochsecs' or 'epochmillisecs'.
This is the supported Formats that Redshift can convert Using automatic recognition.
TimeStamp needs the format to be as YYYYMMDD HHMISS = 19960108 040809 to be able to extract it right, that's what the error state Invalid timestamp format or value [YYYY-MM-DD HH24:MI:SSOF], while epoch time format is just seconds or milliseconds since January 1, 1970 that it doesn't understand how to extract it's values from.
microseconds isn't supported as a parameter for TIMEFORMAT in Redshift.
Second Solution:
You won't need to pass TIMEFORMAT to the COPY command, but you will insert epoch time in your staging tables as VARCHAR or TEXT.
Then, when inserting epoch time from your staging tables into the schema tables convert it like this: TIMESTAMP 'epoch' + epoch_time/1000 * interval '1 second' AS time
Secondly, about date field:
DATE data type is specified as Calendar date (year, month, day) as stated by the Docs, As a result it can't be the number of days or be less than 10 characters in length (as 2021-03-04) and that's what the error tell us Invalid Date Format - length must be 10 or more.
The solution for Date field:
You need to do a work-around, by passing the number of days as a VARCHAR or text to your staging tables.
When loading the schema tables from the staging tables, apply Data cleaning by convert number of days to a DATE using TOCHAR: TO_DATE(TO_CHAR(number of days, '9999-99-99'),'YYYY-MM-DD')
As a result, number of days will be a valid DATE in your schema tables.

Convert Epoch to Date with select value

I'm trying to convert a epoch timecode to a date in Pentaho Spoon. I use an input text file to extract fields from. I want to export the fields in a database but there is this timestamp field that contains epoch timestamps like this "1480017396", the datatype is set as an integer and the field is named timestamp. I want to convert with it with Select value.
So I go to the next step and use the select value option to select the field and change the datatype to Date with a format of dd/MM/yyyy the result gives me all kinds of dates in 18-01-1970 range. I tried everything (Different formats etc.) but I just can't seem to solve it.
Any guesses? Image of output
The time in epoch is in miliseconds, not seconds, so, take your number, multiply by 1000, and turn to date.
See that if you divide, the date goes back a few ... and multiply it you get the correct date because of the timestamp.

pyspark converting unix time to date

I am using the following code to convert a column of unix time values into dates in pyspark:
transactions3=transactions2.withColumn('date', transactions2['time'].cast('date'))
The column transactions2['time'] contains the unix time values. However the column date which I create here has no values in it (date = None for all rows). Any idea why this would be?
Use from_unixtime. expr("from_unixtime(timeval)")

How to get Hours and Minute using Extract function as single result in Postgresql

I have a timestamp with timezone column in one of my tables. I need to extract both hours and minutes from the timestamp with timezone column using extract function but i am unable too.
I tried like this,
extract(hour_minute from immi_referral_user_tb.completed_time) >= '06:30'
but I am getting a syntax error.
I am using extract function in where clause
immi_referral_user_tb.completed_time = timestamp with timezone column
Is there any other way too accomplish this?
You can cast the column to a time data type and compare that to a time value:
immi_referral_user_tb.completed_time::time >= time '06:30'