String to Date conversion in wrangler - google-cloud-data-fusion

I am having my raw data in the format of '2019-10-10' in csv file.
After reading file I have loaded into Wrangler for transformation .
My target column is having data type as DATE.
I have applied below transformation:
set-column TODATE TODATE=UNIX_DATE('2019-10-10')
Here UNIX_DATE('2019-10-10') will convert date into Unix time stamp and post that Wrangler will take care while dumping into to Target table.
Here it is giving error like
Pipeline faile: jexl transformation wrong.
Expected result should be in 2019-10-10 format in target table.
Please help further.

Related

AWS Glue: How to filter out data from DynamicFrame when date format is wrong or bad data

In Aws Glue after extracting data in DynamicFrame I'm converting date time format to UTC, But if in case date format is wrong for eg Invalid value for date, It will break entire glue flow.
So I want to Filter out these bad data from DynamicFrame before processing it further.
I'm using Filter.apply for filtering data and my date is present in "Date": "2022-01-01T12:11:27.251Z" this format.
You can parse the Date field to check if it has the expected format. Example:
from datetime import datetime
date_str = "2022-01-01T12:11:27.251Z"
try:
datetime_obj = datetime.strptime(date_str, "%Y-%m-%dT%H:%M:%S.%fZ")
# date_str has the correct format, continue processing row
except ValueError:
# date_str does not have the correct format, do something...
You can include this logic in the implementation of Filter.apply(). For example, if the Date field has an invalid format, the row can be filtered out.

Incorrect date format generated in sink in Azure data factory

I'm using the copy data utility in Azure Data Factory to copy data from a REST source to a CSV file. When I preview that source data in ADF the date format is the correct ISO format however when it is written to a csv file or a database table the format changes to something that looks a bit like a unix timestamp e.g. '/Date(340502400000)/'.
Source:
Preview data from the source in ADF
Destination:
Actual data written to the csv file
I've been trying to figure out how to change this to write the date to the file in the ISO format but I'm getting nowhere. Any assistance will be greatly appreciated
I tested the same string date with your source and all works well in my side.
You could manually set the sink data type, like String or Datatime
Csv output(Datetime):
After doing these, if the issue still exist, the best way is that ask Azure support for the deep helps. Only them can tell you what happened in your pipeline.
HTH.
Thanks for the answer but still no joy, all I get is a type conversion error.
Error
Operation on target Copy_Lake_CSV failed: Failure happened on 'Sink' side. ErrorCode=UserErrorInvalidDataValue,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Column 'lastModifiedDateTime' contains an invalid value '/Date(1615206773000+0000)/'. Cannot convert '/Date(1615206773000+0000)/' to type 'DateTime'.,Source=Microsoft.DataTransfer.Common,''Type=System.FormatException,Message=String was not recognized as a valid DateTime.,Source=mscorlib,'
UPDATE:
I logged a call with Microsoft who initially didn't know how to resolve this issue either. Eventually it was escalated all the way up to the ADF product team (developers) who came back with a formula to use in a Data Flow.
What you have to do to convert the date format is to first use a Data Copy Task to dump the data into a "raw" dataset, then using a Data Flow Task you can transform the date values using a Derived Column. In the Derived column use the below formula to convert the JSON date format to an ISO date format, then write it to the destination dataset.
toTimestamp("1970-01-01 00:00:00") + seconds(toInteger((ltrim(rtrim([SourceColumn], "+0000)/"),"/Date("))))

Convert to date in cloud datafusion

How do we convert a string to date in cloud datafusion?
I have a column with the value say 20191120 (format of yyyyMMdd) i want to load this into a table in bigquery as date. The table column datatype is also date.
What i have tried so far is that i converted the string to timestamp using "parse-as-simple-date" and i try to convert it to format using format-date to "yyyy-MM-dd", but this step converts it to string and the final load fails. I have even tried to explicitly mention the column as date in the o/p schema as date. But it fails at runtime.
I tried keeping it as timestamp in the pipeline and try loading the date into Bigquery date type.
I noticed in the error that came op was field dt_1 incompatible with avro integer. Is datafusion internally converting the extract into avro before loading. AVRO does not have a date datatype which is causing the isssue?
Adding answer for posterity:
You can try doing these,
Go to LocalDateTime column in wrangler
Open dropdown and click on "Custom Transform"
Type timestamp.toLocalDate() (timestamp being the column name)
After the last step it should convert it into LocalDate type which you can write to bigquery. Hope this helps
For this specific date format, the Wrangler Transform directive would be:
parse-as-simple-date date_field_dt yyyyMMdd
set-column date_field_dt date_field_dt.toLocalDate()
The second line is required if the destination is of type Date.
Skip empty values:
set-column date_field_dt empty(date_field_dt) ? date_field_dt : date_field_dt.toLocalDate()
References:
https://github.com/data-integrations/wrangler/blob/develop/wrangler-docs/directives/parse-as-simple-date.md
https://github.com/data-integrations/wrangler/blob/develop/wrangler-docs/directives/parse-as-date.md
You could try to parse your input data with Data Fusion using Wrangler.
In order to test it out I have replicated a workflow where a Data Fusion pipeline is fed with data coming from BigQuery. This data is then parsed to the proper type and then it is exported back again to BigQuery. Note that the public dataset is “austin_311” and I have used ‘’311_request’ table as some of their columns are TIMESTAMP type.
The steps I have done are the following:
I have queried a public dataset that contained TIMESTAMP data using:
select * from `bigquery-public-data.austin_311.311_request`
limit 1000;
I have uploaded it to Google Cloud Storage.
I have created a new Data Fusion batch pipeline following this.
I have used the Wrangler to Parse CSV data to custom 'Simple Data' yyyy-MM-dd HH:mm:ss
I have exported Pipeline results to BigQuery.
This qwiklab has helped me through the steps.
Result:
Following the above procedure I have been able to export Data Fusion data to BigQuery and the DATE fields are exported as TIMESTAMP, as expected.

parse date time from string in spotfire

one column in my csv file is a date that is read as a string and it follows this pattern : 2018-09-19 10:27:28.409Z. I am struggling to convert the column from string to date.
The conversion options in spotfire didn't allow me to change the column type. however, I found the solution, at the moment of importing the data set (file) you need to specify the type (date time) and magically spotfire manages the conversion.

Convert String to ISO date format in Talend

I have Excel data and trying to insert the data into MongoDB using Talend Big Data for Open Studio. This is my job,
tFileInputExcel --> tMap --> tMongoDBOutput
In excel sheet, i have a date value column in this format 7/13/2017(MM/dd/yyyy) as string type and I am trying to insert this column value as ISO format ISODate("2017-07-13T00:00:00.000Z") in MongoDB.
This is my Job:
tFileInputExcel:
tMap:
tMongoDBOutput:
When execute this job, I'm getting the below error.
Error:
When i change the parse format like this TalendDate.parseDate("MM/dd/yyyy",row1.ClosingDate) , I'm getting SimpleDateFormat error. Simple Date Format Error
How to resolve this issue?
you can do simply if you mongodb column schema is date:
TalendDate.parseDate("MM/dd/yyyy",row3.newColumn)
That will automatically convert the date in the date model that your mongoDB column have.
You can change in your schema in Talend the date Model like "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'".
This is a very common mistake doing in reading data without understanding the underlying data types.
I have blogged about this especially for Talend: https://www.tobiasmaasland.de/2017/07/20/using-date-in-talend-etl-jobs/
But let me explain a bit.
Sometimes Excel tries to convert data in the cell even if one might think the cell type is set to String. Insted, it is set to Date. As such, no conversion is needed and the type needs to be Date in the input component.
If it is a String and an error occurs, the the structure of the String is either not the same everywhere or some cells are empty (null). So you might be lucky with
TalendDate.parseDate("MM/dd/yyyy", (row1.ClosingDate == null), "01/01/1970", row1.ClosingDate)
I just assumed you might want to use a placeholder date insted of having null.
This heavily depends on the actual data type in the cells, if every cell has the same data type and if all the data is formatted correctly.
To sum up one of the facts in my blog post: Don't use String for dates. Use Date for dates in Excel. It makes everything easier.