I am dealing with large CSV files which has a Date column in the following format:
Date
06/14/2022 20:47:02.681660678
06/14/2022 20:47:07.683097678
06/14/2022 20:47:12.725898678
06/14/2022 20:47:17.728014678
I would like to transform this to this format:
Date
2022-06-10 03:42:56
2022-06-10 03:42:59
2022-06-10 03:43:02
2022-06-10 03:43:05
Any ideas how can I do this on EME Editor directly, without going back and forth to Excel?
Much appreciate it. :-)
Related
I have a column which contians date as string but in many formats like - dd/MM/yy, dd/MMM/yyy .. etc etc. And I am using the following code to convert all strings to one specific date format (yyyy-MM-dd) in hive :
select
from_unixtime(unix_timestamp('31/02/2021','dd/MM/yyyy'),'yyyy-MM-dd')
but this gives me 2021-03-03 in HIVE.
Is there any other way to identify such invalid dates and give null.
Assume, you recognized format correctly and it is exactly 'dd/MM/yyyy' and date is invalid one '31/02/2021'.
unix_timestamp function in such case will move date to the next month and there is no way to change it's behavior. But you can check if the date double-converted from original string to timestamp and back to original format is the same. In case it is not the same, then the date is invalid one.
case
-- check double-converted date is the same as original string
when from_unixtime(unix_timestamp(date_col,'dd/MM/yyyy'),'dd/MM/yyyy') = date_col
--convert to yyyy-MM-dd if the date is valid
then from_unixtime(unix_timestamp('31/02/2021','dd/MM/yyyy'),'yyyy-MM-dd')
else null -- null if invalid date
end as date_converted
I am having my raw data in the format of '2019-10-10' in csv file.
After reading file I have loaded into Wrangler for transformation .
My target column is having data type as DATE.
I have applied below transformation:
set-column TODATE TODATE=UNIX_DATE('2019-10-10')
Here UNIX_DATE('2019-10-10') will convert date into Unix time stamp and post that Wrangler will take care while dumping into to Target table.
Here it is giving error like
Pipeline faile: jexl transformation wrong.
Expected result should be in 2019-10-10 format in target table.
Please help further.
I have a large data and in that one field be like Wed Sep 15 19:17:44 +0100 2010 and I need to insert that field in Hive.
I am getting troubled for choosing data type. I tried both timestamp and date but getting null values when loading from CSV file.
The data type is a String as it is text. If you want to convert it, I would suggest a TIMESTAMP. However you will need to do this conversion yourself while loading the data or (even better) afterwards.
To convert to a timestamp, you can use the following syntax:
CAST(FROM_UNIXTIME(UNIX_TIMESTAMP(<date_column>,'FORMAT')) as TIMESTAMP)
Your format seems complex though. My suggestion is to load it as a string and then just do a simple query on the first record until you get it working.
SELECT your_column as string_representation,
CAST(FROM_UNIXTIME(UNIX_TIMESTAMP(<date_column>,'FORMAT')) as TIMESTAMP) as timestamp_representation
FROM your_table
LIMIT 1
You can find more information on the format here: http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html
My advice would be to concat some substrings first and try to convert only the day, month, year part before you look at time and timezone et cetera.
So I have one big file (13 million rows) and date formatted as:
2009-04-08T01:57:47Z. Now I would like to split it into 2 columns now,
one with just date as dd-MM-yyyy and other with time only hh:MM.
How do I do it?
You can simply use tMap and parseDate/formatDate to do what you want. It is neither necessary nor recommended to implement your own date parsing logic with regexes.
First of all, parse the timestamp using the format yyyy-MM-dd'T'HH:mm:ss'Z'. Then you can use the parsed Date to output the formatted date and time information you want:
dd-MM-yyyy for the date
HH:mm for the time (Note: you mixed up the case in your question, MM stands for the month)
If you put that logic into a tMap:
you will get the following:
Input:
timestamp 2009-04-08T01:57:47Z
Output:
date 08-04-2009
time 01:57
NOTE
Note that when you parse the timestamp with the mentioned format string (yyyy-MM-dd'T'HH:mm:ss'Z'), the time zone information is not parsed (having 'Z' as a literal). Since many applications do not properly set the time zone information anyway but always use 'Z' instead, so this can be safely ignored in most cases.
If you need proper time zone handling and by any chance are able to use Java 7, you may use yyyy-MM-dd'T'HH:mm:ssXXX instead to parse your timestamp.
I'm guessing Talend is falling over on the T and Z part of your date time stamp but this is easily resolved.
As your date time stamp is in a regular pattern we can easily extract the date and time from it with a tExtractRegexFields component.
You'll want to use "^([0-9]{4}-[0-9]{2}-[0-9]{2})T([0-9]{2}:[0-9]{2}):[0-9]{2}Z" as your regex which will capture the date in yyyy-MM-dd format and the time as mm:HH (you'll want to replace the date time field with a date field and a time field in the schema).
Then to format your date to your required format you'll want to use a tMap and use TalendDate.formatDate("dd-MM-yyyy",TalendDate.parseDate("yyyy-MM-dd",row7.date)) to return a string in the dd-MM-yyyy format.
I export data in csv format from sql server database. It contain 5 column. one column have date and time value. When i checked the date -time value i found date time value is in wrong format. I add the filter but filter not applied on some data. I try to format the data in same format but formatting did not applied on the data. I tried everything to fix the issue but it is not getting fix.
I have attached the sample data please check it from your end.
7/12/2013 14:50
8/12/2013 20:14
9/12/2013 11:38
10/12/2013 15:31
13/12/2013 12:45:50
13/12/2013 14:35:42
13/12/2013 14:37:40
14/12/2013 17:00:10
18/12/2013 14:57:35
Data started from 13/12/2013 12:45:50 are not getting change in date time format.
The trouble is that your dates are in french format dd/mm/yyyy you can force them to datetime with the following line :
[datetime]::ParseExact("7/12/2013 14:50", "d/MM/yyyy HH:mm", $null)
[datetime]::ParseExact("13/12/2013 12:45:50", "d/MM/yyyy HH:mm:ss", $null)
Be carefull in you case sometime you've got seconds and a double space between day and time.