Can we extract date from the name of source file and add it to a db column using talend - talend

So if I have a source file containing 10 columns and my target contains 11 columns and the extra column is of type date . The source file contains a name like 'cust20201212' now I wish to extract only the date part and add this to my target table column in the date type column. Is it possible to achieve this using talend. I just want to extract the date as 2020-12-12 and 2020-12-01 and store it in the date column of oracle table.
Can we use tregexextract in this scenario?

You need firstly to get your Filename in the flow, or in a variable . Do you have it in a context variable, or does it come from a tFileList ?
If you have a tFileList in your job , you can access it with global Variable
: ((String)globalMap.get("tFileList_1_CURRENT_FILE"))
When you have this filename, you have to parse your filename to extract the data :
TalendDate.parseDate("yyyyMMdd",StringHandling.LEFT(StringHandling.RIGHT(*PLACE_HERE_FILENAME*,12),8))
StringHandling.RIGHT get the last part of your filename (8 chars + extension (4 chars) = 12)
StringHandling.LEFT gets you the first 8 chars of this expression (20201201)
TalendDate.parseDate convert the string representing your date to an actual date.
Then you can pass this new data to your oracle db.

Related

Data Factory - Can I use the date field in a CSV to determine the destination folder in Copy Activity

I have some CSV files that I want to copy to a specific folder in ADLS based on the date column within the file.
i.e. CSV file has a column named "date" that reads "2022-02-23" on all rows. I want to copy that file to a folder that has the corresponding year and month, such as "/curated/UK/ProjectABC/2022/02"
I've got a Lookup activity that's pointing to the source CSV file and populating a Set Variable activity with the month using this dynamic content - #substring(string(activity('Lookup1').output.firstrow.date),5,2)
Would this be the right approach, to use a variable?
I cant use variables in the Directory portion of the Sink Dataset, as far as I know.
Have you come across this situation before?
Sounds like you're on the right path. You can use absolutely use Dataset parameters:
Then populate them in your pipeline using a variable (or parameter, or expression):

Convert StringTimestamp datastage to Timestamp db2

i am working on ETL job in datastage , a simple one Source ---> tRANsformer -----> destination
the source is a csv file , the destination is db2 base , so the prob is that the csv file contains a string timestamp like this
and i need to put it the db2 stage this is my table that i created with a script
this is the transformer config
and this is my prob this error appears
that means this in english
update_or_insert, 3: Unhandled conversion error in the "SEC_DAT_DATE_INSERT" zone from the source
type "timestamp" to the target type "timestamp [microseconds]":
source value = "*****************". The result does not accept a NULL value and there is no
handle_null to specify a default value
I don't know what it means that's the prob if anyone could help that would be nice thanks
First off, verify how Excel has handled the timestamp. Change the display format so that it conforms to ISO 8601, namely YYYY-MM-DD HH:NN:SS format, before you export it to CSV. Check the CSV file using a text editor (Notepad or Wordpad) to confirm that the timestamp format is correct.
Then change the StringToTimestamp() function so that it reflects the new format (or leave out the format entirely if this is your default format.
Note that the Else part of your expression uses a string. Perhaps you need to wrap that in a StringToTimestamp() function.
i will suggest you to check weather you have marked that column as key in the source (can happen by mistake) if so then deselect the key check box and see weather nullable is set to YES for that column in source ,if not then try and run on nullable column with YES selected. hope this helps.

pandas automatically converting date

I am extracting data from an excel file and storing it into a postgresql table.The excel file has a column which has date values ranging greater than 23:59:59.When I am extracting the data in pandas data frame ,these values are automatically getting converetd to a different format.
For eg:
If the excel sheet has value of time as '31:15:45' Pandas convert it to '"1900-01-01T07:15:45.437Z"'
the postgres table stores '07:15:45'.
I have tried keeping the value string using dtype as well as tried using converters.But,they were of no help.The time is always getting converted.
I want the time value to be taken as it is i.e. '31:15:45'.
Thanks in advance.
sorry, think this is more what you wanted so I edited my original answer.
If your column is of 'dt' type you could do the following:
df[new_col] = df[name_of_date_col].dt.hour + ' ' +df[name_of_date_col].dt.minute + ' ' + df[name_of_date_col].dt.second
otherwise just change it to 'dt' type with
df[name_of_date_col] = pd.to_datetime(df[name_of_date_col])
Then do the stuff above
Leave it as string. when you are reading excel file into pandas you can specify data type for a column. It is called converters
Check doc
pandas.read_excel(my_file, converters = {my_str_column: str})

Data Type Cast Won't Stick in SSIS

I'm trying to automate a process with SSIS that exports data into a flat file (.csv) that is then saved to a directory, where it will be scanned and imported by some accounting software. The software (unfortunately) only recognizes dates that are in MM/DD/YYYY fashion. I have tried every which way to cast or convert the data pulled from SQL to be in the MM/DD/YYYY, but somehow the data is always recognized as either a DT_Date or DT_dbDate data type in the flat file connection, and saved down as YYYY-MM-DD.
I've tried various combinations of data conversion, derived columns, and changing the properties of the flat file columns to string in hopes that I can at least use substring operations to get this formatted correctly, but it never fails to save down as YYYY-MM-DD. It is truly baffling. The preview in the OLE DB source will show the dates as "MM/DD/YYYY" but somehow it always changes to "YYYY-MM-DD" when it hits the flat file.
I've tried to look up solutions (for example, here: Stubborn column data type in SSIS flat flat file connection manager won't change. :() but with no luck. Amazingly if I merely open the file in Excel and save it, it will then show dates in a text editor as "MM/DD/YYYY", only adding more mystery to this Bermuda Triangle-esque caper.
If there are any tips, I would be very appreciative.
This is a date formatting issue.
In SQL and in SSIS, dates have one literal string format and that is YYYY-MM-DD. Ignore the way they appear to you in the data previewer and/or Excel. Dates are displayed to you based upon your Windows regional prefrences.
Above - unlike the US - folks in the UK will see all dates as DD/MM/YYYY. The way we are shown dates is NOT the way they are stored on disk. When you open in Excel it does this conversion as a favor. It's not until you SAVE that the dates are stored - as text - according to your regional preferences.
In order to get dates to always display the same way. We need to save them not as dates, but as strings of text. TO do this, we have to get the data out of a date column DT_DATE or DT_DBDATE and into a string column: DT_STR or DT_WSTR. Then, map this new string column into your csv file. Two ways to do this "date-to-string" conversion...
First, have SQL do it. Update your OLE DB Source query and add one more column...
SELECT
*,
CONVERT(VARCHAR(10), MyDateColumn, 101) AS MyFormattedDateColumn
FROM MyTable
The other way is let SSIS do it. Add a Derived Column component with the expression
SUBSTRING([MyDateColumn],6,2) + "/" + SUBSTRING([MyDateColumn],8,2) + "/" + SUBSTRING([MyDateColumn],1,4)
Map the string columns into your csv file, NOT the date columns. Hope this helps.
It's been a while but I just came across this today because I had the same issue and hope to be able to spare someone the trouble of figuring it out. What worked for me was adding a new field in the Derived Column transform rather than trying to change the existing field.
Edit
I can't comment on Troy Witthoeft's answer, but wanted to note that if you have a Date type input, you wouldn't be able to do SUBSTRING. Instead, you could use something like this:
(DT_WSTR,255)(MONTH([Visit Date])) + "/" + (DT_WSTR,255)(DAY([Visit Date])) + "/" + (DT_WSTR,255)(YEAR([Visit Date]))

How to load file with time (Date) type into Hive table?

I'm using hive to create and try to load file content into the table.
There's a column type "Date" and the date format in the file is dd/mm/yy, for example: 01/12/2013
But when I trie to load the data into table from the file, the column values corresponding to the "Date" is always NULL, as if failed to load the date content.
I put the column content into a txt file and upload to the hdfs, so, the column may be:
id, name, birthdate
and corresponding value are:
1, "Joan", 04/05/1989
But the "04/05/1989" seems can't be read into the table, always null.
Please tell me if the format in my txt file is wrong or I need some specific grammar when loading date type data into Hive table.
Thanks!
Date data type format is YYYY-MM-DD. You need to format field accordingly.
More details on
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-date