How to get the time difference in talend? - talend

How to get the difference in time by comparing with the previous value and getting the result .Say for example
There are
2017-01-01 13:00:00
2017-01-01 13:15:00
I need the difference as 15 minutes after finding the difference,How to do it?

Firstly, you'll have to use TalendDate.diffDate(column1,column2,"pattern") to get the time difference.
Then, if you want to compare current value with previous one (in the same column), you can set a sequence on your flow, it will help you identify which one is the previous value. Then, you'll just have to read twice your flow, and have an inner join between current sequence and current sequence -1 to get the currentDate and the previous Date.
First subjob :
YourFlow -> tMap -> tHashOutput
In tMap, add a new "sequence" column to your field and use Numeric.sequence("s1",1,1).
This way all lines will have an ID.
Then, read twice your Hash , and join flows on "sequence - 1"
tHashInput_1----|
|--tMap--->Output
tHashInput_2----|
Put the TalendDate.diffDate() method in the output, using the two Dates fields.

Here is an alternative :
Start defining starting talend job execution time, this way (here in a tJava, but you can also use tSetGlobalVar component) :
globalMap.put("startDate", TalendDate.getDate("CCYY/MM/DD hh:mm:ss"));
The following code is used later in the job inside a tJava :
String endDate = TalendDate.getDate("CCYY/MM/DD hh:mm:ss");
long executionTime = format.parse(endDate).getTime() - format.parse(((String)globalMap.get("startDate"))).getTime();
System.out.println("Execution Time : "+(executionTime/(60*60*1000))+" Hour(s) "+(executionTime/(60*1000)%60)+" Minute(s) "+(executionTime/1000%60)+" second(s).");

Related

TALEND STUDIO - Get number of Non-working Days between start_date and end_date using TMAP in Talend

I am a beginner in Talend and I need to achieve the following:
For example, I have two input tables in the TMAP component.
Table 1:
Start_Date
End_Date
25/8/2022
1/9/2022
Table 2 (Lookup Table):
Non_working_days
Remark
27/8/2022
Weekend
28/8/2022
Weekend
31/8/2022
Weekend
I would want my output to count the number of non-working days from the lookup table.
For exp:
Start_Date
End_Date
No_of_non_working_days
25/8/2022
1/9/2022
3
Can this be achieved by using the expression editor in the TMAP component or I will need to create a routine to achieve it?
Thanks.
This is doable with a subjob , a bit complex but an interesting one :
Main idea: generate all dates between startDate and endDate, then compare each one of these dates to the content of table 2. Then count the number of corresponding dates.
tFixedFlow1 (table 1) place here your input table 1
tFlowToIterate : this will create global variables for startDate and endDate, that will be important for the next steps
tLoop : the aim is to generate all dates contained between startDate and endDate
See detail :
tIterateToFlow : once we have created all dates between start and endDate, regroupe the iterationFlow into a unique flow.
tLogRow : just so you can control content.
tMap+table 2 : join input flow with lookup from your table 2. Make it an innner join.
tAggregate : count the number of lines in the output
tLogRow : print screen of the result.

Connecting BigQuery and Google Sheets - DATE parameter issue

following 1 I started creating a Spreadsheet which reads data from BigQuery, but I'm having an issue handling parameters related to date values.
In the first sheet, I created 2 cells with 2 parameters, the start and the end of a date interval, with proper values. Both cells are formatted as "Date" value.
In the second sheet I configured BigQuery connector, for this example, I'm using a public dataset with dates. bigquery-public-data.utility_eu.date_greg
From the BigQuery connector wizard I added:
"STARTDATE" as "PARAMETERS!B1"
"ENDDATE" as "PARAMETERS!B2"
After this configuration, this is the resulting query:
SELECT
date,
date_str,
date_int
FROM `bigquery-public-data.utility_eu.date_greg`
WHERE date > DATE(#STARTDATE) AND date < DATE(#ENDDATE)
LIMIT 10
I'm getting an error directly from the editor with this message:
> Error BigQuery: No matching signature for function DATE for argument types: INT64. Supported signatures: DATE(TIMESTAMP, [STRING]); DATE(DATETIME); DATE(INT64, INT64, INT64) at [8:14]
As far as I can understand, the "date" cells are retrieved as a number, so the direct parse is not working. After a couple of tests, I understood the that given int value is the number I can obtain change cell format to "number".
If you convert cell value from DATE to NUMBER you get this value:
01/05/2019 -> 43.586
31/05/2019 -> 43.616
What is this number? It is not milliseconds, it increases by 1 every next day. In order to create the proper query that can parse this int, I need to understand what is this int (of course I can handle the cell as "text" and writing the timestamp value directly, but I would prefer to have the native date format so I can use the built-in calendar.
My consideration (with simple math) is that this number refers to a number of days since 30/12/1899, but it is very odd (also, every date BEFORE this days is always 0), so I'm asking you directly how to handle this value. Basing on my understanding of when the number counter starts (30/12/1899), I created this query which add the number retrieved from the cell:
SELECT *
FROM `bigquery-public-data.utility_eu.date_greg`
WHERE
date >= DATE_ADD(DATE("1899-12-30"), INTERVAL #DATAINIZIO DAY)
AND date <= DATE_ADD(DATE("1899-12-30"), INTERVAL #DATAFINE DAY)
It is working... but I think I'm doing a workaround that is not the proper way of doing this.
Also, is there any full documentation related to this BigQuery connection provided by Spreadsheet? Besides presentation in 1 I'm unable to find any specific documentation.
Spreadsheets (Google, Excel, ...) store the dates as days passed since a starting date with a fractional day representing time.
From here: "Excel stores dates and times as a number representing the number of days since 1900-Jan-0, plus a fractional portion of a 24 hour day: ddddd.tttttt . This is called a serial date, or serial date-time."
Now, you have to ways to filter by date on your Query:
In the query, you can use DATE_ADD to add your number of days (cell value) to the base date. (Carefull, DATE_ADD takes INT, and the date value is float so needs prior casting).
(preferred) on your spreadsheet you use TEXT(cell, "yyyy-mm-dd") so you can then use DATE() in the BigQuery query.
I use the second method as, though you need that extra cell (unless you directly store the date as YYYY-MM-DD; keeps the query cleaner than having a cast and date_add in there. Also would save you from the "1904 problem" explained in the link above.
What is this number? It is not milliseconds, it increases by 1 every next day.
This is so called serial number which represent number of days since "very beginning"
Google's Spreadsheet date calendar starts from 1900-01-01 - which is treated as a "very beginning"
In order to create the proper query that can parse this int, I need to understand what is this int
Armed with above info you can adjust you dates calculation to be in sync with what BigQuery expects
You mentioned that your fields are already in Date format, maybe you are doing an extra parsing in your query.
Try to do it without the DATE functions.
Also, I found this other doc, not merely related to connection, but might be helpful: Getting info from Spreadsheets with BigQuery.

SSIS For Loop Container with Date Variable

I want to create a monthly package that executes a daily query at ODBC and writes an output file.
More specifically the query must be first executed for the first day of the previous month (e.g. '01/11/2018') then the next one ('02/11/2018') until the last day of the previous month ('30/11/2018').
The date variables are currently saved as Strings and I also want to have a string variable with Oracle date format to be inserted into the query. How should it be organised? Is there a way that I could use the string variables in the expressions?
Break it into parts as follows:
Declare variables to store previous month start and end date as follows:
start_date(datetime) = (DT_DATE)((DT_WSTR,4)YEAR(DATEADD("MM",-1,GETDATE()))+"-"+RIGHT("0"+(DT_WSTR,2)MONTH(DATEADD("MM",-1,GETDATE())),2)+"-01")
end_date(datetime) = DATEADD("D", -(DAY(GETDATE())),GETDATE())
Declare variable Counter(datetime)
Create a For loop container as follows :
Rest of the Data Flow Task should be there within For loop container, which will create output file. You can use the variable Counter in SQL to parameterize it
In fact, I figured out that all I wanted to use in my loop was the daypart of the date, so I created two extra int variables that contains:
1) the first day of the month (1)
2) the last day of the month (28,30,31)
I used those two variables at the For Loop Expressions and convert the index to string, so I could add it in the query. Possibly there will be a better way and it would be welcome.

How to process files only for the past hour using Talend?

I have continuous sensor data coming in every 5 mins in form of files. I want to pick files only for the past hour and do the required processing.
for e.g: the talend job runs at 12:01pm , it picks all the files from 11:00 am to 12:00 pm only.
Can anyone please suggest the approach I should take to make this happen within talend. is there any inbuilt component that can pick files for previous one hour ?
Here is the flow.
Use tFileProperties, in which you will get builtin schema with the name of mstring_name. By using this column you will get last modified time of file and in tJava or tJavaRow you can check wether this time lie between past one hour using talendDate functions
iterate all files and in tJavaRow write this code :
Date lastModifiedDate = TalendDate.parseDate("EEE MMM dd HH:mm:ss zzz yyyy", input_row.mtime_string);
Date current_date = TalendDate.getCurrentDate();
if(TalendDate.diffDate(current_date, lastModifiedDate,"HH") <= 1) {
output_row.abs_path = input_row.abs_path;
}
by this you will get all the files which are between past one hour.
hope this helps..
here is the complete job design :
tFileList--->(iterate)---->tFileProperties---->(row1 main)---->tJavaRow---->if---->tFileInputDelimited---->main----->tMap---->main----->tFileOutput
The context you are setting tJavaRow, check its nullability in if condition :
context.getProperty("file") != null && !context.getProperty("file").isEmpty()
After this use the context as you are doing
There is no built-in component that will give you files based on time.
However, you can accomplish this by using tFileList-->tFileProperties. Configure tFileList to sort by last modified date, then tFileProperties will give you the modified date. From there, you can filter based on the date value - if older than an hour, stop, otherwise process.

CMIS Query : how to get result of one date only

I want to get data from one date only, example: 2014-06-16
in CMIS reference I know that we can use = (equal) operator that I think the time must be precised.
The alternative that i thought is to do like below :
First:
SELECT * FROM cmis:document WHERE cmis:creationDate >= TIMESTAMP '2014-06-16T00:00:00.000Z' AND cmis:creationDate< TIMESTAMP '2014-06-17T00:00:00.000Z'
Second:
SELECT P.tsi:DATENUM as date_traitement, L.tsi:type as type, P.tsi:statut as statut
FROM tsi:lot AS L JOIN tsi:pli AS P ON L.cmis:name = P.tsi:lot
WHERE
(P.tsi:DATENUM >= TIMESTAMP '2014-06-16T00:00:00.000Z' AND P.tsi:DATENUM < TIMESTAMP '2014-06-17T00:00:00.000Z')
The first one is running perfectly, I've got data from the 16 june BUT in the seconde I don't know WHY but I still got data from 2014-06-17
Note: tsi:DATENUM type is datetime
So could you say what's wrong OR how to get data from ONE date only?
The second one should work. The timestamps you are using are in GMT. If your timestamps are stored with a time zone offset it could be the reason why you are seeing times from 6/17 when you expect to only see times from 6/16.