Tracking the job progress in Talend - talend

I have to copy data from excel sheets to the sql server tables.
I want to track my job progress as in I would like to have output message saying 'data is been loaded in tableX' after each table's completion.
I tried to use tLogRow but it outputs each row being copied.
Which component should I use and how do I do it?
I want my messages to be printed while running from command line as well.

You can do this by logging to the console in a tJava component for each of your tMSSqlOutput components and link them with an onComponentOk link.
To print to the console you can use System.out.println("data is been loaded in tableX");.
You'll then see the output of this in your run tab and also in any logs produced when the job is ran just as you would with a tLogRow component.
A slightly more lengthy approach but without writing this small snippet of Java code would be to link a tFixedFlowInput with an onComponentOk to your database output component. In this you could specify a single row of data with a single column "message" (or whatever you want to call it) and then put your message in the tFixedFlowInput component. From here just link it to a tLogRow as normal.

Related

Issue with Copy Activity Metadata

The Copy Data Activity Which used to show the number of rows written isnt showing up any more.
Is there any option in the copy activity to make sure it reflects the number of rows written.
Be it debug of the pipeline, or a triggered pipeline run, you can check the output of the copy data activity to conclude whether the data read is equal to data written or not.
Let's say it is a pipeline run. Navigate to monitor section and click on the pipeline.
Now, the activity run dialog opens up. There you can monitor from the activity debug output whether the data read is equal to data written or not:
NOTE: The above is for blob to blob copy. For your source and sink, there will be similar activity output data that might contain the required information (like rows read and rows written). The following is an example for Azure SQL database to blob:

Append datasets from xlsx and databse - Talend

I have three Excel files and one database connection which I need to append as a part of my flow. All four datasets in the pre-append stage have just one column.
When I try to use tUnite, I get the error for tFileInputExcel - see the screenshot. Moreover, I cannot join the database connection with tUnite.
What am I doing wrong?
I think the problem is with the tFileExist components (I think that's what these are on the left with the "if" links coming out) because each of them is trying to start a new flow. Once you're joining them with the unite, there can be only one start to the flow - and this goes to the start of the first branch of the merge order.
You can move the if logic elsewhere. Another idea is to put the output from each of the Excel into a tHashOutput (linked together), then use a tHashInput to write to your DB.

How to increment a number from a csv and write over it

I'm wondering how to increment a number "extracted" from a field in a csv, and then rewrite the file with the number incremented.
I need this counter in a tMap.
Is the design below a good way to do it ?
EDIT: im trying a new method. see the design of my subjob below, but i have an error when i link the tjavarow to my main tmap in the main job
Exception in component tMap_1
java.lang.NullPointerException
at mod_file_02.file_02_0_1.FILE_02.tFileList_1Process(FILE_02.java:9157)
at mod_file_02.file_02_0_1.FILE_02.tRowGenerator_5Process(FILE_02.java:8226)
at mod_file_02.file_02_0_1.FILE_02.tFileInputDelimited_2Process(FILE_02.java:7340)
at mod_file_02.file_02_0_1.FILE_02.runJobInTOS(FILE_02.java:12170)
at mod_file_02.file_02_0_1.FILE_02.main(FILE_02.java:11954)
2014-08-07 12:43:35|bm9aSI|bm9aSI|bm9aSI|MOD_FILE_02|FILE_02|Default|6|Java
Exception|tMap_1|java.lang.NullPointerException:null|1
[statistics] disconnected
enter image description here
You should be able to do this mid flow in a tMap or a tJavaRow.
Simply read the number in as an integer (or other numeric data type) and then add your increment to it.
A really simple example might look like this:
Here we have a tFixedFlowInput that has some hard coded values for the job:
And we run it through a tMap where we add 1 to the age column:
And finally, we output it to the console in a table:
EDIT:
As Gabriele B has pointed out, this doesn't exactly work when reading and writing to the same flat file as Talend claims an exclusive read-write lock on the file when reading and keeps it open throughout the job.
Instead you would have to write the incremented data to some other place such as a temporary file, a database or even just to the buffer and then read that data in to a separate job which would then output the file you want and clean up anything temporary.
The problem with that is you can't do the output in the same process. I've just tried testing reading in the file in one child job, passing the data back to a parent job using a tBufferOutput and then passing that data to another child job as a context variable and then trying to output to the file. Unfortunately the file lock remains on it so you can't do this all in one self contain job (even using a parent job and several child jobs).
If this sounds horrible to you (it is) and you absolutely need this to happen (I'd suggest a database table sounds like a better match for this functionality than a flat file) then you could raise a feature request on the Talend Jira for the tFileInputDelimited to not hold the file open or to not insist on an exclusive read-write lock on the file.
Once again, I strongly recommend that you move to using a database table for this because even without the file lock issue, this is definitely not the right use of a flat file and this use case perfectly fits a database, even something as lightweight as an embedded H2 database.

How to log progress of tasks in Talend Open Studio?

I have some sample jobs that migrate data from one database to another and I would like to have some information about the current progress, like the one you have when the job is run interactively from the application itself (I export and run it from command line).
I use flowMeter and statsCatcher but everything i got is the overall time and the overall number of records passed (e.g. 4657 sec, 50.000.000 rows).
Is there any solution to get a decent log ?
Your solution is about adding a conditional clause to logging. Something true one row every, let's say, 50000. This condition using a sequence should work:
Numeric.sequence("log_seq",1,1) % 50000 == 0
You can use the custom component bcLogBack to basically output your log using an sl4j facade stack. The component has an option called "Conditional logging" to send the message only when the condition evaluate to true.
Alternatively, if you don't like the idea of install a custom component, you can end your subjob using the standard tLogRow (or tWarn, tDie or whatever) prefixed by a tFilter with the same expression as advanced condition. This way you'll let the stream pass (and the log message to be triggered) just one time every 50000. Here's a very basic job diagram
//---->tMySqlOutput--->tFilter-----//filter--->tWarn (or tLogRow)
As far as I know, tLogRow outputs to the console. So you can easily plug an output into it.
If tLogRow isn't enough, you can plug your output into a TJavaFlex component. There you could use something like log4j or any custom output.
You can also use tFileDelimitedOutput as a log file. This component have a nice "append" option that works like a charm for this use case.
For your question above : How to obtain the log information
By experience, I can tell that some components outputs the flow. For example, the tMysqlInput outputs the successfully inserted rows.
Generally, to log the information I use the component tReplicate which allows me to output a copy of the flow to a log file.
tMySqlOutput ---- tReplicate ----- tMap -------- tMySqlInput (insert in DB)
+---- tMap -------- tDelimitedFile (log info)
You can also use tWarn in combination with tLogCatcher:
tMySqlOutput ---- tFilter ---- tWarn
tLogCatcher ---- tMap ---- tLogRow
tFilter would prevent you from logging a progress on every row completion (see Gabriele B's answer). tWarn would have the actual message you want to log out.
tLogCatcher should get inputs from all of the tWarns, tMapper transforms each row from the logCatcher into an output row, and tLogRow logs it.
That answer is described in more detail (with pictures): http://blog.wdcigroup.net/2012/05/error-handling-in-talend-using-tlogcatcher/

Keeping track of refreshes in Crystal Reports 2008

I am curious to know if there is a way to tell if a report has been printed or ran. For example, the user enters in a inspectionnumber and hits apply and then clicks print and then prints the report. Can i know if the report has been printed? is there a way to use local variables to track that, some sort of loop?
I've never tested this, but here's a theory you can try.
In your Database Expert, go to your Current Connections and Add Command. Use this to write up a SQL query to save the usage data to a table in your data source (If your data source is read only, just add a delimited text file as an additional data source and output your usage data to that instead.)
The best example I have of this is # http://www.scribd.com/doc/2190438/20-Secrets-of-Crystal-Reports. On page 39, you'll see a method for creating a table of contents that more or less uses this method.