how to collect all information about the current Job in Talend data studio - mongodb

I'm Running any job then I want to log all information like ---
job name
Source detail and destination details (file name/Table name)
No of records input and number of records processed or save.
so I want log all the above information and insert into Mongodb using talend open studio Components also explain what component do I need to perform that task. need some serious response thanks.

You can use tJava component as below. Get the count of source, destination, details of the source name and target name. Now redirect the details to a file in tJava.
For more about logging functionalities, go through below tutorials,
https://www.youtube.com/watch?v=SSi8BC58v3k&list=PL2eC8CR2B2qfgDaQtUs4Wad5u-70ala35&index=2

I'd consider using log4j which has most of this information. Using MDC you could expand the log messages with custom attributes. Log4j does have a JSON format, and there seems to be a MongoDB appender as well.
It might take a bit more time to configure (I'd suggest adding the dependencies via a routine) but once configured it will require absolutely no configuration in the job. Using log4j you can create filters, etc.

Related

File Endpoint for Citrus Framework

I'm currently looking at using Citrus for our Integration Testing, however our Integration Software uses amongst others, file messages - where files are written to an inbound folder, picked up and processed which results in a new file message being written to an outbound folder or data being written to SQL.
I was wondering if Citrus can write a file with a certain payload to an inbound folder and then monitor for a file to appear in certain outbound folder and/or in a SQL table.
Example Test Case:
file()
.folder(todoInboundFolder)
.write()
.payload(new ClassPathResource("templates/todo.xml"));
file()
.folder(todoOutboundFolder)
.read()
.validate("/t:todo/t:correlationId", "${todocorrelationId}")
.validate("/t:todo/t:title", "${todoName}");
query(todoDataSource)
.statement("select count(*) as cnt from todo_entries where correlationid = '${todocorrelationId}'")
.validate("cnt", "1");
Additionaly - is there a way to specific the timeout to wait for the file/SQL entries to appear?
There is no direct implementation of the file endpoint yet in Citrus. There was a feature request but it was closed due to inactivity https://github.com/citrusframework/citrus/issues/151
You can solve this problem though by using a simple Apache Camel route to do the file transfer. Citrus is able to call the Camel route and use its outcome very easily. Read more about this here https://citrusframework.org/citrus/reference/2.8.0/html/index.html#apache-camel
This would be the workaround that can help right now. Other than that you can reopen or contribute to the issue.

Talend - Stats and Logs - On database - error

I have a job that inserts data from sql server to mysql. I have set the project settings as -
Have checked the check box for - Use statistics(tStatCatcher), Use logs (tLogcatcher), Use volumentrics (tflowmetercatcher)
Have selected 'On Databases'. And put in the table names
(stats_table,logs_table,flowmeter_table) as well. These tables were created before. The schema of these tables were determined using tcreatetable component.
The problem is when I run the job, data is inserted in the stats_table but not in flowmeter_table
My job is as follows
tmssInput -->tmap --> tmysqoutput.
I have not included tstatcatcher,tlogcatcher,tflowmetercatcher. The stats and logs for this job are taken from the project settings.
My question - Why is there no data entered in flowmeter_table? Should I include tStatCatcher , tlogcatcher and tflowmetercatcher explicitly in the job for it to run fine?
I am using TOS
Thanks in advance
Rathi
Using flow meter requires you to manually configure the flows you want to monitor.
On every flow you want to monitor, right-click on the row >parameters>advanced settings>Monitor connection.
Then you should be able to see data in your flow table.
If you are using the project settings , you don't need to add the *Catcher component on your job.
You need to use tstatcatcher,tlogcatcher,tflowmetercatcher composant in the job directly.
The composant have already their schema defined so you jusneed to put a tmap and redirect in the table you want like :
Moreover in order tu use the tlog catcher you need to put some tdie or twarn in your job.

Conditional routing in Apache NiFi

I'm using NiFi to get data from an Oracle database and put some of this data in Kafka (using the processor PutKafka).
Example : if the attribute "id" contains "aaabb"
Is that possible in Apache NiFi? How can i do it?
This should definitely be possible, the flow might be something like this...
1) ExecuteSQL or QueryDatabaseTable to get the data from the database, these produce Avro
2) ConvertAvroToJson processor to convert the Avro to Json
3) EvaluateJsonPath to extract the id field into an attribute
4) RouteOnAttribute to route flow files where the id attribute contains "aaabbb"
5) PutKafka to deliver any of the matching results from RouteOnAttribute
To add on to Bryan's example flow, I wanted to point you to some great documentation that should help introduce you to Apache NiFi.
Firstly, I would suggest checking out the NiFi documentation. It is very good and should help a lot. In addition to providing details on each of the processors Bryan mentioned it also has general documentation for every type of user.
For a basic introduction to build a NiFi flow check out this video.
For example templates check out this repo. It's a has an excel file at it's root level which has a description and list of processors for each template.

Add metric name in OTSDB via API

I am adding data into OTSDB from different sources. But i give metric name for each data points using XML file. Also i dont have any access to OTSDB to create Metric Name via terminal
I have reffered below links :-
API PUT
GitHub Issue
In gitHub issue, i couldn't understand how to use --auto-metirc .
I know how to create metric using Terminal :-
Here i am creating abxcs metirc using terminal.
./tsdb mkmetric abxcs
But How to create metric using API?
FYI :- Please suggest solution using JAVA
Thanks for help in advance.
In order to have metric names auto created on-the-fly, you'll need to set
tsd.core.auto_create_metrics = true
in the OpenTSDB configuration file. Ref: http://opentsdb.net/docs/build/html/user_guide/configuration.html
Whether or not a data point with a new metric will assign a UID to the metric. When false, a data point with a metric that is not in the database will be rejected and an exception will be thrown.
CLI equivalent of it is to pass --auto-metric switch while starting tsd process.

JasperServer using REST to run a report with data source specified at run time

I have no problem executing a report on JasperServer using the RESTful api when the report unit has data source predefined.
What need to do though is allow my customers to select what database they want to run the report against when they are getting ready to execute a report. I assumed that when I make the PUT request to run the report I could simply throw the data source resource descriptor in the ReportUnit resource descriptor passed in the PUT but it doesn't seem to work.
I even went as far as to pull the resource descritor for the ReportUnit when it had the data source predfined. Tested that passing that resource descritor in the PUT worked. Then removed the predifiend data source and tried executing the report again using the exact resource descriptor I pulled previously and it would not work.
Is this possible?
I may be wrong, without any much reading, I think you can create data source and domain via resource services.
To update the report file using the resource service, you may have to change the domainQuery node.
I had pulled out the jrxml for my json based report file and it looked something like this:
<resourceDescriptor name="domainQuery.xml" wsType="xml" uriString="/adhoc/topics/myjsonposts_files/domainQuery.xml" isNew="false">
Hope this will help you find your solution.