IBM datastage integration with java - datastage

We have datastage jobs and want to use one java class which reads the file and gives some data back. Can someone explain the steps needed to perform this function?

There are java transformer and java client stages in Real Time section in Palette.
You will need to study the API that DataStage uses to work with Java.
Simply write a java code that reads the file and you can call its class in DataStage.

The Java Integration Stage is a DataStage Connector through which you can call a custom Java application from InfoSphere Data Stage and Quality Stage parallel jobs. The Java Integration Stage is available from IBM InfoSphere Information Server version 9.1 and higher. The Java Integration Stage can be used in the following topologies: as a source, as a target, as a transformer, and as a lookup stage. For more information on the Java Integration Stage, see Related topics.
The DataStage Java Pack is a collection of two plug-in stages, Java Transformer and Java Client, through which you can call Java applications from DataStage. The Java Pack is available from DataStage version 7.5.x and higher.
The Java Transformer stage is an active stage that can be used to call a Java application that reads incoming data, transforms it, and writes it to an output link defined in a DataStage job. The Java Client stage is a passive stage that can be used as a source, as a target, and as a lookup stage. When used as a Source, the Java Client will be producing data. When used as a target, the Java Client Stage will be consuming data. When used as a lookup, the Java Client Stage will perform lookup functions.
For more information on the Java Pack Stages, see Related topics.
https://www.ibm.com/developerworks/data/library/techarticle/dm-1305handling/index.html

Related

Can Talend generate Scala code for Apache Spark?

I am new to Talend ETL tool. I want to use Talend to generate Spark batch jobs.
Can Talend generate Scala code instead of Java or is there a way to plug in Scala based batch job in to Talend?
No, it generates Java only.
That should not matter if using Talend as a graphic ETL-tool at a higher abstraction level though.

how to write task information in the spring data flow UI manually

I integrate spring batch into a restful controller of a spring boot, which means now we operate spring batch program by send a restful call. In this case, we cannot make a jar and register the jar on spring data flow server. So my question is that how to register a task if we don't have jar
You've asked a few similar questions today.
My recommendation is that you could consider referring to the ref. guide of Spring Cloud Task and Spring Cloud Data Flow. Specifically, pay attention to the Spring Batch section.
Once you have the understanding as to what to do, you can build a batch-job as a Spring Cloud Task application, and run it standalone successfully.
If it runs locally as expected, you can switch to SCDF and register the JAR using the REST-API, Shell, or in the GUI. You'd need a physical uber-jar of the application for it. With that registered, you can then build a Task definition with it, and launch it from SCDF.
If you want to do all of the above programmatically, please have a look at the acceptance-test suite for examples.

spring Batch flow job Vs spring composed task

I want to execute my apps using spring-complex-task and I have already build complex spring-batch Flow Jobs which executes perfectly fine.
could you please explain what is difference between spring Batch flow job Vs spring composed task? and which is best among them?
A composed task within Spring Cloud Data Flow is actually built on Spring Batch in that the transition from task to task is managed by a dynamically generated Spring Batch job. This model allows the decomposition of a batch job into reusable parts that can be independently tested, deployed, and orchestrated at a level higher than a job. This allows for things like writing a single step job that is reusable across multiple workflows.
They are really complimentary. You can use a composed task within Spring Cloud Data Flow to orchestrate both Spring Cloud Tasks as well as Spring Batch jobs (run as tasks). It really depends on how you want to slice up your process. If you have processes that are tightly coupled, package them as a single job. From there, you can orchestrate them with Spring Cloud Data Flow's composed task functionality.
In general, there's not one that's "better". It's going to be dependent on your use case and requirements.
Spring Batch is a nice framework to run batch processing applications.
Spring Cloud Task is a wrapper that allows you to run short lived microservices using Spring Cloud along with Spring Boot. Once you setup a test with #EnableTask it will then launch your *Runner. The framework also comes with Spring Batch integration points and ComposedTaskRunner helps facilitate that integration.
I'd start with the Spring Cloud Task batch documentation and then come back to ask more specific questions.

Automated way to see Queries in all Oracle Connectors of all jobs in datastage

Is there a way to see all the queries that are there in my Oracle COnnector stages of my datastage project? I am using DS 11.3.
No not natively. You could export your project and parse the export for all of the SQL staments (this could be done by a DataStage job of cause) or you might be able to query it if you have IGC Information Governance Catalog) in place.

Can Spring XD be used as a platform for comprehensive Spring batch workflows?

Spring XD provides platform for launching for the batch jobs. Does that cover comprehensive workflows for all batch job use-cases? Or it is meant to be used within the context of Spring XD use-cases.
For example someone who wants to use just spring-batch not necessarily all the features of data ingestion/real-time analytics, will they still be benefited by setting up the Spring XD DIRT just for executing batch workflows? In that case, are there any limitations not being able to use all batch workflows supported by spring-batch?
In short, yes it can be used as a comprehensive batch platform. Spring XD provides a number of compelling features currently with more coming in the future. Features Spring XD provide for batch solutions:
Job orchestration - Spring Batch explicitly avoids the problem of job orchestration so that the developer can use whatever tool they want. Spring XD brings orchestration in a distributed environment via scheduling of jobs, executing ad hoc jobs, and executing jobs on the result of some form of logic (polling a directory for a file for example).
Abstraction of Spring Batch and Spring Integration - Spring Batch and Spring Integration are commonly used in solutions to address more complex scenarios. For example, if you need to FTP a file to a server, then kick off a batch job once it's there, you'd use Spring Integration for the FTP piece and to kick off the job with Spring Batch handling the processing of the job. Spring XD provides an elegent abstraction of those components to allow for easy assembly of these into more robust solutions.
Simplification of remote partitioning - Spring XD provides facilities to simplify the wiring of the communication aspects of remote partitioning within Spring Batch.
Interaction of jobs via UI, shell, or REST - Spring XD exposes a number of metrics and functionality to be consumable via their web based UI, the interactive shell, or REST based end points.
The main limit as of Spring XD 1.0 for batch processing is the inability to execute nested jobs (using a JobStep). I believe this will be part of Spring XD 1.1 (https://jira.spring.io/browse/XD-1972).
Looking forward, other features that I would expect in future versions of Spring XD are around high availability for jobs. Currently if a job is deployed on a node and the node goes down, it will be redeployed automatically. In future releases, the ability to restart the job automatically upon redeployment would be possible.