Trying to do following scheme in Talend ESB:
tREST_request ---- tXMLMap -- Split it to two DB_Inputs -- tXMLMap --- tREST_response
getting ID from RESTrequest,
then get some info from two different DBs by this ID
and finally combine the result REST responses from both DB inputs
Talend do not allow me to combine both DB inputs to single XMLmap,
as i understand may be only one Main flow.
Are there any other way to do it?
Found a workaround by serializing the process, use global variable and hashes.
Not sure if it is a best solution but at least it is working.
Attached image with flow.
Related
Here is a different scenario for GET or POST confusion. I am working on a web application built with spring-boot microservice architecture where there is a need of validate and update some bulk data from excel sheet.
There can be 500-1000 records in excel sheet with 6 different columns for bulk processing. Once UI submits the excel sheet to server from then the total process is asynchronous. There are microservice to microservice calls which I am getting confused to have GET or POST.
Here is a problem: I have 4 microservices (let's say orchestra-service,A-service,B-service and C-service).
OrchestraService creates a DTO list from excel sheet which will be used in further calls. Orchestra calls 'A'. 'A' validates the data with DB and marks success and failure records in DTO list object and returns the list back to orchestra. Again orchestra calls 'B', it does the similar job like 'A' and returns back to orchestra.
Now orchestra calls 'C' which will update success records into database, updates the file status on database and also creates a new resultant excel sheet with error messages per row which will be emailed to the user later(small report kind of thing).
In above microservice to microservice calls only C is updating database and creating resource on server. All above calls I used POST method because I need the request body to pass my input list to all services.
According to HTTP Standards am I doing right?
https://www.rfc-editor.org/rfc/rfc7231#section-4.3.3
Providing a block of data, such as the fields entered into an HTML form, to a data-handling process it should be a POST call.
Please advice me whether:
I should use POST for only 'C' and GET for others or
It should be POST for all as other process involves in data filtering process.
NOTE: service A,B, and C not all services uses all the columns of excel but some of them in combinations. One column having 18 characters long data so I think it can be a problem with GET header limit for bulk operation.
Http Protocol
There is no actual violation on passing information on GET and if that request doesn't mutate between identical requests, then it's fine.
Microservice wise
Now for clarification, are Service A and Service B actually needed ?
Aren't they the same Domain as Service C, and can reside inside of him ?
It's more then good practice to have a Microservice validate its own domain and return a collection of success and failure with the relevant messages.
I had the similar question few years back and here is the possible solution for the first part of your question.
As mentioned by #Oreal Eraki in his answer, I would also question whether you need services A and B. If its just validation and data transformation it can be done in the same domain where the data is actually stored.
I would like to ask a question about the dss.
I'm creating queries to expose services rest by retrieving data from a db postrgresql.
I would like to expose a service that, based on an incoming parameter (table name), retrieves all of that table. But I have a problem.
I do not know a priori the number and the name of the columns in this table. So I wonder how I can map the output (maybe in a generic object)? It's possible to do it? Or do I need to know the name of the columns absolutely?
Thanks a lot
The DSS requires you to define what the response will look like beforehand. It is not dynamic like that.
I receive an error from time and time due to incompatible data in my source data set compared to my target data set. I would like to control the action that the pipeline determines based on error types, maybe output or drop those particulate rows, yet completing everything else. Is that possible? Furthermore, is it possible to get a hold of the actual failing line(s) from Data Factory without accessing and searching in the actual source data set in some simple way?
Copy activity encountered a user error at Sink side: ErrorCode=UserErrorInvalidDataValue,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Column 'Timestamp' contains an invalid value '11667'. Cannot convert '11667' to type 'DateTimeOffset'.,Source=Microsoft.DataTransfer.Common,''Type=System.FormatException,Message=String was not recognized as a valid DateTime.,Source=mscorlib,'.
Thanks
I think you've hit a fairly common problem and limitation within ADF. Although the datasets you define with your JSON allow ADF to understand the structure of the data, that is all, just the structure, the orchestration tool can't do anything to transform or manipulate the data as part of the activity processing.
To answer your question directly, it's certainly possible. But you need to break out the C# and use ADF's extensibility functionality to deal with your bad rows before passing it to the final destination.
I suggest you expand your data factory to include a custom activity where you can build some lower level cleaning processes to divert the bad rows as described.
This is an approach we often take as not all data is perfect (I wish) and ETL or ELT doesn't work. I prefer the acronym ECLT. Where the 'C' stands for clean. Or cleanse, prepare etc. This certainly applies to ADF because this service doesn't have its own compute or SSIS style data flow engine.
So...
In terms of how to do this. First I recommend you check out this blog post on creating ADF custom activities. Link:
https://www.purplefrogsystems.com/paul/2016/11/creating-azure-data-factory-custom-activities/
Then within your C# class inherited from IDotNetActivity do something like the below.
public IDictionary<string, string> Execute(
IEnumerable<LinkedService> linkedServices,
IEnumerable<Dataset> datasets,
Activity activity,
IActivityLogger logger)
{
//etc
using (StreamReader vReader = new StreamReader(YourSource))
{
using (StreamWriter vWriter = new StreamWriter(YourDestination))
{
while (!vReader.EndOfStream)
{
//data transform logic, if bad row etc
}
}
}
}
You get the idea. Build your own SSIS data flow!
Then write out your clean row as an output dataset, which can be the input for your next ADF activity. Either with multiple pipelines, or as chained activities within a single pipeline.
This is the only way you will get ADF to deal with your bad data in the current service offerings.
Hope this helps
We are currently using visjs version 3 to map the dependencies of our custom built workflow engine. This has been WONDERFUL because it helps us to visualize the flow and find invalid or missing dependencies. What we want to do next is simplify the process of building the dependencies using the visjs manipulation feature. The idea would be that we would display a large group of nodes and allow the user to order them correctly. We then want to be able to submit that json structure back to the server for processing.
Would this be possible?
Yes, this is possible.
Vis.js dispatches various events that relate to user interactions with graph (e.g. manipulations, or position changes) for which you can add handlers that modify or store the data on change. If you use DataSets to store nodes and edges in your network, you can always use the DataSets' get() function to retrieve all elements in you handler in JSON format. Then in your handler, just use an ajax request to transmit the JSON to your server to store the entire graph in your DB or by saving the JSON as a file.
The oppposite for loading the graph: simply query the JSON from your server and inject it into the node and edge DataSets' using the set method.
You can also store the networks current options using the network's getOptions method, which returns all applied options as json.
I m working of a project of Enterprise application architecture using software talend
i have this table : User(Id_user, name_user, Email)
what i want to do is select Data from this table and sending email to each user using Tsendemail component
i could so far make a connection to Database using TMssinput and send a single email using Tsendemail
but i dont know how to select values of Row and use them as "email" for Tsendemail
Can someone help me pls ? and thank you
As tSendMail component is not a processing component (ie, it cannot handles more than one vector in input) but a starting component, the best way to do so is to use the good-ol' tFlowToIterate as we did here. Your job will almost look like:
tMssInput---row---->tFlowtoIterate--->Iterate---->tSendEmail
Inside the tFlowToIterate instance you're going to put everything you need from row into the globalMap. Every data-processing operation should be done before that, on the row context (for example, filtering out users you won't the mail to be sent, etc.).