Load multiple multischema delimited file from same directories - talend

Could I know does it have any method to load multiple files that are multi schema delimited files which store in same directories in Talend?
I have tried use the tFileInputMSDelimited component before, but unable to link with tFilelist component to loop through the files inside the directory.
Does anyone have idea how to solve this problem?
To make clearer, each file only contain one batch line but contain multiple header line and it comes with a bunch of transaction line. As showing at the sample data below.

The component tFileOutputMSDelimited should suit your needs.
You will need multiple flows going into it.
You can either keep the files and read them or use tHashInput/tHashOutput to get the data directly.
Then you direct all the flows to the tFileOutputMSDelimited (example with tFixedFlowInput, adapt with your flows) :
In it, you can configure which flow is the parent flow containing your ID.
Then you can add the children flows and define the parent and the ID to recognize the rows in the parent flow :

Related

In Power Query, when duplicating the source query should I duplicate the Transform File folder as well?

My apologies in advance if this question has already been asked, if so I cannot find it.
So, I have this huge data base divided by country where I need to import from each country data base individually and then, in Power Query, append the queries as one.
When I imported the US files, the Power Query automatically generated a Transform File folder with 4 helper queries:
Then I just duplicated the query US - Sales and named it as UK - Sales pointing it to the UK sales folder:
The Transform File folder didn't duplicate, though.
Everything seems to be working just fine right now, however I'd like to know if this could be problem in the near future, because I still have several countries to go. Should I manually import new queries as new connections instead of just duplicating them or it just doesn't matter?
Many thanks!
The Transform Files Folder group contains the code that is called to transform a list of files. It is re-usable code. You can see the Sample File, which serves as the template for the transform actions.
As long as the file that is arrived at for the Sample File has the same structure as the files that you are feeding into the command, then you can use any query with any list of files.
One thing you need to make sure is that the Sample File is not removed from your data source. You may want to create a new dummy file just for that purpose, make sure it won't be deleted, and then point the Sample File query to pull just that file.
The Transform Helper Queries are special queries that you may edit the queries, but you cannot delete and recreate your own manually. They are automatically created by PQ when combining list of contents and are inherently linked to the parent query.
That said, you cannot replicate them, and must use the Combine function provided by PQ to create the helper queries.
You may however, avoid duplicating the queries, instead replicate your steps in the parent query, and use table union to join the list before combining the contents with the same helper queries.

How can I pass output from a filter activity directly to a copy activity in ADF?

I have 4000 files each averaging 30Kb in size landing in a folder on our on premise file system each day. I want to apply conditional logic (several and/or conditions) against details in their file names to only move files matching the conditions into another folder. I have tried linking a meta data activity which gets all files in the source folder with a filter activity which applies the conditional logic with a for each activity with an embedded copy activity. This works but it is taking hours to process the files. When running the pipeline in debug the output window appears to list each file copied as a line item. I’ve increased the batch count setting in the for each to 50 but it hasn’t improved things. Is there a way to link the filter activity directly to the copy activity without using for each activity? Ie pass the collection from the filter straight into copy’s source. Alternatively, some of our other pipelines just use the copy activity pointing to a source folder and we configure its filefilter setting with a simple regex using a combination of * and ?, which is extremely fast. However, in this particular scenario, my conditional logic is more complex and I need to compare attributes in each file’s name with values to decide if the file should be moved. The filefilter setting allows dynamic content so I could remove the filter activity completely, point the copy to the source folder and put the conditional logic in the filefilter’s dynamic content area but how would I get a reference to the file name to do the conditional checks?
Here is one solution:
Write array output as text to a .json in Blob Storage (or wherever). Here are the steps to make that work:
Copy Data Source:
Copy Data Sink:
Write the json (array output) to a text file that has the name of the files you want to copy.
Copy Activity Source (to get it from JSON to .txt):
Sink will be .txt file in your Blob.
Use that text file in your main copy activity and use the following setting:
This should copy over all the files that you identified in your Filter Activity.
I realize this is a work around, but really is the only solution for what you are asking. Otherwise there is no way to link a filter activity straight to a copy activity.

How to add an attribute in a json file via thmap?

I am a beginner on Talend, I have a problem processing a json file via talend. I have a json file with several levels and containing tables on different levels (or depths) of json. I just want to add an attribute in a json area located at a given depth via thmap. So in input I have the json file and in output the same json file with the new attribute. I have no idea how to configure the thmap although it is dedicated to simplify complex mappings.
difficult to answer without more information can you create a screen grab of your TMAP usually it's quite simple in the output field to on the left cell you add it there

Two different mappings to ONE XML output file

I'm working on a talend job where I have a excel file and a couple of database fields that gets mapped to an XML file.
The working job looks like this:
Problem: I want to, with the same input of the excel file and the database fields, make another mapping that outputs to the same working XML file mentioned ealier. So I will have ONE XML file with TWO different mappings. How can I achieve this?
Update
I have done this mapping:
which in the end gets exported like this:
but I'm unsure on how to use this mapping in the tAdvancedFileOutputXML
If I understood correctly, you want to have a single XML file containing two different XMLs (the second one appended to the first one). In the shown Job add a OnSubJobOk link to point to a duplicate of your document flow which has a different mapping. In the second flow rather than using tFileOutputXML component to write the XML file, you can use the tAdvancedFileOutputXML with Append Source XML File marked to add to the file generated from the first flow. Also make sure to configure the XML tree. Check the following link for further information https://help.talend.com/reader/~hSvVkqNtFWjDbBHy0iO_w/h3wZegFH1_1XfusiUGtsPg
Hope this helps.

Using Talend Open Studio DI to extract extract value from unique 1st row before continuing to process columns

I have a number of excel files where there is a line of text (and blank row) above the header row for the table.
What would be the best way to process the file so I can extract the text from that row AND include it as a column when appending multiple files? Is it possible without having to process each file twice?
Example
This file was created on machine A on 01/02/2013
Task|Quantity|ErrorRate
0102|4550|6 per minute
0103|4004|5 per minute
And end up with the data from multiple similar files
Task|Quantity|ErrorRate|Machine|Date
0102|4550|6 per minute|machine A|01/02/2013
0103|4004|5 per minute|machine A|01/02/2013
0467|1264|2 per minute|machine D|02/02/2013
I put together a small, crude sample of how it can be done. I call it crude because a. it is not dynamic, you can add more files to process but you need to know how many files in advance of building your job, and b. it shows the basic concept, but would require more work to suite your needs. For example, in my test files I simply have "MachineA" or "MachineB" in the first line. You will need to parse that data out to obtain the machine name and the date.
But here is how may sample works. Each Excel is setup as two inputs. For the header the tFileInput_Excel is configured to read only the first line while the body tFileInput_Excel is configured to start reading at line 4.
In the tMap they are combined (not joined) into the output schema. This is done for the Machine A Excel and Machine B excels, then those tMaps are combined with a tUnite for the final output.
As you can see in the log row the data is combined and includes the header info.