Using Talend Open Studio DI to extract extract value from unique 1st row before continuing to process columns - talend

I have a number of excel files where there is a line of text (and blank row) above the header row for the table.
What would be the best way to process the file so I can extract the text from that row AND include it as a column when appending multiple files? Is it possible without having to process each file twice?
Example
This file was created on machine A on 01/02/2013
Task|Quantity|ErrorRate
0102|4550|6 per minute
0103|4004|5 per minute
And end up with the data from multiple similar files
Task|Quantity|ErrorRate|Machine|Date
0102|4550|6 per minute|machine A|01/02/2013
0103|4004|5 per minute|machine A|01/02/2013
0467|1264|2 per minute|machine D|02/02/2013

I put together a small, crude sample of how it can be done. I call it crude because a. it is not dynamic, you can add more files to process but you need to know how many files in advance of building your job, and b. it shows the basic concept, but would require more work to suite your needs. For example, in my test files I simply have "MachineA" or "MachineB" in the first line. You will need to parse that data out to obtain the machine name and the date.
But here is how may sample works. Each Excel is setup as two inputs. For the header the tFileInput_Excel is configured to read only the first line while the body tFileInput_Excel is configured to start reading at line 4.
In the tMap they are combined (not joined) into the output schema. This is done for the Machine A Excel and Machine B excels, then those tMaps are combined with a tUnite for the final output.
As you can see in the log row the data is combined and includes the header info.

Related

Append datasets from xlsx and databse - Talend

I have three Excel files and one database connection which I need to append as a part of my flow. All four datasets in the pre-append stage have just one column.
When I try to use tUnite, I get the error for tFileInputExcel - see the screenshot. Moreover, I cannot join the database connection with tUnite.
What am I doing wrong?
I think the problem is with the tFileExist components (I think that's what these are on the left with the "if" links coming out) because each of them is trying to start a new flow. Once you're joining them with the unite, there can be only one start to the flow - and this goes to the start of the first branch of the merge order.
You can move the if logic elsewhere. Another idea is to put the output from each of the Excel into a tHashOutput (linked together), then use a tHashInput to write to your DB.

Exporting the results of an Anylogic experiment

I have built my model and run the experiment. I cannot seem to find where the data is stored.
I now need to conduct several runs and compare the results, I am using normally distributed repair times so the results should vary between runs without modifying parameters.
How can I keep the results of each run and then present them all in the same data set?
There are two main options for getting data out of your simulation
Using the internal AnyLogic database
Using external files like Excel or txt
Step 1: Setup your objects
Internal Database
Create an empty table with the columns you require
External object
Setup either an Excel or text file using the objects provided by AnyLogic in the Connectivity palette
Step 2: Saving your data
For both cases you need to write your data to the object of your choosing, either as the data gets generated or at the end of the simulation model
using Internal DB
The best option is to write data using the following command
insertInto(table_name)
.columns(column_name)
.values(value);
This will just insert a new line into a database table that you created, you can save multiple values to multiple columns by adding comma-separated entries into the parameters for columns and values.
e.g
insertInto(temeprature_output_table)
.columns(scenario_name, time, temperature)
.values("sceanrio1", 10,5, 102);
External files
2.1) Using Excel
filename.setCellValue(value, sheetName, row, column);
or even better you can write out an entire dataset
excelFile.writeDataSet(dataset, sheetName, row, column);
2.2) Using a text file
fileName.println("value" + "\t" + " value 2");
You can use whatever separator you want "\t" for tab separated or "," for comma and so on
Step 3: Finish and export data
Internal Database
At the end of a simulation run, you can simply export the data
See help here https://anylogic.help/anylogic/connectivity/export-excel.html#exporting-data-to-ms-excel-workbook
P.S. It is possible to automate this with some effort
External file
On Excel you need to call .writeFile() to finish.
On both objects, you need to call .close() for them to be closed and saved to memory.
FYI
Excel has the option to save on termination.
Read more on using Excel here - https://anylogic.help/anylogic/connectivity/excel-file.html#writing-to-excel-file
And on text file here
https://anylogic.help/anylogic/connectivity/text-file.html#replicated
There is also an example model

How to get the count of an Excel in unix or Linux in datastage

Is there any possibility i can get the count of an excel in unix or linux.
I tried creating server routine & i am able to get the output link count but i am unable to get return value into a file or variable.
Please suggest,
There are lots of information (besides the data itself) that can be extracted from an Excel file - check out the documentation
I could imageine two options for your goal:
check a certain cell which would be filled/exist if the file is not empty
check the filesize of an empty file and if the filesize is bigger than that it is not empty
The Best approach will be,
Use 'Unstructured Data' stage from File section in palette to read the xlsx(excel file).
Use Aggregator stage to take count of row , consider any column which will always be there in case the data comes into the file.
Write this count into a file.
use this as per your logical requirement.
After point 2 you can use transformer stage to handle count logic, whichever you find suitable.

Extract data from to connected Excel files, as if one is function and the other its parameter values

My question has 2 parts.
Part 1:
I have 3 Excel files each with multiple sheets. I have loaded all files with sheets into the matlab using a small code. The files are loaded in a structured form. i.e. One main structure and 3 files as a seperate fields with multiple sheets as a value.
Part 2:
Now I have another seperate excel file having a single sheet with a single column. The col of this excel file has entries(or values/names) which are linked to each file of part one. In other words each excel file of part 1 is a data of each entry of a column of excel file in part 2.
Now my problem is that I want to link both of them. This is important because in further processes I want my program to run in a way that when I specify the value or tell the entry from excel file of part 2 it takes corresponding data from excel file of part 1. Is there any way of doing it?
This block diagram may be helpful to understand what I mean. BLock Diagram

TfileList catches one of the 6 files only

I tried to display some results from several files in a directory. I use TFileList, and 2 tFileInputDelimited which are both linked to TFileList. I don't know why but at the end of the processing my results are lugged from just one of the 6 files I want. It appears that there are results from the list file of the directory.
Each tFileInputDelimited has ((String)globalMap.get("tFileList_1_CURRENT_FILEPATH")) as name of the flow.
Here is my TMap:
Your job is set up so your lookup is iterative which causes some issues as Talend only seems to use the last iteration rather than doing what you might expect and iterating through every step for everything it needs (although this might be more complicated than you first think).
One option is to rework the job so you use your iterate part of the job as the main input to the tMap rather than the lookup.
Alternatively, you could iterate the data into a tBufferOutput component and then OnSubjobOk you could link the job as before but replace the iterative part with a tBufferInput component as it will store all of the data from all of the files iterated through.