How to get the count of an Excel in unix or Linux in datastage - datastage

Is there any possibility i can get the count of an excel in unix or linux.
I tried creating server routine & i am able to get the output link count but i am unable to get return value into a file or variable.
Please suggest,

There are lots of information (besides the data itself) that can be extracted from an Excel file - check out the documentation
I could imageine two options for your goal:
check a certain cell which would be filled/exist if the file is not empty
check the filesize of an empty file and if the filesize is bigger than that it is not empty

The Best approach will be,
Use 'Unstructured Data' stage from File section in palette to read the xlsx(excel file).
Use Aggregator stage to take count of row , consider any column which will always be there in case the data comes into the file.
Write this count into a file.
use this as per your logical requirement.
After point 2 you can use transformer stage to handle count logic, whichever you find suitable.

Related

Logging a counter value to a batch name in siemens TIA Portal

I need to create a program for 1214 PLC in TIA Portal and a Comfort HMI that counts several products using a count up and stores that value to a specific batch name.
For every new batch, the operator would enter a new batch name, and the counter will count the products for that specific batch.
The count needs to be displayed on the HMI screen along with the history of batches and the associated final count number.
So basically, I need a way to attach a name (batch_id) to a final count and log that pair for later reference.
Can someone give me some advice as to how I would do that?
To clarify, I need help with storing and displaying the counter value and batch names, not with the counting itself.
I appreciate any help you can provide.
There are a few ways to do this (yes, you can use PLC data logs and no they don't have to create a separate file for each batch), but I am posting here what I would do, because it's convenient for data backups, I have taken this approach before, and know it works.
Write the count value (generated in the PLC), the batch value and the timestamp to a CSV file on a USB drive inserted into the Comfort HMI, using VBScripts on the HMI.
Split the files regularly - e.g. daily, weekly or monthly, to minimize the risk of any single file becoming corrupt and you losing the data. More detail follows.
Data Storage:
Count is calculated in the PLC. Batch ID and timestamp can be stored in the PLC (if you want it to be retentive after a power cut), or in the HMI.
You will have Comfort HMI tags representing each of these three values. Once a batch is complete, call a VB script that writes the values of these values to CSV file. There are application examples and forum entries on SIOS about this.
Data display as a table:
Read the CSV file values according to your filter criteria (day, time range, batch ID, batch ID range, etc) using a VB script. Write to internal HMI tags.
Display these internal HMI tags as IO fields on a Comfort panel screen. This is your custom-built table and yes it's the only way to do it unless you want to create a custom control and install it on the panel.
Backing up:
Disable logging and check USB is not in use using a script, e.g. this: https://support.industry.siemens.com/cs/document/89855157
Remove the USB, copy the files, re-insert it and activate logging again.
(you implement the 'disable' and 'activate' logging features, e.g. using an internal BOOL tag that prevents a script from executing).
There is a lot of info on SIOS about these topics, as Application Examples, FAQs and forum entries.
support.industry.siemens.com
The PLC log method works, but data backup and especially display can become a pain.

Using Talend Open Studio DI to extract extract value from unique 1st row before continuing to process columns

I have a number of excel files where there is a line of text (and blank row) above the header row for the table.
What would be the best way to process the file so I can extract the text from that row AND include it as a column when appending multiple files? Is it possible without having to process each file twice?
Example
This file was created on machine A on 01/02/2013
Task|Quantity|ErrorRate
0102|4550|6 per minute
0103|4004|5 per minute
And end up with the data from multiple similar files
Task|Quantity|ErrorRate|Machine|Date
0102|4550|6 per minute|machine A|01/02/2013
0103|4004|5 per minute|machine A|01/02/2013
0467|1264|2 per minute|machine D|02/02/2013
I put together a small, crude sample of how it can be done. I call it crude because a. it is not dynamic, you can add more files to process but you need to know how many files in advance of building your job, and b. it shows the basic concept, but would require more work to suite your needs. For example, in my test files I simply have "MachineA" or "MachineB" in the first line. You will need to parse that data out to obtain the machine name and the date.
But here is how may sample works. Each Excel is setup as two inputs. For the header the tFileInput_Excel is configured to read only the first line while the body tFileInput_Excel is configured to start reading at line 4.
In the tMap they are combined (not joined) into the output schema. This is done for the Machine A Excel and Machine B excels, then those tMaps are combined with a tUnite for the final output.
As you can see in the log row the data is combined and includes the header info.

How to increment a number from a csv and write over it

I'm wondering how to increment a number "extracted" from a field in a csv, and then rewrite the file with the number incremented.
I need this counter in a tMap.
Is the design below a good way to do it ?
EDIT: im trying a new method. see the design of my subjob below, but i have an error when i link the tjavarow to my main tmap in the main job
Exception in component tMap_1
java.lang.NullPointerException
at mod_file_02.file_02_0_1.FILE_02.tFileList_1Process(FILE_02.java:9157)
at mod_file_02.file_02_0_1.FILE_02.tRowGenerator_5Process(FILE_02.java:8226)
at mod_file_02.file_02_0_1.FILE_02.tFileInputDelimited_2Process(FILE_02.java:7340)
at mod_file_02.file_02_0_1.FILE_02.runJobInTOS(FILE_02.java:12170)
at mod_file_02.file_02_0_1.FILE_02.main(FILE_02.java:11954)
2014-08-07 12:43:35|bm9aSI|bm9aSI|bm9aSI|MOD_FILE_02|FILE_02|Default|6|Java
Exception|tMap_1|java.lang.NullPointerException:null|1
[statistics] disconnected
enter image description here
You should be able to do this mid flow in a tMap or a tJavaRow.
Simply read the number in as an integer (or other numeric data type) and then add your increment to it.
A really simple example might look like this:
Here we have a tFixedFlowInput that has some hard coded values for the job:
And we run it through a tMap where we add 1 to the age column:
And finally, we output it to the console in a table:
EDIT:
As Gabriele B has pointed out, this doesn't exactly work when reading and writing to the same flat file as Talend claims an exclusive read-write lock on the file when reading and keeps it open throughout the job.
Instead you would have to write the incremented data to some other place such as a temporary file, a database or even just to the buffer and then read that data in to a separate job which would then output the file you want and clean up anything temporary.
The problem with that is you can't do the output in the same process. I've just tried testing reading in the file in one child job, passing the data back to a parent job using a tBufferOutput and then passing that data to another child job as a context variable and then trying to output to the file. Unfortunately the file lock remains on it so you can't do this all in one self contain job (even using a parent job and several child jobs).
If this sounds horrible to you (it is) and you absolutely need this to happen (I'd suggest a database table sounds like a better match for this functionality than a flat file) then you could raise a feature request on the Talend Jira for the tFileInputDelimited to not hold the file open or to not insist on an exclusive read-write lock on the file.
Once again, I strongly recommend that you move to using a database table for this because even without the file lock issue, this is definitely not the right use of a flat file and this use case perfectly fits a database, even something as lightweight as an embedded H2 database.

TfileList catches one of the 6 files only

I tried to display some results from several files in a directory. I use TFileList, and 2 tFileInputDelimited which are both linked to TFileList. I don't know why but at the end of the processing my results are lugged from just one of the 6 files I want. It appears that there are results from the list file of the directory.
Each tFileInputDelimited has ((String)globalMap.get("tFileList_1_CURRENT_FILEPATH")) as name of the flow.
Here is my TMap:
Your job is set up so your lookup is iterative which causes some issues as Talend only seems to use the last iteration rather than doing what you might expect and iterating through every step for everything it needs (although this might be more complicated than you first think).
One option is to rework the job so you use your iterate part of the job as the main input to the tMap rather than the lookup.
Alternatively, you could iterate the data into a tBufferOutput component and then OnSubjobOk you could link the job as before but replace the iterative part with a tBufferInput component as it will store all of the data from all of the files iterated through.

how to skip the header while processing message in Mirth

I am using Mirth 3.0. I am having a file having thousands of records. The txt file is having 3 liner header. I have to skip this header. How can I do this.
I am not suppose to use batch file option.
Thanks.
The answer is a real simple setting change which you need to make.
I think your input source data type is delimited text.
Go to your channel->Summary tab->Set data type->Source 1 Inbound Properties->Number of header record set it to 3.
What mirth will do is, to skip the first 3 line records from the file as they will be considered as headers.
If there is some method of identifying the header records in the file, you can add a source filter that uses a regular expression to identify and ignore those records.
Such result can be achieved using the Attachment script on the Summary tab. There you deal with a message in its raw format. Thus, if your file contains three lines of comments and then the first message starts with the MSH segment you may use regular JavaScript functions to subtract everything up to MSH. The same is true about the Preprocessor script, and it's even more logical to do such transformation there. The difference is that the Mirth does not store the message before it hits the Attachment handler, but it stores it before the Preprocessor handles the message.
Contrary to that, the source filter deals with the message serialized to the E4X XML object, where the serialization process may fail because of the header (it depends on the inbound message data type settings).
As a further reading I would recommend the "Unofficial Mirth Connect Developer's Guide".
(Disclaimer: I'm the author of this book.)
In my implementation the header content remains same so in advance I know how much lines the header is going to take so inside source filter I am using the following code.
delete msg["row"][1];delete msg["row"][1];return true;
I am using delete statement twice becoz after executing the first delete statement MSG will be having one less row and if header accommodates more that a single row then second delete statement is required.