Iteration among components in talend 5.6 - talend

I have a filename and I have to check if it is there in physical location. If it is there then I have to increment like filename_1,filename_2 etc and check If filename_1 exists in physical location and in database if it is there in any one of these then increment again and check in both physical location,database until I get a file name that is not there in both places.
But any link is not going back for iterating though components in talend.
when i found a filename not in both places then i have to create a file with that name in physical location and update in database.

Use these components :
tLoop : for loop, from 1 to 9999 (as infinity)
tFileExists : check for "filename_"+((Integer)globalMap.get("tLoop_1_CURRENT_VALUE"))
if not exists : write new file with this name "filename_"+((Integer)globalMap.get("tLoop_1_CURRENT_VALUE")) and update the tLoop_1_CURRENT_VALUE beyond the max of the loop to get off the loop, use a tJava and this line of code globalMap.put("tLoop_1_CURRENT_VALUE",9999))
Dont forget to check the first file filename separatly before the loop.

Related

Is there a difference in performance between set/save when saving columns to tables?

I have a small utility that checks for new columns for an intraday hdb and adds new columns.
At the moment I am using :
.[set;(pth;?[data;();();cls]);{[p;e] .log.error[.z.h;"Failed to save to path [",string[p],"] with error :",e]}[pth;]]
where path is :
`:path_to_hdb/2022.03.31/table01/newDummyThree
and
?[data;();();cls] // just an exec statement
Would it make any difference to use save instead:
.[save;(pth;?[data;();();cls]);{[p;e] .log.error[.z.h;"Failed to save to path [",string[p],"] with error :",e]}[pth;]]
Yes. If you are adding entire columns to a table then you might want to store it splayed, i.e. as a directory of column files rather than as a single table file. This means using set rather than save.
https://code.kx.com/q/kb/splayed-tables/
But test actual example updates.
As mentioned in the documentation for save:
Use set instead to save
a variable to a file of a different name
local data
So set has the advantage of not requiring a global and you can name the file a different name to the name of your in-memory global variable.
There is no difference in how they serialise/write the data. In fact, save uses set under the covers anyway:
q)save
k){$[1=#p:`\:*|`\:x:-1!x;set[x;. *p]; x 0:.h.tx[p 1]#.*p]}'
By the way - you can't use save in the way that you've suggested in your post. save takes a symbol as input and this symbol is the symbol name of your global variable containing the data you want to write.

Azure-data-Factory Copy data If a certain file exists

I have many files in a blob container. However I wanted to run a Stored procedure only IF a certain file (e.g. SRManifest.csv) exists on the blob Container. I used Get metadata and IF Condition on Data Factory. Can you please help me with the dynamic script for this. I tried this #bool(startswith(
activity('Get Metadata1').output.childitems.ItemName,
'SRManifest.csv')). It doesnt work.
Then I thought, what if i used #greaterOREquals(activity('Get Metadata1').output.LastModified,adddays(utcnow(),-2))But this checks the last modified within 2 days of the Bloob not the file exist. Thank you.
Please see below my diagram
I have understood your requirement differently I think.
I wanted to run a Stored procedure only IF a certain file (e.g. SRManifest.csv) exists on the blob Container
1 Change your metadata activity to look for existence of sentinel file (SRManifest.csv)
2 Follow with an IF activity, use this condition:
3 Put your sp in the True part of the IF activity
If you also needed the file list passed to the sp then you'll need the GetMetadata with childitems option inside the IF-True activity
Based on your diagram, since you are looping over all the blob names already, you can add a Boolean variable to the pipeline and set its default value to false:
Inside the ForEach activity, you only want to attempt to set the variable if the value is still false, and if the blob name is found, set it to true. Since Set Variable cannot be self-referential, do this inside the False branch of an If activity:
This will only attempt to process if the value is false (so the file name has not been found yet), and will do nothing if the value is true. Now set the variable based on your file name:
[NOTE: This value can be hard coded, parameterized, or based on a variable]
When you execute the pipeline, you'll see the Set Variable stops attempting once the value is set to true:
In the main pipeline, after the ForEach activity has completed, you can use the variable to set the condition of your final If activity. If the blob is never found, it will still be false, so put the Stored Procedure activity inside the True branch.

Matlab loop: ignores variable definition when loading something from directories

I am new to MATLAB and try to run a loop within a loop. I define a variable ID beforehand, for example ID={'100'}. In my loop, I then want to go to the ID's directory and then load the matfile in there. However, whenever I load the matfile, suddenly the ID definition gets overridden by all possible IDs (all folders in the directory where also the ID 100 is). Here is my code - I also tried fullfile, but no luck so far:
ID={'100'}
for subno=1:length(ID) % first loop
try
for sessno=1:length(Session) % second loop, for each ID there are two sessions
subj_name = ([ID{subno} '_' Session{sessno} '_vectors_SC.mat']);
cd(['C:\' ID{subno} '\' Session{sessno}]);
load(subj_vec_name) % the problem occurs here, when loading, not before
end
end
end
When I then check the length of the IDs, it is now not 1 (one ID, namely 100), but there are all possible IDs within the directory where 100 also lies, and the loop iterates again for all other possible IDs (although it should stop after ID 100).
You should always specify an output to load to prevent over-writing variables in your workspaces and polluting your workspace with all of the contents of the file. There is an extensive discussion of some of the strange potential side-effects of not doing this here.
Chances are, you have a variable named ID inside of the .mat file and it over-writes the ID variable in your workspace. This can be avoided using the output of load.
The output to load will contain a struct which can be used to access your data.
data = load(subj_vec_name);
%// Access variables from file
id_from_file = data.ID;
%// Can still access ID from your workspace!
ID
Side Note
It is generally not ideal to change directories to access data. This is because if a user runs your script, they may start in a directory they want to be in but when your program returns, it dumps them somewhere unexpected.
Instead, you can use fullfile to construct a path to the file and not have to change folders. This also allows your code to work on both *nix and Windows systems.
subj_name = ([ID{subno} '_' Session{sessno} '_vectors_SC.mat']);
%// Construct the file path
filepath = fullfile('C:', ID{subno}, Session{sessno}, subj_name);
%// Load the data without changing directories
data = load(filepath);
With the command load(subj_vec_name) you are loading the complete mat file located there. If this mat file contains the variable "ID" it will overwrite your initial ID.
This should not cause your outer for-loop to execute more than once. The vector 1:length(ID) is created by the initial for-loop and should not get overwritten by subsequent changes to length(ID).
If you insert disp(ID) before and after the load command and post the output it might be easier to help.

IDCAMS LISTCAT deleting VSAM file when next step is IEFBR14

I have a requirement wherein I need to check if a VSAM file exists or not. If it is not present then I need to create it like TEST.FILE2. My JCL is as :
//STEP01 EXEC PGM=IDCAMS
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
LISTCAT ENTRIES('BRTEST.FILE1')
/*
//STEP02 EXEC PGM=IEFBR14,COND=(4,GT)
//DD01 DD DSN=BRTEST.FILE1,
// DISP=(,CATLG,DELETE),
// LIKE=BRTEST.FILE2
//SYSPRINT DD SYSOUT=*
//SYSOUT DD SYSOUT=*
But a stange thing is happening. Whenever I execute this JCL, STEP001 return a return code as 004 even though the file is already present, and a new file is created in STEP02. So if I submit this JCL twice, a new file is created both the times. I am not able to understand how the file is getting deleted. And the strange thing is if I run the JCL without STEP02 then it gives MAXCC as 0 saying that the file was found in catalog.
I was able to achieve my requirement by following code, but would still like to understand why and how my VSAM file gets deleted for LISTCAT.
//STEP02 EXEC PGM=IEFBR14,COND=(4,GT)
//DD01 DD DSN=BRTEST.FILE1,
// DISP=(MOD,CATLG,CATLG),
// LIKE=BRTEST.FILE2
//SYSPRINT DD SYSOUT=*
//SYSOUT DD SYSOUT=*
Here is the SYSPRINT when only STEP01 is executed:
IDCAMS SYSTEM SERVICES TIME: 03:47:44
LISTCAT ENTRIES('BRTEST.FILE1')
CLUSTER ------- BRTEST.FILE1
IN-CAT --- CATALOG.TEST03
DATA ------- BRTEST.FILE1.DATA
IN-CAT --- CATALOG.TEST03
INDEX ------ BRTEST.FILE1.INDEX
IN-CAT --- CATALOG.TEST03
IDCAMS SYSTEM SERVICES TIME: 03:47:44
THE NUMBER OF ENTRIES PROCESSED WAS:
AIX -------------------0
ALIAS -----------------0
CLUSTER ---------------1
DATA ------------------1
GDG -------------------0
INDEX -----------------1
NONVSAM ---------------0
PAGESPACE -------------0
PATH ------------------0
SPACE -----------------0
USERCATALOG -----------0
TAPELIBRARY -----------0
TAPEVOLUME ------------0
TOTAL -----------------3
THE NUMBER OF PROTECTED ENTRIES SUPPRESSED WAS 0
IDC0001I FUNCTION COMPLETED, HIGHEST CONDITION CODE WAS 0
IDC0002I IDCAMS PROCESSING COMPLETE. MAXIMUM CONDITION CODE WAS 0
And when both steps are executed:
IDCAMS SYSTEM SERVICES TIME: 03:48:35
LISTCAT ENTRIES('BRTEST.FILE1')
IDC3012I ENTRY BRTEST.FILE1 NOT FOUND
IDC3009I ** VSAM CATALOG RETURN CODE IS 8 - REASON CODE IS IGG0CLEG-42
IDC1566I ** BRTEST.FILE1 NOT LISTED
IDCAMS SYSTEM SERVICES TIME: 03:48:35
THE NUMBER OF ENTRIES PROCESSED WAS:
AIX -------------------0
ALIAS -----------------0
CLUSTER ---------------0
DATA ------------------0
GDG -------------------0
INDEX -----------------0
NONVSAM ---------------0
PAGESPACE -------------0
PATH ------------------0
SPACE -----------------0
USERCATALOG -----------0
TAPELIBRARY -----------0
TAPEVOLUME ------------0
TOTAL -----------------0
THE NUMBER OF PROTECTED ENTRIES SUPPRESSED WAS 0
IDC0001I FUNCTION COMPLETED, HIGHEST CONDITION CODE WAS 4
IDC0002I IDCAMS PROCESSING COMPLETE. MAXIMUM CONDITION CODE WAS 4
The value for ZOS390RL variable is z/OS 02.01.00 and ZENVIR is ISPF 7.1MVS TSO.
May have an answer for you. Didn't think of it because it is a VSAM dataset, and the way you are trying to do it is unusual (to me).
There is/was a product called UCC11. I think it is now marketed by Computer Associates, and called CA-11 (or somesuch). I think you are using this product or something similar at your site.
If executed at the beginning of a JOB it would look for files specified as NEW and CATLG, and look to see if there was an existing file of the same name in the catalog. If there was, the existing file would be deleted.
This would obviate the need for an initial IEFBR14 step to delete such files.
I think that you are using this product, or something similar. Your file is being automatically deleted when it exists, so your IDCAMS step, which is reading data from a file (even if it is SYSIN and DD *) is not known to the product, so your VSAM file is deleted before your IDCAMS step is run.
Changing the file to MOD as the initial disposition (MOD will add to an existing file and create a new file if none exists) will not cause such a product to delete the file.
Using the LIKE for a VSAM file will not obtain the CA-size and CI-SIZE from the model dataset. You will get default values for those, which may well impact on the performance of your programs. You cannot specify these values when defining a VSAM file in JCL. You also won't get buffer values from the model dataset, but you can specify those separately in the JCL (which you haven't).
Here is a description of what LIKE does for you: http://publibfp.dhe.ibm.com/cgi-bin/bookmgr/BOOKS/iea2b680/12.40?DT=20080604022956
The following attributes are copied from the model data set to the
new data set:
Data set organization
Record organization (RECORG) or
Record format (RECFM)
Record length (LRECL)
Key length (KEYLEN)
Key offset (KEYOFF)
Type, PDS, PDSE, basic format, extended format, large format, or HFS (DSNTYPE)
Space allocation (AVGREC and SPACE)
Unless you explicitly code the SPACE parameter for the new data set,
the system determines the space to be allocated for the new data
set by adding up the space allocated in the first three extents of the
model data set. Therefore, the space allocated for the new data set
will generally not match the space that was specified for the model
data set. Note that regardless of the units in which the model data
set was allocated, the new data set will be allocated in tracks. This
assumes that space was not specified on the JCL and is being picked up
from the model data set.
There are some other little "gotchas", like in the last paragraph, detailed in the link as well.
Unless you have strong reasons otherwise, I'd strongly suggest doing the whole thing in one IDCAMS step (as below).
I suspected it was going to be 1.12, 1.13 or 2.1 (2.01). IEFBR14 is, subtly, part of the OS now.
Exactly why you get this effect, I don't know. I don't have access to 2.1, so can't investigate myself.
IEFBR14 has changed, LIKE is not really intended for VSAM datasets (you'll get a lot of default values for things you may or may not want), it's not really a "usual" way to do this. See Suggestions below.
Try adding a DDname to your IDCAMS step which just references the VSAM dataset. See if that changes anything. Use that DDname in an IDCAMS statement. See if that changes anything.
Take all your results to your Sysprogs, and see if they can spot anything.
If not, it'll be PMR-time: http://www-01.ibm.com/support/docview.wss?uid=swg21507639
If you do raise a PMR, please update by adding an Answer with the resolution once you receive it.
Suggestions.
Find out how this task is done by other people at your site.
Have you tried using the VSAM file you have defined in that way? You should LISTCAT TEST.FILE1 and TEST.FILE2 and compare. If you look up LIKE in the JCL Reference, you will see that there are things which a VSAM DEFINE can do which you can't do for a VSAM file defined in the JCL using LIKE.
Unless there is some reason otherwise, I'd suggest you do the whole thing in one step with IDCAMS. See if the file exists, use IDCAM's IF to to test the CC from that and only DEFINE if the file does not exist. You can use a MODEL (for instance on your TEST.FILE2) to get everything which is similar to another file, and just override anything different that you need.
If you have a look here, http://pic.dhe.ibm.com/infocenter/zos/v1r13/index.jsp?topic=%2Fcom.ibm.zos.r13.idai200%2Fdefclu.htm, you will find a number of Modal Commands for IDCAMS which will give you everything you need to define if it is not there, and do something different (set the Condition Code, for instance) if it is.
Please still supply the requested information. It is an interesting question on the face of it, which may have a simple solution. But even with a solution, I don't think it is what you want.
What you want to do can be done entirely in the IDCAMS step. You can inspect the return code from a previous operation (ie, the LISTCAT) and do something (like define a new cluster) if the code is greater than 0. If the 2nd step works, then set the MAXCC to return 0 to tell your JCL that this step completed OK (and to let your Ops folks know this too).
Look for the IDCAMS 'IF'.

Talend How To Pass Last Modified File Into TFileInputDelimited?

I have searched all over, and read this post.
But it doesn't seem complete and doesn't work.
The situation: I need to get the last modified file from a directory on the local machine. I then need to pass that file into the fileinputdelimited component.
I currently have:
tfilelist --> iterate --> titeratetoflow --> tsamplerow
-->tflowtoiterate -> tinpufiledelimited ---> tlogrow (just to make sure its pulling the right file)
But it doesn't work. I have configured it. so that titeratetoflow has a column called
"FileName" with "((String)globalMap.get("CURRENT_FILE"))" as the value,
"FileDirectory" with ((String)globalMap.get("CURRENT_FILEDIRECTORY")) as value, and
"FileAndDirectory" with ((String)globalMap.get("CURRENT_FILEPATH")) as value.
The tsamplerow is limited to "1".
The tiflowtoiterate is set so that
"FileNameOnly" is value of "FileName"
"FileDirectoryOnly" is "FileDirectory" and
"FilePathComplete" is "FileAndDirectory"
In the File location field of the tinputfiledelimited, I have "((String)globalMap.get("FilePathComplete"))"
When it runs I get an error saying cannot find file or path. If I cut out the fileinput component and have it send straight to the tlogrow, it shows a single line of blank entry.
Any ideas?
I'm not sure if you've just slightly misconfigured the job here but it seems to work fine for me.
Here's a few screenshots showing my job design:
The only thing I can think of just by looking at your post is that you might have slightly messed up the key value pair combinations in the tFlowToIterate. I tend to find that the default settings there work fine pretty much all of the time and it makes it a little more obvious what it's doing as well.
EDIT: Actually, it looks like you might be using the wrong values in your tIterateToFlow. The tFileList will throw the values for the file paths etc in to the global map but it will preface it with the unique component name. If you hit ctrl+space in the value window it should prompt you with a list of available values (these are also specified in the "Outline" tab of the studio). It typically makes an implicit conversion to String but for this you will need to explicitly convert it so use .toString() instead of (String).
Another way to get last modified file is as below
tFileList(sorted DESC by file modified date) ------> tFixedFlowInput (schema - filename, filenumber) ----->tHashOutput
here in tFixedFlowInput
filename = file(String)globalMap.get("tFileList_1_CURRENT_FILEPATH")+"/"+(String)globalMap.get("tFileList_1_CURRENT_FILE")
filenumber = (Integer)globalMap.get("tFileList_1_NB_FILE")
What above will accomplish is get list of all files in the directory with their number/rank - where the file last modified will have file number =1 and next to that will have 2...and so on.
Now on SubJobOK of above tFileList you can have tHashInput which will read from above tHashOutput and filter only row where filenumber==1 - which means the last modified file.
tHashInput (link to tHashoutput) ---->tFilterRow(filenumber==1)------>tLogRow
One reason why you are getting null is probably you have used globalMap.get("CURRENT_FILEPATH) instead of globalMap.get("tFileList_1_CURRENT_FILEPATH")
The Simple Solution for above problem could be as below:
tFileList(sorted ASC by file modified date)--> tIterateToFlow --> tJava( just to end the subjob).
Then on
subjob ok --> tfileinput ( use (String)globalMap.get("tFileList_1_CURRENT_FILE") or (String)globalMap.get("tFileList_1_CURRENT_FILEPATH") as a file name/file path)
Explanation:
Since tFileList iterates all the files in ASC order, it will always have Latest file name stored in globalMap for the last iteration. The list is only iterated till tIterateToFlow hence after this component (String)globalMap.get("tFileList_1_CURRENT_FILE") will always give the last file name from the iterated list, which is the latest file in out case.
Main Flow :
Component View: