JCL ICEMAN how many sort files are needed? - jcl

I am working with JCL and there is what is called an ICEMAN which is which is invoked when the IBM SORT utility DFSORT is used. DFSORT can be used to SORT, COPY or MERGE files, amongst other things. In the example below there the output is from a SORT. My question is how many sortwork (//SORTWK01 DD UNIT=SYSDA,SPACE=(CYL,30)) files are needed. They seem to me to always vary in number when I see them in JCL. Is there a formula for this to figure the size of how many SORTWKnns are needed?
JCL Code:
//STEP5 EXEC PGM=ICEMAN,COND=(4,LT)
//SYSOUT DD SYSOUT=1
//SYSIN DD DSN=CDP.PARMLIB(cardnumberhere),DISP=SHR
//SORTIN DD DSN=filename,DISP=SHR
//SORTOUT DD DSN=filename,DISP=(OLD,KEEP),
// DCB=(LRECL=5000,RECFM=FB),
// SPACE=(CYL,30)
//SORTWK01 DD UNIT=SYSDA,SPACE=(CYL,30)
//SORTWK02 DD UNIT=SYSDA,SPACE=(CYL,30)
//SORTWK03 DD UNIT=SYSDA,SPACE=(CYL,30)
//SORTWK04 DD UNIT=SYSDA,SPACE=(CYL,30)

It is common for JCL to be copied from one jobstream to the next, and the next, and the next, resulting in replicative fade.
According to the documentation...
//SORTWKdd DD
Defines intermediate storage data sets. Usually needed for a sorting
application unless dynamic allocation is requested. Will not
be used for a copying or merging application.
Dynamic allocation is requested via the DYNALLOC option. Some shops have this set as the default.
If you want, you can manually calculate the required work space. Typically 1.5 to 2 times the size of the input file is adequate. Always allocate more than 1 SORTWKdd DD statement for efficiency. Avoid allocating a large number of SORTWKdd DD statements.

EXEC PGM=ICEMAN
and
EXEC PGM=SORT
will give you the same result. They are ALIASes of each other, and the same program is executed whichever PGM= is specified.
As cschneid has indicated well, SORTWKnn are "sort work datasets", and the tendency to copy JCL without reference to existing "standard" datasets leads to a lot of overallocation of work dataset space.
Workspace for SORT can be specified in two ways, either manually (putting in the SORTWKnn files, and the maximum number is far in excess of 15) or dynamically using DYNALLOC.
DYNALLOC is the recommended approach, as workspace will be allocated on what is understood, by SORT, to be needed. Lookup the associated installation options/overrides on the OPTION statement as well.
Typically, there will be default DYNALLOC values which will deal with the majority of SORT steps, and then specific OPTION parameters will be provided for exceptionally large SORTs.
Manual definition of SORTWKnn datasets in a jobstep will "turn off" any dynamic allocation for that step.
Specific definition of SORTWKnn datasets is sometimes convenient, but not often. The space needed is probably closer to 1.2 times input file these days. You can check the SYSOUT from a typical run of a particular jobstep and see how much space was actually used, adjusting the primary SORTWKnn space or number of datasets to a better fit if there is over-/under-allocation.
It is often a good idea to specify additional information (average record-length, estimated number of records) when DYNALLOC is used for a SORT invoked by a programming language. This is because SORT may not be able to "see" the input dataset, so does not have much information for estimating the workspace required.
Separately, it is best to leave all DCB information off output files. SORT will provide correct DCB information from the input dataset and taking into account any manipulations on the data within SORT Control Cards. If you leave DCB infomation in the JCL (the LRECL, RECFM) you have two places to change it whenever the file changes, rather than one.
In your actual example, over 100 cylinders of space are allocated unnecessarily whilst the step is running. This type of thing, when applied to many JOBs, can lead to failures in other JOBs and even the purchase/charging of/for additional DASD (disk space) which is not needed.

Related

How to delete memory usage during an Experiment?

I am constructing an experiment in Anylogic, which saves data in the Parameter variation tab under a custom-class list. The model needs to perform a lot of simulations, and repetitions to optimize for Setting variables in the model itself. After x amount of iterations, I use a Python connector to run some code in finding new possible parameters for the underlaying model.
The problem I am having right now, is that around Simulation-run number 200, the memory usage is maximum (4Gb), and it proceeds to run super-slow. I have found some interesting ways to cut on memory usage, but I believe there is only one thing that could help me right now: let the system delete memory that is used for past iterations. After each iteration, the data of a simulation is stored, so I am fine with anylogic deleting the logs of the specific simulation afterwards.
Is such a thing possible? If so, how can I implement that?
Java makes use of a Garbage collector to manage memory usage and you have no control over it. How it works is that every now and then, based on some internal logic, it will collect and remove all instances of classes in memory that do not contain any active references and remove them.
Thus to reduce memory you must ensure that any instances that are no longer needed are not referenced by any of the objects currently active in your model.
To identify these you must use a Java profiler like JProfiler, or some of the free alternatives - see here for more.
This will show you exactly what classes are using up all your memory and with some deep diving you should be able to identify who is keeping reference to them.

How to create a list of hex number in 8085 assembler?

I need to know how to create a list of number in 8085 assembler and store the list in successive memory location?
Most assemblers will have a set of define statements and a way to specify different bases.
For example, the values zero, one and forty-two, along with a short nul-terminated string, may be created with something like:
some_vals: db 0, 1, 2Ah, 'hello', 0
How your assembler does it is probably in the documentation somewhere. Without more specific details on which assembler you're using, this not much more help I can give.
Pseudo-ops like data definition (db, dw, ds), address specification (org) or label setting (mylabel:) do not generally form part of the processor documentation itself, rather they're a function of the assembler.
See, for example, chapter 4 of this document. I particularly love the fact that we used to buy these 200-page books for $3.95 whereas now you'll be shelling out a hundred bucks for a digital copy with no incremental cost of production :-)

Why the immediate offset in the riscv's JAL instruction has bit order changed?

The bit field is shown below
I don't see the point the of doing this re-ordering of bit-field.
Is there a special kind of manipulation when RISC-V processor is executing this instruction?
The purpose of the shuffling is to reduce the number of muxs involved in constructing the full sized operand from the immediates across the different instruction types.
For example, the sign-extend bit (which drives a lot wires) is always the same (inst[31]). You can also see that imm[10] is almost always in the same place too, across I-type, S-type, B-type, and J-type instructions.

Watson Retrieve and Rank/ Discovery Service return always table of content with high(est) score

Backgroud:
I'm using Watson Retrieve and Rank/ or Discovery Service to retrieve information from user manuals. I performed the training with an example washing machine manual in pdf format.
My target is to receive the best passages from the document where a specific natural language string occurs (like "Positioning the drain hose"). Which is working in general.
My problem is that the table of content is almost always the passage with the highest score.
Therefore are the first results just the table of content instead of the relevant text passage. (See example results)
"wrong" result (table of content):
Unpacking the washing machine ----------------------------------------------------2 Overview of the washing machine --------------------------------------------------2 Selecting a location -------------------------------------------------------------------- 3 Adjusting the leveling feet ------------------------------------------------------------3 Removing the shipping bolts --------------------------------------------------------3 Connecting the water supply hose ------------------------------------------------- 3 Positioning the drain hose ----------------------------------------------------------- 4 Plugging in the machine
"correct" result
Positioning the drain hose The end of the drain hose may be positioned in three ways: Over the edge of a sink The drain hose must be placed at a height of between 60 and 90 cm. To keep the drain hose spout bent, use the supplied plastic hose
possible Solutions
ignoring the table of content during training process
offset parameter to e.g. ignore the first 3 results
find out whether the result is part of table of content and ignore if YES
Those approaches are static and don't applicable for multiple documents with various structures (table of content at the beginning/ at the end/ no table of content, ...).
Has someone an idea to better approach this topic?
At this time, passage retrieval results are not affected by relevancy training. As passage retrieval always searches the entire corpus, unfortunately the only reliable way of excluding passage retrieval results from a table of contents is to remove the table of contents.

How should I store my large MATLAB data files during analysis?

I am having issues with 'data overload' while processing point cloud data in MATLAB. This is what I am currently doing:
I begin with my raw data files, each in the order of ~30Mb each.
I then do initial processing on them to extract n individual objects and remove outlying points, which are all combined into a 1 x n structure, testset, saved into testset.mat (~100Mb).
So far so good. Now things become complicated:
For each point in each object in testset, I will compute one of a number of features, which ends up being a matrix of some size (for each point). The size of the matrix, and some other properties of the computation, are parameters of the calculations. I save these computed features in a 1 x n cell array, each cell of which contains an array of the matrices for each point.
I then save this cell array in a .mat file, where the name specified the parameters, the name of the test data used and the types of features extracted. For example:
testset_feature_type_A_5x5_0.2x0.2_alpha_3_beta_4.mat
Now for each of these files, I then do some further processing (using a classification algorithm). Again there are more parameters to set.
So now I am in a tricky situation, where each final piece of the initial data has come through some path, but the path taken (and the parameters set along that path) are not intrinsically held with the data itself.
So my question is:
Is there a better way to do this? Can anyone who has experience in working with large datasets in MATLAB suggest a way to store the data and the parameter settings more efficiently, and more integrally?
Ideally, I would be able to look up a certain piece of data without having to use regex on the file strings—but there is also an incentive to keep individually processed files separate to save system memory when loading them in (and to help prevent corruption).
The time taken for each calculation (some ~2 hours) prohibits computing data 'on the fly'.
For a similar problem, I have created a class structure that does the following:
Each object is linked to a raw data file
For each processing step, there is a property
The set method of the properties saves the data to file (in a directory with the same name as
the raw data file), stores the file name, and updates a "status" property to indicate that this step is done.
The get method of the properties loads the data if the file name has been stored and the status indicates "done".
Finally, the objects can be saved/loaded, so that I can do some processing now, save the object, later load it and I immediately know how far along the particular data set is in the processing pipeline.
Thus, the only data in memory is the data that is currently being worked on, and you can easily know which data set is at which processing stage. Furthermore, if you set up your methods to accept arrays of objects, you can do very convenient batch processing.
I'm not completely sure if this is what you need, but the save command allows you to store multiple variables inside a single .mat file. If your parameter settings are, for example, stored in an array, then you can save this together with the data set in a single .mat file. Upon loading the file, both the dataset and the array with parameters are restored.
Or do you want to be able to load the parameters without loading the file? Then I would personally opt for the cheap solution of having a second set of files with just the parameters (but similar filenames).