How te read the size of a pds member using lminit,lmmfind..? - rexx

I want to read size of members of a pds
For example:-
my pds name is - my.pds.cics
If i browse this pds I will find details like below:
name prompt size created ..............
____PDS1 0051e 25/03/2016
____PDS2 006be 25/03/2016
____PDS3 0078e 25/03/2016
____PDS4 0051g 25/03/2016
I want to read the size of all the members of this pds and store it in variable.
ex. var1 = 0051e
Please help me how to o it.I tried using lmmfind. Can anyone help me with the codes in REXX.

Have you looked at the variables available in LMMFIND
if it is a load module, ZLSIZE should hold the size
if it is a FB file and ispf stats are set, ZLCNORCE will hold the number of records. and
size = NumberOfRecord * RecordLength
if it is VB, you are stuffed.
The other option is to treat the file as a recfm=u and write a program and read the raw data. You should be able calculate the approximate size from the Member-start/end positions. There are IBM manuals that document the format.

Related

What are missing attributes as defined in the hdf5 specification and metadata in group h5md?

I have a one hdf5 format file Data File containing the molecular dynamics simulation data. For quick inspection, the h5ls tool is handy. For example:
h5ls -d xaa.h5/particles/lipids/positions/time | less
now my question is based on the comment I received on the data format! What attributes are missing according the hdf5 specifications and metadata in group?
Are you trying to get the value of the Time attribute from a dataset? If so, you need to use h5dump, not h5ls. And, the attributes are attached to each dataset, so you have to include the dataset name on the path. Finally, attribute names are case sensitive; Time != time. Here is the required command for dataset_0000 (repeat for 0001 thru 0074):
h5dump -d /particles/lipids/positions/dataset_0000/Time xaa.h5
You can also get attributes with Python code. Simple example below:
import h5py
with h5py.File('xaa.h5','r') as h5f:
for ds, h5obj in h5f['/particles/lipids/positions'].items():
print(f'For dataset={ds}; Time={h5obj.attrs["Time"]}')

How could I get offset of a field in flatbuffers binary file?

I am using a library, and the library requires me to provide the offset of the desired data in the file, so it can use mmap to read the data (I can't edit the code of this library but only provide the offset).
So I want to use flatbuffers to serialize my whole data because there isn't any packing and unpacking in flatbuffers, (I think) which means that it is easy to get the offset of the desired part in the binary file.
But I don't know how to get the offset. I have tried loading the binary file and calculate the offset of the pointer of the desired field, for example, the address of the root is 1111, the address of the desired field is 1222, so the offset of the field in the binary file is 1222 - 1111 = 111 (because there is no unpacking step). But in fact, the offset of the pointer is a huge negative number.
Could someone help me with this problem? Thanks in advance!
FlatBuffers is indeed very suitable for mmap. There are no offsets to be computed, since the generated code does that all for you. You should simply mmap the whole FlatBuffers file, and then use the field accessors as normal, starting from auto root = GetRoot<MyRootType>(my_mmapped_buffer). If you want to get a direct pointer to the data in a larger field such as a string or a vector, again simply use the provided API: root->my_string_field()->c_str() for example (which will point to inside your mmapped buffer).

Get offset and length of a subset of a WAT archive from Common Crawl index server

I would like to download a subset of a WAT archive segment from Amazon S3.
Background:
Searching the Common Crawl index at http://index.commoncrawl.org yields results with information about the location of WARC files on AWS S3. For example, searching for url=www.celebuzz.com/2017-01-04/*&output=json yields JSON-formatted results, one of which is
{
"urlkey":"com,celebuzz)/2017-01-04/watch-james-corden-george-michael-tribute",
...
"filename":"crawl-data/CC-MAIN-2017-34/segments/1502886104631.25/warc/CC-MAIN-20170818082911-20170818102911-00023.warc.gz",
...
"offset":"504411150",
"length":"14169",
...
}
The filename entry indicates which archive segment contains the WARC file for this particular page. This archive file is huge; but fortunately the entry also contains offset and length fields, which can be used to request the range of bytes containing the relevant subset of the archive segment (see, e.g., lines 22-30 in this gist).
My question:
Given the location of a WARC file segment, I know how to construct the name of the corresponding WAT archive segment (see, e.g., this tutorial). I only need a subset of the WAT file, so I would like to request a range of bytes. But how do I find the corresponding offset and length for the WAT archive segment?
I have checked the API documentation for the Common Crawl index server, and it isn't clear to me that this is even possible. But in case it is, I'm posting this question.
The Common Crawl index does not contain offsets into WAT and WET files. So, the only way is to search the whole WAT/WET file for the desired record/URL. Eventually, it would be possible to estimate the offset because the record order in WARC and WAT/WET files is the same.
After many trial and error I had managed to get a range from a warc file in python and boto3 the following way:
# You have this form the index
offset, length, filename = 2161478, 12350, "crawl-data/[...].warc.gz"
import boto3
from botocore import UNSIGNED
from botocore.client import Config
# Boto3 anonymous login to common crawl
s3 = boto3.client('s3', config=Config(signature_version=UNSIGNED))
# Count the range
offset_end = offset + length - 1
byte_range = 'bytes={offset}-{end}'.format(offset=2161478, end=offset_end)
gzipped_text = s3.get_object(Bucket='commoncrawl', Key=filename, Range=byte_range)['Body'].read()
# The requested file in GZIP
with open("file.gz", 'w') as f:
f.write(gzipped_text)
The rest is optimisation... Hope it helps! :)

Get info about a stack with the tiff class in Matlab

I'm trying to open a stack containing tiff images with Matlab Tiff class and get informations (Height,Width, Number of images) about the stack/the images in it.
My question is the following: I would like to know how to access information from the "class" object, i.e. t in the example below.
Let's say my stack is named 'OriginalStack.tif', then when I type
t = Tiff('OriginalStack.tif','r')
I get the following output:
t =
TIFF File: '/Users/catherine/Documents/MATLAB/OriginalStack.tif'
Mode: 'r'
Current Image Directory: 1
Number Of Strips: 1
SubFileType: Tiff.SubFileType.Default
Photometric: Tiff.Photometric.RGB
ImageLength: 364
ImageWidth: 460
RowsPerStrip: 364
BitsPerSample: 8
Compression: Tiff.Compression.None
SampleFormat: Tiff.SampleFormat.UInt
SamplesPerPixel: 3
PlanarConfiguration: Tiff.PlanarConfiguration.Chunky
ImageDescription: ImageJ=1.48v
images=20
slices=20
loop=false
Orientation: Tiff.Orientation.TopLeft
The height, width and number of images are all correct (obviously), however I don't know hot to get them from t and I have to use imfinfo('OriginalStack.tif') to get this information. Sorry if I'm not using the right terminology here.
Thanks!
By consulting the TIFF Reader documentation, you would use a method called getTag for the TIFF object. What you specify as an input is a string for the field of the Tiff instance you want. For example, if you wanted the samples per pixel, you would do:
samplesPerPixel = t.getTag('SamplesPerPixel');
Make sure you type in how the field is spelled exactly. This is case sensitive. For more information, check out this link: http://www.mathworks.com/help/matlab/import_export/importing-images.html#br_c8to-1 . It has some great examples on reading in TIFF stacks and extracting their metadata, but because you already have all of the fields displayed from imfinfo, just pick out the field you want, then use that as a parameter into getTag string formatted.

Reading structured variable from MAT file

I am performing an analysis which involves simulation of over 1000 cases. I extracting lots of data for each case as well (about 70MB). Currently I am saving the results for each case as:
Vessel.TotalForce
Vessel.WindForce
Vessel.CurrentForce
Vessel.WaveForce
Vessel.ConnectionForce
...
Line1.EffectiveTension
Line1.X
Line1.Y
Line2.EfectiveTension
Line2.X
Line2.Y
...
save('CaseNo1.mat')
Now, I need to perform my analysis for CaseNo1.mat to CaseNo1000. Initially I planned to create a Database.mat file by loading all cases in it and then accessing any variable using h5read. This way Matlab doesn't need to load all the data at a time. However, I am concerned now that my database file will be too big.
Is there any way I can read the structured variables from individual case files for example CaseNo1.mat without loading the CaseNo1.mat file in memory.
Matlab examples shows loading just the variables directly from MAT file without loading the whole MAT file. But I am not sure how to read structures data the same way.
x=load('CaseNo1.mat','Line1.X')
says Line1.X not found. But it's there. The command is not correct to access the data. Also tried using h5read, but it says CaseNo1.mat is not an HDF5 file.
Can anyone help with this.
Apart from this, I would also appreciate if there is any suggestion about performing such data intensive analysis.
I was wrong! I'm leaving my old answer for context, though I've edited it to reference this one. I thought I had used matfile() in that way before, but I hadn't. I just did a thorough search and ran a few test cases. You've actually run into a limitation of the way Matlab handles and references structures stored in .mat files. There is, however, a solution. It does involve some refactoring of your original code, but it shouldn't be too egregious.
Vessel_TotalForce
Vessel_WindForce
Vessel_CurrentForce
Vessel_WaveForce
Vessel_ConnectionForce
...
Line1_EffectiveTension
Line1_X
Line1_Y
Line2_EfectiveTension
Line2_X
Line2_Y
...
save('CaseNo1.mat')
Then to access, just use matfile (or load) as you were before. Like so:
Vessel_WaveForce = load('CaseNo1.mat'', 'Vessel_WaveForce')
It's important to note that this restriction doesn't appear to be caused by anything you've chosen to do in your program, but rather is imposed by the way Matlab interacts with it's native storage files when they contain structures.
EDIT: This answer works, but doesn't actually solve the problem posed in OP's question. I thought I had used matfile to generate a handle that I could access, but I was wrong. See my other answer for details.
You could use matfile, like so:
myMatFileHandle = matfile('caseNo1.mat');
thisVessel = myMatFileHandle.vessel;
Also, from the little bit I can see, you seem to be on the right track for high-volume analysis. Just remember to use sparse when applicable, and generally avoid conditionals inside of loops if possible.
Good luck!
The objective of storing data in structured format is:
To be organized
Easy scripting post processor where looping through data under one data set it required.
To store structured dataset containing integer, floating and string variables in MAT file and to be able to read just the required variable using h5read command was sought. Matlab load command is not able to read variable beyond first level from stored data in a MAT file. The h5write couldn't write string variables. Hence needed a work around to solve this problem.
To do this I have used following method:
filename = 'myMatFile';
Vessel.TotalForce = %store some data
Vessel.WindForce = %store some data
Vessel.CurrentForce = %store some data
Vessel.WaveForce = %store some data
Vessel.ConnectionForce = %store some data
...
Lin1.LineType = 'Wire'
Line1.ArcLength_0.EffectiveTension = %store some data
Line1.ArcLength_50.EffectiveTension= %store some data
Line1.ArcLength_100.EffectiveTension= %store some data
Lin2.LineType = 'Chain'
Line2.ArcLength_0.EffectiveTension= %store some data
Line2.ArcLength_50.EffectiveTension= %store some data
Line2.ArcLength_100.EffectiveTension= %store some data
save([filename '_temp.mat']);
PointToMat=matfile([filename '.mat'],'Writable',true);
PointToMat.(char(filename)) = load([filename '_temp.mat']);
delete([filename '_temp.mat']);
Now to read from the MAT file created, we can use h5read as usual. To extract the EffectiveTension for Line1, ArcLength_0:
EffectiveTension = h5read([filename '.mat'],['/' filename '/Line1/ArcLength_0/EffectiveTension']);
For string variables, h5read returns decimal values corresponding to each character. To obtain the actual string I used:
name = char(h5read([filename '.mat'],['/' filename '/Line1/LineType']));
Tried this method on my data set which is about 200MB and I could process them pretty fast. Hope this would help someone someday.
Short answer:
Having saved the data into a MAT file with the '-v7.3' option, use something like h5read(filename, '/Line2/X') to read just one structure field. You can even read an array partially, for example:
s.a = 1:100;
save('test.mat', '-v7.3', 's');
clear
h5read('test.mat', '/s/a', [1 10], [1 5], [1 3])
returns each third element of the 1:100 array, starting with the 10th element and returning 5 values:
10 13 16 19 22
Long answer:
See answer by #Amitava for the more elaborate code and topic coverage.