Accessing Individual Images in Multipage TIFF in Mathematica - import

I have 12 large (1gb each) multi-page TIFF files containing 1500 images that represent a time series of 3D data.
To keep memory consumption at bay, i would like to only read individual images from the multi-page TIFF files, instead of reading everything and then selecting only the required file.
Is there an option to Import that I'm missing or is there another approach?
Thanks,

Try for example:
pageNbr = 3;
Import["C:\\test1.tif", {"ImageList", pageNbr}]

Related

Tesseract training with multipage tiff

How does the box file need to look like if I use a multipage tiff to train Tesseract?
More precisely: how do the Y-coordinates of a box file correspond to Y-coordinates within pages?
The last, 6th column in the box file represents zero-based page number.
https://github.com/tesseract-ocr/tesseract/wiki/Make-Box-Files
Update:
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract
Each font should be put in a single multi-page tiff and the box file
can be modified to specify the page number for each character after
the coordinates. Thus an arbitrarily large amount of training data may
be created for any given font, allowing training for large
character-set languages.
Even if you can have as large training text as you want, it could potentially result in unnecessarily large image and hence slow down training.

Taking the left and right channels of two wav files and joining them into one separate stereo file

I have two stereo wav files that I would like to take the left channel of the first audio file and take the right channel of the second audio file and join them into one new wave file.
Here's an image of what I'm trying to do.
I know I can read files into matlab / octave and get the separate left right channels with the code below:
[imported_sig_1, fs_rate, nbitsraw] = wavread(strcat('/tmp/01a.wav'));
imported_sig_L=imported_sig_1(:,1)';
[imported_sig_2, fs_rate, nbitsraw] = wavread(strcat('/tmp/02a.wav'));
imported_sig_R=imported_sig_2(:,2)';
I can then write the new channels that I want out using the code
wavwrite([(imported_sig_L)' (imported_sig_R)'] ,fs_rate,16,'newfile.wav'); %
The problem I'm running into is the time it takes to import the file and size of the array the wave files take up. The files I'm importing are about 1-4 hours long and it takes a while to import and it takes a lot of memory in the array is there away around importing the full file and then exporting them?
I'm using octave 3.8.1 on Ubuntu 14.04 which is like matlab but I also have access to sox
I assume the bottleneck is your hard drive and your system has sufficient memory to keep all thee files in memory at the same time. If so you won't gain an speed reading only one channel. With a 16 bit wav your HDD would have to skip 2 bytes, read 2 bytes, skip 2 bytes, read 2 bytes... For such a read operation it is much faster to copy the full file into the memory and remove the unwanted channels afterwards.

Compression of large figures in .fig format in MATLAB

My MATLAB script generates a figure from a timeseries data that, when saved, is over 200 MB in size. Is there a way to compress the figure to a lesser size in '*.fig' format? The compression has to be lossless so that I can zoom in and view the details in the figure. The figure has to be saved in *.fig format so that the axis property relations between subplots are preserved and I can use the data cursor tool.
The *.fig format cannot be saved as is in compressed form. The format is just not capable of it. But in MATLAB you can use functions zip to compress files created by savefig, and unzip with passing to openfig. This way you can create simple script to load and save zipped figs. Of course you will need to use a temp file, which should be taken care of as well.

Matlab .mat file saving

I have identical code in Matlab, identical data that was analyzed using two different computers. Both are Win 7 64 bit. Both Matlabs are 2014-a version. After the code finishes its run, I save the variables using save command and it outputs .mat file.
Is it possible to have two very different memory sizes for these files? Like one being 170 MB, and the other being 2.4 GB? This is absurd because when I check the variables in matlab they add up to maybe 1.5 GB at most. What can be the reason for this?
Does saving to .mat file compress the variables (still with the regular .mat extension)? I think it does because when I check the individual variables they add up to around 1.5 GB.
So why would one output smaller file size, but the other just so huge?
Mat in recent versions is HDF5, which includes gzip compression. Probably on one pc the default mat format is changed to an old version which does not support compression. Try saving specifying the version, then both PCs should result in the same size.
I found the reason for this based on the following stackoverflow thread: MATLAB: Differences between .mat versions
Apparently one of the computers was using -v7 format which produces much smaller files. - v7.3 just inflates the files significantly. But this is ironical in my opinion since -v7.3 enables saving files larger than 2 GB, which means they will be much much larger when saved in .mat file.
Anyway this link is very useful.
Update:
I implemented the serialization mentioned in the above link, and it increased the file size. In my case the best option will be using -v7 format since it provides the smallest file size, and is also able to save structures and cell arrays that I use a lot.

Read and represent mp3 files using memmapfile in matlab

I have to analyze bio acoustic audiofiles using matlab. Eventually I want to be able to find anomalies in the audio. That's the reason I need to find a way to represent the audio in a way I can extract and compare features. I'm dealing with mp3 files up to 150 mb. These files are too large for matlab to read in to it's memory. Therefore I want to use the memmapfile() function. I used the following code and a small mp3 file to find out how it actually works.
[testR, ~] = audioread('test.mp3');
testM = memmapfile('test.mp3');
disp(testM.Data);
disp(testR);
The actual values of the testM.Data and testR are different. Audioread() returns a 7483391 x 2 matrix and memmapfile() a 4113874 x 1 matrix.
I'm not really sure how memmapfile() works, I expected this to be equal to each other. Is there a way to read mp3 files in the same format audioread() does using memmapfile()? And what does memmapfile actually return in case of an audio file? Maybe it's also usable in the vector format in the case of anomaly detection?
Thanks in advance!
NOTE: The original files were in wav IMA ADPCM format with sizes from 1.5 up to 2.5 gb. Since Matlab can't deal with that format and the size of the files I converted them to 8bit mp3 files.
I think that the problem is mammapfile by default read data in uint8 format, while audioread function read data in another way.
How you can see here you can specify the format of data when you read it with memmapfile, so try to "play" with different values. From the documentation I read that you can read data in double format, so try to modify the memmapfile data format and audioread data format.
Last thing, memmapfile always organize the data in matrix like "somenumbers x 1", so if you want the original one you need to use something like reshape.
Anyway if you work with big data I suggest you to try with something different instead memmapfile, because it is very very slow