Array of structures importing - memory prealocation problemMatLab - matlab

I have a few .mat files, each of them has an array of structures (of unknown length) called DATA. I want to import all these structures in a single array, but I don't want to use this code:
FileNames = strcat('file',num2str((1:N)''),'.mat');
DATATemp = [];
for int = 1:length(FileNames)
load(FileNames(int,:));
DATATemp=[DATATemp DATA];
end
DATA = DATATemp;
because it does not prealocate the memory for the array.
Are there any clever ways of doing that?

If the length is short enough that you can over-allocate memory, you can do something like this: Pick an array size that is way larger than what you will ever see, and then trim it back down after you are done.
FileNames = strcat('file',num2str((1:N)''),'.mat');
DATATemp = zeros(1e6,1);
idx = 1;
for int = 1:length(FileNames)
load(FileNames(int,:));
idx_end = idx + length(DATA) - 1;
DATATemp(idx:idx_end) = DATA;
idx = idx_end + 1;
end
DATA = DATATemp(1:idx_end);
However, if you are talking about a LOT of data, or just want to cover all your bases, a more rigorous solution is to allocate in chunks
FileNames = strcat('file',num2str((1:N)''),'.mat');
CHUNK_SIZE = 1e6;
DATATemp = zeros(INIT_SIZE,1);
idx = 1;
for int = 1:length(FileNames)
load(FileNames(int,:));
idx_end = idx + length(DATA) - 1;
if idx_end > length(DATATemp)
DATATemp = [DATATemp zeros(INIT_SIZE ,1);
DATATemp(idx:idx_end) = DATA;
idx = idx_end + 1;
end
DATA = DATATemp(1:idx_end);
Just make sure that your CHUNK_SIZE is significantly larger than the size of a typical individual file. I picked 1e6 here. That's what I would pick if I was loading ~20 files with an average size of 1e5 each. That way, although I'm still concatenating more space, it's much less often. This may not be very clever, but I hope it helps.
Also note that if you are loading files from a network drive that will slow things down immensely.

Related

How can I avoid this for-loop in spite of every element having to be checked individually?

Using Matlab R2019a, is there any way to avoid the for-loop in the following code in spite of the dimensions containing different element so that each element has to be checked? M is a vector with indices, and Inpts.payout is a 5D array with numerical data.
for m = 1:length(M)-1
for power = 1:noScenarios
for production = 1:noScenarios
for inflation = 1:noScenarios
for interest = 1:noScenarios
if Inpts.payout(M(m),power,production,inflation,interest)<0
Inpts.payout(M(m+1),power,production,inflation,interest)=...
Inpts.payout(M(m+1),power,production,inflation,interest)...
+Inpts.payout(M(m),power,production,inflation,interest);
Inpts.payout(M(m),power,production,inflation,interest)=0;
end
end
end
end
end
end
It is quite simple to remove the inner 4 loops. This will be more efficient unless you have a huge matrix Inpts.payout, as a new indexing matrix must be generated.
The following code extracts the two relevant 'planes' from the input data, does the logic on them, then writes them back:
for m = 1:length(M)-1
payout_m = Inpts.payout(M(m),:,:,:,:);
payout_m1 = Inpts.payout(M(m+1),:,:,:,:);
indx = payout_m < 0;
payout_m1(indx) = payout_m1(indx) + payout_m(indx);
payout_m(indx) = 0;
Inpts.payout(M(m),:,:,:,:) = payout_m;
Inpts.payout(M(m+1),:,:,:,:) = payout_m1;
end
It is possible to avoid extracting the 'planes' and writing them back by working directly with the input data matrix. However, this yields more complex code.
However, we can easily avoid some indexing operations this way:
payout_m = Inpts.payout(M(1),:,:,:,:);
for m = 1:length(M)-1
payout_m1 = Inpts.payout(M(m+1),:,:,:,:);
indx = payout_m < 0;
payout_m1(indx) = payout_m1(indx) + payout_m(indx);
payout_m(indx) = 0;
Inpts.payout(M(m),:,:,:,:) = payout_m;
payout_m = payout_m1;
end
Inpts.payout(M(m+1),:,:,:,:) = payout_m1;
It seems like there is not a way to avoid this. I am assuming that each for lop independently changes a variable parameter used in the main calculation. Thus, it is required to have this many for loops. My only suggestion is to turn your nested loops into a function if you're concerned about appearance. Not sure if this will help run-time.

How to transfer excel formula to octave?

As you can see the image (excel file) I would like to use that formula in Octave to get the desired result. I also uploaded the octave codes picture and the workspace picture too. In workspace my result/values for storage variable should be the same values like that in excel (storage column). I have a doubt that in the code the last part using (if statement with i-1 is seems to be the error).
Can someone help me to figure it out? Let me know if any further clarifications required. Also I am posting my code below too:
BM_max = 1236;
virtual_feed_max = 64;
operation = dlmread ('2020Operation.csv');
BM = ones (size (operation, 1), 1);
for i=1:size(operation,1)
if operation(i,1)==1
BM(i,1)=BM_max;
else
BM(i,1)=0;
end
end
virtual_feed = ones(size(operation,1),1);
virtual_feed(:,1) = 64;
storage = ones(size(BM,1),1);
c = ones(size(BM,1),1);
for i=1:size(BM,1)
c=(BM(:,1)-virtual_feed(:,1));
end
for i=1:size(BM,1)
if ((i=1)&& c)<0
storage(:,1)=0;
elseif ((i=1)&& c)>0
storage(:,1)=c;
else
# Issue is below (Taking the value from subsequent row is the problem)
if (c+(storage(i-1,1)))<0
storage(:,1)=0;
elseif (c+(storage(i-1,1)))>0
storage(:,1)=(c+(storage(i-1,1)));
end
end
end
Workspace Excel
I think what you want is the following (as seen from your Excel screenshot)
BM_max = 1236;
virtual_feed_max = 64;
operation = [0; 1; 1; 1; 1; 1; 1; 1; 0; 0; 0; 0; 0];
BM = BM_max * operation;
virtual_feed = repmat (virtual_feed_max, size (operation));
storage = zeros (size (operation));
for i=2:numel (storage)
storage (i) = max (BM(i) - virtual_feed(i) + storage(i-1), 0);
endfor
storage
which outputs:
storage =
0
1172
2344
3516
4688
5860
7032
8204
8140
8076
8012
7948
7884
I leave the part of vectorization to make it faster for you. (hint: have a look at cumsum)
From this point on
for i=1:size(BM,1)
if ((i=1)&& c)<0
storage(:,1)=0;
elseif ((i=1)&& c)>0
storage(:,1)=c;
else
# Issue is below (Taking the value from subsequent row is the problem)
if (c+(storage(i-1,1)))<0
storage(:,1)=0;
elseif (c+(storage(i-1,1)))>0
storage(:,1)=(c+(storage(i-1,1)));
end
end
end
you are not changing a single value in storage but all the row/column, so each iteration, all the row/column is being changed instead of a single "cell".
You should use something like this:
storage(i,1) = 0;
BTW, a lot of those 'for' loops can be changed to vector operations. Example:
for i=1:size(BM,1)
c=(BM(:,1)-virtual_feed(:,1));
end
Can be changed for:
c = BM - virtual_feed;

Group variables based on lengths of specific arrays

I have a long list of variables in a dataset which contains multiple time channels with different sampling rates, such as time_1, time_2, TIME, Time, etc. There are also multiple other variables that are dependent on either of these times.
I'd like to list all possible channels that contain 'time' (case-insensitive partial string search within Workspace) and search & match which variable belongs to each item of this time list, based on the size of the variables and then group them in a structure with the values of the variables for later analysis.
For example:
Name Size Bytes Class
ENGSPD_1 181289x1 1450312 double
Eng_Spd 12500x1 100000 double
Speed 41273x1 330184 double
TIME 41273x1 330184 double
Time 12500x1 100000 double
engine_speed_2 1406x1 11248 double
time_1 181289x1 1450312 double
time_2 1406x1 11248 double
In this case, I have 4 time channels with different names & sizes and 4 speed channels which belong to each of these time channels.
whos function is case-sensitive and it will only return the name of the variable, rather than the values of the variable.
As a preamble I'm going to echo my comment from above and earlier comments from folks here and on your other similar questions:
Please stop trying to manipulate your data this way.
It may have made sense at the beginning but, given the questions you've asked on SO to date, this isn't the first time you've encountered issues trying to pull everything together and if you continue this way it's not going to be the last. This approach is highly error prone, unreliable, and unpredictable. Every step of the process requires you to make assumptions about your data that cannot be guaranteed (size of data matching, variables being present and named predictably, etc.). Rather than trying to come up with creative ways to hack together the data, start over and output your data predictably from the beginning. It may take some time but I guarantee it's going to save time in the future and it will make sense to whoever looks at this in 6 months trying to figure out what is going on.
For example, there is absolutely no significant effort needed to output your variables as:
outputstructure.EngineID.time = sometimeseries;
outputstructure.EngineID.speed = somedata;
Where EngineID can be any valid variable name. This is simple and it links your data together permanently and robustly.
That being said, the following will bring a marginal amount of sanity to your data set:
% Build up a totally amorphous data set
ENGSPD_1 = rand(10, 1);
Eng_Spd = rand(20, 1);
Speed = rand(30, 1);
TIME = rand(30, 1);
Time = rand(20, 1);
engine_speed_2 = rand(5, 1);
time_1 = rand(10, 1);
time_2 = rand(5, 1);
% Identify time and speed variable using regular expressions
% Assumes time variables contain 'time' (case insensitive)
% Assumes speed variables contain 'spd', 'sped', or 'speed' (case insensitive)
timevars = whos('-regexp', '[T|t][I|i][M|m][E|e]');
speedvars = whos('-regexp', '[S|s][P|p][E|e]{0,2}[D|d]');
% Pair timeseries and data arrays together. Data is only coupled if
% the number of rows in the timeseries is exactly the same as the
% number of rows in the data array.
timesizes = vertcat(speedvars(:).size); % Concatenate timeseries sizes
speedsizes = vertcat(timevars(:).size); % Concatenate speed array sizes
% Find intersection and their locations in the structures returned by whos
% By using intersect we only get the data that is matched
[sizes, timeidx, speedidx] = intersect(timesizes(:,1), speedsizes(:,1));
% Preallocate structure
ndata = length(sizes);
groupeddata(ndata).time = [];
groupeddata(ndata).speed = [];
% Unavoidable (without saving/loading data) eval loop :|
for ii = 1:ndata
groupeddata(ii).time = eval('timevars(timeidx(ii)).name');
groupeddata(ii).speed = eval('speedvars(speedidx(ii)).name');
end
A non-eval method, by request:
ENGSPD_1 = rand(10, 1);
Eng_Spd = rand(20, 1);
Speed = rand(30, 1);
TIME = rand(30, 1);
Time = rand(20, 1);
engine_speed_2 = rand(5, 1);
time_1 = rand(10, 1);
time_2 = rand(5, 1);
save('tmp.mat')
oldworkspace = load('tmp.mat');
varnames = fieldnames(oldworkspace);
timevars = regexpi(varnames, '.*time.*', 'match', 'once');
timevars(cellfun('isempty', timevars)) = [];
speedvars = regexpi(varnames, '.*spe{0,2}d.*', 'match', 'once');
speedvars(cellfun('isempty', speedvars)) = [];
timesizes = zeros(length(timevars), 2);
for ii = 1:length(timevars)
timesizes(ii, :) = size(oldworkspace.(timevars{ii}));
end
speedsizes = zeros(length(speedvars), 2);
for ii = 1:length(speedvars)
speedsizes(ii, :) = size(oldworkspace.(speedvars{ii}));
end
[sizes, timeidx, speedidx] = intersect(timesizes(:,1), speedsizes(:,1));
ndata = length(sizes);
groupeddata(ndata).time = [];
groupeddata(ndata).speed = [];
for ii = 1:ndata
groupeddata(ii).time = oldworkspace.(timevars{timeidx(ii)});
groupeddata(ii).speed = oldworkspace.(speedvars{speedidx(ii)});
end
See this gist for timing.

fprintf Octave - Data corruption

I am trying to write data to .txt files. Each of the files is around 170MB (after writing data to it).
I am using octave's fprintf function, with '%.8f' to write floating point values to a file. However, I am noticing a very weird error, in that a sub-set of entries in some of the files are getting corrupted. For example, one of the lines in a file is this:
0.43529412,0.}4313725,0.43137255,0.33233533,...
that "}" should have been "4". Now how did octave's fprintf write that "}" with '%.8f' option in the first place? What is going wrong?
Another example is,
0.73289\8B987,...
how did that "\8B" get there?
I have to process a very large data-set with 360 Million points in total. This error in a sub-set of rows in some files is becoming a big problem. What is causing this problem?
Also, this corruption doesn't occur at random. For example, if a file has 1.1 Million rows, where each row corresponds to a vector representing a data-instance, then the problem occurs say in 100 rows at max, and these 100 rows are clustered togeter. Say for example, these are distributed from row 8000 to 8150, but it is not the case that out of 100 corrupted rows, first 50 are located near say 10000th row and the remaining at say 20000th row. They always form a cluster.
Note: Below code is the code-block responsible for extracting data and writing it to files. Some variables in the code, like K_Cell have been computed computed earlier and play virtually no role in data-writing process.
mf = fspecial('gaussian',[5 5], 2);
fidM = fopen('14_01_2016_Go_AeossRight_ClustersM_wLAMRD.txt','w');
fidC = fopen('14_01_2016_Go_AeossRight_ClustersC_wLAMRD.txt','w');
fidW = fopen('14_01_2016_Go_AeossRight_ClustersW_wLAMRD.txt','w');
kIdx = 1;
featMat = [];
% - Generate file names to print the data to
featNo = 0;
fileNo = 1;
filePath = 'wLRD10_Data_Road/featMat_';
fileName = [filePath num2str(fileNo) '.txt'];
fidFeat = fopen(fileName, 'w');
% - Compute the global means and standard deviations
gMean = zeros(1,13); % - Global mean
gStds = zeros(1,13); % - Global variance
gNpts = 0; % - Total number of data points
fidStat = fopen('wLRD10_Data_Road/featStat.txt','w');
for i=1600:10:10000
if (featNo > 1000000)
% - If more than 1m points, close the file and open new one
fclose(fidFeat);
% - Get the new file name
fileNo = fileNo + 1;
fileName = [filePath num2str(fileNo) '.txt'];
fidFeat = fopen(fileName, 'w');
featNo = 0;
end
imgName = [fAddr num2str(i-1) '.jpg'];
img = imread(imgName);
Ir = im2double(img(:,:,1));
Ig = im2double(img(:,:,2));
Ib = im2double(img(:,:,3));
imgR = filter2(mf, Ir);
imgG = filter2(mf, Ig);
imgB = filter2(mf, Ib);
I = im2double(img);
I(:,:,1) = imgR;
I(:,:,2) = imgG;
I(:,:,3) = imgB;
I = im2uint8(I);
[Feat1, Feat2] = funcFeatures1(I);
[Feat3, Feat4] = funcFeatures2(I);
[Feat5, Feat6, Feat7] = funcFeatures3(I);
[Feat8, Feat9, Feat10] = funcFeatures4(I);
ids = K_Cell{kIdx};
pixVec = zeros(length(ids),13); % - Get the local image features
for s = 1:length(ids) % - Extract features
pixVec(s,:) = [Ir(ids(s,1),ids(s,2)) Ig(ids(s,1),ids(s,2)) Ib(ids(s,1),ids(s,2)) Feat1(ids(s,1),ids(s,2)) Feat2(ids(s,1),ids(s,2)) Feat3(ids(s,1),ids(s,2)) Feat4(ids(s,1),ids(s,2)) ...
Feat5(ids(s,1),ids(s,2)) Feat6(ids(s,1),ids(s,2)) Feat7(ids(s,1),ids(s,2)) Feat8(ids(s,1),ids(s,2))/100 Feat9(ids(s,1),ids(s,2))/500 Feat10(ids(s,1),ids(s,2))/200];
end
kIdx = kIdx + 1;
for s=1:length(ids)
featNo = featNo + 1;
fprintf(fidFeat,'%d,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f\n', featNo, pixVec(s,:));
end
% - Compute the mean and variances
for s = 1:length(ids)
gNpts = gNpts + 1;
delta = pixVec(s,:) - gMean;
gMean = gMean + delta./gNpts;
gStds = gStds*(gNpts-1)/gNpts + delta.*(pixVec(s,:) - gMean)/gNpts;
end
end
Note that the code block:
for s=1:length(ids)
featNo = featNo + 1;
fprintf(fidFeat,'%d,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f\n', featNo, pixVec(s,:));
end
is the only part of the code that writes the data-points to the files.
The earlier code-block,
if (featNo > 1000000)
% - If more than 1m points, close the file and open new one
fclose(fidFeat);
% - Get the new file name
fileNo = fileNo + 1;
fileName = [filePath num2str(fileNo) '.txt'];
fidFeat = fopen(fileName, 'w');
featNo = 0;
end
opens a new file for writing the data to it, when the currently opened file exceeds the limit of 1 million data-points.
Furthermore, note that
pixVec
variable cannot contain anything other than floats/double values, or the octave will throw an error.

Background frame loop matlab

Im trying to make a loop for doing the same operation to a lot of .mov files in matlab. The code i have right now looks like this:
close all
clear all
clc
movFiles = dir('*.mov');
numFiles = length(movFiles);
mydata = cell(1,numFiles);
% mydata = zeros(numFiles);
for k = 1:numFiles
mydata{1,k} = VideoReader(movFiles(k).name);
end
for k=1:numFiles
bk_downsample = 5; %The downsample factor for frame averaging
%disp('Opening video...') %lower number =longer computation time
vob = mydata;
frame = vob.read(inf); %Reads to end = vob knows the number of frames
vidHeight = vob.Height;
vidWidth = vob.Width;
nFrames = vob.NumberOfFrames;
%% First-iteration background frame
background_frame = double(frame*0);
disp('Calculating background...')
for k = 1:bk_downsample:nFrames
background_frame = background_frame + double(read(vob, k));
disp(k/(nFrames)*100)
end
%background_frame = uint8(bk_downsample*background_frame/(nFrames));
background_frame = bk_downsample*background_frame/(nFrames);
%imshow(background_frame)
%% Second-iteration background frame
%This section re-calculates the background frame while attempting to
%minimize the effect of moving objects in the calculation
background_frame2 = double(frame*0);
pixel_sample_density = im2bw(double(frame*0));
diff_frame = double(frame*0);
stream_frame = diff_frame(:,:,1);
bk_downsample = 10;
figure
hold on
for k = 1:bk_downsample:nFrames
diff_frame = imabsdiff(double(read(vob, k)), background_frame);
diff_frame = 1-im2bw(uint8(diff_frame),.25);
pixel_sample_density = pixel_sample_density + diff_frame;
stream_frame = stream_frame + (1-diff_frame)/(nFrames/bk_downsample);
nonmoving = double(read(vob, k));
nonmoving(:,:,1) = nonmoving(:,:,1).*diff_frame;
nonmoving(:,:,2) = nonmoving(:,:,2).*diff_frame;
nonmoving(:,:,3) = nonmoving(:,:,3).*diff_frame;
background_frame2 = background_frame2 + nonmoving;
%pause
disp(k/(nFrames)*100)
end
background_frame2(:,:,1) = background_frame2(:,:,1)./pixel_sample_density;
background_frame2(:,:,2) = background_frame2(:,:,2)./pixel_sample_density;
background_frame2(:,:,3) = background_frame2(:,:,3)./pixel_sample_density;
imshow(uint8(background_frame2))
%imshow(stream_frame)
filename = ['Ring_' num2str(k) '_background_' num2str(img) '.jpg'];
imwrite((uint8(background_frame2)),filename)
end
I know that the error starts with vob=mydata; but im not sure how to correct it, hope that someone is able to help me since it would save me a lot of time in my data-analysis.
Have a great day! :)
Your code doesn't make much sense... You're creating a cell array:
mydata = cell(1,numFiles);
%// . . .
mydata{1,k} = . . .
but however you try to access it like a structure:
vob = mydata;
frame = vob.read(inf);
If I'd guess, then your error stems from you forgetting to index in the cell array, i.e.:
vob = mydata{k};
Other programming oddity I noticed in your code is the fact you're using the same looping variable k, in two nested for lops, the outer one being on k=1:numFiles and the inner ones being on k=1:bk_downsample:nFrames; don't do that unless you're trying to drive yourself crazy while figuring out why your for loop executes only once. Name them k1 for the outer loop and k2 for the inner loops and you'll be happier.
I'm no expert in video processing, but for me it looks like your line should be:
vob=mydata{1,k};
That's why that error shows up, because you are treating a cell of structs as if it was a single struct.