I am running a very large meta-simulation where I go through two hyperparameters (lets say x and y) and for each set of hyperparameters (x_i & y_j) I run a modest sized subsimulation. Thus:
for x=1:I
for y=1:j
subsimulation(x,y)
end
end
For each subsimulation however, about 50% of the data is common to every other subsimulation, or subsimulation(x_1,y_1).commondata=subsimulation(x_2,y_2).commondata.
This is very relevant since so far the total simulation results file size is ~10Gb! Obviously, I want to save the common subsimulation data 1 time to save space. However, the obvious solution, being to save it in one place would screw up my plotting function, since it directly calls subsimulation(x,y).commondata.
I was wondering whether I could do something like
subsimulation(x,y).commondata=% pointer to 1 location in memory %
If that cant work, what about this less elegant solution:
subsimulation(x,y).commondata='variable name' %string
and then adding
if(~isstruct(subsimulation(x,y).commondata)),
subsimulation(x,y).commondata=eval(subsimulation(x,y).commondata)
end
What solution do you guys think is best?
Thanks
DankMasterDan
You could do this fairly easily by defining a handle class. See also the documentation.
An example:
classdef SimulationCommonData < handle
properties
someData
end
methods
function this = SimulationCommonData(someData)
% Constructor
this.someData = someData;
end
end
end
Then use like this,
commonData = SimulationCommonData(something);
subsimulation(x, y).commondata = commonData;
subsimulation(x, y+1).commondata = commonData;
% These now point to the same reference (handle)
As per my comment, as long as you do not modify the common data, you can pass it as third input and still not copy the array in memory on each iteration (a very good read is Internal Matlab memory optimizations). This image will clarify:
As you can see, the first jump in memory is due to the creation of common and the second one to the allocation of the output c. If the data were copied on each iteration, you would have seen many more memory fluctuations. For instance, a third jump, then a decrease, then back up again and so on...
Follows the code (I added a pause in between each iteration to make it clearer that no big jumps occur during the loop):
function out = foo(a,b,common)
out = a+b+common;
end
for ii = 1:10; c = foo(ii,ii+1,common); pause(2); end
Related
More of a blue skies question here - if I have some code that is like
A = [1,2,3,4,5,6]; %input data
B = sort(A); %step one
C = B(1,1) + 10; %step two
Is there a line of code I can use to remove "B" to save memory before doing something else with C?
clear B
This will remove the variable B from memory.
See the documentation here for more info.
There is no need to assign each result to a new variable. For example, you could write:
A = [1,2,3,4,5,6]; %input data
A = sort(A); %step one
A = A(1,1) + 10; %step two
Especially if A is large, it is much more efficient to write A = sort(A) than B = sort(A), because then sort can work in-place, avoiding the need to create a secondary array. The same is true for many other functions. Working in-place means that the cache can be used more effectively, speeding up operations. The reduced memory usage is also a plus for very large arrays, and in-place operations tend to avoid memory fragmentation.
In contrast, things like clear B tend to slow down the interpreter, as they make things more complicated for the JIT. Furthermore, as can be seen in the documentation,
On UNIX® systems, clear does not affect the amount of memory allocated to the MATLAB process.
That is, the variable is cleared from memory, but the memory itself is not returned to the system.
As an aside, as #obchardon said in a comment, your code can be further simplified by realizing that min does the same thing as keeping only the first value of the result of sort (but much more efficiently).
As an example, I've put three operations in a row that can work in-place, and used timeit to time the execution time of these two options: using a different variable every time and clearing them when no longer needed, or assigning into the same variable.
N = 1000;
A = rand(1,N);
disp(timeit(#()method1(A)))
disp(timeit(#()method2(A)))
function D = method1(A)
B = sort(A);
clear A
C = cumsum(B);
clear B
D = cumprod(C);
end
function A = method2(A)
A = sort(A);
A = cumsum(A);
A = cumprod(A);
end
Using MATLAB Online I see these values:
different variables + clear: 5.8806e-05 s
re-using same variable: 4.4185e-05 s
MATLAB Online is not the best way for timing tests, as so many other things happen on the server at the same time, but it gives a good indication. I've ran the test multiple times and seen similar values most of those times.
I have a piece of MATLAB code that works fine, but I wanted to know is there any faster way of performing the same task, where each .csv file is a 768*768 dimension matrix
Current code:
for k = 1:143
matFileName = sprintf('ang_thresholded%d.csv', k);
matData = load(matFileName);
imshow(matData)
end
Any help in this regard will be very helpful. Thank You!
In general, its better to separate the loading, computational and graphical stuff.
If you have enough memory, you should try to change your code to:
n_files=143;
% If you know the size of your images a priori:
matData=zeros( 768, 768,n_files); % prealocate for speed.
for k = 1:n_files
matFileName = sprintf('ang_thresholded%d.csv', k);
matData(:,:,k) = load(matFileName);
end
seconds=0.01;
for k=1:n_Files
%clf; %Not needed in your case, but needed if you want to plot more than one thing (hold on)
imshow(matData(:,:,k));
pause(seconds); % control "framerate"
end
Note the use of pause().
Here is another option using Matlab's data stores which are designed to work with large datasets or lots of smaller sets. The TabularTextDatastore is specifically for this kind of text based data.
Something like the following. However, note that since I don't have any test files it is sort of notional example ...
ttds = tabularTextDatastore('.\yourDirPath\*.csv'); %Create the data store
while ttds.hasdata %This turns false after reading the last file.
temp = read(ttds); %Returns a Matlab table class
imshow(temp.Variables)
end
Since it looks like your filenames' numbering is not zero padded (e.g. 1 instead of 001) then the file order might get messed up so that may need addressed as well. Anyway I thought this might be a good alternative approach worth considering depending on what else you want to do with the data and how much of it there might be.
When I have to display the variable value every n iterations of a for loop I always do something along these lines:
for ii=1:1000
if mod(ii,100)==0
display(num2str(ii))
end
end
I was wondering if there is a way to move the if condition outside the loop in order to speed up the code. Or also if there is something different I could do.
You can use nested loops:
N = 1000;
n = 100;
for ii = n:n:N
for k = ii-n+1:ii-1
thingsToDo(k);
end
disp(ii)
thingsToDo(ii);
end
where thingsToDo() get the relevant counter (if needed). This a little more messy, but can save a lot of if testing.
Unless the number of tested values is much larger than the number of printed values, I would not blame the if-statement. It may not seem this way at first, but printing is indeed a fairly complex task. A variable needs to be converted and sent to an output stream which is then printing in the terminal. In case you need to speed the code up, then reduce the amount of printed data.
Normally Matlab function takes vector inputs as well. This is the case for disp and display and does only take a single function call. Further, conversion to string is unnecessary before printing. Matlab should send the data to some kind of stream anyway (which may indeed take argument of type char but this is not the same char as Matlab uses), so this is probably just a waste of time. In addition to that num2str do a lot of things to ensure typesafe conversion. You already know that display is typesafe, so all these checks are redundant.
Try this instead,
q = (1:1000)'; % assuming q is some real data in your case
disp(q(mod(q,100)==0)) % this requires a single call to disp
Instead of concatening results to this, Is there other way to do the following, I mean the loop will persist but vector=[vector,sum(othervector)]; can be gotten in any other way?
vector=[];
while a - b ~= 0
othervector = sum(something') %returns a vector like [ 1 ; 3 ]
vector=[vector,sum(othervector)];
...
end
vector=vector./100
Well, this really depends on what you are trying to do. Starting from this code, you might need to think about the actions you are doing and if you can change that behavior. Since the snippet of code you present shows little dependencies (i.e. how are a, b, something and vector related), I think we can only present vague solutions.
I suspect you want to get rid of the code to circumvent the effect of constantly moving the array around by concatenating your new results into it.
First of all, just make sure that the slowest portion of your application is caused by this. Take a look at the Matlab profiler. If that portion of your code is not a major time hog, don't bother spending a lot of time on improving it (and just say to mlint to ignore that line of code).
If you can analyse your code enough to ensure that you have a constant number of iterations, you can preallocate your variables and prevent any performance penalty (i.e. write a for loop in the worst case, or better yet really vectorized code). Or if you can `factor out' some variables, this might help also (move any loop invariants outside of the loop). So that might look something like this:
vector = zeros(1,100);
while a - b ~= 0
othervector = sum(something);
vector(iIteration) = sum(othervector);
iIteration = iIteration + 1;
end
If the nature of your code doesn't allow this (e.g. you are iterating to attain convergence; in that case, beware of checking equality of doubles: always include a tolerance), there are some tricks you can perform to improve performance, but most of them are just rules of thumb or trying to make the best of a bad situation. In this last case, you might add some maintenance code to get slightly better performance (but what you gain in time consumption, you lose in memory usage).
Let's say, you expect the code to run 100*n iterations most of the time, you might try to do something like this:
iIteration = 0;
expectedIterations = 100;
vector = [];
while a - b ~= 0
if mod(iIteration,expectedIterations) == 0
vector = [vector zeros(1,expectedIterations)];
end
iIteration = iIteration + 1;
vector(iIteration) = sum(sum(something));
...
end
vector = vector(1:iIteration); % throw away uninitialized
vector = vector/100;
It might not look pretty, but instead of resizing the array every iteration, the array only gets resized every 100th iteration. I haven't run this piece of code, but I've used very similar code in a former project.
If you want to optimize for speed, you should preallocate the vector and have a counter for the index as #Egon answered already.
If you just want to have a different way of writing vector=[vector,sum(othervector)];, you could use vector(end + 1) = sum(othervector); instead.
I'm going to write a program in MATLAB that takes a function, sets the value D from 10 to 100 (the for loop), integrates the function with Simpson's rule (the while loop) and then displays it. Now, this works fine for the first 7-8 values, but then it takes longer time and eventually I run out of memory, and I don't understand the reason for this. This is the code so far:
global D;
s=200;
tolerance = 9*10^(-5);
for D=10:1:100
r = Simpson(#f,0,D,s);
error = 1;
while(error>tolerance)
s = 2*s;
error = (1/15)*(Simpson(#f,0,D,s)-r);
r = Simpson(#f,0,D,s);
end
clear error;
disp(r)
end
mtrw's comment probably already answers the question in part: s should be reinitialized inside the for loop. The posted code results in s increasing irreversibly every time the error was too large, so for larger values of D the largest s so far will be used.
Additionally, since the code re-evaluates the entire integration instead of reusing the previous integration from [0, D-1] you waste lots of resources unless you want to explicitly show the error tolerance of your Simpson function - s will have to increase a lot for large D to maintain the same low error (since you integrate over a larger range you have to sum up more points).
Finally, your implementation of Simpson could of course do funny stuff as well, which no one can tell without seeing it...