Synchronization in Matlab - matlab

I'm processing data in parallel using parfor in this way:
iteration = 10;
result = zeros(1, iteration);
matlabpool open local 2
parfor i = 1:iteration
data = generate_data();
result(i) = process_data(data);
end
end
matlabpool close
It works fine, but I have one problem. My function generate_data generates unique data (i.e. 0, 1, 2, 3, 4 ...) but in practice sometimes I give same value two times (and I give 0, 1, 1, 2, 3, 4, 4, 5, ...). In simple my function looks like this:
function data = generate_data()
persistent counter generated_data;
if(isempty(counter))
counter = 1;
generated_data = [0 1 2 3 4 5 6 7 8 9];
end
data = generated_data(counter);
counter = counter + 1;
How can I fix this ?

If I've understood correctly, you want to ensure that your generate_data doesn't return the same value to two iterations of your PARFOR loop. Unfortunately, you cannot do this directly in the PARFOR loop, since no communication is allowed. Your options are basically: either call generate_data on the MATLAB client; or run two PARFOR loops perhaps like this:
parfor ii = 1:iteration
generated(ii) = generate_data();
end
% omit duplicated values - perhaps you might wish to generate
% some more here too...
generated = unique(generated);
parfor ii=1:numel(generated)
result(ii) = process_data(generated(ii));
end

I tried this:
function data = generate_data(id)
persistent generated_data;
if(isempty(generated_data))
generated_data = [0 1 2 3 4 5 6 7 8 9];
end
data = generated_data(mod(id - 1, length(generated_data)) + 1);
In each call I can generate data set on demand using id. I have to remember that one data set can be reached only for one id. It works, but not solve my problem. I would eliminate parametr id and use internal counter like in first post.

Related

MATLAB: Using a for loop within another function

I am trying to concatenate several structs. What I take from each struct depends on a function that requires a for loop. Here is my simplified array:
t = 1;
for t = 1:5 %this isn't the for loop I am asking about
a(t).data = t^2; %it just creates a simple struct with 5 data entries
end
Here I am doing concatenation manually:
A = [a(1:2).data a(1:3).data a(1:4).data a(1:5).data] %concatenation function
As you can see, the range (1:2), (1:3), (1:4), and (1:5) can be looped, which I attempt to do like this:
t = 2;
A = [for t = 2:5
a(1:t).data
end]
This results in an error "Illegal use of reserved keyword "for"."
How can I do a for loop within the concatenate function? Can I do loops within other functions in Matlab? Is there another way to do it, other than copy/pasting the line and changing 1 number manually?
You were close to getting it right! This will do what you want.
A = []; %% note: no need to initialize t, the for-loop takes care of that
for t = 2:5
A = [A a(1:t).data]
end
This seems strange though...you are concatenating the same elements over and over...in this example, you get the result:
A =
1 4 1 4 9 1 4 9 16 1 4 9 16 25
If what you really need is just the .data elements concatenated into a single array, then that is very simple:
A = [a.data]
A couple of notes about this: why are the brackets necessary? Because the expressions
a.data, a(1:t).data
don't return all the numbers in a single array, like many functions do. They return a separate answer for each element of the structure array. You can test this like so:
>> [b,c,d,e,f] = a.data
b =
1
c =
4
d =
9
e =
16
f =
25
Five different answers there. But MATLAB gives you a cheat -- the square brackets! Put an expression like a.data inside square brackets, and all of a sudden those separate answers are compressed into a single array. It's magic!
Another note: for very large arrays, the for-loop version here will be very slow. It would be better to allocate the memory for A ahead of time. In the for-loop here, MATLAB is dynamically resizing the array each time through, and that can be very slow if your for-loop has 1 million iterations. If it's less than 1000 or so, you won't notice it at all.
Finally, the reason that HBHB could not run your struct creating code at the top is that it doesn't work unless a is already defined in your workspace. If you initialize a like this:
%% t = 1; %% by the way, you don't need this, the t value is overwritten by the loop below
a = []; %% always initialize!
for t = 1:5 %this isn't the for loop I am asking about
a(t).data = t^2; %it just creates a simple struct with 5 data entries
end
then it runs for anyone the first time.
As an appendix to gariepy's answer:
The matrix concatenation
A = [A k];
as a way of appending to it is actually pretty slow. You end up reassigning N elements every time you concatenate to an N size vector. If all you're doing is adding elements to the end of it, it is better to use the following syntax
A(end+1) = k;
In MATLAB this is optimized such that on average you only need to reassign about 80% of the elements in a matrix. This might not seam much, but for 10k elements this adds up to ~ an order of magnitude of difference in time (at least for me).
Bare in mind that this works only in MATLAB 2012b and higher as described in this thead: Octave/Matlab: Adding new elements to a vector
This is the code I used. tic/toc syntax is not the most accurate method for profiling in MATLAB, but it illustrates the point.
close all; clear all; clc;
t_cnc = []; t_app = [];
N = 1000;
for n = 1:N;
% Concatenate
tic;
A = [];
for k = 1:n;
A = [A k];
end
t_cnc(end+1) = toc;
% Append
tic;
A = [];
for k = 1:n;
A(end+1) = k;
end
t_app(end+1) = toc;
end
t_cnc = t_cnc*1000; t_app = t_app*1000; % Convert to ms
% Fit a straight line on a log scale
P1 = polyfit(log(1:N),log(t_cnc),1); P_cnc = #(x) exp(P1(2)).*x.^P1(1);
P2 = polyfit(log(1:N),log(t_app),1); P_app = #(x) exp(P2(2)).*x.^P2(1);
% Plot and save
loglog(1:N,t_cnc,'.',1:N,P_cnc(1:N),'k--',...
1:N,t_app,'.',1:N,P_app(1:N),'k--');
grid on;
xlabel('log(N)');
ylabel('log(Elapsed time / ms)');
title('Concatenate vs. Append in MATLAB 2014b');
legend('A = [A k]',['O(N^{',num2str(P1(1)),'})'],...
'A(end+1) = k',['O(N^{',num2str(P2(1)),'})'],...
'Location','northwest');
saveas(gcf,'Cnc_vs_App_test.png');

Calling and working with variables in a loop in Matlab whose names depend on the loop index?

I need to work with variables at each iteration of a loop in Matlab whose names depend on the loop index h (e.g. if h=1 I want to use data1 e to create other variables). Is there a way to do it? I cannot use cells, because the variables are very large matrices and I have memory problems using cells.
Example:
data1=[1,2,3];
data2=[4,5,6];
data3=[7,8,9]; %they are in the workspace
for h=1:3
% A`h'=data`h'+6
% save A`h'
end
I think you should consider using structure with dynamic field names (see more details here).
For example
for h=1:n
dataName = sprintf('data%d', h); %// dynamic name
resultName = sprintf('res%d', h); %// dynamic name
base.(resName) = myFunction( base.(currentName) ); %// process data and save to result
end
The nice thing about this approach (especially if you run into memory problems) is that save and load supports this approach:
for h=1:n
dataName = sprintf('data%d', h); %// dynamic name
base = load( 'myHugeMatFile.mat', dataName ); %// loads only one variable from the file
%// now the variable is a field in base
resultName = sprintf('res%d', h); %// dynamic name
base.(resName) = myFunction( base.(currentName) ); %// process data and save to result
save( 'myResultsFile.mat', '-struct', '-append', 'base' ); %// please verify this works - I'm not 100% certain here.
end
Note how save and load can tread struct fields as different variables when needed.
As far as I understand, you either want to make a matrix:
A = [1 2 3;
4 5 6;
7 8 9];
or a vector:
A =[1; 2; 3; 4; 5; 6; 7; 8; 9];
In the first case, just writing
A = [data1;data2;data3];
should do the trick. Otherwise, look into horzcat for a horizontal vector and vertcat for a vertical one:
A = horzcat(data1,data2,data3);
A = vertcat(data1',data2',data3');

Modify script from another script

I have a script that is run inside a loop (some constants are modified at each iteration). Is there a way to comment out a line of the script without modifying the .m file?
UPDATE:
Following the answer from Floris and Matthew Simoneau, I made a function trying to do the same thing (and it works). The skipLineParameter is a string referencing to a base workspace variable that has a value of 0 (don't skipline) or 1 (skipline) :
function skipline(skipLineParameter, parameter, default)
try
a = evalin('base', skipLineParameter);
if ~a
assignin('base', parameter, default);
end
catch
assignin('base', parameter, default);
end
end
This is a possible approach - using a condition that is set in the main program to decide whether to execute a particular line in the script.
If your main program is
for ii = 1:9
skipLine3 = (mod(ii,3)==0);
runSub
end
And runSub.m looks like this:
A = 1;
B = 2;
% modified lines to trap condition where 'skipLine3' doesn't exist:
if ~exist('skipLine3', 'var') skipMe = false; else skipMe = skipLine3; end
if ~skipMe, B=B*2; end
fprintf(1, "for iteration %d B is %d\n", ii, B)
Then the output will be:
for iteration 1 B is 4
for iteration 2 B is 4
for iteration 3 B is 2
for iteration 4 B is 4
for iteration 5 B is 4
for iteration 6 B is 2
for iteration 7 B is 4
for iteration 8 B is 4
for iteration 9 B is 2
As you can see - the skipLine3 parameter, which is set in the main loop (every third iteration), affects whether line 3 (B=B*2) is executed in the script.
I think what you're looking for is a function. Here's how to turn runSub into a function:
function runSub(ii,skip)
A = 1;
B = 2;
if ~skip, B=B*2; end
fprintf(1, 'for iteration %d B is %d\n', ii, B);
You can access it in the loop like this:
for ii = 1:9
skipLine3 = (mod(ii,3)==0);
runSub(ii,skipLine3)
end

find first and last value for unique julian date

i have a data set similar to the following:
bthd = sort(floor(1+(10-1).*rand(10,1)));
bthd2 = sort(floor(1+(10-1).*rand(10,1)));
bthd3 = sort(floor(1+(10-1).*rand(10,1)));
Depth = [bthd;bthd2;bthd3];
Jday = [repmat(733774,10,1);repmat(733775,10,1);repmat(733776,10,1)];
temp = 10+(30-10).*rand(30,1);
Data = [Jday,Depth,temp];
where I have a matrix similar to 'Data' with Julian Date in the first column, depth in the second, and then temperature in the third column. I would like to find what are the first and last values are for each unique Jday. This can be obtained by:
Data = [Jday,Depth,temp];
[~,~,b] = unique(Data(:,1),'rows');
for j = 1:length(unique(b));
top_temp(j) = temp(find(b == j,1,'first'));
bottom_temp(j) = temp(find(b == j,1,'last'));
end
However, my data set is extremely large and using this loop results in long running time. Can anyone suggest a vectorized solution to do this?
use diff:
% for example
Jday = [1 1 1 2 2 3 3 3 5 5 6 7 7 7];
last = find( [diff(Jday) 1] );
first = [1 last(1:end-1)+1];
top_temp = temp(first) ;
bottom_temp = temp(last);
Note that this solution assumes Jday is sorted. If this is not the case, you may sort Jday prior to the suggested procedure.
You should be able to accomplish this using the occurrence option of the unique function:
[~, topidx, ~] = unique(Data(:, 1), 'first', 'legacy');
[~, bottomidx, ~] = unique(Data(:, 1), 'last', 'legacy');
top_temp = temp(topidx);
bottom_temp = temp(bottomidx);
The legacy option is needed if you're using MATLAB R2013a. You should be able to remove it if you're running R2012b or earlier.

Matlab: Recursion to get decision tree

I am trying to implement decision tree with recursion: So far I have written the following:
From a give data set, find the best split and return the branches, to give more details lets say I have data with features as columns of matrix and last column indicate the class of the data 1, -1.
Based on 1. I have a best feature to split along with the branches under that split, lets say based on Information gain I get feature 9 is the best split and unique values in feature 9 {1,3,5} are the branches of 9
I have figured how to get the data related to ach branch, then I need to iterate over each branch's data to get the next set of split. I am having trouble figuring this recursion.
Here is the code that I have so far, the recursion that I am doing right now doesn't look right: How can I fix this?
function [indeces_of_node, best_split] = split_node(X_train, Y_train)
%cell to save split information
feature_to_split_cell = cell(size(X_train,2)-1,4);
%iterate over features
for feature_idx=1:(size(X_train,2) - 1)
%get current feature
curr_X_feature = X_train(:,feature_idx);
%identify the unique values
unique_values_in_feature = unique(curr_X_feature);
H = get_entropy(Y_train); %This is actually H(X) in slides
%temp entropy holder
%Storage for feature element's class
element_class = zeros(size(unique_values_in_feature,1),2);
%conditional probability H(X|y)
H_cond = zeros(size(unique_values_in_feature,1),1);
for aUnique=1:size(unique_values_in_feature,1)
match = curr_X_feature(:,1)==unique_values_in_feature(aUnique);
mat = Y_train(match);
majority_class = mode(mat);
element_class(aUnique,1) = unique_values_in_feature(aUnique);
element_class(aUnique,2) = majority_class;
H_cond(aUnique,1) = (length(mat)/size((curr_X_feature),1)) * get_entropy(mat);
end
%Getting the information gain
IG = H - sum(H_cond);
%Storing the IG of features
feature_to_split_cell{feature_idx, 1} = feature_idx;
feature_to_split_cell{feature_idx, 2} = max(IG);
feature_to_split_cell{feature_idx, 3} = unique_values_in_feature;
feature_to_split_cell{feature_idx, 4} = element_class;
end
%set feature to split zero for every fold
feature_to_split = 0;
%getting the max IG of the fold
max_IG_of_fold = max([feature_to_split_cell{:,2:2}]);
%vector to store values in the best feature
values_of_best_feature = zeros(size(15,1));
%Iterating over cell to get get the index and the values under best
%splited feature.
for i=1:length(feature_to_split_cell)
if (max_IG_of_fold == feature_to_split_cell{i,2});
feature_to_split = i;
values_of_best_feature = feature_to_split_cell{i,4};
end
end
display(feature_to_split)
display(values_of_best_feature(:,1)')
curr_X_feature = X_train(:,feature_to_split);
best_split = feature_to_split
indeces_of_node = unique(curr_X_feature)
%testing
for k = 1 : length(values_of_best_feature)
% Condition to stop the recursion, if clases are pure then we are
% done splitting, if both classes have save number of attributes
% then we are done splitting.
if (sum(values_of_best_feature(:,2) == -1) ~= sum(values_of_best_feature(:,2) == 1))
if((sum(values_of_best_feature(:,2) == -1) ~= 0) || (sum(values_of_best_feature(:,2) == 1) ~= 0))
mat1 = X_train(X_train(:,5)== values_of_best_feature(k),:);
[indeces_of_node, best_split] = split_node(mat1, Y_train);
end
end
end
end
Here is the out of my code: and looks like some in my recursion I am only going depth of one branch and after that I never go back to rest of the branches
feature_to_split =
5
ans =
1 2 3 4 5 6 7 8 9
feature_to_split =
9
ans =
3 5 7 8 11
feature_to_split =
21
feature_to_split =
21
feature_to_split =
21
feature_to_split =
21
if you are interest in running this code: git
After multiple rounds of debug, I figured the answers, I hope someone will benefit from this:
for k = 1 : length(values_of_best_feature)
% Condition to stop the recursion, if clases are pure then we are
% done splitting, if both classes have save number of attributes
% then we are done splitting.
if((sum(values_of_best_feature(:,2) == -1) ~= 0) || (sum(values_of_best_feature(:,2) == 1) ~= 0))
X_train(:,feature_to_split) = [];
mat1 = X_train(X_train(:,5)== values_of_best_feature(k),:);
%if(level >= curr_level)
split_node(mat1, Y_train, 1, 2, level-1);
%end
end
end
return;