how can split data into ranges - matlab

I have a list of numerical values C that represent hours and minuts: first column hours, second column minuts
C=[19 44;15 57;15 19;0 21;20 21;20 20;0 6;22 0;21 17;17 47;23 51;22 27;21 39;21 36]
I want to split them in ranges:
ranges= {[0 0; 3 59] [4 0; 7 59] [8 0; 11 59] [12 0; 15 59] [16 0; 19 59] [20 0; 23 59]}
can you help me?

You can use arrayfun to achieve this. Try the following code:
times = randi(20,1,30)+rand(1,30); %% Example data.
s = arrayfun(#(n) times(times>=0+4*n & times<(4*(n+1)-1)), 0:(24/4-1),'UniformOutput', False)'
celldisp(s)
s{1} =
1.2963 2.4468 2.7948 1.5328 1.3507
s{2} =
5.4868 5.6443 4.9390
s{3} =
9.5470 10.6868 10.1835 8.7802 8.7757 9.4359 8.3786 10.5870
s{4} =
12.8176 13.8759 13.6225
s{5} =
16.9294 17.8116
s{6} =
20.5108
If you want your values sorted:
s = arrayfun(#(n) sort(times(times>=0+4*n & times<(4*(n+1)-1))), 0:(24/4-1),'UniformOutput', False)'
celldisp(s)
s{1} =
1.2963 1.3507 1.5328 2.4468 2.7948
s{2} =
4.9390 5.4868 5.6443
s{3} =
8.3786 8.7757 8.7802 9.4359 9.5470 10.1835 10.5870 10.6868
s{4} =
12.8176 13.6225 13.8759
s{5} =
16.9294 17.8116
s{6} =
20.5108

easiest way would be to use "hist()" and "histcounts()"
as mentioned by user4694 those arent doubles but either durations or timestamps.
either way you have to transform them into doubles first i.e. with minutes() in the case of timestamps, and create the specific bins the same way. This is coded for duration
X=[duration(0,0,0) duration(4,0,0) duration(3,15,0)]; %and so on
bins=[duration(0,0,0) duration(4,0,0) duration(8,0,0)]
% if you just want the histogramm
hist(X,bins);
% if you want to know which element in X goes to which bin try
[amount_in_bin,Bins,which_bin]=histcounts(minutes(X),minutes(bins));
%or just go for the last one
[~,~,which_bin]=histcounts(minutes(X),minutes(bins));

Related

For loop for storing value in many different matrix

I have written a code that stores data in a matrix, but I want to shorten it so it iterates over itself.
The number of matrices created is the known variable. If it was 3, the code would be:
for i = 1:31
if idx(i) == 1
C1 = [C1; Output2(i,:)];
end
if idx(i) == 2
C2 = [C2; Output2(i,:)];
end
if idx(i) == 3
C3 = [C3; Output2(i,:)];
end
end
If I understand correctly, you want to extract rows from Output2 into new variables based on idx values? If so, you can do as follows:
Output2 = rand(5, 10); % example
idx = [1,1,2,2,3];
% get rows from Output which numbers correspond to those in idx with given value
C1 = Output2(find(idx==1),:);
C2 = Output2(find(idx==2),:);
C3 = Output2(find(idx==3),:);
Similar to Marcin i have another solution. Here i predefine my_C as a cell array. Output2 and idx are random generated and instead of find i just use logical adressing. You have to convert the data to type cell {}
Output2 = round(rand(31,15)*10);
idx = uint8(round(1+rand(1,31)*2));
my_C = cell(1,3);
my_C(1,1) = {Output2(idx==1,:)};
my_C(1,2) = {Output2(idx==2,:)};
my_C(1,3) = {Output2(idx==3,:)};
If you want to get your data back just use e.g. my_C{1,1} for the first group.
If you have not 3 but n resulting matrices you can use:
Output2 = round(rand(31,15)*10);
idx = uint8(round(1+rand(1,31)*(n-1)));
my_C = cell(1,n);
for k=1:n
my_C(1,k) = {Output2(idx==k,:)};
end
Where n is a positive integer number
I would recommend a slighty different approach. Except for making the rest of the code more maintainable it may also slightly speed up the execution. This due to that matlab uses a JIT compiler and eval must be recompiled every time. Try this:
nMatrices = 3
for k = 1:nMatrices
C{k} = Output2(idx==k,:);
end
As patrik said in the comments, naming variables like this is poor practice. You would be better off using cell arrays M{1}=C1, or if all the Ci are the same size, even just a 3D array M, for example, where M(:,:,1)=C1.
If you really want to use C1, C2, ... as you variable names, I think you will have to use eval, as arielnmz mentioned. One way to do this in matlab is
for i=1:3
eval(['C' num2str(idx(i)) '=[C' num2str(idx(i)) ';Output2(' num2str(i) ',:)];'])
end
Edited to add test code:
idx=[2 1 3 2 2 3];
Output2=rand(6,4);
C1a=[];
C2a=[];
C3a=[];
for i = 1:length(idx)
if idx(i) == 1
C1a = [C1a; Output2(i,:)];
end
if idx(i) == 2
C2a = [C2a; Output2(i,:)];
end
if idx(i) == 3
C3a = [C3a; Output2(i,:)];
end
end
C1=[];
C2=[];
C3=[];
for i=1:length(idx)
eval(['C' num2str(idx(i)) '=[C' num2str(idx(i)) ';Output2(' num2str(i) ',:)];'])
end
all(C1a(:)==C1(:))
all(C2a(:)==C2(:))
all(C3a(:)==C3(:))

Writing cellfun output using Matlab code for a folded data set

I have data a data file like this:
0 -7.09381e-10 7.88112e-09
1 -3.365e-09 3.96397e-08
2 -1.74014e-09 1.3715e-08
3 -6.79327e-10 4.74787e-09
4 -1.92799e-10 1.56609e-09
5 6.53422e-11 5.09169e-10
6 5.21863e-11 1.73983e-10
7 5.64361e-11 6.29614e-11
0 -9.44027e-10 8.14559e-09
1 -2.02866e-09 4.29019e-08
2 -2.2109e-10 1.57419e-08
3 4.55366e-11 5.97503e-09
4 1.70868e-10 2.28134e-09
5 1.90134e-10 8.52557e-10
6 4.4223e-11 3.2142e-10
7 7.2096e-12 1.22047e-10
and another 100 sets of data in this sequence one after another. The first column indices are time index. I fold the data and then calculate the ratio of column 2 and 3 using the following matlab code:
data_jknife =dlmread('datafile.txt',' ');
metadata = data_jknife(:,1); % a bidimensional array data_jknife, and want to access all its elements on the first column
data1 = data_jknife(:,2);%accessing all the elements on the second clomun
data2 = data_jknife(:,3);
groupedMetaData = arrayfun(#(x) metadata(x:4:end), 1:4 ,'UniformOutput',false );
groupedData1 = arrayfun(#(x) data1(x:4:end), 1:4 ,'UniformOutput',false ); %grouping data from the second column
groupedData2 = arrayfun(#(x) data2(x:4:end), 1:4 ,'UniformOutput',false );
flippedData1 = fliplr(groupedData1);
flippedData1 = flippedData1(1:2);
foldedData1 = cellfun(#(x,y) mean([x y],2), flippedData1 ,groupedData1(1:numel(flippedData1)),'UniformOutput',false);
flippedData2 = fliplr(groupedData2);
flippedData2 = flippedData2(1:4);
foldedData2 = cellfun(#(x,y) mean([x y],2), flippedData2 ,groupedData2(1:numel(flippedData2)),'UniformOutput',false);
foldedData = cellfun(#rdivide, foldedData1, foldedData2,'UniformOutput',false);
So the output of the foldedData should be like this:
0 R(0)
1 R(1)
0 R'(0)
1 R'(1)
2 R'(2)
where R is 2nd column divided by 3rd column of the folded data for corresponding time slices. Now I would like to write the output in a file in the above format. But I don't know how to do that. Could anybody please help me with that? Thanks in advance. So here is the numerical values of the operation
ok. So the folding acts like this for the first sequence of the data set:
2nd column elements(I take average of t= 0,3,4,7 data)
((-7.09381*10^-10) + (-6.79327*10^-10) + (-1.92799*10^-10) + 
(5.64361*10^-11))/4 =
-3.81268*10^-10
3rd column elements:
((7.88112*10^-09) + (4.74787*10^-09) + (1.56609*10^-09) + (6.29614*10^-11))/4 =
3.56451*10^-9
then I take average of t= 1,2,5,7 data. So the 2nd column is:
((-3.365*10^-09) + (-1.74014*10^-09) + (6.53422*10^-11) + (5.64361*10^-11))/4=
-1.24584*10^-9
3rd column is :
((3.96397*10^-08) + (1.3715*10^-08) + (5.09169*10^-10) + (1.73983*10^-10))/4=
1.35095*10^-8
So for the first sequence of data the output is :
R0 = (-3.81267725`*^-10)/(3.5645103500000007`*^-9) = -0.106962
R1 = (-1.245840425`*^-9)/(1.35095*10^-8) = -0.0922198
therefore the desired output for the 1st sequence is :
0 -0.106962
1 -0.0922198
Code
%%// input_filepath and output_filepath are the paths to the input and
%%// output files
d1 = dlmread(input_filepath,' ')
t1 = permute(reshape(d1',24,[]),[1 3 2]) %%//'
d1 = permute(reshape(t1,3,8,[]),[2 1 3])
d2 = d1([0 3 4 7]+1,[2 3],:)
d22 = d1([1 2 5 7]+1,[2 3],:)
t1 = [mean(d2) ; mean(d22)]
t2 = t1(:,1,:)./t1(:,2,:)
out = [repmat([0:size(t1,1)-1]',size(t2,3),1) t2(:)] %%//'
datacell = cellstr(num2str(out))
fid1 = fopen(output_filepath,'w');
for k = 1:size(datacell,1)
fprintf(fid1,'%s\n',datacell{k,:});
end
fclose(fid1);

find first and last value for unique julian date

i have a data set similar to the following:
bthd = sort(floor(1+(10-1).*rand(10,1)));
bthd2 = sort(floor(1+(10-1).*rand(10,1)));
bthd3 = sort(floor(1+(10-1).*rand(10,1)));
Depth = [bthd;bthd2;bthd3];
Jday = [repmat(733774,10,1);repmat(733775,10,1);repmat(733776,10,1)];
temp = 10+(30-10).*rand(30,1);
Data = [Jday,Depth,temp];
where I have a matrix similar to 'Data' with Julian Date in the first column, depth in the second, and then temperature in the third column. I would like to find what are the first and last values are for each unique Jday. This can be obtained by:
Data = [Jday,Depth,temp];
[~,~,b] = unique(Data(:,1),'rows');
for j = 1:length(unique(b));
top_temp(j) = temp(find(b == j,1,'first'));
bottom_temp(j) = temp(find(b == j,1,'last'));
end
However, my data set is extremely large and using this loop results in long running time. Can anyone suggest a vectorized solution to do this?
use diff:
% for example
Jday = [1 1 1 2 2 3 3 3 5 5 6 7 7 7];
last = find( [diff(Jday) 1] );
first = [1 last(1:end-1)+1];
top_temp = temp(first) ;
bottom_temp = temp(last);
Note that this solution assumes Jday is sorted. If this is not the case, you may sort Jday prior to the suggested procedure.
You should be able to accomplish this using the occurrence option of the unique function:
[~, topidx, ~] = unique(Data(:, 1), 'first', 'legacy');
[~, bottomidx, ~] = unique(Data(:, 1), 'last', 'legacy');
top_temp = temp(topidx);
bottom_temp = temp(bottomidx);
The legacy option is needed if you're using MATLAB R2013a. You should be able to remove it if you're running R2012b or earlier.

blocking matrix in matlab

Suppose that I have a matrix with non square size such as 30X35 and I want to split into blocks such as 4 blocks it would be like 15X18 and fill the added cell by zeros could that be done in matlab?
You can do it by copying the matrix (twice) and then setting to 0's the part you want to:
m = rand([30 35]);
mLeft = m;
mLeft(1:15, :) = 0;
mRight = m;
mRight(16:end, :) = 0;
Or it could be the other way around, first you create a matrix full of 0's and then copy the content you are interested.
mLeft = zeros(size(m));
mLeft(16:end, :) = m(16:end, :);
A generalisation could be done as:
% find the splits, the position where blocks end
splits = round(linspace(1, numRows+1, numBlocks+1));
% and for each block
for s = 1:length(splits)-1
% create matrix with 0s the size of m
mAux = zeros(size(m));
% copy the content only in block you are interested on
mAux( splits(s):splits(s+1)-1, : ) = m( splits(s):splits(s+1)-1, : )
% do whatever you want with mAux before it is overwriten on the next iteration
end
So with the 30x35 example (numRows = 30), and assuming you want 6 blocks (numBlocks = 6), splits will be:
splits = [1 6 11 16 21 26 31]
meaning that the i-th block starts at splits(i) and finsished at row splits(i-1)-1.
Then you create an empty matrix:
mAux = zeros(size(m));
And copy the content from m from column splits(i) to splits(i+1)-1:
mAux( splits(s):splits(s+1)-1, : ) = m( splits(s):splits(s+1)-1, : )
This example ilustrates if you want to have subdivision that span ALL the columns. If you want subsets of rows AND columns you will have to find the splits in both directions and then do 2 nested loops with:
for si = 1:legth(splitsI)-1
for sj = 1:legth(splitsj)-1
mAux = zeros(size(m));
mAux( splitsI(si):splitsI(si+1)-1, splitsJ(sj):splitsJ(sj+1)-1 ) = ...
m( splitsI(si):splitsI(si+1)-1, splitsJ(sj):splitsJ(sj+1)-1 );
end
end
Have you looked at blockproc ?

Matlab: Recursion to get decision tree

I am trying to implement decision tree with recursion: So far I have written the following:
From a give data set, find the best split and return the branches, to give more details lets say I have data with features as columns of matrix and last column indicate the class of the data 1, -1.
Based on 1. I have a best feature to split along with the branches under that split, lets say based on Information gain I get feature 9 is the best split and unique values in feature 9 {1,3,5} are the branches of 9
I have figured how to get the data related to ach branch, then I need to iterate over each branch's data to get the next set of split. I am having trouble figuring this recursion.
Here is the code that I have so far, the recursion that I am doing right now doesn't look right: How can I fix this?
function [indeces_of_node, best_split] = split_node(X_train, Y_train)
%cell to save split information
feature_to_split_cell = cell(size(X_train,2)-1,4);
%iterate over features
for feature_idx=1:(size(X_train,2) - 1)
%get current feature
curr_X_feature = X_train(:,feature_idx);
%identify the unique values
unique_values_in_feature = unique(curr_X_feature);
H = get_entropy(Y_train); %This is actually H(X) in slides
%temp entropy holder
%Storage for feature element's class
element_class = zeros(size(unique_values_in_feature,1),2);
%conditional probability H(X|y)
H_cond = zeros(size(unique_values_in_feature,1),1);
for aUnique=1:size(unique_values_in_feature,1)
match = curr_X_feature(:,1)==unique_values_in_feature(aUnique);
mat = Y_train(match);
majority_class = mode(mat);
element_class(aUnique,1) = unique_values_in_feature(aUnique);
element_class(aUnique,2) = majority_class;
H_cond(aUnique,1) = (length(mat)/size((curr_X_feature),1)) * get_entropy(mat);
end
%Getting the information gain
IG = H - sum(H_cond);
%Storing the IG of features
feature_to_split_cell{feature_idx, 1} = feature_idx;
feature_to_split_cell{feature_idx, 2} = max(IG);
feature_to_split_cell{feature_idx, 3} = unique_values_in_feature;
feature_to_split_cell{feature_idx, 4} = element_class;
end
%set feature to split zero for every fold
feature_to_split = 0;
%getting the max IG of the fold
max_IG_of_fold = max([feature_to_split_cell{:,2:2}]);
%vector to store values in the best feature
values_of_best_feature = zeros(size(15,1));
%Iterating over cell to get get the index and the values under best
%splited feature.
for i=1:length(feature_to_split_cell)
if (max_IG_of_fold == feature_to_split_cell{i,2});
feature_to_split = i;
values_of_best_feature = feature_to_split_cell{i,4};
end
end
display(feature_to_split)
display(values_of_best_feature(:,1)')
curr_X_feature = X_train(:,feature_to_split);
best_split = feature_to_split
indeces_of_node = unique(curr_X_feature)
%testing
for k = 1 : length(values_of_best_feature)
% Condition to stop the recursion, if clases are pure then we are
% done splitting, if both classes have save number of attributes
% then we are done splitting.
if (sum(values_of_best_feature(:,2) == -1) ~= sum(values_of_best_feature(:,2) == 1))
if((sum(values_of_best_feature(:,2) == -1) ~= 0) || (sum(values_of_best_feature(:,2) == 1) ~= 0))
mat1 = X_train(X_train(:,5)== values_of_best_feature(k),:);
[indeces_of_node, best_split] = split_node(mat1, Y_train);
end
end
end
end
Here is the out of my code: and looks like some in my recursion I am only going depth of one branch and after that I never go back to rest of the branches
feature_to_split =
5
ans =
1 2 3 4 5 6 7 8 9
feature_to_split =
9
ans =
3 5 7 8 11
feature_to_split =
21
feature_to_split =
21
feature_to_split =
21
feature_to_split =
21
if you are interest in running this code: git
After multiple rounds of debug, I figured the answers, I hope someone will benefit from this:
for k = 1 : length(values_of_best_feature)
% Condition to stop the recursion, if clases are pure then we are
% done splitting, if both classes have save number of attributes
% then we are done splitting.
if((sum(values_of_best_feature(:,2) == -1) ~= 0) || (sum(values_of_best_feature(:,2) == 1) ~= 0))
X_train(:,feature_to_split) = [];
mat1 = X_train(X_train(:,5)== values_of_best_feature(k),:);
%if(level >= curr_level)
split_node(mat1, Y_train, 1, 2, level-1);
%end
end
end
return;