I have a dataset consisting of 80k rows. It's stored in a cell.
In the third column the values should go such as -3 in the first row, -2 in the second -1 in the third and so on all through the whole dataset.
As :
-3
-2
-1
-3
-2
-1
...
Now I want to check whether or not this number sequence is actually being followed throughout the whole dataset. I know for a fact it isn't and therefore I want to make some kind of loop that automatically removes the whole rows of data that doesn't follow the -3, -2, -1 steps.
My initial thought was to use diff command to change index, but can't seem to get it right.
Second was to create a loop that would remove data every time it didn't follow the specific number sequence.
for i = 1:length(Dataset)
if Dataset{272,1}(i,3) == -3
continue
else
eraseidx = Dataset{272,1}(i,3)
if Dataset{272,1}(i+1,3) == -2
continue
else
eraseidx = Dataset{272,1}(i,3)
if Dataset{272,1}(i+2,3) == -1
continue
else
eraseidx = Dataset{272,1}(i,3)
end
end
end
end
(Reason for choosing Dataset{272,1} is that I know there is a fault).
Anyone have a method for solving this?
Your loop has one main problem: you nest all conditions under each other, so you check for i+1 and i+2 only if the first condition where false, but then you check again for them as you come to the next iteration of the loop. If Dataset{272,1}(i+1,3) == -2, then you already know that on the next iteration (when you compare Dataset{272,1}(i,3) to -3) it will return false, but is it false or true?! according to your loop it depends on when you ask this...
Here are two options to correct this, still using a loop. The first way it to loop on all k (I replace i with k to not override the imaginary unit in MATLAB), but compare Dataset{272,1}(i+1,3) to a different number:
c = [-1 -3 -2];
for k = 1:length(Dataset)
if Dataset{272,1}(k,3) ~= c(mod(k,3)+1);
eraseidx = Dataset{272,1}(k,3);
end
end
Another option is to compare the data in triplets:
for k = 1:3:length(Dataset)
if Dataset{272,1}(k,3) ~= -3
eraseidx = Dataset{272,1}(k,3);
end
if Dataset{272,1}(k+1,3) ~= -2
eraseidx = Dataset{272,1}(k+1,3);
end
if Dataset{272,1}(k+2,3) ~= -1
eraseidx = Dataset{272,1}(k+2,3);
end
end
Right now you only save the problematic records to eraseidx, but you overwrite it each time any of the conditions is fulfilled, so you need to also index eraseidx:
eraseidx = zeros(length(Dataset),1);
c = [-1 -3 -2];
for k = 1:length(Dataset)
if Dataset{272,1}(k,3) ~= c(mod(k,3)+1);
eraseidx(k) = Dataset{272,1}(k,3);
end
end
However, if you only want to delete those records you can save the k alone as logical indexing, and not all the record. Also, it seems to me that your loop should run on 1:length(Dataset{272,1}) so:
eraseidx = false(length(Dataset{272,1}),1);
c = [-1 -3 -2];
for k = 1:length(Dataset{272,1})
eraseidx(k) = Dataset{272,1}(k,3) ~= c(mod(k,3)+1);
end
And this could be easily vectorised to:
c = [-1 -3 -2];
k = 1:length(Dataset{272,1});
eraseidx = Dataset{272,1}(k,3) ~= c(mod(k,3)+1).';
So now eraseidx will be a logical vector, so that Dataset{272,1}(eraseidx,:) will be all the records from Dataset{272,1} to be deleted. Edit: To delete them you just write: Dataset{272,1}(eraseidx,:) = [].
Vectorize on all Dataset
In case all the data in all cells has the same size and shape, you can use cell2mat to convert it to a ND-array, and then perform a vectorized operation on all data at once:
data = reshape(cell2mat(Dataset.'),[],3,numel(Dataset)); % convert to 3D array
c = [-1 -3 -2];
k = 1:size(data,1);
correct = c(mod(k,3)+1).'; % the correct values to compare
eraseidx = squeeze(bsxfun(#ne,data(:,3,:),correct));
Now each column in eraseidx corresponds to a third column in Dataset, so the result for Dataset(k,1) is given in eraseidx(:,1).
To delete all true records in eraseidx you can use a simple loop:
for k = 1:numel(Dataset)
Dataset{k}(eraseidx(:,k),:) = [];
end
Notice that you cannot delete the records directly from data because arrays cannot have different number of rows in each element of the third dimension.
If Dataset has to be a cell-array...
If you're using cells to keep the data in Dataset because it is not in the same size and shape, you can adjust the above loop for all the process:
c = [-1 -3 -2];
for k = 1:numel(Dataset)
eraseidx = Dataset{k,1}(:,3) ~= c(mod(1:length(Dataset{k,1}),3)+1).';
Dataset{k,1}(eraseidx,:) = [];
end
Or you can use cellfun (which is basically a compact loop):
c = [-1 -3 -2];
delfun = #(M) M(M(:,3)==c(mod(1:length(M),3)+1).',:);
Dataset_fixed = cellfun(delfun,Dataset,'UniformOutput',false);
delfun is an anonymous function that retrieve only the wanted records from each cell in Dataset. Now Dataset_fixed is Dataset without the deleted rows.
Related
I have a problem as follows -
Script: Using ’If’ condition inside a ’for’ loop
Create a new script and save it using your name and matric ID.
Use ’for’ loop to create two matrices A and B. The size of the matrices are same, and the matrices are of 5 by 4 size.
Each element of the matrix A will be determined in the ’for’ loop using the following formula A(i,j) = a*i +b*j; where a and b are
the last two digits of your matric ID (here a = 2 and b = 5).
i and j are the row number and column number of the matrices respectively.
The elements of the matrix B will be changed from 0 to 1 ’if’ the corresponding element of the matrix A is even number.
I tried to solve in this way but it won't work. What is the correct way to check if an element is even or not in MATLAB matrices?
clc
% clear all
A = zeros(5,4);
B = zeros(5,4);
for j = 1:5
for i = 1:4
A(i,j) = 2*i + 5 * j;
if mod(B(i,j),2) == 0
A(i,j) = 1;
end
end
end
These lines in your code
if mod(B(i,j),2) == 0
A(i,j) = 1;
end
set A(i,j) to 1 if mod(B(i,j)) == 0. This is always true since you have initialized B with zeros. You shloud do it the other way around, test if mod(A(i,j),2) == 0 and set B(i,j) = 1
I have a 1x24 vector (a). I should define a command in Matlab which compare all 24 values of a vector (a) with a certain value (mean (b)) and if the vector (a) item is greater than certain value (mean (b)), ''I'' sets 1 and if the vector item is less than certain value ''I'' sets 0. I wrote the below code:
for i=1:length(a)
if a(i) >= mean(b)
I = 1;
else
I = 0;
end
end
But it implements the comparison only for the last index of vector a and sets I=0. How can I fix the command that do the comparison for all indexes of vector a?
In MATLAB, you can use the following syntax to do so:
I = a >= mean(b);
If you want to use your code for doing so, you'll need to initialize I as a vector, and modify its indices as follows:
I = zeros(length(a),1)
for ii=1:length(a)
if a(ii) >= mean(b)
I(ii) = 1;
else
I(ii) = 0;
end
end
You should read about logical indexing in matlab. You don't need for loops for what you are doing. For example, if you have,
rng(5);
a = rand(1,10);
b = 0.5;
then, I = a > b; will return a logical array with zeros and ones, where one indicates the position in the array where the given condition is satisfied,
I =
0 1 0 1 0 1 1 1 0 0
Using these indices, you can modify your original array. For example, if you wish to change all values of a greater than b to be 10, you would simply do,
a(a > b) = 10;
Specifically, if you need indices where the condition is satisfied, you can use, find(a > b), which in this example will give you,
ans =
2 4 6 7 8
I have the following code and it is taking a very long time to go through it... I dont think its an infinite loop but it is as follows:
Y = zeros(1069,30658);
D1 = LagOp({0,1,1,1},'Lags',[0,1,2,1]);
for n = 2:30658;
for j = 2:1063
if filter(D1,Ret((D1.Degree + j),n),'Initial',Ret(2:D1.Degree,n)) < 0;
Y(j+3,n) = -1*Ret(j+3,n);
else
Y(j+3,n)=Ret(j+3,n) ;
end
end
end
Basically I want to flip the sign of the current element in the matrix if the previous 3 elements before it add up to being less than 0. Otherwise to leave it alone. Could it be the ... else statement causing the trouble here ?
Edited: I figured out a more efficient way but I am working on figuring out how to change it to 3 values ahead instead with the following code:
for n = 1:30658
Y(:,n) = RET(:,n);
t = conv(Y(:,n), [1 1 1], 'valid');
mask = [false(3,1); t(1:end-1)<0];
Y(mask,n) = -Y(mask,n);
end`
So So for example, if I have a some numbers in an array given by -1 -2 -3 -4 -5 -6 -1 -2 3 -2 -1, then at the number -4 it would look at the sum of the previous 3 values (-1 - 2 - 3 < 0) and then change the sign of the value 3 ahead so -1 becomes positive and this would continue but the change in signs will not having any affect on the sums iterating for every row.
Thanks,
Okay, so I have a script that will produce my vector of repeated integers of a certain interval, but now theres a particular instance where I need to make sure that once it is shuffled, the numbers do not repeat. So for example, I produced a vector of repeating 1-5, 36 times, shuffled. How do I ensure that there are no repeated numbers after shuffling? And to make things even more complex, I need to produce two such vectors that do not ever have the same value at the same index. For example, lets say 1:5 was repeated twice for these vectors, so then this would be what I'm looking for:
v1 v2
4 2
2 4
3 2
5 3
4 5
1 4
5 1
1 5
3 1
2 3
I made that right now by taking an example of 1 vector and just shifting it off by 1 to create another vector that will satisfy the requirements, but in my situation, that wont actually work because I can't have them be systematically dependent like that.
So I tried a recursive technique to make the script start over if the vectors did not make the cut and as expected, that did not go over so well. I hit my maximum recursive iterations and I've realized this is clearly not the way to go. Is there some other alternative?
EDIT:
So I found a way to satisfy some of the conditions I needed above in the following code:
a = nchoosek(1:5,2);
b = horzcat(a(:,2),a(:,1));
c = vertcat(a,b);
cols = repmat(c,9,1);
cols = cols(randperm(180),:);
I just need to find a way to shuffle cols that will also enforce no repeating numbers in columns, such that cols(i,1) ~= cols(i+1,1) and cols(i,2) ~= cols(i+1,2)
This works, but it probably is not very efficient for a large array:
a = nchoosek(1:5, 2);
while (any(a(1: end - 1, 1) == a(2: end, 1)) ...
|| any(a(1: end - 1, 2) == a(2: end, 2)))
random_indices = randperm(size(a, 1));
a = a(random_indices, :);
end
a
If you want something faster, the trick is to logically insert each row in a place where your conditions are satisfied, rather than randomly re-shuffling. For example:
n1 = 5;
n2 = 9;
a = nchoosek(1:n1, 2);
b = horzcat(a(:,2), a(:,1));
c = vertcat(a, b);
d = repmat(c, n2, 1);
d = d(randperm(n1 * n2), :);
% Perform an "insertion shuffle"
for k = 2: n1 * n2
% Grab row k from array d. Walk down the rows until a position is
% found where row k does not repeat with its upstairs or downstairs
% neighbors.
m = 1;
while (any(d(k,:) == d(m,:)) || any(d(k,:) == d(m+1,:)))
m = m + 1;
end
% Insert row k in the proper position.
if (m < k)
ind = [ 1: m k m+1: k-1 k+1: n1 * n2 ];
else
ind = [ 1: k-1 k+1: m k m+1: n1 * n2 ];
end
d = d(ind,:);
end
d
One way to solve this problem is to think both vectors as being created as follows:
For every row of arrays v1 and v2
Shuffle the array [1 2 3 4 5]
Set the values of v1 and v2 at the current row with the first and second value of the shuffle. Both values will always be different.
Code:
s = [1 2 3 4 5];
Nrows = 36;
solution = zeros(Nrows,2);
for k=1:Nrows
% obtain indexes j for shuffling array s
[x,j] = sort(rand(1,5));
%row k takes the first two values of shuffled array s
solution(k,1:2) = s(j(1:2));
end
v1 = solution(:,1);
v2 = solution(:,2);
Main edit: random => rand,
With this method there is no time wasted in re-rolling repeated numbers because the first and second value of shuffling [1 2 3 4 5] will always be different.
Should you need more than two arrays with different numbers the changes are simple.
This is half a question and half a challenge to the matlab gurus out there:
I'd like to have a function take in a logical array (false/true) and give the beginning and ending of all the contiguous regions containing trues, in a struct array.
Something like this:
b = getBounds([1 0 0 1 1 1 0 0 0 1 1 0 0])
should return
b = 3x1 struct array with fields:
beg
end
and
>> b(2)
ans =
beg: 4
end: 6
I already have an implementation, but I don't really know how to deal with struct arrays well so I wanted to ask how you would do it - I have to go through mat2cell and deal, and when I have to deal with much larger struct arrays it becomes cumbersome. Mine looks like this:
df = diff([0 foo 0]);
a = find(df==1); l = numel(a);
a = mat2cell(a',ones(1,l))
[s(1:l).beg] = deal(a{:});
b = (find(df==-1)-1);
b = mat2cell(b',ones(1,l))
[s(1:l).end] = deal(b{:});
I don't see why you are using mat2cell, etc. You are making too much of the problem.
Given a boolean row vector V, find the beginning and end points of all groups of ones in the sequence.
V = [1 0 0 1 1 1 0 0 0 1 1 0 0];
You get most of it from diff. Thus
D = diff(V);
b.beg = 1 + find(D == 1);
This locates the beginning points of all groups of ones, EXCEPT for possibly the first group. So add a simple test.
if V(1)
b.beg = [1,b.beg];
end
Likewise, every group of ones must end before another begins. So just find the end points, again worrying about the last group if it will be missed.
b.end = find(D == -1);
if V(end)
b.end(end+1) = numel(V);
end
The result is as we expect.
b
b =
beg: [1 4 10]
end: [1 6 11]
In fact though, we can do all of this even more easily. A simple solution is to always append a zero to the beginning and end of V, before we do the diff. See how this works.
D = diff([0,V,0]);
b.beg = find(D == 1);
b.end = find(D == -1) - 1;
Again, the result is as expected.
b
b =
beg: [1 4 10]
end: [1 6 11]
By the way, I might avoid the use of end here, even as a structure field name. It is a bad habit to get into, using matlab keywords as variable names, even if they are only field names.
This is what I went with:
df = diff([0 foo 0]);
s = struct('on',num2cell(find(df==1)), ...
'off',num2cell(find(df==-1)-1));
I forgot about num2cell and the nice behavior of struct with cell arrays.