I'm trying to prune any sequence of length 3 or more from a vector of numbers in Matlab (or Octave). For example, given the vector dataSet,
dataSet = [1 2 3 7 9 11 13 17 18 19 20 22 24 25 26 28 30 31];
removing all sequences of length 3 or more would yield prunedDataSet:
prunedDataSet = [7 9 11 13 22 28 30 31 ];
I can brute force a solution, but I suspect there is a more succinct (and perhaps efficient) way to do it using vector/matrix operations, but I always get confused about whether something yields an index or the value at said index. Suggestions?
Here's the brute force method I came up with:
dataSet = [1 2 3 7 9 11 13 17 18 19 20 22 24 25 26 28 30 31];
benign = [];
for i = 1:size(dataSet,2)-2;
if (dataSet(i) == (dataSet(i+1)-1) && dataSet(i) == dataSet(i+2)-2);
benign = [benign i ] ;
end;
end;
remove = [];
for i = 1:size(benign,2);
remove = [remove benign(i) benign(i)+1 benign(i)+2 ];
end;
remove = unique(remove);
prunedDataSet = setdiff(dataSet, dataSet(remove));
Here's a solution using DIFF and STRFIND
%# define dataset
dataSet = [1 2 3 7 9 11 13 17 18 19 20 22 24 25 26 28 30 31];
%# take the difference. Whatever is part of a sequence will have difference 1
dds = diff(dataSet);
%# sequences of 3 lead to two consecutive ones. Sequences of 4 are like two sequences of 3
seqIdx = findstr(dds,[1 1]);
%# remove start, start+1, start+2
dataSet(bsxfun(#plus,seqIdx,[0;1;2])) = []
dataSet =
7 9 11 13 22 28 30 31
Here's an attempt using vector-matrix notation:
s1 = [(dataSet(1:end-1) == dataSet(2:end)-1), false];
s2 = [(dataSet(1:end-2) == dataSet(3:end)-2), false, false];
s3 = s1 & s2;
s = s3 | [false, s3(1:end-1)] | [false, false, s3(1:end-2)];
dataSet(~s)
The idea is: s1 is true for all positions where a number a appears before a+1. s2 is true for all positions where a appears two positions before a+2. Then s becomes true where both the previous conditions are met. Then, we build s such that every true value is propagated to its two successors.
Finally, dataSet(~s) keeps all the values for which the above conditions are false, that is, it keeps numbers that are not part of a 3-sequence.
Related
Want to convert the alphabet to numerical values and transform it back to alphabets using some mathematical techniques like fast Fourier transform in MATLAB.
Example:
The following is the text saved in "text2figure.txt" file
Hi how r u am fine take care of your health
thank u very much
am 2.0
Reading it in MATLAB:
data=fopen('text2figure.txt','r')
d=fscanf(data,'%s')
temp = fileread( 'text2figure.txt' )
temp = regexprep( temp, ' {6}', ' NaN' )
c=cellstr(temp(:))'
Now I wish to convert cell array with spaces to numerical values/integers:
coding = 'abcdefghijklmnñopqrstuvwxyz .,;'
str = temp %// example text
[~, result] = ismember(str, coding)
y=result
result =
Columns 1 through 18
0 9 28 8 16 24 28 19 28 22 28 1 13 28 6 9 14 5
Columns 19 through 36
28 21 1 11 5 28 3 1 19 5 28 16 6 28 26 16 22 19
Columns 37 through 54
28 8 5 1 12 21 8 28 0 0 21 8 1 14 11 28 22 28
Columns 55 through 71
23 5 19 26 28 13 22 3 8 0 0 1 13 28 0 29 0
Now I wish to convert the numerical values back to alphabets:
Hi how r u am fine take care of your health
thank u very much
am 2.0
How to write a MATLAB code to return the numerical values in the variable result to alphabets?
Most of the code in the question doesn't have any useful effects. These three lines are the ones that lead to result:
str = fileread('test2figure.txt');
coding = 'abcdefghijklmnñopqrstuvwxyz .,;';
[~, result] = ismember(str, coding);
ismember returns, in the second output argument, the indices into coding for each element of str. Thus, result are indices that we can use to index into coding:
out = coding(result);
However, this does not work because some elements of str do not occur in coding, and for those elements ismember returns 0, which is not a valid index. We can replace the zeros with a new character:
coding = ['*',coding];
out = coding(result+1);
Basically, we're shifting each code by one, adding a new code for 1.
One of the characters we're missing here is the newline character. Thus the three lines have become one line. You can add a code for the newline character by adding it to the coding table:
str = fileread('test2figure.txt');
coding = ['abcdefghijklmnñopqrstuvwxyz .,;',char(10)]; % char(10) is the newline character
[~, result] = ismember(str, coding);
coding = ['*',coding];
out = coding(result+1);
All of this is easier to achieve just using the ASCII code table:
str = fileread('test2figure.txt');
result = double(str);
out = char(result);
I want to create a matrix which has distinct rows selected from another matrix.
For Example, I have a 10x3 matrix A
A =
1 2 3
4 5 6
7 8 9
10 11 12
13 14 15
16 17 18
19 20 21
22 23 24
25 26 27
28 29 30
Now I want to create a new matrix B of size 2 X 3 from A in a iterative process in such a way that the matrix B should consist different rows in each iteration (max iteration = 5)
My Pseudo-code:
for j=1:5
create matrix 'B' by selecting 2 rows randomly from 'A', which should be different
end
You could use randperm to mess up the rows randomly and then take two rows in each iteration successively in order.
iterations = 4;
permu = randperm(size(A,1));
out = A(permu(1:iterations*2),:);
for ii = 1:iterations
B = out(2*ii - 1:2*ii,:)
end
Results:
B =
22 23 24
25 26 27
B =
1 2 3
13 14 15
B =
19 20 21
16 17 18
B =
7 8 9
10 11 12
I made a matrix in Matlab, say,
A = magic(5);
A =
17 24 1 8 15
23 5 7 14 16
4 6 13 20 22
10 12 19 21 3
11 18 25 2 9
Now I found the indices I want using the find function as:
ind = find(A(:,5)>3 & A(:,4)>= 8);
ind =
1
2
3
Now if I want to get a subset of matrix A for those indices using B = A(ind) function, I only get the first column of the matrix:
B = A(ind)
B =
17
23
4
How can I get all the columns as subset??
Oops ... I got it
B = A(ind,:);
I have a routine that returns a list of integers as a vector.
Those integers come from groups of sequential numbers; for example, it may look like this:
vector = 6 7 8 12 13 14 15 26 27 28 29 30 55 56
Note that above, there are four 'runs' of numbers (6-8, 12-15, 26-30 & 55-56). What I'd like to do is forward the longest 'run' of numbers to a new vector. In this case, that would be the 26-30 run, so I'd like to produce:
newVector = 26 27 28 29 30
This calculation has to be performed many, many times on various vectors, so the more efficiently I can do this the better! Any wisdom would be gratefully received.
You can try this:
v = [ 6 7 8 12 13 14 15 26 27 28 29 30 55 56];
x = [0 cumsum(diff(v)~=1)];
v(x==mode(x))
This results in
ans =
26 27 28 29 30
Here is a solution to get the ball rolling . . .
vector = [6 7 8 12 13 14 15 26 27 28 29 30 55 56]
d = [diff(vector) 0]
maxSequence = 0;
maxSequenceIdx = 0;
lastIdx = 1;
while lastIdx~=find(d~=1, 1, 'last')
idx = find(d~=1, 1);
if idx-lastIdx > maxSequence
maxSequence = idx-lastIdx;
maxSequenceIdx = lastIdx;
end
d(idx) = 1;
lastIdx=idx;
end
output = vector(1+maxSequenceIdx:maxSequenceIdx+maxSequence)
In this example, the diff command is used to find consecutive numbers. When numbers are consecutive, the difference is 1. A while loop is then used to find the longest group of ones, and the index of this consecutive group is stored. However, I'm confident that this could be optimised further.
Without loops using diff:
vector = [6 7 8 12 13 14 15 26 27 28 29 30 55 56];
seqGroups = [1 find([1 diff(vector)]~=1) numel(vector)+1]; % beginning of group
[~, groupIdx] = max( diff(seqGroups)); % bigger group index
output = vector( seqGroups(groupIdx):seqGroups(groupIdx+1)-1)
output vector is
ans =
26 27 28 29 30
Without loops - should be faster
temp = find ( ([(vector(2:end) - vector(1:end-1))==1 0])==0);
[len,ind]=max(temp(2:end)-temp(1:end-1));
vec_out = vector(temp(ind)+1:temp(ind)+len)
Quick MATLAB question.
What would be the best/most efficient way to select a certain number of elements, 'n' in windows of 'm'. In other words, I want to select the first 50 elements of a sequence, then elements 10-60, then elements 20-70 ect.
Right now, my sequence is in vector format(but this can easily be changed).
EDIT:
The sequences that I am dealing with are too long to be stored in my RAM. I need to be able to create the windows, and then call upon the window that I want to analyze/preform another command on.
Do you have enough RAM to store a 50-by-nWindow array in memory? In that case, you can generate your windows in one go, and then apply your processing on each column
%# idxMatrix has 1:50 in first col, 11:60 in second col etc
idxMatrix = bsxfun(#plus,(1:50)',0:10:length(yourVector)-50); %'#
%# reshapedData is a 50-by-numberOfWindows array
reshapedData = yourVector(idxMatrix);
%# now you can do processing on each column, e.g.
maximumOfEachWindow = max(reshapedData,[],1);
To complement Kerrek's answer: if you want to do it in a loop, you can use something like
n = 50
m = 10;
for i=1:m:length(v)
w = v(i:i+n);
% Do something with w
end
There's a slight issue with the description of your problem. You say that you want "to select the first 50 elements of a sequence, then elements 10-60..."; however, this would translate to selecting elements:
1-50
10-60
20-70
etc.
That first sequence should be 0-10 to fit the pattern which of course in MATLAB would not make sense since arrays use one-indexing. To address this, the algorithm below uses a variable called startIndex to indicate which element to start the sequence sampling from.
You could accomplish this in a vectorized way by constructing an index array. Create a vector consisting of the starting indices of each sequence. For reuse sake, I put the length of the sequence, the step size between sequence starts, and the start of the last sequence as variables. In the example you describe, the length of the sequence should be 50, the step size should be 10 and the start of the last sequence depends on the size of the input data and your needs.
>> startIndex = 10;
>> sequenceSize = 5;
>> finalSequenceStart = 20;
Create some sample data:
>> sampleData = randi(100, 1, 28)
sampleData =
Columns 1 through 18
8 53 10 82 82 73 15 66 52 98 65 81 46 44 83 9 14 18
Columns 19 through 28
40 84 81 7 40 53 42 66 63 30
Create a vector of the start indices of the sequences:
>> sequenceStart = startIndex:sequenceSize:finalSequenceStart
sequenceStart =
10 15 20
Create an array of indices to index into the data array:
>> index = cumsum(ones(sequenceSize, length(sequenceStart)))
index =
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
>> index = index + repmat(sequenceStart, sequenceSize, 1) - 1
index =
10 15 20
11 16 21
12 17 22
13 18 23
14 19 24
Finally, use this index array to reference the data array:
>> sampleData(index)
ans =
98 83 84
65 9 81
81 14 7
46 18 40
44 40 53
Use (start : step : end) indexing: v(1:1:50), v(10:1:60), etc. If the step is 1, you can omit it: v(1:50).
Consider the following vectorized code:
x = 1:100; %# an example sequence of numbers
nwind = 50; %# window size
noverlap = 40; %# number of overlapping elements
nx = length(x); %# length of sequence
ncol = fix((nx-noverlap)/(nwind-noverlap)); %# number of sliding windows
colindex = 1 + (0:(ncol-1))*(nwind-noverlap); %# starting index of each
%# indices to put sequence into columns with the proper offset
idx = bsxfun(#plus, (1:nwind)', colindex)-1; %'
%# apply the indices on the sequence
slidingWindows = x(idx)
The result (truncated for brevity):
slidingWindows =
1 11 21 31 41 51
2 12 22 32 42 52
3 13 23 33 43 53
...
48 58 68 78 88 98
49 59 69 79 89 99
50 60 70 80 90 100
In fact, the code was adapted from the now deprecated SPECGRAM function from the Signal Processing Toolbox (just do edit specgram.m to see the code).
I omitted parts that zero-pad the sequence in case the sliding windows do not evenly divide the entire sequence (for example x=1:105), but you can easily add them again if you need that functionality...