Rolling-window matrix with different intervals between columns - matlab

I have a vector of data for 21 years with daily data and want to create a rolling window of 365 days such as the next period stars one month (30 days) after the previous one. In the question, n_interval defines the difference between the first data point of the next window and the last observation of the previous series.
Let's assume my daily data start from Jan. 1 2000, then the first column would be Jan. 1, 2000 -Jan.1, 2001 and the second column starts from Feb. 1, 2000. and ends on Feb. 1, 2001. and ... the last column will cover Jan. 1, 2017 to Jan. 1, 2018. for example if:
vec = [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17]
for a given variable n_interval = 3, with window_size=5, the output matrix should look like:
mat = [[1 4 7 10 13],
[2 5 8 11 14],
[3 6 9 12 15],
[4 7 10 13 16],
[5 8 11 14 17]]

Given your example vector
vec = [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17];
we can create an indexing scheme by as follows:
First, we need to determine how many rows there will be in the mat. Assuming we want every element of vec to be expressed in mat at least once then we need to make sure that last index in the last row is greater than or equal to the size of vec. It's fairly easy to see that the index of the last column in mat is described by
last_index = n_interval*(n_rows-1) + n_columns
We want to ensure that last_index >= numel(vec). Substituting in the above expression into the inequality and solving for n_rows gives
n_rows >= (numel(vec) - n_columns)/n_interval + 1
We assign n_rows to be the ceil of this bound so that it is the smallest integer which satisfies the inequality. Now that we know the number of rows we generate the list of starting indices for each row
start_index = 1:n_interval:(n_interval*(n_rows-1)+1);
In the index matrix we want each column to be 1 plus the previous column. In other words we want to offset the column according to the array index_offset = 0:(n_interval-1).
Using bsxfun we generate the index matrix by computing the sums of all pairs between the start_index and index_offset arrays
index = bsxfun(#plus, index_offset, start_index');
The final thing we need to worry about is going out of bounds. To handle this we apply the mod function to wrap the out of bounds indicies:
index_wrapped = mod(index-1, numel(vec))+1;
Then we simply sample the vector according to index_wrapped
mat = vec(index_wrapped);
The complete code is
n_interval = 3;
n_columns = 5;
vec = 1:17;
n_rows = ceil((numel(vec)-n_columns)/n_interval + 1);
start_index = 1:n_interval:(n_interval*(n_rows-1)+1);
index_offset = 0:(n_columns-1);
index = bsxfun(#plus, index_offset, start_index');
index_wrapped = mod(index-1, numel(vec))+1;
mat = vec(index_wrapped);

Related

finding most frequent words in matlab

I have a matrix like below:
temp=[1 1 6;
1 2 6;
1 3 7;
1 4 1;
2 1 1;
2 2 2;
2 3 5;
2 4 6;
3 1 4;
3 2 3;
3 3 5;
3 4 7;];
First column represent the document_id, second column represents word_id and third column represents its occurrence in the document_id.
I want to find the top 3 words in terms of their frequencies in the entire documents. Rather than just using lots of loops, what is a better way to do this in Matlab?
I have an initial idea of using:
sorted=sortrows(temp, 2)
I guess histcount or accumarray could help me but not sure how?
WoW! This was the answer I was looking for:
sortrows(splitapply(#sum, sorted(:, 3), findgroups(sorted(:, 2))), -1)
ans =
17
14
11
11
https://www.mathworks.com/help/matlab/ref/splitapply.html
**Update1: actually NOT. Because it doesn't tell me which word_id from the second column are creating this
**Update2: while I can get the highest frequency word_id, I cannot get the top 3 frequency word_ids using the following method:
>> [index, max_val] =max(splitapply(#sum, sorted(:, 3), findgroups(sorted(:, 2))))
index =
17
max_val =
3
Correct final answer:
>> [frequencies, original_positions] = sort(splitapply(#sum, sorted(:, 3), findgroups(sorted(:, 2))), 'descend')
frequencies =
17
14
11
11
original_positions =
3
4
1
2
If you're interested in a solution with accumarray (as you expected), there you go:
[Pos, ~, ind] = unique(temp(:,2)); %Finding unique word IDs (unsorted)
freq = accumarray(ind, temp(:,3)); %Frequencies (unsorted)
To get the top 3. Sort the frequencies in descending order and extract the values at first three indices (or sort in ascending order and extract the values at last three indices).
PosFreq = sortrows([Pos freq], 2, 'descend'); %Sorting according to frequencies
Top3PosFreq = PosFreq(1:3,:); %Extracting top three frequencies
Result:
Top3PosFreq =
3 17
4 14
1 11

What wrong with the following code?

I have a 20*120 matrix. For each column in the matrix I need to find the maximum value between all the values, and then sum the remaining values. Then I need to divide the maximum value by the summation of the remaining values. I tried the following code but the result was not correct. What is the problem?
s = 1:z %z=120
for i = 1:x %x=20
maximss = max(Pres_W); %maximum value
InterFss = (sum(Pres_W))-maximss; %remaining values
SIRk(:,s) = (maximss(:,s))./(InterFss(:,s));
end
Instead of answering "what's wrong", I'll first provide a solution explaining how this should be done:
Say we have an example matrix m as follows:
m =
8 5 9 14 10 7 5
10 8 12 11 9 9 12
10 3 7 7 8 4 6
13 11 6 15 13 11 9
Find the maximum value of each column:
col_max = max(m, [], 1)
col_max =
13 11 12 15 13 11 12
Sum all elements in each column, and substract the maximum values:
col_sum = sum(m, 1) - col_max
col_sum =
28 16 22 32 27 20 20
Divide the maximum value by the sum of the other elements:
col_max ./ col_sum
ans =
0.46429 0.68750 0.54545 0.46875 0.48148 0.55000 0.60000
Or, as a one-liner:
max(m,[],1)./(sum(m,1)-max(m,[],1))
ans =
0.46429 0.68750 0.54545 0.46875 0.48148 0.55000 0.60000
By the way: Your code does exactly what you're explaining, it returns the maximum value divided by all values except the maximum value.
Notes regarding best practice:
Vectorize things like this, no need for loops.
max(m, [], 1) is the same as max(m) for 2D-arrays. However, if your matrix for some reason only have one row, it will return the maximum value of the row, thus a single number.
sum(m,1) is the same as sum(m) for 2D-arrays. However, if your matrix for some reason only have one row, it will return the sum of the row, thus a single number.

Finding middle point for consecutive number

Let say I have
A=[1 3 4 5 6 7 9 12 15 16 17 18 20 23 24 25 26];
My interest is how to find the middle value between consecutive numbers using Matlab.
For example, first group of consecutive numbers is
B=[3 4 5 6 7];
so the answer should be is 5. The 2nd group of consecutive numbers (i.e. [15 16 17 18]) should give 16 etc...
At the end, my final answer is
[5 16 24]
Here is a vectorized approach:
d = [diff(A) == 1, 0];
subs = cumsum([diff(d) == 1, 0]).*(d | [0, diff(d) == -1]) + 1
temp = accumarray(subs', A', [], #median)
final = floor(temp(2:end))
Here is some sample code which does what you are looking for. I'll let you play with the different outputs to see what they do exactly, although I wrote some comments to follow:
clear
clc
A=[1 3 4 5 6 7 9 12 15 16 17 18 20 23 24 25 26]
a=diff(A); %// Check the diff array to identify occurences different than 1.
b=find([a inf]>1);
NumElements=diff([0 b]); %//Number of elements in the sequence
LengthConsec = NumElements((NumElements~=1)) %// Get sequences with >1 values
EndConsec = b(NumElements~=1) %// Check end values to deduce starting values
StartConsec = EndConsec-LengthConsec+1;
%// Initialize a cell array containing the sequences (can have ifferent
%lengths, i.e. an array is not recommended) and an array containing the
%median values.
ConsecCell = cell(1,numel(LengthConsec));
MedianValue = zeros(1,numel(LengthConsec));
for k = 1:numel(LengthConsec)
ConsecCell{1,k} = A(StartConsec(k):1:EndConsec(k));
MedianValue(k) = floor(median(ConsecCell{1,k}));
end
%//Display the result
MedianValue
Giving the following:
MedianValue =
5 16 24
diff + strfind based approach -
loc_consec_nums = diff(A)==1 %// locations of consecutive (cons.) numbers
starts = strfind([0 loc_consec_nums],[0 1]) %// start indices of cons. numbers
ends = strfind([loc_consec_nums 0],[1 0]) %// end indices of cons. numbers
out = A(ceil(sum([starts ; ends],1)./2))%// median of each group of starts and ends
%// and finally index into A with them for the desired output

Filtering an adjacency matrix in matlab

I have got a nx3 adjacency matrix that contains nodes in the first two dimension and the correspondant weight in the third dimension. I want to filter the matrix for specific thresholds (for nodes indexing). For example, I want to keep the adjacency matrix for nodes smaller than 10.000, 20.000, etc. Which is the most efficient way to do so in matlab? I tried to do the following, find the index which correspond to nodes:
counter = 1;
for i=1: size(graph4, 1)
if (graph4(i,1) >30000) | (graph4(i,2) >30000)
bucket(counter) = i;
counter=counter+1;
end
end
Suppose the adjacency matrix is A as given below:
A =
8 1 6
3 5 7
4 9 2
11 4 9
6 8 10
7 12 5
17 10 15
12 14 16
13 18 11
If you want both column 1 and column 2 to be less than a value, you can do:
value = 10;
T = A(A(:,1) < value & A(:,2) < value, :)
T =
8 1 6
3 5 7
4 9 2
6 8 10
The following line seems to give the same results as your sample code (but it doesn't seem like it fits your description.
value = 10000;
bucket = find((A(:,1)>value) | A(:,2)>value)
I guess you made a mistake and want to increment the counter above the bucket-line and initialize it as counter = 0 before the loop? As it is now, it will be one more than the number of elements in the bucket-list.

Removing a row in a matrix, by removing an entry from a possibly different row for each column

I have a vector of values which represent an index of a row to be removed in some matrix M (an image). There's only one row value per column in this vector (i.e. if the image is 128 x 500, my vector contains 500 values).
I'm pretty new to MATLAB so I'm unsure if there's a more efficient way of removing a single pixel (row,col value) from a matrix so I've come here to ask that.
I was thinking of making a new matrix with one less row, looping through each column up until I find the row whose value I wish to remove, and "shift" the column up by one and then move onto the next column to do the same.
Is there a better way?
Thanks
Yes, there is a solution which avoids loops and is thus faster to write and to execute. It makes use of linear indexing, and exploits the fact that you can remove a matrix entry by assigning it an empty value ([]):
% Example data matrix:
M = [1 5 9 13 17
2 6 10 14 18
3 7 11 15 19
4 8 12 16 20];
% Example vector of rows to be removed for each column:
vector = [2 3 4 1 3];
[r c] = size(M);
ind = sub2ind([r c],vector,1:c);
M(ind) = [];
M = reshape(M,r-1,c);
This gives the result:
>> M =
1 5 9 14 17
3 6 10 15 18
4 8 11 16 20