I have this code:
A = [3,1,5,8]
B = [0, 0]
indexB = [1,2,2,1]
for i = 1:4
B(indexB(i)) = B(indexB(i)) + A(i)
end
So, in the end, I got
B = [11, 6]
I wonder if I can use a more efficient way to sum up instead of using the for-loop?
Classic use of accumarray. Only this time, you accumulate the entries in A then add this on top of B as B is the starting point of the summation:
B = B(:); % Force into columns
B = B + accumarray(indexB(:), A(:));
How accumarray works is quite simple. You can think of it as a miniature MapReduce paradigm. Simply put, for each data point we have, there is a key and an associated value. The goal of accumarray is to place (or bin) all of the values that belong to the same key and do some operation on all of these values. In our case, the "key" would be the values in indexB where each element is a location to index into B. The values themselves are those from A. We would then want to add up all of the values that belong to each location in indexB together. Thankfully, the default behaviour for accumarray is to add all of these values. Specifically, the output of accumarray would be an array where each position computes the sum of all values that mapped to a key. For example, the first position would be the summation of all values that mapped to the key of 1, the second position would be the summation of all values that mapped to the key of 2 and so on.
Because you are using B as a starting point, the end result would be to take the summation result from accumarray and add this on top of B thus completing the code.
Minor Note
I do have to point out that accumarray works by columns. Because you are using rows, I had to force the input so that they are columns, which is the purpose of the (:) syntax. The output will also be as a column so you can transpose that if you wish to have it in a row format.
Related
I have a vector of numbers (temperatures), and I am using the MATLAB function mink to extract the 5 smallest numbers from the vector to form a new variable. However, the numbers extracted using mink are automatically ordered from lowest to largest (of those 5 numbers). Ideally, I would like to retain the sequence of the numbers as they are arranged in the original vector. I hope my problem is easy to understand. I appreciate any advice.
The function mink that you use was introduced in MATLAB 2017b. It has (as Andras Deak mentioned) two output arguments:
[B,I] = mink(A,k);
The second output argument are the indices, such that B == A(I).
To obtain the set B but sorted as they appear in A, simply sort the vector of indices I:
B = A(sort(I));
For example:
>> A = [5,7,3,1,9,4,6];
>> [~,I] = mink(A,3);
>> A(sort(I))
ans =
3 1 4
For older versions of MATLAB, it is possible to reproduce mink using sort:
function [B,I] = mink(A,k)
[B,I] = sort(A);
B = B(1:k);
I = I(1:k);
Note that, in the above, you don't need the B output, your ordered_mink can be written as follows
function B = ordered_mink(A,k)
[~,I] = sort(A);
B = A(sort(I(1:k)));
Note: This solution assumes A is a vector. For matrix A, see Andras' answer, which he wrote up at the same time as this one.
First you'll need the corresponding indices for the extracted values from mink using its two-output form:
[vals, inds] = mink(array);
Then you only need to order the items in val according to increasing indices in inds. There are multiple ways to do this, but they all revolve around sorting inds and using the corresponding order on vals. The simplest way is to put these vectors into a matrix and sort the rows:
sorted_rows = sortrows([inds, vals]); % sort on indices
and then just extract the corresponding column
reordered_vals = sorted_rows(:,2); % items now ordered as they appear in "array"
A less straightforward possibility for doing the sorting after the above call to mink is to take the sorting order of inds and use its inverse to reverse-sort vals:
reverse_inds = inds; % just allocation, really
reverse_inds(inds) = 1:numel(inds); % contruct reverse permutation
reordered_vals = vals(reverse_inds); % should be the same as previously
I need to create a function that has the input argument n, a integer , n>1 , and an output argument v, which is a column vector of length n containing all the positive integers smaller than or equal to n, arranged in such a way that no element of the vector equals its own index.
I know how to define the function
This is what I tried so far but it doesn't work
function[v]=int_col(n)
[1,n] = size(n);
k=1:n;
v=n(1:n);
v=k'
end
Let's take a look at what you have:
[1,n] = size(n);
This line doesn't make a lot of sense: n is an integer, which means that size(n) will give you [1,1], you don't need that. (Also an expression like [1,n] can't be on the left hand side of an assignment.) Drop that line. It's useless.
k=1:n;
That line is pretty good, k is now a row vector of size n containing the integers from 1 to n.
v=n(1:n);
Doesn't make sense. n isn't a vector (or you can say it's a 1x1 vector) either way, indexing into it (that's what the parentheses do) doesn't make sense. Drop that line too.
v=k'
That's also a nice line. It makes a column vector v out of your row vector k. The only thing that this doesn't satisfy is the "arranged in such a way that no element of the vector equals its own index" part, since right now every element equals its own index. So now you need to find a way to either shift those elements or shuffle them around in some way that satisfies this condition and you'd be done.
Let's give a working solution. You should really look into it and see how this thing works. It's important to solve the problem in smaller steps and to know what the code is doing.
function [v] = int_col(n)
if n <= 1
error('argument must be >1')
end
v = 1:n; % generate a row-vector of 1 to n
v = v'; % make it a column vector
v = circshift(v,1); % shift all elements by 1
end
This is the result:
>> int_col(5)
ans =
5
1
2
3
4
Instead of using circshift you can do the following as well:
v = [v(end);v(1:end-1)];
I have a table with Ids and Dates. I would like to retrieve the index of the max date for each Id.
My initial approach is so:
varfun(#max, table, 'Grouping Variables', 'Id', 'InputVariables','Date');
This obviously gives me the date rather than the index.
I noted that the max function will return both the maxvalue and maxindex when specified:
[max_val, max_idx] = max(values);
How can I define an anonymous function using max to retrieve max_idx? I would then use it in the var_fun to get my result.
I'd prefer not to declare a cover function (as opposed to a anon func)over max() as:
1. I'm working in a script and would rather not create another function file
2. I'm unwilling to change my current script to a function
Thanks a million guys,
I'm assuming your Ids are positive integers and and your Dates are numbers.
If you wanted the maximum Date for each Id, it would be a perfect case for accumarray with the max function. In the following I'll use f to denote a generic function passed to accumarray.
The fact that you want the index of the maximum makes it a little trickier (and more interesting!). The problem is that the Dates corresponding to a given Id are passed to f without any reference to their original index. Therefore, an f based on max can't help. But you can make the indices "pass through" accumarray as imaginary parts of the Dates.
So: if you want just one maximizing index (even if there are several) for each Id:
result = accumarray(t.Id,... %// col vector of Id's
t.Date+1j*(1:size(t,1)).', ... %'// col vector of Dates (real) and indices (imag)
[], ... %// default size for output
#(x) imag(x(find(real(x)==max(real(x))),1))); %// function f
Note that the function f here maximizes the real part and then extracts the imaginary part, which contains the original index.
Or, if you want all maximizing indices for each Id:
result = accumarray(t.Id,... %// col vector of Id's
t.Date+1j*(1:size(t,1)).', ... %'// col vector of Dates (real) and indices (imag)
[], ... %// default size for output
#(x) {imag(x(find(real(x)==max(real(x)))))}); %// function f
If your Ids are strings: transform them into numeric labels using the third output of unique, and then proceed as above:
[~, ~, NumId] = unique(t.Id);
and then either
result = accumarray(NumId,... %// col vector of Id's
t.Date+1j*(1:size(t,1)).', ... %'// col vector of Dates (real) and indices (imag)
[], ... %// default size for output
#(x) imag(x(find(real(x)==max(real(x))),1))); % function f
or
result = accumarray(NumId,... %// col vector of Id's
t.Date+1j*(1:size(t,1)).', ... %'// col vector of Dates (real) and indices (imag)
[], ... %// default size for output
#(x) {imag(x(find(real(x)==max(real(x)))))}); %// function f
I don't think varfun is the right approach here, as
varfun(func,A) applies the function func separately to each variable of
the table A.
This would only make sense if you wanted to apply it to multiple columns.
Simple approach:
Simply go with the loop approach: First find the different IDs using unique, then for each ID find the indices of the maximum dates. (This assumes your dates are in a numerical format which can be compared directly using max.)
I did rename your variable table to t, as otherwise we would be overwriting the built-in function table.
uniqueIds = unique(t.Id);
for i = 1:numel(uniqueIds)
equalsCurrentId = t.Id==uniqueIds(i);
globalIdxs = find(equalsCurrentId);
[~, localIdxsOfMax] = max(t.Date(equalsCurrentId));
maxIdxs{i} = globalIdxs(localIdxsOfMax);
end
As you mentioned your Ids are actually strings instead of numbers, you will have to change the line: equalsCurrentId = t.Id==uniqueIds(i); to
equalsCurrentId = strcmp(t.Id, uniqueIds{i});
Approach using accumarray:
If you prefer a more compact style, you could use this solution inspired by Luis Mendo's answer, which should work for both numerical and string Ids:
[uniqueIds, ~, global2Unique] = unique(t.Id);
maxDateIdxsOfIdxSubset = #(I) {I(nth_output(2, #max, t.Date(I)))};
maxIdxs = accumarray(global2Unique, 1:length(t.Id), [], maxDateIdxsOfIdxSubset);
This uses nth_output of gnovice's great answer.
Usage:
Both above solutions will yield: A vector uniqueIds with a corresponding cell-array maxIdxs, in a way that maxIdxs{i} are the indices of the maximum dates of uniqueIds(i).
If you only want a single index, even though there are multiple entries where the maximum is attained, use the following to strip away the unwanted data:
maxIdxs = cellfun(#(X) X(1), maxIdxs);
I have got a matrix Nx2 which contains edges from a graph. Indexes of the matrix correspond to ids of twitter users. Their relation is the retweeted status(if a user retweets another user). Totally in my graph there exists N retweeted relations. The number of users are M. I want to transform the ids of the graph from the initial twitter ids to 1:M ids. For example to replace the first id of the graph with 1(in every line and column that exists). I want to do so, without changing again the id which already have been changed. I tried to use a for-loop combined with find function in order to tranform ids to index. However what should I do in order to avoid changing items that already have been changed? I know that my code is wrong:
counter = 0;
for index = 1:length(grph)
index1 = find(grph(:,1) == grph(index,1));
index2 = find(grph(:,2) == grph(index,2));
counter = counter+1;
grph(index1,1) = counter;
counter = counter+1;
grph(index2,2) = counter;
end
A little example which illustrates what I want, the following:
35113 45010
5695 57711
22880 33193
22880 45010
43914 35113
Desired output :
1 2
3 4
5 6
5 2
7 1
Pretty simple. Use a combination of unique and reshape. Assuming your ID matrix in your example was stored in A:
[~,~,B] = unique(A.', 'stable');
C = reshape(B, [size(A,2) size(A,1)]).';
A would be the matrix of IDs while C is your desired output. How this works is that unique's third output would give you an array of unique IDs for each value that is encountered in A. The reason why we need to transpose the result first is because MATLAB operates along the columns, and your result needs to operate along the rows. Transposing the result effectively does this. Also, you need to the 'stable' flag so that we assign IDs in the order we encounter them. Not doing 'stable' will sort the values in A first, then assign the IDs.
B will inevitably become a column vector, and so we need to reshape this back into a matrix that is the same size as your input A. Note that I need to reshape by the transpose of the result as reshape will operate among the columns. Because we were operating along the rows, I need to reshape the matrix by its transpose, and then transpose that result to get your desired output.
Example use:
A = [35113 45010
5695 57711
22880 33193
22880 45010
43914 35113]; %// Matrix defined by you
[~,~,B] = unique(A.', 'stable');
C = reshape(B, [size(A,2) size(A,1)]).';
C =
1 2
3 4
5 6
5 2
7 1
However, if sorting the IDs isn't required and you just want to have IDs per node ID, then you can just use unique as is without the stable flag.
Now, if you want to know which IDs from your graph got assigned to which IDs in the output matrix, just use the first output of unique:
[mapping, ~, B] = unique(A.', 'stable');
mapping will give you a list of all unique IDs that were encountered in your matrix. Their position identifies what ID was used to assign them into B. In other words, running this, we get:
mapping =
35113
45010
5695
57711
22880
33193
43914
This means that ID 35113 in A gets mapped to 1 in B, ID 45010 in A gets mapped to 2 in B and so on. As a more verbose illustration:
mappings = [(1:numel(mapping)).' mapping]
mappings =
1 35113
2 45010
3 5695
4 57711
5 22880
6 33193
7 43914
I can't test right now, but this should do what you want:
[~, ~, kk] = unique(A.','stable');
result = reshape(kk, fliplr(size(A))).';
You need a recent enough Matlab version, so that unique has the 'stable' option.
If you have the Communications Toolbox, the second line could be replaced by
result = vec2mat(kk, size(A,2));
There are two matrices; the first one is my input matrix
and the second one ("renaming matrix") is used to replace the values of the first one
That is, looking at the renaming matrix; 701 must be replaced by 1,...,717 must be replaced by 10,etc.. such that the input matrix becomes as such
The ? values are defined but i didn't put them. The second column of the input matrix is already sorted(ascending order from top down) but the values are not consecutive(no "710": see first pic).
The question is how to get the output matrix(last pic) from the first two.
Looks to me like it's screaming for a sparse matrix solution. In matlab you can create a sparse matrix with the following command:
SM = sparse( ri, ci, val );
where ri is the row index of the non-zero elements, ci is the corresponding column index, and val is the values.
Let's call your input matrix IM and your lookup matrix LUM, then we construct the sparse matrix:
nr = size(LUM, 1);
SM = sparse( ones(nr, 1), LUM(:, 1), LUM(:, 2) );
Now we can get your result in a single line:
newMatrix = reshape(SM(1, IM), size(IM));
almost magic.
I didn't have a chance to check this tonight - but if it doesn't work exactly as described, it should be really really close...
If the values in the first column all appear in the second column, and if all you want is replace the values in the second column by 1..n and change the values in the first column accordingly, you can do all of this with a simple call to ismember:
%# define "inputMatrix" here as the first array in your post
[~,newFirstColumn] = ismember(inputMatrix(:,1),inputMatrix(:,2));
To create your output, you'd then write
outputMatrix = [newFirstColumn,(1:length(newFirstColumn))'];
If M is the original matrix and R is the renaming matrix, here's how you do it
N = M;
for n = 1:size(M,1)
N(find(M==R(n,1))) = R(n,2);
end
Note that in this case you're creating a new matrix N with the renamed values. You don't have to do that if you like.