I'm trying to convert a tibble in to a pixset. The tibble has columns x y consisting of column and row coordinates - basically it is the coordinates of a point I want to encode in to a pixset.
Example
myx <- c(541, 541, 542, 542)
myy <- c(324, 325, 326, 340)
mytib <- tibble(x = as.integer(myx), y = as.integer(myy))
mytib
# A tibble: 4 x 2
x y
<int> <int>
1 541 324
2 541 325
3 542 326
4 542 340
The actual situation is reading a huge csv file with hundreds of thousands of points.
I want to turn this into a pixset. I can make a blank pixset, called px, with no points and then one coordinate at a time, set values to true:
px <-px.none(im)
where im is a dummy cimg file of the appropriate dimensions (1280x1280x1x1)
for (i in c(1:numrows)){
px[[mytib[[i,1]], mytib[[i,2]], 1, 1]] <-TRUE
}
where numrows is the number of rows in mytib.
It works but my concern is with a large tibble with hundreds of thousands of rows, this loop will take a long time.
There must be a way to vectorize this.
Any suggestions would be appreciated
Thanks
Related
I am creating a sparse matrix
sp = sparse(I,J,Val,X,Y)
My Val matrix is a ones matrix. Much to my surprise the sp matrix does not contain only zeros and ones. I suppose that this happens because in some cases there are duplicates in I,J. I mean the sp(1,1) is set to 1 2 times, and this makes it 2.
Question 1: Is my assumption true? Instead of overwriting the value, does MATLAB really add it?
Question 2: How can we get around this, given that it would be very troublesome to manipulate I and J. Something I can think of, is to use find (thus guaranteeing uniqueness) and then recreate the matrix using ones once more. Any better suggestion?
Question 1: Is my assumption true? Instead of overwriting the value, does Matlab really add it?
Correct. If you have duplicate row and column values each with their own values, MATLAB will aggregate them all into the same row and column location by adding them.
This is clearly seen in the documentation but as a reproducible example, suppose I have the following row and column locations and their associated values at these locations:
i = [6 6 6 5 10 10 9 9].';
j = [1 1 1 2 3 3 10 10].';
v = [100 202 173 305 410 550 323 121].';
Note that these are column vectors as this shape is the expected input. In a neater presentation:
>> [i j v]
ans =
6 1 100
6 1 202
6 1 173
5 2 305
10 3 410
10 3 550
9 10 323
9 10 121
We can see that there are three values that get mapped to location (6, 1), two values that get mapped to location (10, 3) and finally two that get mapped to location (9, 10).
By creating the sparse matrix and displaying it, we thus get:
>> S = sparse(i,j,v)
S =
(6,1) 475
(5,2) 305
(10,3) 960
(9,10) 444
As you can see, the three values mapped to (6, 1) are summed: 100 + 202 + 173 = 475. You can verify this with the other duplicate row and column locations.
Question 2: How can we get around this, given that it would be very troublesome to manipulate I and J. Something I can think of, is to use find (thus guaranteeing uniqueness) and then recreate the matrix using ones once more. Any better suggestion?
There are two possible ways to mitigate this if it is truly your desire to only have a binary matrix.
The first way which may be more preferable to you as you mentioned that manipulating the row and column locations is troublesome is to create the matrix that you have now, but then convert it to logical so that any values that are non-zero are set to 1:
>> S = S ~= 0
S =
10×10 sparse logical array
(6,1) 1
(5,2) 1
(10,3) 1
(9,10) 1
If you require that the precision of the matrix be back in its original double form, cast the result after you convert to logical:
>> S = double(S ~= 0)
S =
(6,1) 1
(5,2) 1
(10,3) 1
(9,10) 1
The second way if you wish is to work on your row and column locations so that you filter out any indices that are non-unique, then create a vector of ones for val that is as long as the unique row and column locations. You can use the unique function to help you do that. Concatenate the row and column locations in a two column matrix and specify that you want to operate on 'rows'. This means that each row is considered an input rather than individual elements in a matrix. Once you find the unique row and column locations, use these as input for creating the sparse matrix:
>> unique_vals = unique([i j], 'rows')
unique_vals =
5 2
6 1
9 10
10 3
>> vals = ones(size(unique_vals, 1));
>> S = sparse(unique_vals(:, 1), unique_vals(:, 2), vals)
S =
(6,1) 1
(5,2) 1
(10,3) 1
(9,10) 1
I have Two Arrays in MATLAB, Say A and B contains random values as below. Both arrays A and B always contain a pair; 2,4,6 or 8 or more elements (even number only) and A always has less elements than B. And elements in both arrays are pre-sorted.
A=[152 271];
B=[107 266 314 517 538 732];
I want to check the range of values of all pairs (one pair, 152-271 in this example) in A against all pairs of B. And expand/modify the values of pairs of B as such, if it exceed the B values. In this example, first to compare pair 152-271 of A with first pair of B (i.e. 107-266). As 152 is greater than 107, and 271 is greater than 266. We will modify 266 values of first pair of B with 271 to wholly include the range of first pair of A within B. Both intervals (range) in A and B should somewhat overlap to modify the B values.We will stop when there are no elements to check in A. The end result will be like this:
A=[152 271];
B=[107 271 314 517 538 732];
In this image below Green,Rad and Yellow represent A,B and final B (only modified) values respectively.
You can use find with the option last to identify the indices in B
A=[152 271 280 320];
B=[107 266 314 517 538 732];
for interval = 1:numel(A)-1
%get the index of the lower interval bound in B
index=find(B<=A(interval),1,'last');
%increase the upper interval bound if nessecary
B(index+1)=max(B(index+1),A(interval+1));
end
As you did not specify any corner cases (Intervals in A exceeds B) I did not conciser them. If they can happen, you need to extend the code.
A=[152 271];
B=[107 266 314 517 538 732];
mat=[A B];
A1 = vec2mat(mat,2)
n = size(mat,1);
[t,p] = sort(mat(:));
z = cumsum(accumarray((1:2*n)',2*(p<=n)-1));
z1 = [0;z(1:end-1)];
A2 = [t(z1==0 & z>0),t(z1>0 & z==0)]
% Reference Link: (http://www.mathworks.com/matlabcentral/newsreader/view_thread/171594) by Roger Stafford
I have a 333 x 333 adjacency matrix which consists of values that I would like to average according to the identity of each cell, which is defined in a separate 333x1 vector. There are a total of 13 different groups defined in the second vector, so ideally, I'd be able to calculate a new 13 x 13 matrix in which each cell contained the average value of the corresponding values from the larger matrix.
matrix_1: 333 x 333 --> contains values for each pairwise interaction
vector_2: 333 x 1 --> contains the identity (range: 1 - 13) for each of the elements in matrix_1 (elements are the same in both the rows and columns)
ideal output = matrix_2: 13 x 13 --> contains values in each cell which reflect the mean score for all examples of the specific identity comparison.
e.g. matrix_2(1,1) --> should contain mean score of all 1 to 1 values from matrix_1
e.g. matrix_2(1,2) --> should contain mean score of all 1 to 2 values (and 2 to 1 values) from matrix_1
Thanks in advance
Mac
I'm not 100% certain from your description, but I guess you want:
[I,J] = ndgrid(V);
out = accumarray([I(:),J(:)], M(:), [], #mean);
I have the following matrix which keeps track of the starting and ending points of data ranges (the first column represents "starts" and the second column represents the "ends"):
myMatrix = [
162 199; %// this represents the range 162:199
166 199; %// this represents the range 166:199
180 187; %// and so on...
314 326;
323 326;
397 399;
419 420;
433 436;
576 757;
579 630;
634 757;
663 757;
668 757;
676 714;
722 757;
746 757;
799 806;
951 953;
1271 1272
];
I need to eliminate all the ranges (ie. rows) which are contained within a larger range present in the matrix. For example the ranges [166:199] and [180:187] are contained within the range [162:199] and thus, rows 2 and 3 would need to be removed.
The solution I thought of was to calculate a sort of "running" max on the second column to which subsequent values of the column are compared to determine whether or not they need to be removed. I implemented this with the use of a for loop as follows:
currentMax = myMatrix(1,2); %//set first value as the maximum
[sizeOfMatrix,~] = size(myMatrix); %//determine the number of rows
rowsToRemove = false(sizeOfMatrix,1); %//pre-allocate final vector of logicals
for m=2:sizeOfMatrix
if myMatrix(m,2) > currentMax %//if new max is reached, update currentMax...
currentMax = myMatrix(m,2);
else
rowsToRemove(m) = true; %//... else mark that row for removal
end
end
myMatrix(rowsToRemove,:) = [];
This correctly removes the "redundant" ranges in myMatrix and produces the following matrix:
myMatrix =
162 199
314 326
397 399
419 420
433 436
576 757
799 806
951 953
1271 1272
Onto the questions:
1) It would seem that there has to be a better way of calculating a "running" max than a for loop. I looked into accumarray and filter, but could not figure out a way to do it with those functions. Is there a potential alternative that skips the for loop (some kind of vectorized code that is more efficient)?
2) Is there a completely different (that is, more efficient) way to accomplish the final goal of removing all the ranges that are contained within larger ranges in myMatrix? I don't know if I'm over-thinking this whole thing...
Approach #1
bsxfun based brute-force approach -
myMatrix(sum(bsxfun(#ge,myMatrix(:,1),myMatrix(:,1)') & ...
bsxfun(#le,myMatrix(:,2),myMatrix(:,2)'),2)<=1,:)
Few explanations on the proposed solution:
Compare all starts indices against each other for "contained-ness" and similarly for ends indices. Note that the "contained-ness" criteria has to be for either of these two :
Greater than or equal to for starts and lesser than or equal to for ends
Lesser than or equal to for starts and greater than or equal to for ends.
I just so happen to go with the first option.
See which rows satisfy at least one "contained-ness" and remove those to have the desired result.
Approach #2
If you are okay with an output that has sorted rows according to the first column and if there are lesser number of local max's, you can try this alternative approach -
myMatrix_sorted = sortrows(myMatrix,1);
col2 = myMatrix_sorted(:,2);
max_idx = 1:numel(col2);
while 1
col2_selected = col2(max_idx);
N = numel(col2_selected);
labels = cumsum([true ; diff(col2_selected)>0]);
idx1 = accumarray(labels, 1:N ,[], #(x) findmax(x,col2_selected));
if numel(idx1)==N
break;
end
max_idx = max_idx(idx1);
end
out = myMatrix_sorted(max_idx,:); %// desired output
Associated function code -
function ix = findmax(indx, s)
[~,ix] = max(s(indx));
ix = indx(ix);
return;
I ended up using the following for the "running maximum" problem (but have no comment on its efficiency relative to other solutions):
function x = cummax(x)
% Cumulative maximum along dimension 1
% Adapted from http://www.mathworks.com/matlabcentral/newsreader/view_thread/126657
% Is recursive, but magically so, such that the number of recursions is proportional to log(n).
n = size(x, 1);
%fprintf('%d\n', n)
if n == 2
x(2, :) = max(x);
elseif n % had to add this condition relative to the web version, otherwise it would recurse infinitely with n=0
x(2:2:n, :) = cummax(max(x(1:2:n-1, :), x(2:2:n, :)));
x(3:2:n, :) = max(x(3:2:n, :), x(2:2:n-1, :));
end
I got the following results after applying:[h,bins]=hist(data), such that, the data will contain the LBP (Local Binary Pattern) values.
h =
221 20 6 4 1 1 2 0 0 1
bins =
Columns 1 through 7
8.2500 24.7500 41.2500 57.7500 74.2500 90.7500 107.2500
Columns 8 through 10
123.7500 140.2500 156.7500
I want to ask the following:
Does the first bin represent the values 0-8.25 and the second bin the values 8.26-24.75, and so forth?
For the h value 221, does it mean that we have computed 221 an LBP value ranging from 0-8.25?
1) No. The bin location is in the center value of the bin, that is, for the first bin the values are 0-16.5, the second bin is 16.5-33, etc. Use histc if it is more natural to specify bin edges instead of centers.
2) h(1)=221 means that from your entire data set (that has 256 elements according to your question), 221 elements had values ranging between 0-16.5 .