erase last element of a vector by reference - pass-by-reference

I would link to shrink a vector, which is passed by reference.
To do so, I use erase on the last element(s):
require(Rcpp)
cppFunction("void f(IntegerVector &x){
x[1] = 3;
}")
cppFunction("void g(IntegerVector &x){
x.erase(1);
}")
a <- c(1L, 2L)
a
f(a)
a
g(a)
a
Output is:
[1] 1 2
[1] 1 3
[1] 1 3
Function f shows that I can make it to pass by reference.
However, for some reason I cannot remove the last element with function g.
(I can check that the size of the vector does decrease in C++.)
So either:
It is currently not possible with Rcpp (for good reasons, I am sure, such as memory reallocation).
I do it wrong.
So is it possible to remove the last element(s) of a vector in C++, and get it back in R?
EDIT:
I want to pass by reference because I use large data sets (here, a tibble with hundreds of millions of rows).
Copying is non-optimal here.

Related

Unexpected matlab behaviour when using vectorised assignment

I've come across some unexpected behaviour in matlab that I can't make sense of when performing vectorised assignment:
>> q=4;
>> q(q==[1,3,4,5,7,8])
The logical indices contain a true value outside of the array bounds.
>> q(q==[1,3,4,5,7,8])=1
q =
4 0 1
Why does the command q(q==[1,3,4,5,7,8]) result in an error, but the command q(q==[1,3,4,5,7,8])=1 work? And how does it arrive at 4 0 1 being the output?
The difference between q(i) and q(i)=a is that the former must produce the value of an array element; if i is out of bounds, MATLAB chooses to give an error rather than invent a value (good choice IMO). And the latter must write a value to an array element; if i is out of bounds, MATLAB chooses to extend the array so that it is large enough to be able to write to that location (this has also proven to be a good choice, it is useful and used extensively in code). Numeric arrays are extended by adding zeros.
In your specific case, q==[1,3,4,5,7,8] is the logical array [0,0,1,0,0,0]. This means that you are trying to index i=3. Since q has a single value, reading at index 3 is out of bounds, but we can write there. q is padded to size 3 by adding zeros, and then the value 1 is written to the third element.

How to merge two lists(or arrays) while keeping the same relative order?

For example,
A=[a,b,c,d]
B=[1,2,3,4]
my question is: how to generate all possible ways to merge A and B, such that in the new list we can have a appears before b, b appears before c,etc., and 1 appears before 2, 2 appears before 3,etc.?
I can think of one implementation:
We choose 4 slots from 8,then for each possible selection, there are 2 possible ways--A first or B first.
I wonder is there a better way to do this?
EDIT:
I've just learned a more intuitive way--use recursion.
For each spot, there are two possible cases, either taken from A or taken from B; keep recursing until A or B is empty, and concatenate the remaining.
If the relative order is different than what constitutes a sorted list (I assume it is, because otherwise it would not be a problem), then you need to formalize the initial order. Multiple ways to do that. the easiest being remembering the index of each element in each list. Example: valid position for a is 1 in the first array [...]
Then you could just go ahead and join the lists, then generate all the permutations of elements. Any valid permutation is one that keeps the order relationship of the new indexes with the order you have stored
Example of one valid permutation array
a12b3cd4
You can know and check that this is valid permutation because the index of element 'a' is smaller than the index of b, and so on. and you know the indexes must be smaller because this is what you have formulated at the first step
Similarly an invalid permutation array is
ba314cd2
same way of checking

MATLAB spending an incredible amount of time writing a relatively small matrix

I have a small MATLAB script (included below) for handling data read from a CSV file with two columns and hundreds of thousands of rows. Each entry is a natural number, with zeros only occurring in the second column. This code is taking a truly incredible amount of time (hours) to run what should be achievable in at most some seconds. The profiler identifies that approximately 100% of the run time is spent writing a matrix of zeros, whose size varies depending on input, but in all usage is smaller than 1000x1000.
The code is as follows
function [data] = DataHandler(D)
n = size(D,1);
s = max(D,1);
data = zeros(s,s);
for i = 1:n
data(D(i,1),D(i,2)+1) = data(D(i,1),D(i,2)+1) + 1;
end
It's the data = zeros(s,s); line that takes around 100% of the runtime. I can make the code run quickly by just changing out the s's in this line for 1000, which is a sufficient upper bound to ensure it won't run into errors for any of the data I'm looking at.
Obviously there're better ways to do this, but being that I just bashed the code together to quickly format some data I wasn't too concerned. As I said, I fixed it by just replacing s with 1000 for my purposes, but I'm perplexed as to why writing that matrix would bog MATLAB down for several hours. New code runs instantaneously.
I'd be very interested if anyone has seen this kind of behaviour before, or knows why this would be happening. Its a little disconcerting, and it would be good to be able to be confident that I can initialize matrices freely without killing MATLAB.
Your call to zeros is incorrect. Looking at your code, D looks like a D x 2 array. However, your call of s = max(D,1) would actually generate another D x 2 array. By consulting the documentation for max, this is what happens when you call max in the way you used:
C = max(A,B) returns an array the same size as A and B with the largest elements taken from A or B. Either the dimensions of A and B are the same, or one can be a scalar.
Therefore, because you used max(D,1), you are essentially comparing every value in D with the value of 1, so what you're actually getting is just a copy of D in the end. Using this as input into zeros has rather undefined behaviour. What will actually happen is that for each row of s, it will allocate a temporary zeros matrix of that size and toss the temporary result. Only the dimensions of the last row of s is what is recorded. Because you have a very large matrix D, this is probably why the profiler hangs here at 100% utilization. Therefore, each parameter to zeros must be scalar, yet your call to produce s would produce a matrix.
What I believe you intended should have been:
s = max(D(:));
This finds the overall maximum of the matrix D by unrolling D into a single vector and finding the overall maximum. If you do this, your code should run faster.
As a side note, this post may interest you:
Faster way to initialize arrays via empty matrix multiplication? (Matlab)
It was shown in this post that doing zeros(n,n) is in fact slow and there are several neat tricks to initializing an array of zeros. One way is to accomplish this by empty matrix multiplication:
data = zeros(n,0)*zeros(0,n);
One of my personal favourites is that if you assume that data was not declared / initialized, you can do:
data(n,n) = 0;
If I can also comment, that for loop is quite inefficient. What you are doing is calculating a 2D histogram / accumulation of data. You can replace that for loop with a more efficient accumarray call. This also avoids allocating an array of zeros and accumarray will do that under the hood for you.
As such, your code would basically become this:
function [data] = DataHandler(D)
data = accumarray([D(:,1) D(:,2)+1], 1);
accumarray in this case will take all pairs of row and column coordinates, stored in D(i,1) and D(i,2) + 1 for i = 1, 2, ..., size(D,1) and place all that match the same row and column coordinates into a separate 2D bin, we then add up all of the occurrences and the output at this 2D bin gives you the total tally of how many values at this 2D bin which corresponds to the row and column coordinate of interest mapped to this location.

Empty objects in MATLAB [duplicate]

This question already has answers here:
Difference between [] and [1x0] in MATLAB
(4 answers)
Closed 7 years ago.
In many cases I have seen that MATLAB would return empty objects and if you look at their size, they would be something like 1 x 0 or0 x 1.
An example is the following piece of code :
img = zeros(256); % Create a square zero image of dimension 256 X 256
regions = detectMSERFeatures(img);
size(regions)
If you look at the size of regions, it will by 0 X 1. My questions are the following. Some of these questions can be overlapping.
What is the meaning of such dimensions ?
What can be said about the memory layout of such objects ? The reason I am asking about memory layout is because MATLAB allows you to write the following statement: temp = zeros(1,0);
Why can't MATLAB simply return an empty constant like NULL in such cases instead of returning weirdoes of size 1 x 0 ?
Arrays in MATLAB can have any of their dimensions of size zero - I guess that may seem odd initially, but they're just arrays like any other.
You can create them directly:
>> a = double.empty(2,0,3,0,2)
a =
Empty array: 2-by-0-by-3-by-0-by-2
or using other array creation functions such as zeros, ones, rand and so on.
Note that, as is obvious from the above, empty arrays still have a class - you can create them with double.empty, uint8.empty, logical.empty and so on. The same is also true for user-defined classes.
It's very useful to have such arrays, rather than just a NULL element. Without them, you would need to spend a lot of programming effort to check for edge cases where you had a NULL rather than an array, and you wouldn't be able to distinguish between arrays that were NULL because they had no rows, and arrays that were NULL because they had no columns.
In addition, they're useful for initializing arrays. For example, let's say you have an array that needs to start empty but get filled later, and you know that it's always going to have three rows but a variable number of columns. You can then initialize it as double.empty(3,0), and you know that your initial value will always pass any checks on the number of rows your array has. That wouldn't work if you initialized it to [] (which is zero by zero), or to a NULL element.
Finally, you can also multiply them in the same way as non-empty arrays. It may be surprising to you that:
>> a = double.empty(2,0)
a =
Empty matrix: 2-by-0
>> b = double.empty(0,3)
b =
Empty matrix: 0-by-3
>> a*b
ans =
0 0 0
0 0 0
but if you think it through, it's just a logical and necessary application/extension of the regular rules for matrix multiplication.
As to how they're stored in memory - again, they're stored just like regular MATLAB arrays. I can't recall the exact details (look in the documentation for mxArray), but it's basically a header giving the dimensions (some of which may be zero), followed by a list of the elements in column-major order (which in this case is an empty list).

Copy matrix rows matlab

Lets say i have a matrix A of 300x65. the last column(65th) contains ordered values (1,2,3). the first 102 elements are '1', the second 50 elements are '2' and the remainder will be '3'.
I have another matrix B, which is 3x65 and i want to copy the first row of B by the number of '1's in matrix A. The second row of B should be copied by the number of '2's in in matrix A and the 3th row should be copied by the remaining value of matrix A. By doing this, matrix B should result in a 300x65 matrix.
I've tried to use the repmat function of matlab with no succes, does anyone know how to do this?
There are many inconsistencies in your problem
first if you copy 1 row of B for every element of A(which will end up happening by your description) that will result in a matrix 19500x65
secondly copy its self is a vague term, do you mean duplicate? do you want to store the copied value into a new var?
what I gathered from your problem is you want to preform some operation between A and B to create a matrix and store it in B which in itself will cause the process to warp as it goes if you do not have another variable to store the result in
so i suggest using a third variable c to store the result in and then if you need it to be in b set b = C
also for whatever process you badly described I recommend learning to use a 'for' loop effectively because it seems like that is what you would need to use
syntax for 'for' loop
for i = [start:increment:end]
//loops for the length of [start:increment:end]
//sets i to the nth element of [start:increment:end] where n is the number of times the loop has run
end
If I understand your question, this should do it
index = A(:,end); % will be a column of numbers with values of 1, 2, or 3
newB = B(index,:); % B has 3 rows, which are copied as required by "index"
This should result in newB having the same number of rows as A and the same number of columns as the original B