I want to generate a matrix of known dimension in Octave. The problem is that I do not want to initialize the matrix with zeros. The matrix will only contain 0 or 1 but the elements (cells) which do not get allocated any value, must remain blank. Plan to use such a matrix in 'Collaborative Filtering' algo.
I am new to both Ocatve and 'Collaborative Filtering' algos. Have tried to look for the solution on the net but to no avail. Keywords empty matrix on net refers to arrays with zero dimensions or char matrix with " " as values.
A numeric array cannot hold empty values. Typically in this case, people will use NaN as a placeholder value.
%// Initialize a 3D matrix of NaN values
data = nan(2, 3, 4);
size(data)
%// 2 3 4
It is then easy to differentiate a place holder value from real data. You can detect them using isnan.
The only way to create an array of empty values (and it is highly discouraged due to the performance hit) is to use cell arrays.
data = cell(2, 3, 4);
The problem is that I do not want to initialize the matrix with zeros. The matrix will only contain 0 or 1 but the elements (cells) which do not get allocated any value, must remain blank.
You are wrong. You may think the matrix only contains values 0 or 1 but actually it has a value of 0, 1, or unset. You can't have a blank value, you always need some value. Or at least not blank the way you are thinking. Taking it a very low level, all bites need to have a value (0 or 1), they can't be blank. Therefore, if you want a blank value you need to interpret some value as blank.
Your data will then have 3 states: true, false, and blank. You will so need at least 2 bits per point (note that even logical/bool data types, which only need 1 bit, actually take up 8 bits (1 byte)).
Using NaN
This may look like the simples solution but it's actually pretty bad. It will be a huge waste of memory (and you will have very large matrices if you're doing collaborative filtering).
The reason is that if you use NaN, your data needs to be of type single or double. That's at least 32 or 64 bits respectively. Remember that you only actually need 2 bits. Of course, you could make your own data type that does have a NaN value.
octave> vals = NaN (3, 3) # 3x3 matrix of type double (default)
vals =
NaN NaN NaN
NaN NaN NaN
NaN NaN NaN
octave> vals = NaN (3, 3, "single") # 3x3 matrix of type single
vals =
NaN NaN NaN
NaN NaN NaN
NaN NaN NaN
Using a cell matrix
A cell array is a data type where each cell can be any Octave value. This includes another cell array, a matrix of any dimensions, or even an empty array. You could use an empty array as blank, but this will be terribly inefficient, both for memory and speed, and you won't be able to use most functions since they will work on numeric arrays, not cell arrays.
octave> vals = cell (3, 3); # create 3x3 cell matrix
octave> vals{2,3} = true; # set value
octave> vals{2,3} = false; # set value
octave> vals{2,3} = []; # unset value
octave> cumsum (vals)
error: cumsum: wrong type argument 'cell'
octave> nnz (vals)
error: nnz: wrong type argument 'cell array'
octave> find (vals)
error: find: wrong type argument 'cell'
Using 8 bit integer
This is what I see being used most often. Using signed 8 bit, you can use 0 for blank, -1 for false, and 1 for true (or whatever makes the most sense for you).
octave> vals = zeros (3, 3, "int8");
Using a separate matrix to track blank values
If you really really want to have a matrix of 0 and 1, then you need a separate matrix to keep track of which values have been set. In such case, both matrices can be of type logical, therefore each taking up 8 bit per data points, which totals at 16 bit per data point. It also has the problem that you need to keep the two matrices in sync.
octave> vals = false (3, 3);
octave> set_vals = false (size (vals));
Making your own class
Either using the new classdef (will require Octave 4.0.0) of the old #class type, you can encapsulate any of the strategy above (I would personally use an 8 bit integer) on its own class. This moves the logic of knowing which value (-1 or 0) means blank if you use a signed 8 bit. Or if you prefer to use a separate matrix for blank values, then move the logic of keeping the values in sync to a setter method.
Related
Say I have a row vector defined as follows:
x = [5 6.7 8.9]
The above code results in a row vector with all the elements being typecasted as floating points(including the 5 in the 1st index).
x =
5.0000 6.7000 8.9000
Is there any way I can prevent the typecasting of the 5 (present in the first place), i.e. is there any way I can get my vector as follows:
x =
5 6.7000 8.9000
without the four decimal points after the 5.
In Matlab and Octave, double is the default value for all numeric values, even if some of those values might be whole numbers (integers). And for numeric arrays, all elements must be of the same type.
In general, you should just leave all your numeric values as doubles, and use formatting controls (like those provided by printf() and its friends) to display them how you want.
In this case, you could do something like:
x = [5 6.7 8.9];
printf('%d %.04f %.04f\n', x);
Or to be more flexible:
printf('%g ', x); printf('\n');
I have two column arrays with the same number of rows:
>> size(values)
ans =
12915 1
>> size(positions)
ans =
12915 1
values contains some NaN entries:
>> sum(isnan(values))
ans =
2500
while `positions' is filled with integer values:
>> sum(isnan(positions))
ans =
0
Some values in the two arrays:
values(randi(length(values), 10, 1))
ans =
0.0290
0.1000
0.0430
NaN
0.0310
0.9700
0.3170
0.1750
NaN
0.1410
positions(randi(length(positions), 10, 1))
ans =
5
8
12
11
10
6
10
3
9
4
If I try and create a table with those two columns I get an uncomprehensible (for me) error message:
>> table(values, positions)
Subscript indices must either be real positive integers or logicals.
I tried and removed the NaN values without success: I keep getting the same error message. However, I cannot understand the error message.
What's the problem?
You have very likely created a variable called table. If you type whos table you will probably get a result such as:
whos table
Name Size Bytes Class Attributes
table 1x1 8 double
You can solve this by simply clearing the table variable: clear table. This will leave the function but delete the variable.
Note that you have created the table variable somewhere, thus it's likely that you also use it somewhere (especially if you have a large project with mostly scripts and not functions). Just deleting the variable may result in a broken code. Therefore, I suggest you search for the variable name in your scripts and make sure you don't break anything.
The table(a,b) notation indexes the matrix table with a, and b. Since your values are non-integer, you get this error message.
I suppose, what you intend to do is to merge the two column vectors. For this you can use [ ], as
table = [values positions]
This will still contain nan values, but I guess this will not bother you
CORRECTION
If however you would like to add the values in table at their position, you may use
table(positions)=values
I have a 50x25 cell array in the variable raw_data. Each cell contains a 200x150 matrix. I have a few NaN values scattered between all those values and I want to set them to zeros to make sure they do not interfere at later stages.
I have tried the following:
raw_data(cellfun(#(x) any(isnan(x), raw_data, 'UniformOutput', false)) = 0
When running the script, I get "Function 'subindex' is not defined for values of class 'cell'". Can anyone help me, please?
Thanks in advance!
How about this:
cellfun(#(x) nansum(x,ndims(x)+1), raw_data, 'UniformOutput', false)
Note if you're certain you'll only have 2D matrices in raw_data you can replace the ndims(x)+1 with 3.
The idea is to use nansum to sum along the 3rd dimension as this will preserve the shape of the first 2 dimensions and luckily nansum seems to convert NaN to 0 when all the elements being summed are NaN
I'm writing a MATLAB function to read out data into an n-dimensional array (variable dimension size). I need to be able to access a specific point in the Matrix (to write to it or read it, for example), but I don't know ahead of time how many indexes to specify.
Currently I have a current_point vector which I iterate through to specify each index, and a max_points vector which specifies the size of the array. So, if for example I wanted a 3-dimensional array of size 1000-by-15-by-3, max_points = [1000 15 3], and current_point iterates from [1, 1, 1] to [1000, 15, 3] ([1, 1, 1] -> [1000, 1, 1] -> [1, 2, 1] -> [1000, 2, 1] ->...). What I'd like to be able to do is feed current_point as an index to the matrix like so:
output_matrix(current_point) = val
But apparently something like output_matrix([1 2 3]) = val will just set outputmatrix(1:3) = 30. I can't just use dummy variables because sometimes the matrix will need 3 indexes, other times 4, other times 2, etc, so a vector of variable length is really what I need here. Is there a simple way to use a vector as the points in an index?
Using the function sub2ind to create a linear index is the typical solution to this problem, as shown in this closely-related question. You could also compute a linear index yourself instead of calling sub2ind.
However, your case may be simpler than those in the other questions I linked to. If you're only ever indexing a single point with your current_point vector (i.e. it's just an n-element vector of subscripts into your n-dimensional matrix), then you can use a simple solution where you convert current_point to a cell array of subscripts using the function num2cell and use it to create a comma-separated list of indices. For example:
current_point = [1 2 3 ...]; % A 1-by-n array of subscripts
subCell = num2cell(current_point); % A 1-by-n cell array of subscripts
output_matrix(subCell{:}) = val; % Update the matrix point
The operation subCell{:} creates the equivalent of typing subCell{1}, subCell{2}, ..., which is the equivalent of typing current_point(1), current_point(2), ....
I know it is too late but for anybody who will find this topic. the easiest way that work for me is to use: diag(A (x(:),y(:)) );
unfortunately this works only if you need to get values from the matrix, not for changing values
You can use the sub2ind function to get the linear index from the subscript.
Example:
A=magic(4)
A =
16 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
selectElement={2,3}; %# get the element at position 2,3 in A.
indx=sub2ind(size(A),selectElement{:});
A(indx)
ans =
10
In the above example, I've stored the subscripts (can be any number of dimensions) as a cell. If you have it stored as a vector, simply use num2cell() to convert it to a cell.
You can now easily assign a value to this as A(indx)=value;. I've used different variables than yours to keep the answer general, but the idea is the same and you just need to replace the variable names.
You also mentioned in your post that you're looping from (1,1,1) till some value, (1000,15,3) and assigning a value to each of these points. If you're looping along the columns, you can replace this entire operation with a vectorized solution.
Let finalElement={1000,15,3} be the final step of the loop. As before, find the linear index as
index=sub2ind(size(A),finalElement{:});
Now if you have the values you assign in the loop stored as a single vector, values, you can simply assign it in a single step as
A(1:index)=values;
I have created a sparse matrix using MEX and also created a sparse matrix using MATLAB. To fill in the values of the matrix i have used same formula.
Now to check if the both the matrices are equal I used result=(A==B). result returns 1 for all indices, which implies that all the matrix elements are equal.
But if I do find(A-B) it returns some indices, which indicates that at these indices the values are non-zero. How is this possible?
Note: When i compare the value at these indices it shows equal !
I'm guessing you have values of infinity cropping up in your matrices at the same points. For example:
>> A = Inf;
>> B = Inf;
>> A == B
ans =
1 %# They are treated as equal...
>> A-B
ans =
NaN %# ...but their difference actually results in NaN...
>> find(A-B)
ans =
1 %# ...which is treated as a non-zero value.
The discrepancy here results from the fact that certain operations involving infinity result in NaN values. You can check to see if you have any infinities in A and B by using the function ISINF like so:
any(isinf(A(:)))
any(isinf(B(:)))
And if you get a value of 1 (i.e. true), then the presence of infinities is likely the source of your discrepancy.