Matlab csvread: 450k+ columns - matlab

I am attempting to read a file with Matlab's csvread. It has 10 rows and ~900,000 columns. In the case of such a large number of columns, the function returns a vector, as opposed to a matrix of proper dimensions.
In order to test this, I truncated it to two sizes using cut, and when there are 457,000 columns we have the same behavior:
>> A = csvread( 'test.csv' );
>> size(A)
ans =
4570000 1
But when cut down to 45,700 columns, we have the desired behavior:
>> A = csvread( 'test.csv' );
>> size(A)
ans =
10 45700
Of course, Matlab is capable of handling matrices of the size 10x457,000, and I suppose I can use fscanf in a loop (I feel like this would be less inefficient?), but I was wondering if anyone had any insight.
EDIT: I suppose I could also just make the vector into a matrix of proper dimensions--but I still would like to understand this seemingly strange behavior of the matrix

Related

Matlab's sparse function explanation

this is my first time posting anything so please be nice!
I'm studying a code about a random walker algorithm and i got lost with the use of sparse to make the sparse laplacian matrix of a point and edge set. I'm planning to make my own code of the sparse function, but i'm having problems understanding how it works and the output of it so any help would be perfect.
Thank you all !
A sparse matrix is a special type of "matrix" in matlab, which is conceptually equivalent to a normal matrix, but works differently 'under the hood'.
They are called "sparse", because they are usually used in situations where one would expect most elements of the matrix to contain zeros, and only a few non-zero elements.
The advantage of using this type of special object is that the memory it takes to create such an object depends primarily on the number of nonzero elements contained, rather than the size of the "actual" matrix.
By contrast, a normal (full) matrix needs memory allocated relative to its size. So for instance, a 1000x1000 matrix of numbers (so called 'doubles') will take roughly 8Mb bytes to store (1 million elements at 8 bytes per 'double'), even if all the elements are zero. Observe:
>> a = zeros(1000,1000);
>> b = sparse(1000,1000);
>> whos
Name Size Bytes Class Attributes
a 1000x1000 8000000 double
b 1000x1000 8024 double sparse
Now, assign a value to each of them at subscripts (1,1) and see what happens:
>> a(1,1) = 1 % without a semicolon, this will flood your screen with zeros
>> b(1,1) = 1
b =
(1,1) 1
As you can see, the sparse matrix only keeps track of nonzero values, and the zeros are 'implied'.
Now lets add some more elements:
>> a(1:100,1:100) = 1;
>> b(1:100,1:100) = 1;
>> whos
Name Size Bytes Class Attributes
a 1000x1000 8000000 double
b 1000x1000 168008 double sparse
As you can see, the allocated memory for a hasn't changed, because the size of the overall array hasn't changed. Whereas for b, because it now contains more nonzero values, it takes up more space in memory.
In general most sparse matrices should work with the same operations as normal matrices; the reason for this is that most 'normal' functions are explicitly defined to also accept sparse matrices, but treat them differently under the hood (i.e. they try to arrive at the same result, but using a different approach internally to do so, one that is more suitable to sparse matrices). e.g.:
>> c = sum(a(:))
c =
10000
>> d = sum(b(:))
d =
(1,1) 1000000
You can 'convert' a full matrix directly to a sparse one with the sparse command, and a sparse matrix back to a "full" matrix with the full command:
>> sparse(c)
ans =
(1,1) 10000
>> full(d)
ans =
1000000

How to dump non-zero elements of a sparse matrix into a non-sparse matrix

I could have also asked how to dump a sparse matrix to CSV basically I have a graph represented as a sparse matrix and I want to export the graph to CSV to open it in Gephi. So my sparse matrix is something like:
(23,35) 1
(35,78) 1
(78,23) 1
etc
I would like to convert this into a vector like:
[23,35,1;35,78,1;78,23,1]
I would love to know there's a simple one-liner to do it but I can't make my brain find it so thanks a lot for the help.
If you alternatively known something as sparse2csv('graph.csv',Adj) that would generate in a file:
23,35,1
35,78,1
78,23,1
Then that would work too.
The way that comes to mind isn't a one-liner (though it could easily be made one via a function) but simply uses two function calls: find and nonzeros:
A = sparse([23;35;78],[35;78;23],[1;1;1]);
[r,c] = find(A~=0);
v = nonzeros(A);
compact = [r,c,v];
disp(compact);
which returns
78 23 1
23 35 1
35 78 1
As Luis Mendo points out in the comments, a simpler solution exists since find will return the nonzero values in a third output argument (the given array is no longer logical either now):
[r,c,v] = find(A);
compact = [r,c,v];

Storing arrays of different length in one matrix Matlab

I have a number of arrays of different sizes, e.g.
A=1:10; B=1:9 etc.
Now I want to save these arrays into one big matrix. In this example I would want it to be 2x10, with NaN for the remaining spot not filled by array B. I know how to preallocate this matrix with NaN(size), but my question here is how to get these arrays in with their different lengths. It must be a super simple command, but I just can't seem to think of it!
You need to specify the column indices:
>> BigMat = NaN(2,10);
>> BigMat(1, 1:numel(A) ) = A;
>> BigMat(2, 2:numel(B) ) = B;
Also take a look at cell structures. They can contain a variety of different data types. For example
BigMat{1}=A;
BigMat{2}=B;
BigMat{3}='Some text string'

Matlab fast neighborhood operation

I have a Problem. I have a Matrix A with integer values between 0 and 5.
for example like:
x=randi(5,10,10)
Now I want to call a filter, size 3x3, which gives me the the most common value
I have tried 2 solutions:
fun = #(z) mode(z(:));
y1 = nlfilter(x,[3 3],fun);
which takes very long...
and
y2 = colfilt(x,[3 3],'sliding',#mode);
which also takes long.
I have some really big matrices and both solutions take a long time.
Is there any faster way?
+1 to #Floris for the excellent suggestion to use hist. It's very fast. You can do a bit better though. hist is based on histc, which can be used instead. histc is a compiled function, i.e., not written in Matlab, which is why the solution is much faster.
Here's a small function that attempts to generalize what #Floris did (also that solution returns a vector rather than the desired matrix) and achieve what you're doing with nlfilter and colfilt. It doesn't require that the input have particular dimensions and uses im2col to efficiently rearrange the data. In fact, the the first three lines and the call to im2col are virtually identical to what colfit does in your case.
function a=intmodefilt(a,nhood)
[ma,na] = size(a);
aa(ma+nhood(1)-1,na+nhood(2)-1) = 0;
aa(floor((nhood(1)-1)/2)+(1:ma),floor((nhood(2)-1)/2)+(1:na)) = a;
[~,a(:)] = max(histc(im2col(aa,nhood,'sliding'),min(a(:))-1:max(a(:))));
a = a-1;
Usage:
x = randi(5,10,10);
y3 = intmodefilt(x,[3 3]);
For large arrays, this is over 75 times faster than colfilt on my machine. Replacing hist with histc is responsible for a factor of two speedup. There is of course no input checking so the function assumes that a is all integers, etc.
Lastly, note that randi(IMAX,N,N) returns values in the range 1:IMAX, not 0:IMAX as you seem to state.
One suggestion would be to reshape your array so each 3x3 block becomes a column vector. If your initial array dimensions are divisible by 3, this is simple. If they don't, you need to work a little bit harder. And you need to repeat this nine times, starting at different offsets into the matrix - I will leave that as an exercise.
Here is some code that shows the basic idea (using only functions available in FreeMat - I don't have Matlab on my machine at home...):
N = 100;
A = randi(0,5*ones(3*N,3*N));
B = reshape(permute(reshape(A,[3 N 3 N]),[1 3 2 4]), [ 9 N*N]);
hh = hist(B, 0:5); % histogram of each 3x3 block: bin with largest value is the mode
[mm mi] = max(hh); % mi will contain bin with largest value
figure; hist(B(:),0:5); title 'histogram of B'; % flat, as expected
figure; hist(mi-1, 0:5); title 'histogram of mi' % not flat?...
Here are the plots:
The strange thing, when you run this code, is that the distribution of mi is not flat, but skewed towards smaller values. When you inspect the histograms, you will see that is because you will frequently have more than one bin with the "max" value in it. In that case, you get the first bin with the max number. This is obviously going to skew your results badly; something to think about. A much better filter might be a median filter - the one that has equal numbers of neighboring pixels above and below. That has a unique solution (while mode can have up to four values, for nine pixels - namely, four bins with two values each).
Something to think about.
Can't show you a mex example today (wrong computer); but there are ample good examples on the Mathworks website (and all over the web) that are quite easy to follow. See for example http://www.shawnlankton.com/2008/03/getting-started-with-mex-a-short-tutorial/

Using ranges in Matlab/Octave matrices

Let's say I want to create an 100x100 matrix of which every row
contains the elements 1-100
A = [1:100; 1:100; 1:100... n]
Obviously forming a matrix is a bad idea, because it would force me to
create 100 rows of range 1:100.
I think I could do it by taking a 'ones' array and multiplying every
row by a vector... but I'm not sure how to do it
a = (ones(100,100))*([])
??
Any tips?
You can use the repeat matrix function (repmat()). You code would then look like this:
A = repmat( 1:100, 100, 1 );
This means that you're repeating the first argument of repmat 100 times vertically and once horizontally (i.e. you leave it as is horizontally).
You could multiply a column vector of 100 1s with a row vector of 1:100.
ones(3,1)*(1:3)
ans =
1 2 3
1 2 3
1 2 3
Or you could use repmat ([edit] as Phonon wrote a few seconds before me [/edit]).
Yes, repmat is the easy solution, and even arguably the right solution. But knowing how to visualize your aim and how to create something that yields that aim will give long term benefits in MATLAB. So try other solutions. For example...
cumsum(ones(100),2)
bsxfun(#plus,zeros(100,1),1:100)
ones(100,1)*(1:100)
cell2mat(repmat({1:100},100,1))
and the boring
repmat(1:100,100,1)