Big numbers and long loops in matlab? - matlab

How can I store a matrix with 2^100 rows in MatLab! it is my search space and I really need to do it .
In your opinion, is it possible ? if yes, please help me that how can i do it?

2100 is about 1030, which is much too large for you to fit in memory - so you won't be able to store this matrix.
A couple of alternatives that you might want to think about -
Are many of the entries in the matrix zero? If so, you could consider using a sparse matrix which is much more memory efficient.
Do you need to be able to access the rows in an arbitrary order, or sequentially? If sequentially, you can generate the rows on an as-needed basis (perhaps in blocks of ten thousand at a time)
Do you need to look at all the rows at all? If not, perhaps you can define a function which generates the entries on the fly, as they are requested.

Related

plans and subset of rows in pyfftw

I want to use pyfft to repeatedly compute the discrete Fourier transform of a subset of rows for a two-dimensional array. I do not know in advance which rows I need to transform, that depends on the output from the previous round. I do know that doing it for all rows is wasteful.
It is my understanding that a 'plan' in FFTW3 is associated with the type of transform (c2c, r2c, etc) and the input/output length, which is always a vector in the 1D case. In pyfftw it looks like a 'plan' is associated to the type of transform and the input/output shape, so my interpretation is that it uses the same FFTW3 plan for every row.
My question is: is it possible to use the same FFTW3 plan for some of the rows, without creating separate pyfftw.FFTW objects for all possible combinations of rows?
On a different note, I am wondering how pyfftw uses multiple cores: does it use multiple cores for each row (this appears natural in view of FFTW3 documentation) or does it farm out different rows to different cores (which was my initial assumption)?
If you can create a numpy array from a view, you can plan for it with pyFFTW - all valid numpy arrays should work just fine.
This means several things:
Your array needs to have regular strides, but those strides can be arbitrary.
ND arrays are planned as ND transforms, with the selected axes being used.
You can probably do something cunning with stride tricks and it will probably work (but might not do what you expect if you do something too nefarious like overlapping rows and then use threads).
One solution that I've used quite a bit is to copy the rows that you want to transform into an interim array, and transforming that. You might well find that's the fastest option (particularly when you can allow for getting the byte offset correct).
Obviously, this doesn't work if you always have a different number of rows. You might still find that if you plan for the largest number of rows that are transformed and then copy in a subset you still do faster than otherwise.
The problem you're going to come up against, even if you go down to the C level, is that the planning overhead might well dominate if you're changing your transform sizes often.
You could also try pyfftw.interfaces.numpy_fft which is normally faster that numpy and has the ability to cache repeated transform sizes.

Compute similarity between columns in a fast manner MATLAB

Given a (rather) big matrix A :
A [m*n] (m = 7, n = 15000)
where the columns are items and the rows are attributes, I would like to compute similarity between each two items and store it in an array. The similarity matrix could be sim where each row contains [item1_id,item2_id, similarity]. This process needs to be done very fast.
I am considering options like #bsxfun which uses multi-threading for this purpose. Others ideas are also similarly appreciated (I am totally open to other efficient approaches). It would be appreciated if you could suggest some approaches and report the timing it takes for you to perform the operation. This process may be scaled up in future. Thank you very much.

Sparse table in MATLAB, is it possible?

\ am dealing with a matrix in MATLAB which is sparse and has many rows and columns. In this case, the row and columns of the matrix are the ids for particular items. Let's assume them as id1 and id2.
It would be nice if the ids for rows and columns could be embedded so I can have access to them easily to them without the need for creating extra variables that keep the two ids.
The answer would be probably to use a table data type. Tables are very ideal answer for my need however I was wondering if I could create a table data type for a sparse matrix?
A [m*n] sparse matrix %% m & n are huge
id1 [1*m] , id2 [1*n] %% two vectors containing numeric ids for rows and column
Could we obtain?
T [m*n] sparse table matrix
Thanks for sharing your view with me.
I will address the question and the comments in order to clear some confusion.
The short answer
There is no sparse table class in Matlab. Cannot do. Use sparse() matrices.
The long answer
There is a reason why sparse tables make little sense:
Philosophically speaking, the advantage of having nice row and column labels, is completely lost if you are working with a big panel of data and/or if the data is sparse.
Scrolling through 246829 rows and 33336 columns? Can only be useful at very isolated times if you are debugging your code and a specific outlier is causing you results to go off. Also, all you might see is just a sea of zeros.
Technically a table can have more columns for the same variable, i.e. table(rand(10,2), rand(10,1)) is a valid table. How would you consider define sparsity on such table?
Fine, suppose you are working with a matrix-like table, i.e. one element per table cell and same numeric class. Still, none of the algebraic operators are defined on a table(). So you need to extract the content first, in order to be able to perform any operation that spans more than a single column of data. Just to be clear, once the data is extracted, then you have e.g. your double (full) matrix or in an ideal case a double sparse matrix.
Now, a few misconceptions to clear:
Less variables implies clearer/cleaner code. Not true. You are probably thinking about the extreme case (in bad practices) of how do I make a series of variables a1, a2, a3, etc..
There is a sweet spot between verbosity and number of variables, amount of comments, and code clarity/maintainability. Only with time and experience you find the right balance.
Control over data cannot go without visual inspection. This approach does NOT scale with big data and the sooner you abandon it, the faster your code will become more reliable. You need to verify your results systematically, rather than relying on visual inspection. Failure to (visually) spot a problem in the data, grows exponentially with its dimension, faster than with systematic tests.
Some background info on my work:
I work with high-frequency prices, that's terabytes of data. I also extended the table() class with additional methods and fixes to help me with my work (see https://github.com/okomarov/tableutils), but I do not see how sparsity is a useful feature to add to table().

Efficient Access of elements in Matrix in Matlab

I have an m x n matrix of integers and where n is a fairly big number m and n ~1000. I want to iterate through all of these and perform a some operations, like accessing a particular cell and assigning a value of a particular cells.
However, at least in my implementation, this is rather inefficient as I have two for loops with Matrix(a,b) = Matrix(a,b+1) or something along these lines. Is there any other way to do this seeing as my current implementation takes a long time to traverse through about 100,000 cells and perform some operations.
Thank you
In matlab, it's almost always possible to avoid loops.
If you want to do Matrix(a,b)=Matrix(a,b+1), you should just do Matrix2=Matrix(:,2:end);
If you are more precise about what you do inside the loop, I can help you more.
Matlab uses column major ordering of matrixes in memory (unlike C). Are you sure you are iterating the indexes in the correct order? If not, try switching them and see if performance improves..
If you can't get rid of the for loops, one possibility would be to rewrite the expensive operations in C and create a MEX file as described here.

Where do I find the memory requirements of a MATLAB function?

I have a 3D array of values (0 or 1), which is very large (approx 2300x2300x11). I want to fit a surface to these values using for example interp3, but when I try MATLAB runs out of memory. Thus, I've decided to reduce the size of my array enough for MATLAB to accomodate it in memory.
Now, the smaller I make the reduced array, the worse my results will be (the surface fitting is part of a measurement process with high precision requirements), so I want to reduce the array as little as possible.
Is there any way to determine on beforehand how much memory a certain array size will demand and how much memory is available, and then use this information to resize the array enough to avoid out of memory exceptions, but not more?
I don't know the answer to this, but I wonder if you can have your cake and eat it, too.
If your data set is too big, why not do a piecewise fit? Do it in chunks rather than omitting data points.
Or be smarter about how you omit data points. You want them in areas of high curvature - where your data is changing fastest. Leave out points in areas far away from the action, where nothing interesting is happening. You might have to do a fit, look at the surface, add and remove more points and try again.
It might an iterative process, but I'll bet you'll be able to get a nice fit with a little luck and effort.
You can look at the maximum array sizes that are supported on different platforms. In general, if you have a PxQxR sized 3D array of doubles, then the size of your array in bytes is P*Q*R*8. For your matrix, the size is ~ 444 MB. You can also try reducing it to a single, using single(A). single uses 4 bytes per element and you can reduce the size of your array by a factor 2.
I haven't really poked into the inner workings of interp3, but the exact memory requirements will depend on the interpolation option you choose. So, you can first try to convert it to single and see if it works. If not, try with 80% (90%) of the number of rows and columns. This way you have a good chunk of the original array, but the memory requirement is only 64% (81%) of the original.
If that doesn't help, duffymo's suggestion is what you should be looking into.