Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I want to solve the linear equation Ax=b, with A is 40000x400000 matrix, x and b are 400000x1 array . When I use A\b in MATLAB, I have the error "out of memory". Because the size of matrix is too large, I can not use A\b in MATLAB. How can I solve my problem without using the command A\b in Matlab.
Thank you for your help
The problem is partially on Matlab and partially on your machine.
First, think very well if you actually need to solve this problem with a 40000x400000, maybe you can simplify it, maybe you can segment your problem, maybe the system is decoupled, just check very well.
For context, to store a 40000x400000 matrix of 8 byte floats, you'd need around 120gb. That's likely too much, there's a good chance you don't even have enough free disk space for it.
If this matrix has many zeros, at least much more than non-zeros, then you could exploit Matlab's sparse matrix functions. They work without storing the whole matrix, and operating only on non-zero numbers, basically.
If you are lazy but you have a very good machine (say 1TB of SSD), you might consider increasing the size of the paging file in linux (or its analogous on windows). That basically means that there is a space in the disk that you'll allow the computer to use as if it were RAM memory. While memory wouldn't crash, the operations you need to perform would take insanely long, so consider starting with a smaller matrix to gauge the execution time, which should grow with the cube of the vector length.
So case in point, try to review the problem you've got at your hands.
Related
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
can I use neural networks or svm or etc, if my output data is 27680 that all of them are zero and just one of them is one?
I mean that Is it right to do this?
when I use SVM I have this error:
Error using seqminopt>seqminoptImpl (line 198)
No convergence achieved within maximum number of iterations.
SVMs are usually binary classifiers. Basically that means that they seperate your datapoints into two groups, which signals whether a datapoint does or doesn't belong to a class. Common strategies for solving multi-class problems with SVMs are one-vs-rest and one-vs-one. In the case of one-vs-rest, you would train one classifier per class, which would be 27,680 for you. In the case of one-vs-one, you would train (K over 2) = (K(K-1))/2 classifiers, so in your case around 38 million. As you can see, both numbers are rather high, so I would be pessimistic about your probability of successfully solving your problem with SVMs.
Nevertheless you can try to increase the maximum amount iterations as described in another stackoverflow thread. Maybe it still works.
You can use Neural Nets for your task and a 1-of-K output is nothing unusual. However, even with only one hidden layer of 500 neurons (and using the input and output vector sizes mentioned in your comment) you will have (27680*2*500) + (500*27680) = 41,520,000 weights in your network. So I would expect rather long training times (although a Google employee would probably laugh about these numbers). You will also most likely need a lot of training examples, unless your input is really simple.
As an alternative you might look into Decision Trees/Random Forests, Naive Bayes or kNN.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I have a Matlab assignment which is very important for my final grade but I'm not sure how to begin. Basically it has 7 clauses that are related to each other but I will ask about the first 3 because I really just want to know how I can approach the assignment and then do the rest on my own.
Create a random vector that includes 5 random variables that are statistically dependant.
I looked this up and the only thing I could think of is either using the pdf command or the cdf command. Should I use either one of them? And how can I make sure they are statistically dependant? And since I am not told which distribution to use, is it possible to make Matlab decide that one for me?
Show the distribution of each of the random variables on a single plot.
I assume I should just use plot on this one? For this I just need to know how to do the first clause, I guess.
Calculate the covariance matrix of the vector. Check what would have happened if the vector had included 5 variables that are statistically independant.
How to I Calculate the covariance matrix? Is there a Matlab command for that?
I really hope you can help me, as I have no idea how to even begin.
Thank you!
(1) There are many ways to create random variables which are dependent. Perhaps the simplest to make a variable Y which is dependent on another X is to take a function of X and add noise. E.g. let X be a Gaussian variable with some mean and variance, then let Y = X^2 + e where e is a Gaussian variable with mean 0 and any variance. So for your 5 variables: let the first be Gaussian, then let the other 4 be functions of the first plus noise.
(2) Dunno for sure but I think you can use plot for that.
(3) cov computes covariance if I remember correctly.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
We know that matlab for loops are very expensive as far as the time is concerned. I am hearing from colleagues saying try to replace for loops with matrix operations. In other words try to replace time consumption with memory consumption.
In a previous post I have asked how it was possible to use CUDA to compare element by element two matrices. #Eric then suggested my to transform the two for loops I had with matrix operations and finally do these matrix operations on the gpu. It was very intuitive. Inspired by this answer I start thinking more seriously matlab code optimizations.
The reason I am doing this post is because I would like to ask if anyone can give some similar intuitive examples or explain efficient coding techniques for writting efficient matlab code? References to any tutorial or a book it would be great.
Thank you!!
Vectorize loops
In Matlab, you can gain speedup by using Vectorize loops. MATLAB is specifically designed to operate on vectors and matrices, so it is usually quicker to perform operations on vectors, or matrices, rather than using a loop. For example:
index=0;
for time=0:0.001:60;
index=index+1;
waveForm(index)=cos(time);
end;
would run considerably faster if replaced with:
time=0:0.001:60;
waveForm=cos(time);
Functions that you might find useful when using vector operations in place of loops include:
any() – returns true if any element is non-zero.
size() – returns a vector containing the number of elements in an array along each dimension.
find() – returns the indices of any non-zero elements. To get the non-zero values themselves you can use something like a(find(a));.
cumsum() – returns a vector containing the cumulative sum of the elements in its argument vector, e.g. cumsum([0:5]) returns [0 1 3 6 10 15].
sum() – returns the sum of all the elements in a vector, e.g. sum([0:5]) returns 15.
Besides replacing for-loops with matrix operations, you can maximizing code performance by optimizing memory access:
1. Preallocate arrays before accessing them within loops
When creating or repeatedly modifying arrays within loops, always allocate the arrays beforehand. Of all three techniques, this familiar one can give the biggest performance improvement.
Code segment 2 executes in 99.8% less time (580 times faster) than segment 1 on machine A, and in 99.7% less time (475 times faster) than segment 1 on machine B.
2. Store and access data in columns
When processing 2-D or N-D arrays, access your data in columns and store it so that it is easily accessible by columns.
Code segment 2 executes in 33% less time than segment 1 on machine A, and in 55% less time than segment 1 on machine B.
3. Avoid creating unnecessary variables
When creating new variables or variables that are functions of existing data sets, ensure that they are essential to your algorithm, particularly if your data set is large.
Code segment 2 executes in 40% less time than code segment 1 on machine A and in 96% less time on machine B.
Code segment 4 executes in 40% less time than code segment 3 on machine A and in 59% less time on machine B.
References:
Profiling, Optimization, and Acceleration of MATLAB code
Programming Patterns: Maximizing Code Performance by Optimizing Memory Access
Writing Fast MATLAB Code
Introduction to Computers for Engineers
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I have dataset consisting of about 300 objects with 84 features for each object. The objects are already separated into two classes. With PCA I´m able to reduce the dimensionality to about 24. I´m using 3 principle components covering about 96% of the variance of the original data. The problem I have is that PCA doesnt care about the ability to separate the classes from each other. Is there a way to combine PCA for reducing feature space and LDA for finding a discriminance function for those two classes ?
Or is there a way to use LDA for finding the features that separate two classes in threedimensional space in the best manner ?
I´m kind of irritated because I found this paper but I´m not really understanding. http://faculty.ist.psu.edu/jessieli/Publications/ecmlpkdd11_qgu.pdf
Thanks in advance.
You should have a look at this article on principle component regression (PCR, what you want if the variable to be explained is scalar) and partial least squares regression (PLSR) with MATLAB's statistics toolbox. In PCR essentially, you choose the principal components as they most explain the dependent variable. They may not be the ones with the largest variance.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I am working on large scale data like 300000 x 300000 matrix currently may interest. It is really hard to process in matlab due to "Out of memory" error so I decide to use EIGEN. Is there any restriciton for eigen in the matrix size?
The dense matrices in EIGEN are stored in a contiguous block of memory, which cannot exceed 2 GB in 32-bit application, so if you're running a 32-bit application, the allocation will start to crash for matrices half this size, that is to say around 10,000x10,000. See for example this duplicated question.
The same issue will happen with other libraries, since you're bounded by your RAM, not by the library.
Yet, if your big matrix has a vast majority of 0 in it, you can resort to SparseMatrix.
If your matrix isn't sparse, then you may store it on disk, but you'll get terrible performance when manipulating it.