I am programming a machine learning algorithm in Scala. For that one I don't think I will need matrices, but I will need vectors. However, the vectors won't even need a dot product, but just element-wise operations. I see two options:
use a linear algebra library like Breeze
implement my vectors as Scala collections like List or Vector and work on them in a functional manner
The advantage of using a linear algebra library might mean less programming for me... will it though, considering learning? I already started learning and using it and it seems not that straight forward (documentation is so-so). (Another) disadvantage is having a extra dependency. I don't have much experience in writing my own projects (so far I programmed in the job, where libraries usage was dictated).
So, is a linear algebra library - e.g. Breeze - worth the learning and dependency compared to programming needed functionality myself, in my particular case?
Related
I try to apply One Class SVM but my dataset contains too many features and I believe feature selection would improve my metrics. Are there any methods for feature selection that do not need the label of the class?
If yes and you are aware of an existing implementation please let me know
You'd probably get better answers asking this on Cross Validated instead of Stack Exchange, although since you ask for implementations I will answer your question.
Unsupervised methods exist that allow you to eliminate features without looking at the target variable. This is called unsupervised data (dimensionality) reduction. They work by looking for features that convey similar information and then either eliminate some of those features or reduce them to fewer features whilst retaining as much information as possible.
Some examples of data reduction techniques include PCA, redundancy analysis, variable clustering, and random projections, amongst others.
You don't mention which program you're working in but I am going to presume it's Python. sklearn has implementations for PCA and SparseRandomProjection. I know there is a module designed for variable clustering in Python but I have not used it and don't know how convenient it is. I don't know if there's an unsupervised implementation of redundancy analysis in Python but you could consider making your own. Depending on what you decide to do it might not be too tricky (especially if you just do correlation based).
In case you're working in R, finding versions of data reduction using PCA will be no problem. For variable clustering and redundancy analysis, great packages like Hmisc and ClustOfVar exist.
You can also read about other unsupervised data reduction techniques; you might find other methods more suitable.
I'm considering to learn Scala for my algorithm development, but first need to know if the language has implemented (or is implementing) complex inverse and pseudo-inverse functions. I looked at the documentation (here, here), and although it states these functions are for real matrices, in the code, I don't see why it wouldn't accept complex matrices.
There's also the following comment left in the code:
pinv for anything that can be transposed, multiplied with that transposed, and then solved
Is this just my wishful thinking, or will it not accept complex matrices?
Breeze implementer here:
I haven't implemented inv etc. for complex numbers yet, because I haven't figured out a good way to store complex numbers unboxed in a way that is compatible with blas and lapack and doesn't break the current API. You can set the call up yourself using netlib java following a similar recipe to the code you linked.
There have been a few questions inquiring about general math/stats frameworks for Scala.
I am interested in only one specific problem, that of solving a large sparse linear system. Essentially I am looking for an equivalent of scipy.sparse.linalg.spsolve.
Currently I am looking into breeze-math of ScalaNLP Breeze, which looks like it would do the job, except that the focus of this library collection is natural language processing, so it feels a bit strange to use that.
Saddle also looks promising, but not very mature yet, and looking at its dependencies, EJML doesn't seem to have sparse functionality, while Apache commons math did, but it was flaky.
Has anyone got a reasonably simple and efficient solution that is currently available?
Although ScalaNLP Breeze says it's for NLP, it's linear algebra library is fairly general and not specialized to NLP. With that said, you could easily do something like this:
val A = new CSCMatrix[Int]()
val B = new CSCMatrix[Int]()
val x = A \ B
I'm quite surprised not to find one in standard library. Is there some reason it is missing or I just need to use a specific toolbox?
Implementing it myself would be very problematic because of the complexity of algorithm involved.
I'm doing some numerical estimation and correction with the Kalman filter, and would like to better estimate my parameters of Q and R, preferably dynamically.
http://en.wikipedia.org/wiki/Kalman_filter#Estimation_of_the_noise_covariances_Qk_and_Rk
That article mentions that GNU Octave is currently the best way of determining these parameters from data:
http://en.wikipedia.org/wiki/GNU_Octave#C.2B.2B_integration
Unfortunately it is written for Matlab, and there's supposedly a C++ implementation. I'm very weak in C++ and would not even know how to import a C++ library and link it properly in XCode. All of my C++ libraries to date have been wrapped in 3rd party Objective-C classes.
Has anyone used the C++ implementation for scientific computing or engineering applications on iPhone? I'd appreciate any pointers or tutorials on how to do this kind of analysis with Objective-C.
Additional keywords:
estimating covariance from data
Autocovariance Least-Squares (ALS) technique
noise covariance
Thank you!
I do not know of any such C++ library, if you fancy doing numerical analysis on iOS, the best way to go is the accelerate framework, specifically (from this description):
Linear Algebra: LAPACK and BLAS
The Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package
(LAPACK) libraries contain—as you would expect—functions to perform
linear algebra computations such as solving simultaneous linear
equations, least squares solutions of linear equations, and eigenvalue
problems. The BLAS library serves as a building block for the LAPACK
library. The BLAS and LAPACK libraries are widely distributed and
industry standard computational libraries. They are available on a
number of different platforms and architectures. So, if you are
already using these libraries you should feel right at home, as the
APIs are exactly the same on Mac OS X.
You'll need a fairly good grounding in C, pointers, arrays and such though, no way around it I feel. There is a detailed description of how to use these linear algebra primitives to implement kalman filtering (although this is using R, so probably not of mush use to you).
This is a SO post on Kalman Filtering which expressed my opinion quite well. I'm afraid I think the chances of finding a magic Objective-C wrapper for Kalman Filtering are fairly low, though I would be very happy to be proven wrong!