Recursive least squares filter for vector input - filtering

All the online tutorials, articles and book chapters that I could find on Recursive Least Squares (RLS) filter assume that the input signal is a scalar (one dimensional). Vectors and matrices do appear in the formalism but they are due to the FIR filter's order i.e. input is transformed from a scalar to the (p+1)-vector of historical/delayed samples (where p is the filter order).
Any pointers on how to extend the formalism to vector inputs would be greatly appreciated.

Related

Nonlinear curve fitting of a matrix function in python

I have the following problem. I have a N x N real matrix called Z(x; t), where x and t might be vectors in general. I have N_s observations (x_k, Z_k), k=1,..., N_s and I'd like to find the vector of parameters t that better approximates the data in the least square sense, which means I want t that minimizes
S(t) = \sum_{k=1}^{N_s} \sum_{i=1}^{N} \sum_{j=1}^N (Z_{k, i j} - Z(x_k; t))^2
This is in general a non-linear fitting of a matrix function. I'm only finding examples in which one has to fit scalar functions which are not immediately generalizable to a matrix function (nor a vector function). I tried using the scipy.optimize.leastsq function, the package symfit and lmfit, but still I don't manage to find a solution. Eventually, I'm ending up writing my own code...any help is appreciated!
You can do curve-fitting with multi-dimensional data. As far as I am aware, none of the low-level algorithms explicitly support multidimensional data, but they do minimize a one-dimensional array in the least-squares sense. And the fitting methods do not really care about the "independent variable(s)" x except in that they help you calculate the array to be minimized - perhaps to calculate a model function to match to y data.
That is to say: if you can write a function that would take the parameter values and calculate the matrix to be minimized, just flatten that 2-d (on n-d) array to one dimension. The fit will not mind.

Kalman Filter prediction error estimation: why two constants and transposed matrices?

Hy everybody!
I have found a very informative and good tutorial for understanding Kalman Filter. In the end, I would like to understand the Extended Kalman Filter in the second half of the tutorial, but first I want to solve any mystery.
Kalman Filter tutorial Part 6.
I think we use constant for prediction error, because the new value in a certain k time moment can be different, than the previous. But why we use two constants? It says:
we multiply twice by a because the prediction error pk is itself a squared error; hence, it is scaled by the square of the coefficient associated with the state value xk.
I can't see the meaning of this sentence.
And later in the EKF he creates a matrix and a transposed matrix from that (in Part 12). Why the transposed one?
Thanks a lot.
The Kalman filter maintains error estimates as variances, which are squared standard deviations. When you multiply a Gaussian random variable N(x,p) by a constant a, you increase its standard deviation by a factor of a, which means its variance increases as a^2. He's writing this as a*p*a to maintain a parallel structure when he converts from a scalar state to a matrix state. If you have an error coviarance matrix P representing state x, then the error covariance of Ax is APA^T as he shows in part 12. It's a convenient shorthand for doing that calculation. You can expand the matrix multiplication by hand to see that the coefficients all go in the right place.
If any of this is fuzzy to you, I strongly recommend you read a tutorial on Gaussian random variables. Between x and P in a Kalman filter, your success depends a lot more on you understanding P than x, even though most people get started by being interested in improving x.

Kullback Leibler Divergence of 2 Histograms in MatLab

I would like a function to calculate the KL distance between two histograms in MatLab. I tried this code:
http://www.mathworks.com/matlabcentral/fileexchange/13089-kldiv
However, it says that I should have two distributions P and Q of sizes n x nbins. However, I am having trouble understanding how the author of the package wants me to arrange the histograms. I thought that providing the discretized values of the random variable together with the number of bins would suffice (I would assume the algorithm would use an arbitrary support to evaluate the expectations).
Any help is appreciated.
Thanks.
The function you link to requires that the two histograms passed be aligned and thus have the same length NBIN x N (not N X NBIN), that is, if N>1 then the number of rows in the inputs should be equal to the number of bins in the histograms. If you are just going to compare two histograms (that is if N=1) it doesn't really matter, you can pass either row or column vector versions of these as long as you are consistent and the order of bins matches.
A generic call to the function looks like this:
dists = kldiv(bins,P,Q)
The implementation allows comparison of multiple histograms to each other (that is, N>1), in which case pairs of columns (with matching column index) in each array are compared and the result is a row vector with distances for each matching pair.
Array bins should be the same size as P and Q and is used to perform a very minimal check that the inputs are of the same size, but is not used in the computation. The routine expects bins to contain the numeric labels of your bins so that it can check for repeated bin labels and warn you if repeats occur, but otherwise doesn't use the information.
You could do away with bins and compute the distance with
KL = sum(P .* (log2(P)-log2(Q)));
without using the Matlab Central versions. However the version you link to performs the abovementioned minimal checks and in addition allows computation of two alternative distances (consult the documentation).
The version linked to by eigenchris checks that no histogram bins are empty (which would make the computation blow up numerically) and if there are, removes their contribution to the sum (not sure this is entirely appropriate - consult an expert on the subject). It should probably also be aware of the exact form of the formula, specifically note the use of log2 above versus natural logarithm in the version linked to by eigenchris.

What is the difference between matrix and array?

What is the more generalized term?
Why is MATLAB named matrix laboratory, then?
A matrix is a practical way to represent a linear transformation from a space of dimension n to a space of dimension m in the form of a nxm array of scalar values.
It is also very practical to perform linear algebra operation in a very systematic way that can be implemented on a computer. For instance if matrix A represents the linear transformation f and matrix B the linear transformation g, then the composition f o g writes as A*B where * denotes matrix multiplication. Matlab has also a lot of routines related to matrix operations (i.e. linear algebra operations) like det, pinv, svd etc...
As you can still see nowadays in Matlab, operators like *, / are strongly tied to matrix operations and thus strongly tied to linear algebra operations, which I think was the original goal of matlab in its early elaboration, hence its name (surely quite speculative but guess not so far from reality).
To perform element-wise operations on n-dimensional data sets, you have to write .*, or ./. denoting you are now performing array operations.
I would not say array operations encompass matrix operations, they are different. The later ones relate to linear algebra, while the other ones just relate to a practical way to operate on large sets of data. These data are not limited to be numbers, they are just n-dimensional data sets of whatever (string, numbers, cells, etc...).
Matlab also has a very synthetic syntax to perform array operations on sub-blocks (i.e. linear/logical subscripts) that makes it very easy to reorganize data sets in just one line of code before applying subsequent matrix or array operations.
If you're asking about MATLAB, the word "matrix" typically refers to a 2d array, whereas an "array" can be n-dimensional.
Early versions of MATLAB supported only 2d matrices, not n-dimensional arrays. I believe support for n-dimensional arrays was introduced in version 5 of MATLAB.
I would say that MATLABs matrix is a more advanced kind of array if you compare to the c-style arrays, eg double array[], or the Java array, eg double arry2[]. I would also say that the matlab matrix is better for mathematical purposed than the c++ vector or Java ArrayList. However, if you mean the matlab array I would say that it is more complicated. I would then recommend the link about matlab data which describes the mxArray type, used to store most of the data in matlab. The question is hard to answer completely without better description of what you mean with array, but I would say that regarding the type there is no difference between an array like a = [1,2,3,4] and matrix like b = [1,2,3,4;5,6,7,8]. There can also be matrices of higher dimensions as c = ones(3,4,3). These are in general called matrices as well in MATLAB, or if you need to be more specific N dimensional matrices.

How to select top 100 features(a subset) which are most relevant after pca?

I performed PCA on a 63*2308 matrix and obtained a score and a co-efficient matrix. The score matrix is 63*2308 and the co-efficient matrix is 2308*2308 in dimensions.
How do i extract the column names for the top 100 features which are most important so that i can perform regression on them?
PCA should give you both a set of eigenvectors (your co-efficient matrix) and a vector of eigenvalues (1*2308) often referred to as lambda). You might been to use a different PCA function in matlab to get them.
The eigenvalues indicate how much of your data each eigenvector explains. A simple method for selecting features would be to select the 100 features with the highest eigen values. This gives you a set of feature which explain most of the variance in the data.
If you need to justify your approach for a write up you can actually calculate the amount of variance explained per eigenvector and cut of at, for example, 95% variance explained.
Bear in mind that selecting based solely on eigenvalue, might not correspond to the set of features most important to your regression, so if you don't get the performance you expect you might want to try a different feature selection method such as recursive feature selection. I would suggest using google scholar to find a couple of papers doing something similar and see what methods they use.
A quick matlab example of taking the top 100 principle components using PCA.
[eigenvectors, projected_data, eigenvalues] = princomp(X);
[foo, feature_idx] = sort(eigenvalues, 'descend');
selected_projected_data = projected(:, feature_idx(1:100));
Have you tried with
B = sort(your_matrix,2,'descend');
C = B(:,1:100);
Be careful!
With just 63 observations and 2308 variables, your PCA result will be meaningless because the data is underspecified. You should have at least (rule of thumb) dimensions*3 observations.
With 63 observations, you can at most define a 62 dimensional hyperspace!