I have several thousands samples with equal number of features (5000, they are time dependent) and I would like to predict of vectors with variable length.
I'm beginner in RNN, and I'd like to know if there are the approaches except zero padding, when we have vector with variable length.
Related
I have a problem with classification (LDA classifier ).
I have 80 samples of training data (80x100) and 15 samples of testing data (15x100). classify function returns: The covariance matrix of each group in TRAINING must be positive definite.
Without knowing how your data looks like, all I can do is to suggest you a few solutions that may solve your problem. A non positive definite convariance matrix can be produced by many different factors:
linear dependence between two or more columns (you can get rid of as many columns that produce linear dependence as possible)
non-stationary data (in this case, you can use differences instead of levels because they grant stationarity)
columns with highly mismatching magnitude, for example a column with very big values and another one with very small values (rescale your columns so that all of them have approximately the same magnitude).
I'm working with the German pre-trained word vectors from https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md
I encountered the following problems:
To extract the vectors for my words, I started by simply searching the wiki.de.vec text-file for the respective words. However, vectors in the wiki.de.vec text-file differ from those that the print-word-vectors function outputs (e.g. the vector for 'affe' meaning 'monkey' is different in the wiki.de.vec file than the output for 'affe' from print-word-vectors). What is the reason for this? I assume this occurs because the vector for a word is computed by adding the sum of its character n-gram vectors in the model by Bojanowski et al., but what does the vector for 'affe' in the wiki.de.vec text-file reflect then? Is it the vector for the n-gram 'affe' that also occurs in other words like 'karaffe'. So, should one always use the print-word-vectors function (i.e. add the character n-gram vectors) when working with these vectors and not simply extract vectors from the text-file?
Some real German words (e.g. knatschen, resonieren) receive a null vector (even with the print-word-vectors function). How can this be if the major advantage of this subword approach is to compute vectors for out-of-vocabulary words?
The nearest-neighbors function (./fasttext nn) outputs the nearest-neighbors of a word with the cosine distance. However, this value differs from the value I obtain by getting the vectors of the individual words with print-word-vectors and computing their cosine distance manually in Matlab using pdist2(wordVector1, wordVector2, 'cosine'). Why is this the case? Is this the wrong way to obtain the cosine distance between two word vectors?
Thanks in advance for your help and suggestions!
I'll need to repeat this process multiple times, and the number of values will vary from ~10 to ~1000. I don't have access to all the vectors at once - they'll become accessible to me two vectors at a time.
In each instance there will always be the same number of values in each of the pair of vectors. However, from instance to instance the number of values will vary.
For column vectors a and b I might try,
a.'*b/(norm(a)*norm(b))
Ideally you would combine all or a subset of your vectors into arrays and do the operations at once, taking advantage of matlab multi threading. Different length vectors is a challenge though...
Do you have access to all the vectors at once?
I have a set of samples, S, and I want to find its PDF. The problem is when I use ksdensity I get values greater than one!
[f,xi] = ksdensity(S)
In array f, most of the values are greater than one! Would you please tell me what the problem can be? Thanks for your help.
For example:
S=normrnd(0.3035, 0.0314,1,1000);
ksdensity(S)
ksdensity, as the name says, estimates a probability density function over a continuous variable. Probability densities can be larger than 1, they can actually have arbitrary values from zero upwards. The constraint on probabilities is that their sum over an exhaustive range of possibilities has to be 1. For probability densities, the constraint is that the integral over the whole range of values is 1.
A crude approximation of an integral of the pdf estimated by ksdensity can be obtained in Matlab like this:
sum(f) * min(diff(xi))
assuming that the values in xi are equally spaced. The value of this expression should be approximately 1.
If in your application you believe this approximation is not close enough to 1, you might want to specify the grid of estimation points (second parameter pts) such that the spacing is finer or the range is wider than the one automatically generated by ksdensity.
I saw a file in matlab with used max() on a matrix whose entries are complex numbers. I can't understand how does matlab compare two complex numbers?
ls1=max(tfsp');
Here , tfsp contains complex numbers.
The complex numbers are compared first by magnitude, then by phase angle (if there is a tie for the maximum magnitude.)
From help max:
When X is complex, the maximum is computed using the magnitude
MAX(ABS(X)). In the case of equal magnitude elements, then the phase
angle MAX(ANGLE(X)) is used.
NaN's are ignored when computing the maximum. When all elements in X
are NaN's, then the first one is returned as the maximum.