What is the meaning of the EchoNest API's getTimbre vector? - echonest

The EchoNest
Analyzer Documentation states the following regarding timbre:
timbre is the quality of a musical note or sound that distinguishes
different types of musical instruments, or voices. It is a complex
notion also referred to as sound color, texture, or tone quality, and
is derived from the shape of a segment’s spectro-temporal surface,
independently of pitch and loudness. The Echo Nest Analyzer’s timbre
feature is a vector that includes 12 unbounded values roughly centered
around 0. Those values are high level abstractions of the spectral
surface, ordered by degree of importance. For completeness however,
the first dimension represents the average loudness of the segment;
second emphasizes brightness; third is more closely correlated to the
flatness of a sound; fourth to sounds with a stronger attack; etc. See
an image below representing the 12 basis functions (i.e. template
segments). The actual timbre of the segment is best described as a
linear combination of these 12 basis functions weighted by the
coefficient values: timbre = c1 x b1 + c2 x b2 + ... + c12 x b12,
where c1 to c12 represent the 12 coefficients and b1 to b12 the 12
basis functions as displayed below. Timbre vectors are best used in
comparison with each other.
My understanding is that the b vector ({b1...b12}) is what is being returned by your API's getTimbre method. But then where are the {c1...c12} coefficients coming from? I don't understand how to acquire a scalar timbre from a vector timbre (primarily because your analysis API is closed source). Can you help me out with this?

Note that answers on this website come from volunteers. To get official support for the library, you need to contact the publisher directly.
b1 … b12 is not the result of the audio analysis, it is merely descriptive of what the analysis does. They are fixed constants as shown in the diagram:
The vector of scalars c1 … c12 is what the analyzer produces. Of course, the sound cannot be perfectly described by only 12 numbers. Multiplying the scalars by the functions won't reproduce the original music because there's not enough data there; it's only an approximation. Possibly, though, you'll get a similar "mood" from each segment, so it could be interesting to try and listen.

Related

Should I perform data centering before apply SVD?

I have to use SVD in Matlab to obtain a reduced version of my data.
I've read that the function svds(X,k) performs the SVD and returns the first k eigenvalues and eigenvectors. There is not mention in the documentation if the data have to be normalized.
With normalization I mean both substraction of the mean value and division by the standard deviation.
When I implemented PCA, I used to normalize in such way. But I know that it is not needed when using the matlab function pca() because it computes the covariance matrix by using cov() which implicitly performs the normalization.
So, the question is. I need the projection matrix useful to reduce my n-dim data to k-dim ones by SVD. Should I perform data normalization of the train data (and therefore, the same normalization to further projected new data) or not?
Thanks
Essentially, the answer is yes, you should typically perform normalization. The reason is that features can have very different scalings, and we typically do not want to take scaling into account when considering the uniqueness of features.
Suppose we have two features x and y, both with variance 1, but where x has a mean of 1 and y has a mean of 1000. Then the matrix of samples will look like
n = 500; % samples
x = 1 + randn(n,1);
y = 1000 + randn(n,1);
svd([x,y])
But the problem with this is that the scale of y (without normalizing) essentially washes out the small variations in x. Specifically, if we just examine the singular values of [x,y], we might be inclined to say that x is a linear factor of y (since one of the singular values is much smaller than the other). But actually, we know that that is not the case since x was generated independently.
In fact, you will often find that you only see the "real" data in a signal once we remove the mean. At the extremely end, you could image that we have some feature
z = 1e6 + sin(t)
Now if somebody just gave you those numbers, you might look at the sequence
z = 1000001.54, 1000001.2, 1000001.4,...
and just think, "that signal is boring, it basically is just 1e6 plus some round off terms...". But once we remove the mean, we see the signal for what it actually is... a very interesting and specific one indeed. So long story short, you should always remove the means and scale.
It really depends on what you want to do with your data. Centering and scaling can be helpful to obtain principial components that are representative of the shape of the variations in the data, irrespective of the scaling. I would say it is mostly needed if you want to further use the principal components itself, particularly, if you want to visualize them. It can also help during classification since your scores will then be normalized which may help your classifier. However, it depends on the application since in some applications the energy also carries useful information that one should not discard - there is no general answer!
Now you write that all you need is "the projection matrix useful to reduce my n-dim data to k-dim ones by SVD". In this case, no need to center or scale anything:
[U,~] = svd(TrainingData);
RecudedData = U(:,k)'*TestData;
will do the job. The svds may be worth considering when your TrainingData is huge (in both dimensions) so that svd is too slow (if it is huge in one dimension, just apply svd to the gram matrix).
It depends!!!
A common use in signal processing where it makes no sense to normalize is noise reduction via dimensionality reduction in correlated signals where all the fearures are contiminated with a random gaussian noise with the same variance. In that case if the magnitude of a certain feature is twice as large it's snr is also approximately twice as large so normalizing the features makes no sense since it would just make the parts with the worse snr larger and the parts with the good snr smaller. You also don't need to subtract the mean in that case (like in PCA), the mean (or dc) isn't different then any other frequency.

Perlin noise how to get a vector from a permutation-array?

I am currently trying to implement perlin noise on HLSL in 2D. I watched Ken's improved Perlin Noise, but i don't understand how the conversion from the permuation-array into Vectors works.
I know that i can get the hash-code like this
int g00 = p[floorX + p[floorY]],
g10 = p[floorX + 1 + p[floorY]],
g01 = p[floorX + p[floorY + 1]],
g11 = p[floorX + 1 + p[floorY + 1]];
(floorX and floorY are the floored x,y coordinates broken down to 8 bits.)
from here, but i still don't understand it. I also don't understand how the "grad(...)"- method from Ken's implementation works.
Can anyone explain me how that works?
Ken Perlin's actual java implementation is really pseudo-code for his GPU implementation, which means some parts (especially the one you are stuck on) are downright impenetrable (using some optimizing theory).
For readability,
try Stefan Gustavson's reference java implementation from Simplex Noise Demystified' (2005) in pdf, it is much more legible as it was designed partly as a teaching tool.
The long set of bit-operations and conditionals (using the trinary short hand operator) in his grad() function are there to choose a pseud-random Unit Vector. In particular, it should choose from a set that are equally spread around the unit circle (in 2D), or sphere (approximated by midpoints of the sides of a cube, in 3D), etc., and do so with an equal probability of each one.
If the lowest 4 bits are sufficiently scrambled by the hashing routine, those 16 choices can be mapped to the 12 best 3D vectors (of length sqrt2) quickly - that is what it's doing (and, imho, why Perlin noise is rarely ever implemented up to the standard he set forth in the accompanying paper).
In particular, he sums two axis unit vectors (x, y, and z)-pseudo-randomly chosen, with pseudo-randomly determined signs each.
Since you want 2D noise, may I suggest using a lookup table for the gradients themselves, but please make them all the same length (it looks a little better than the shortcut Perlin left in the appendix that's been copied EveryWhere, and the vectors are usually floats already). And you can actually choose 8 good basis gradients (so the bitwise scramble works without deformation!).

How to select top 100 features(a subset) which are most relevant after pca?

I performed PCA on a 63*2308 matrix and obtained a score and a co-efficient matrix. The score matrix is 63*2308 and the co-efficient matrix is 2308*2308 in dimensions.
How do i extract the column names for the top 100 features which are most important so that i can perform regression on them?
PCA should give you both a set of eigenvectors (your co-efficient matrix) and a vector of eigenvalues (1*2308) often referred to as lambda). You might been to use a different PCA function in matlab to get them.
The eigenvalues indicate how much of your data each eigenvector explains. A simple method for selecting features would be to select the 100 features with the highest eigen values. This gives you a set of feature which explain most of the variance in the data.
If you need to justify your approach for a write up you can actually calculate the amount of variance explained per eigenvector and cut of at, for example, 95% variance explained.
Bear in mind that selecting based solely on eigenvalue, might not correspond to the set of features most important to your regression, so if you don't get the performance you expect you might want to try a different feature selection method such as recursive feature selection. I would suggest using google scholar to find a couple of papers doing something similar and see what methods they use.
A quick matlab example of taking the top 100 principle components using PCA.
[eigenvectors, projected_data, eigenvalues] = princomp(X);
[foo, feature_idx] = sort(eigenvalues, 'descend');
selected_projected_data = projected(:, feature_idx(1:100));
Have you tried with
B = sort(your_matrix,2,'descend');
C = B(:,1:100);
Be careful!
With just 63 observations and 2308 variables, your PCA result will be meaningless because the data is underspecified. You should have at least (rule of thumb) dimensions*3 observations.
With 63 observations, you can at most define a 62 dimensional hyperspace!

Matlab non-linear, multi-parameter curve fitting issue

I am trying to implement a routine for fitting electrophoretic data from my experiments.
The aim is to derive kinetic parameters for the interaction of biomoecules from the relative areas of peaks in the electropherogram, based on the areas of the peaks in the dataset.
Since all relevant differential equations are known and since the set of equations has an analytical solution, as described here:
Analytical solution manuscript
I set about entering the relevant equations (6, 8, 13, ... from the referenced manuscript) in matlab.
The thus created function works and I can use it to simulate electropherograms of interacting species.
Obviuously, I now would like to use the function to fit experimental data and retrieve the parameters (8 in total, Va, Vc, MUa, MUc, k, A0, C0, baseline noise).
Some of these will obviously be correlated. Example values might be (to give an idea of their magnitude):
params0 = [ ...
8.44E-02; ... % Va
1.25E-01; ... % Vc
5.32E-05; ... % MUa
8.87E-05; ... % MUc
4.48E-03; ... % k
6.06E-01; ... % A0
3.00E-00; ... % C0
4.64E-03 ... % noise
];
My problem is, if I supply experimental data and try something like lsqcurvefit:
[x,resnorm,residual] = lsqcurvefit(#(param,xdata) Electropherogram2(param,xdata,column), params0, time, ydata,lb, ub);
I often get very poor results because I either run out of iterations, I hit some (obviously poorly fitting) local minimum or whatever...
Only if I tinker a lot with the starting values and the allowed intervals (i.e. because I know likely values through other experiments) do I end up with more or less decent fits, but even then, fits are not as good as reported in the original manuscript (fig. 3).
The authors of that manuscript used Excel solver and were kind enough to provide the original data used in Fig. 3 but still I cannot seem to end up with fits as good as theirs without nearly literally supplying the nearly correct starting values.
I am not experienced enough to know what I could tweak to make this process less trial-and-error.
Would something like the global optimization toolbox help me?
Any tips are welcome...
In the mentioned paper ("Analytical solution manuscript") it is implied that the free optimization parameters are five (Va, Vc, MUa, MUc, k) and not eight because the (Aeq/Ceq) ratio can be computed from their representative equations, eq. 8 for Aeq and (obviously) eq. 6 for Ceq.
In my opinion, what's even more troubling is the appearance of the following products in the model, comprised of the free optimization parameters:
k and Va in eq. 12
MUc and Va in the equation for epsilon_A in eq. 12
MUa and Vc in the equation for epsilon_A in eq. 12
In general, non-linear optimization algorithms have a legitimate trouble in optimizing the free parameters when pairs of the latter appear as products in the non-linear model.

Accelerometer signal segmentation

I have a 1D accelerometer signal (one axis only). I would like to create a robust algorithm, which would be able to recognize some shapes in the signal.
At first I apply a moving average filter to the raw signal. On the attached picture the raw signal is coloured red and the averaged signal is black. As seen from the picture, some trends are visible from the averaged (black) signal - the signal contains 10 repetitions of a peak like pattern, where acceleration climbs to a maximum and then drops back down. I have marked the beginnings and endings of those patterns with a cross.
So my goal is to find the marked positions automatically. The problem making the pattern extraction difficult are:
the start of the pattern could have a different y value than the end of the pattern
the pattern could have more than one peak
I do not have any concrete time information (from start to the end of the pattern it takes A time units)
I have tried different approaches, which are pretty much home-brew, so I won't mention them - I don't want you to be biased by my way of thinking. Are there some standard or by the books approaches for doing that kind of pattern extraction? Or maybe does anyone know how to tackle the problem in a robust way?
Any idea will be appreciated.
Keep it simple!
It appears the moving average is a good enough damper device; keep it as-is, maybe only increasing or decreasing its sample count if you notice that it either leaves too much noise or removes too much signal respectively. You then work off the this averaged signal exclusively.
The pattern markers you seek seems relatively easy to detect. Expressed in English, these markers are:
Targets = the points of inflection in the averaged readings curve, when the slope goes markedly negative to positive.
You should therefore be able to detect this situation by comparison of the slope values, calculated along with the moving average, as each new reading value comes available (of course with a short delay, as of course the slope at a given point can only be calculated when the averaged reading for the next [few] point[s] is available)
To avoid false detection, however, you'd need to define a few parameters aimed at filtering the undesirable patterns. These paremeters will define more precisely the meaning of "markedly" in the above target definition.
Tentatively the formula for detecting a point of interest could be as simple as this
(-1 * S(t-1) + St ) > Min_delta_Slope
where
S is the slope (more on this) at time t-1 and t, respectively
Min_delta_Slope is a parameter defining how "sharp" a change in slope we want at a minimum.
Assuming normalized t and Y units, we can set the Min_delta_Slope parameter close to or even past 1. Intuitively a value of 1 (again in normalized units) would indicate that we target points where the curved "arrived" with a downward slope of say 50% and left the point with a upward slope of 50% (or 40% + 60% or .. 10% i.e almost flat and 90% i.e. almost vertical).
To avoid detecting points in the case when this is merely a small dip in the curve, we can take more points into consideration, with a fancier formula such as say
(Pm2 * S(t-2) + Pm1 * S(t-1) + P0 * St + Pp1 S(t+1) ) > Min_delta_Slope
where
Pm2, Pm1, P0 and Pp1 are coefficients giving relative importance to the slope at various point before and after the point of interest. (Pm2 and Pm1 typically negative values unless we use only positive parameter and use negative signs in the formula)
St +/- n is the slope a various times
and Min_delta_Slope is a parameter defining how "sharp" a change in slope we want at a minimum.
Intuitively, this 4 points formula would take into account the shape of the curve at a point two readings prior and two reading past the point of interest (in addition to considering the point right before and after it). Given the proper values for the parameters, the formula would require that the curve be steadily coming "down" for two time slices, then steadily going up for the next two time slices, hence avoiding to mark smaller dips in the curve.
An alternative way to achieve this, may be to compute the slope by using the difference in Y value between the [averaged] reading from two (or more) time slices ago and that of the current [averaged] reading. These two approaches are similar but would produce slightly different result; generally we'd have more say on the desired shape of the curve with the Pm2, Pm1, P0 and P1 parameters.
You might want to look at watershed segmentation, which does a related kind of thing (dividing landscapes into their separate catchment basins). Oddly enough, I'm actually writing a PhD thesis which uses watershed a lot at the moment (seriously :))