How to model scalar values with a neural network if besides direction the magnitude matters too - neural-network

Say you want to predict temperature changes based on some input data. Temperature changes are positive or negative scalars with a mean of zero. If only the direction matters one could just use tanh as an activation function in the output layer. But say for delta-temperatures predicting the magnitude of the change is also important, not just the sign.
How would you model this output. Tanh doesn't seem to be a good choice because it gives values between -1 and 1. And say temperature changes have a gaussian, or some other weird distribution, so hovering around the center quasi-linear domain of tanh(+-0) would be difficult to learn for a neural network. I'm worried that the sign would be good but the magnitude output would be useless.
How about having the network output one-hot vectors of length N, treat the argmax of this output vector as a temperature change on a pre-defined window. Say the window is -30 - +30 degrees, using N=60 long one-hot vector, if argmax( output )=45 that means the prediction is about 15 degrees.
I was actually not sure how to search for this topic.

Related

Are MATLAB's lsim outputs derivatives or the state vector?

I'm trying to do a simulation of a 2-body mass-spring-damper. I've set up a state-space model that I'm pretty confident in and set an input of a displacement and velocity at the base in just one degree of freedom. Upon getting my outputs, I expected that the output vector would just be the state vector at each time step. However, when plotting the output vector corresponding to displacement for each mass in the vertical direction (the input direction), it looked much more like a velocity (0 at the extrema of the input). The plots are shown below:
When I integrated the top 2 plots, I got the following:
Now, I obviously can just accept the outputs as they are and assume I am right in my understanding. But, I want to be sure. From the documentation page:
lsim(___) also returns the time vector t used for simulation and the
state trajectories x (for state-space models only)
I'm just hoping to find out whether or not I am correct in that the output matrix columns correspond to the history of the state derivatives before I go base an analysis on a bad assumption.
I figured it out. My B-matrix expected [derivative, state,...] but I had them in the opposite order.

How to take the difference between the resulting and the correct bucket of a one hot vector into account?

Hi I am using tensorflow at my university to try to classify steering angles of a simulation program using only the images the simulation produces.
The Steering angles are values from -1 to 1 and I separated them into 50 "buckets". So the first value of my prediction vector would mean that the predicted steering angle is between -1 and -0.96.
The following shows the classification and optimization functions I am using.
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(prediction, y))
optimizer = tf.train.AdamOptimizer(0.001).minimize(cost)
y is a vector that with 49 zeros and a single 1 for the correct bucket. My question now is.
How do I take into account if e.g. the correct bucket is at index 25, that the a prediction of 26 is much better than a prediction of 48.
I didn't post the actual network since it is just a couple of conv2d and maxpool layers with a fully connected layer at the end.
Since you are applying Cross entropy or negative log likelihood. you are penalizing the system given the predicted output and the ground truth.
So saying that your system predicted different numbers on your 50 classes output and the highest one was the class number 25 but your ground truth is class 26. So your system will take the value predicted on 26 and adapt the parameters to produce the highest number on this output the next time it sees this input.
You could do two basic things:
Change your y and prediction to be scalars in the range -1..1; make the loss function be (y-prediction)**2 or something. A very different model, but perhaps more reasonable that the one-hot.
Keep the one-hot target and loss, but have y = target*w, where w is a constant matrix, mostly zeros, 1s on the diagonal, and smaller values on the next diagonal, elements (e.g. y(i) = target(i) * 1. + target(i-1) * .5 + target(i+1) * .5 + ...); kind of gross, but it should converge to something reasonable.

Time Series from spectrum

I am having a samll problem while converting a spectrum to a time series. I have read many article sand I htink I am applying the right procedure but I do not get the right results. Could you help to find the error?
I have a time series like:
When I compute the spectrum I do:
%number of points
nPoints=length(timeSeries);
%time interval
dt=time(2)-time(1);
%Fast Fourier transform
p=abs(fft(timeSeries))./(nPoints/2);
%power of positive frequencies
spectrum=p(1:(nPoints/2)).^2;
%frequency
dfFFT=1/tDur;
frequency=(1:nPoints)*dfFFT;
frequency=frequency(1:(nPoints)/2);
%plot spectrum
semilogy(frequency,spectrum); grid on;
xlabel('Frequency [Hz]');
ylabel('Power Spectrum [N*m]^2/[Hz]');
title('SPD load signal');
And I obtain:
I think the spectrum is well computed. However now I need to go back and obtain a time series from this spectrum and I do:
df=frequency(2)-frequency(1);
ap = sqrt(2.*spectrum*df)';
%random number form -pi to pi
epsilon=-pi + 2*pi*rand(1,length(ap));
%transform to time series
randomSeries=length(time).*real(ifft(pad(ap.*exp(epsilon.*i.*2.*pi),length(time))));
%Add the mean value
randomSeries=randomSeries+mean(timeSeries);
However, the plot looks like:
Where it is one order of magnitude lower than the original serie.
Any recommendation?
There are (at least) two things going on here. The first is that you are throwing away information, and then substituting random numbers for that information.
The FFT of a real sequence is a sequence of complex numbers consisting of a real and imaginary part. Converting those numbers to polar form gives you magnitude and phase angle. You are capturing the magnitude part with p=aps(fft(...)), but you are not capturing the phase angle (which would involve atan2(...)). You are then making up random numbers (epsilon=...) and using those to replace the original numbers when you reconstruct your time-series. Also, as the FFT of a real sequence has a particular symmetry, substituting random numbers for the phase angle destroys that symmetry, which means that the IFFT will in general no longer be a real sequence, but a sequence of complex numbers - and again, you're only looking at the real portion of the IFFT, so you're throwing away information again. If this is an audio signal, the results may sound somewhat like the original (or they may be completely different), but the waveform definitely won't match...
The second issue is that in many implementations, ifft(fft(...)) will scale the result by the number of points in the signal. There are several different ways to avoid that, with differing results, but sometimes more attractive in different scenarios, depending on what you are trying to do. You can either scale the fft() result before you do the ifft(), or scale the ifft() result at the end, or in some cases, I've even seen both being scaled by a factor of sqrt(N) - doing it twice has the end result of scaling the final result by N, but it is a bit less efficient since you do the scaling twice...

How to measure power spectral density in matlab?

I am trying to measure the PSD of a stochastic process in matlab, but I am not sure how to do it. I have posted the exact same question here, but I thought I might have more luck here.
The stochastic process describes wind speed, and is represented by a vector of real numbers. Each entry corresponds to the wind speed in a point in space, measured in m/s. The points are 0.0005 m apart. How do I measure and plot the PSD? Let's call the vector V. My first idea was to use
[p, w] = pwelch(V);
loglog(w,p);
But is this correct? The thing is, that I'm given an analytical expression, which the PSD should (in theory) match. By plotting it with these two lines of code, it looks all wrong. Specifically it looks as though it could need a translation and a scaling. Other than that, the shapes of the two are similar.
UPDATE:
The image above actually doesn't depict the PSD obtained by using pwelch on a single vector, but rather the mean of the PSD of 200 vectors, since these vectors stems from numerical simulations. As suggested, I have tried scaling by 2*pi/0.0005. I saw that you can actually give this information to pwelch. So I tried using the code
[p, w] = pwelch(V,[],[],[],2*pi/0.0005);
loglog(w,p);
instead. As seen below, it looks much nicer. It is, however, still not perfect. Is that just something I should expect? Taking the squareroot is not the answer, by the way. But thanks for the suggestion. For one thing, it should follow Kolmogorov's -5/3 law, which it does now (it follows the green line, which has slope -5/3). The function I'm trying to match it with is the Shkarofsky spectral density function, which is the one-dimensional Fourier transform of the Shkarofsky correlation function. Is it not possible to mark up math, here on the site?
UPDATE 2:
I have tried using [p, w] = pwelch(V,[],[],[],1/0.0005); as I was suggested. But as you can seem it still doesn't quite match up. It's hard for me to explain exactly what I'm looking for. But what I would like (or, what I expected) is that the dip, of the computed and the analytical PSD happens at the same time, and falls off with the same speed. The data comes from simulations of turbulence. The analytical expression has been fitted to actual measurements of turbulence, wherein this dip is present as well. I'm no expert at all, but as far as I know the dip happens at the small length scales, since the energy is dissipated, due to viscosity of the air.
What about using the standard equation for obtaining a PSD? I'd would do this way:
Sxx(f) = (fft(x(t)).*conj(fft(x(t))))*(dt^2);
or
Sxx = fftshift(abs(fft(x(t))))*(dt^2);
Then, if you really need, you may think of applying a windowing criterium, such as
Hanning
Hamming
Welch
which will only somehow filter your PSD.
Presumably you need to rescale the frequency (wavenumber) to units of 1/m.
The frequency units from pwelch should be rescaled, since as the documentation explains:
W is the vector of normalized frequencies at which the PSD is
estimated. W has units of rad/sample.
Off the cuff my guess is that the scaling factor is
scale = 1/0.0005/(2*pi);
or 318.3 (m^-1).
As for the intensity, it looks like taking a square root might help. Perhaps your equation reports an intensity, not PSD?
Edit
As you point out, since the analytical and computed PSD have nearly identical slopes they appear to obey similar power laws up to 800 m^-1. I am not sure to what degree you require exponents or offsets to match to be satisfied with a specific model, and I am not familiar with this particular theory.
As for the apparent inconsistency at high wavenumbers, I would point out that you are entering the domain of very small numbers and therefore (1) floating point issues and (2) noise are probably lurking. The very nice looking dip in the computed PSD on the other hand appears very real but I have no explanation for it (maybe your noise is not white?).
You may want to look at this submission at matlab central as it may be useful.
Edit #2
After inspecting documentation of pwelch, it appears that you should pass 1/0.0005 (the sampling rate) and not 2*pi/0.0005. This should not affect the slope but will affect the intercept.
The dip in PSD in your simulation results looks similar to aliasing artifacts
that I have seen in my data when the original data were interpolated with a
low-order method. To make this clearer - say my original data was spaced at
0.002m, but in the course of cleaning up missing data, trying to save space, whatever,
I linearly interpolated those data onto a 0.005m spacing. The frequency response
of linear interpolation is not well-behaved, and will introduce peaks and valleys
at the high wavenumber end of your spectrum.
There are different conventions for spectral estimates - whether the wavenumber
units are 1/m, or radians/m. Single-sided spectra or double-sided spectra.
help pwelch
shows that the default settings return a one-sided spectrum, i.e. the bin for some
frequency ω will include the power density for both +ω and -ω.
You should double check that the idealized spectrum to which you are comparing
is also a one-sided spectrum. Otherwise, you'll need to half the values of your
one-sided spectrum to get values representative of the +ω side of a
two-sided spectrum.
I agree with Try Hard that it is the cyclic frequency (generally Hz, or in this case 1/m)
which should be specified to pwelch. That said, the returned frequency vector
from pwelch is also in those units. Analytical
spectral formulae are usually written in terms of angular frequency, so you'll
want to be sure that you evaluate it in terms of radians/m, but scale back to 1/m
for plotting.

Help me understand FFT function (Matlab)

1) Besides the negative frequencies, which is the minimum frequency provided by the FFT function? Is it zero?
2) If it is zero how do we plot zero on a logarithmic scale?
3) The result is always symmetrical? Or it just appears to be symmetrical?
4) If I use abs(fft(y)) to compare 2 signals, may I lose some accuracy?
1) Besides the negative frequencies, which is the minimum frequency provided by the FFT function? Is it zero?
fft(y) returns a vector with the 0-th to (N-1)-th samples of the DFT of y, where y(t) should be thought of as defined on 0 ... N-1 (hence, the 'periodic repetition' of y(t) can be thought of as a periodic signal defined over Z).
The first sample of fft(y) corresponds to the frequency 0.
The Fourier transform of real, discrete-time, periodic signals has also discrete domain, and it is periodic and Hermitian (see below). Hence, the transform for negative frequencies is the conjugate of the corresponding samples for positive frequencies.
For example, if you interpret (the periodic repetition of) y as a periodic real signal defined over Z (sampling period == 1), then the domain of fft(y) should be interpreted as N equispaced points 0, 2π/N ... 2π(N-1)/N. The samples of the transform at the negative frequencies -π ... -π/N are the conjugates of the samples at frequencies π ... π/N, and are equal to the samples at frequencies
π ... 2π(N-1)/N.
2) If it is zero how do we plot zero on a logarithmic scale?
If you want to draw some sort of Bode plot you may plot the transform only for positive frequencies, ignoring the samples corresponding to the lowest frequencies (in particular 0).
3) The result is always symmetrical? Or it just appears to be symmetrical?
It has Hermitian symmetry if y is real: Its real part is symmetric, its imaginary part is anti-symmetric. Stated another way, its amplitude is symmetric and its phase anti-symmetric.
4) If I use abs(fft(y)) to compare 2 signals, may I lose some accuracy?
If you mean abs(fft(x - y)), this is OK and you can use it to get an idea of the frequency distribution of the difference (or error, if x is an estimate of y). If you mean abs(fft(x)) - abs(fft(y)) (???) you lose at least phase information.
Well, if you want to understand the Fast Fourier Transform, you want to go back to the basics and understand the DFT itself. But, that's not what you asked, so I'll just suggest you do that in your own time :)
But, in answer to your questions:
Yes, (excepting negatives, as you said) it is zero. The range is 0 to (N-1) for an N-point input.
In MATLAB? I'm not sure I understand your question - plot zero values as you would any other value... Though, as rightly pointed out by duffymo, there is no natural log of zero.
It's essentially similar to a sinc (sine cardinal) function. It won't necessarily be symmetrical, though.
You won't lose any accuracy, you'll just have the magnitude response (but I guess you knew that already).
Consulting "Numerical Recipes in C", Chapter 12 on "Fast Fourier Transform" says:
The frequency ranges from negative fc to positive fc, where fc is the Nyquist critical frequency, which is equal to 1/(2*delta), where delta is the sampling interval. So frequencies can certainly be negative.
You can't plot something that doesn't exist. There is no natural log of zero. You'll either plot frequency as the x-axis or choose a range that doesn't include zero for your semi-log axis.
The presence or lack of symmetry in the frequency range depends on the nature of the function in the time domain. You can have a plot in the frequency domain that is not symmetric about the y-axis.
I don't think that taking the absolute value like that is a good idea. You'll want to read a great deal more about convolution, correction, and signal processing to compare two signals.
result of fft can be 0. already answered by other people.
to plot 0 frequency, the trick is to set it to a very small positive number (I use exp(-15) for that purpose).
already answered by other people.
if you are only interested in the magnitude, yes you can do that. this is applicable, say, in many image processing problems.
Half your question:
3) The results of the FFT operation depend on the nature of the signal; hence there's nothing requiring that it be symmetrical, although if it is you may get some more information about the properties of the signal
4) That will compare the magnitudes of a pair of signals, but those being equal do no guarantee that the FFTs are identical (don't forget about phase). It may, however, be enough for your purposes, but you should be sure of that.