Theory behind ARMA and Running Average filter and are there any alternative algorithms for calculating distance from RSSI of signal? - filtering

I can see that the Android Bacon Library features two algorithms for measuring distance: the running average filter and the ARMA filter.
How are these related to the library's implementation (apart from using the filter's formula)?
Where can I find some background information about these that explains the theory behind it?
Are there any known alternative algorithms that can be studied and tried out for measuring the distance?

There are two basic steps to giving a distance estimate on BLE beacons.
Collect RSSI samples
Convert RSSI samples to a distance estimate.
Both of these steps have different possible algorithms. The ARMA and Running Average filters are two different algorithms used for collecting the RSSI samples.
Understand that beacon send out packets at a periodic rate, typically at 1-10 times per second. Each of these packets, when received by the phone will have its own signal level measurement called RSSI. Because of radio noise and measurement error, there is a large amount of variance on each RSSI sample which can lead to big swings in distance estimates. So typically you want to take a number of RSSI samples and average them together to reduce this error.
The running average algorithm simply takes 20 seconds worth of samples (by default, the time period is configurable), throws out the top and bottom 10 percent of the RSSI readings and takes the mean of the remainder. This is similar to how iOS averages samples, so it is the default algoritm for the library for cross-platform compatibility reasons. But it has the disadvantage that the distance estimate lags behind, telling you where the phone is relative to the beacon an average of 10 seconds ago. This may be inappropriate for use cases where the phone is moving relative to the beacon.
The ARMA (Autoregressive Moving Average) algorithm statistically weights the more recent samples more heavily that the older samples, leading to less lag in the distance estimates. But its behavior can be a bit indeterminate and is subject to more varying performance in different radio conditions.
Which algorithm is right for you depends on your use case. Testing both to see which performs better for you is generally the best approach. While there are other possible algorithms for data collection, these are the only two built in to the library. Since it is open source, you are welcome to create your own and submit them as a pull request.
For step 2, there are also a number of possible algorithms. The two most common are a curve fitted formula and a path loss formula. The curve fitted formula is the library default and the path loss alternative is available only in a branch of the library under development. You are welcome to use the latter, but it requires building the library from source. Again, as an open source library you are welcome and encouraged to develop your own alternative algorithms.

Related

How to remove periodicity in hourly wind speed data by using fourier transform in matlab

Review for removing periodicsI have a dataset that contains hourly wind speed data for 7 seven. I am trying to implement a forecasting model to the data and the review paper states that trimming of diurnal, weekly, monthly, and annual patterns in data significantly enhances estimation accuracy. They then follow along by using the fourier series to remove the periodic components as seen in the image. Any ideas on how i model this in matlab?
I am afraid this topic is not explained "urgently". What you need is a filter for the respective frequencies and a certain number of their harmonics. You can implement such a filter with an fft or directly with a IIR/FIR-formula.
FFT is faster than a IIR/FIR-implementation, but requires some care with respect to window function. Even if you do a "continuous" DFT, you will have a window function (like exponential or gaussian). The window function determines the bandwidth. The wider the window, the smaller the bandwidth. With an IIR/FIR-filter the bandwidth is encoded in the recursive parameters.
For suppressing single frequencies (like the 24hr weather signal) you need a notch-filter. This also requires you to specify a bandwidth, as you can see in the linked article. The smaller the bandwidth, the longer it will take (in time) until the filter has evolved to the frequency to suppress it. If you want the filter to recognize the amplitude of the 24hr-signal fast, then you need a wider bandwidth. But then however you are going to suppress also more frequencies slightly lower and slightly higher than 1/24hrs. It's a tradeoff.
If you also want to suppress several harmonics (like described in the paper) you have to combine several notch-filters in series. If you want to do it with FFT, you have to model the desired transfer function in the frequency space and since you can do this for all frequencies at once, so it's more efficient.
An easy but approximate way to get something similar to a notch-filter including all harmonics is with a Comb-filter. But it's an approximation, you have no control over the details of the transfer function. You could do that in Matlab by adding to the original a signal that is shifted by 12hrs. This is because a sinusoidal signal will cancel with one that is shifted by pi.
So you see, there's lots of possibilities for what you want.

Episodic Semi-gradient Sarsa with Neural Network

While trying to implement the Episodic Semi-gradient Sarsa with a Neural Network as the approximator I wondered how I choose the optimal action based on the currently learned weights of the network. If the action space is discrete I can just calculate the estimated value of the different actions in the current state and choose the one which gives the maximimum. But this seems to be not the best way of solving the problem. Furthermore, it does not work if the action space can be continous (like the acceleration of a self-driving car for example).
So, basicly I am wondering how to solve the 10th line Choose A' as a function of q(S', , w) in this pseudo-code of Sutton:
How are these problems typically solved? Can one recommend a good example of this algorithm using Keras?
Edit: Do I need to modify the pseudo-code when using a network as the approximator? So, that I simply minimize the MSE of the prediction of the network and the reward R for example?
I wondered how I choose the optimal action based on the currently learned weights of the network
You have three basic choices:
Run the network multiple times, once for each possible value of A' to go with the S' value that you are considering. Take the maximum value as the predicted optimum action (with probability of 1-ε, otherwise choose randomly for ε-greedy policy typically used in SARSA)
Design the network to estimate all action values at once - i.e. to have |A(s)| outputs (perhaps padded to cover "impossible" actions that you need to filter out). This will alter the gradient calculations slightly, there should be zero gradient applied to last layer inactive outputs (i.e. anything not matching the A of (S,A)). Again, just take the maximum valid output as the estimated optimum action. This can be more efficient than running the network multiple times. This is also the approach used by the recent DQN Atari games playing bot, and AlphaGo's policy networks.
Use a policy-gradient method, which works by using samples to estimate gradient that would improve a policy estimator. You can see chapter 13 of Sutton and Barto's second edition of Reinforcement Learning: An Introduction for more details. Policy-gradient methods become attractive for when there are large numbers of possible actions and can cope with continuous action spaces (by making estimates of the distribution function for optimal policy - e.g. choosing mean and standard deviation of a normal distribution, which you can sample from to take your action). You can also combine policy-gradient with a state-value approach in actor-critic methods, which can be more efficient learners than pure policy-gradient approaches.
Note that if your action space is continuous, you don't have to use a policy-gradient method, you could just quantise the action. Also, in some cases, even when actions are in theory continuous, you may find the optimal policy involves only using extreme values (the classic mountain car example falls into this category, the only useful actions are maximum acceleration and maximum backwards acceleration)
Do I need to modify the pseudo-code when using a network as the approximator? So, that I simply minimize the MSE of the prediction of the network and the reward R for example?
No. There is no separate loss function in the pseudocode, such as the MSE you would see used in supervised learning. The error term (often called the TD error) is given by the part in square brackets, and achieves a similar effect. Literally the term ∇q(S,A,w) (sorry for missing hat, no LaTex on SO) means the gradient of the estimator itself - not the gradient of any loss function.

Accurate frequency estimation with short time series data - maximum entropy methods or Yule Walker AR method?

I am using the Lomb-Scargle code to estimate some frequencies in a short time-series, the time series is shown in the first image. The results of the Lomb-Scargle analysis are shown in the second, and I have zoomed in on a prominent peak at about 2 cycles per day. However this peak is smeared and thus it is proving difficult to resolve the real frequency of this component. Is there any other methods, or improvements to the method I am using, to accurately resolve the important frequency components within this short time-series?
There is some information on the use of methods for short time series here but its not clear whether they need to be regularly sampled. Ideally I am looking for a method that works with irregularly sampled data, from some research it appears that maximum entropy methods are the answer, but I am not sure whether these have been implemented in MATLAB? Although from the this link, it appears that there is an equivalent method, 'The Yule-Walker AR method produces the same results as a maximum entropy estimator. However again its not clear whether the data need to be uniformly sampled?

Cross-talk filter with known source

I currently work in an experimental rock mechanics lab, and when I conduct an experiment, I record the output signals such as effective torque, normal force and motor velocity. However, the latter quantity causes significant cross-talk over the recorded channels, and I want to filter this out. Let me give an example:
Here the upper plot is the strong signal (motor velocity), and the lower is an idle signal that is affected by the cross-talk (blue is raw signal, red is median filtered). The idle channel is only recording noise. We see three effects here. When the motor voltage changes:
the amplitude of the noise increases
the idle signal's median shifts
there is a spike that lasts approximately 0.1 seconds
If we zoom in on the first spike that occurs at around 115 seconds, we get the following plot. This does not seem to be your typical delta-function type of spike, but rather some kind of electronic "echo".
I have seen much work on blind source separation through independent component analysis (ICA), but that did not prove to be effective in my situation. However, since I know the shape of the signal that is causing the cross-talk, there may be better ways to include this information. My question is this: is there a filter or a combination of filters that can tackle the effects mentioned above?
As I am a geologist and not an electrician or mathematician, I don't have a proper background for this kind of material, so please bear with me. I write Python, MATLAB and C++ quite well, so suggested algorithms written in any of those languages is preferred (but not required).
The crosstalk you encounter, results from a parasitic transmission line. Just think of your typical FM-receiver - where the wires equal the antennae. These effects include parasitic and inductive coupling, and form an oscillator (which is the reason, why you cannot see, the theoretically ideal delta spike)
I recognize two different approaches:
use a hardware filtering circuit
use a software-implemented filter
ad 1:
depending on the needed bandwith (maximum frequency/rate of change) on the idle channel, you can determine the corner frequency, as well as the required filter-order, for a given rate of suppression
ad 2:
you can implement several types of filters (IIF, FIR) which resemble these circuits.
Additionally, if you are measuring the aggressive signal anyways, you can use the measurement on the idle channel to determine system-parameters for a mathematical model of the crosstalk. With this model you'd be able, to exclude the interference by calculation

PIV Analysis, Interrogation Area of The Cross Correlation

I'm running a PIV analysis on two consecutive images taken during an experiment to get the vector field. But I would like to know, based on what criteria do I have to choose the percentage of overlap between the tow images for the cross-correlation process? 50%, 75%...? The PIVlab_GUI tool designed for MATLAB chooses a 50% overlap by default, but it allows changing it.
I just want to know the criteria based on which I can know how much overlap is best? Do the vectors become less accurate, dependent.etc, as we increase/decrease the overlap?
My book "Fluid Mechanics Measurements" does not explain how to choose the overlap amount in the cross-correlation process, and I could not find any helpful online reference.
Any help is appreciated.
I suggest you read up on spectral estimation - which is basically equivalent to cross correlation when you segment the data and average the correlation estimates calculated from each segment (the cross correlation is the inverse Fourier transform of the cross spectrum). There's a book chapter on this stuff here, but you may want to find a more complete resource if you are unclear on the basics.
A short answer: increasing the overlap will increase the frequency resolution of the spectral estimate, and give you more segments to average over; your estimate will have a lower variance. But there are diminishing statistical returns the more you increase your overlap past 50%, while the computational complexity continues to rise (more segments = more calculations). Hence most people just choose 50% and have done with it.
It's important to note that you don't get any more information by using overlapping frames, you are simply increasing the frequency resolution (or time lag resolution, for correlation) - similar to the effect of zero-padding a signal before taking its Fourier transform - and this has statistical effects due to the way estimation of this type works.