Attenuating positions and orientations in Kinect - unity3d

I am using Kinect to get the positions and orientations of each joint, and then I am sending them to Unity. I noticed that there are a lot of "jumps" or fluctuations in the values, for example, sometimes I don't move my hand and in Unity it rotates 180 degrees.
What I want is a good way to smooth this fluctuations. I heard about the Kalman filter and I implement the code written here
http://www.dyadica.co.uk/very-simple-kalman-in-c/
And it is not bad for the positions but for the orientations is not so good... If you know better approaches or a better way to implement Kalman it would be nice.

On the prior firstly you need to check how well your sensor is able to pick up your variations and movements.
If sensor is a good one, Then Kalman filter would be a good way to start with for removing the jitters and other noise. By looking at your code, you have implemented a one dimensional KF which is fine. But in your case your requirements seems to look like you need proper orientations and positions for which you may have to design a multi-dimensional KF(equations in a matrix format to remove noise in multi-dimension). You will get a better understanding of KF by these links
http://www.codeproject.com/Articles/342099/EMGU-Kalman-Filter
http://www.codeproject.com/Articles/865935/Object-Tracking-Kalman-Filter-with-Ease
Try to implement multi dimension KF and see how well your system responds to it. If you are not satisfied with the performance of your system, Then you may have to extend the filter by making some changes. In the recent past there are some other variants of KF that has came into existence which are Extended KF and Unscented KF . Kalman Filter fails in some practical scenarios where
Noise is not Gaussian zero mean
Input signal from the sensor is non-linear(obvious in practical)
In practical scenarios noise is never zero mean and input is not linear. For this purpose the extension of KF has been introduced. You can go through Extended kalman filter and unscented kalman filter which overcomes the above drawbacks. Both the algorithms are improvements of KF which works on practical cases and can be understandable only if you have some idea on KF.

Related

How to correct (removing bias) IMU data from accelerometer and gyroscope measurement?

I am currently working on a mission to fuse GNSS and IMU for a more accurate navigation system for autonomous vehicles. I am very familiar with using GNSS to get the accurate position, however I'm a newbie in using IMU sensor. I've read several kinds of literature but am still confused about which better way should I do to remove bias from the accelerometer and gyroscope measurement.
I have 2 kinds of raw measurement data using MPU-9250, they are acceleration data (m/s2) in the x,y, z-axis and angular velocity data (deg/s) also in the x,y, z-axis. I have tried to input these data into my sensor fusion program. Unfortunately, I got unsatisfied with accuracy.. Hence I think firstly I should correcting (removing bias) of raw data IMU, and then the corrected IMU data can be input to my fusion program.
I couldn't find an answer that my brain could understand or fit my situation. Can someone please share some information about this? Can I use a high-pass filter or a low-pass filter in this situation?
I would really appreciate if there is someone could explain in detail to me without using complex math formulas/symbols, I'm not a mathematician and this is one of my problems when looking for information.
Thank you in advance
Accelerometer and Gyroscope have substantial bias usually. You could break the bias down to factors like,
Constant bias
Bias induced by temperature variation.
Bias instability
The static part of bias is easy to subtract out. If the unit starts from level orientation and without any movement, you could take samples for ~1s, average it and subtract it from your readings. Although, this step removes a big chuck of bias, it cannot still fully remove it (due to level not being perfect).
In case you observe that the temperature of IMU die varies during operation (even 5-10 deg matters), note down the bias and temperature (MPU9250 has an inbuilt temperature sensor). Fit a linear or quadratic curve that captures bias against temperature. Later on, use the temperature reading to estimate bias and subtract it out.
Even after implementing 1 and 2, there will still be some stubborn bias left. If the same is used in a fusion algorithm like Kalman filter (that is not formulated to estimate bias, the resulting position and orientation estimates will be biased too).
Bias can be estimated along with important states (like position) using some external reference/sensor like GNSS, Camera.
Complementary filter (low pass + high pass) or a Kalman filter can be formulated for this purpose.
Kalman filter approach:
Good amount of intuition along with some mathematics is needed to use this approach. Basically the work involves formulating prediction & measurement model and then provide rough noise variances for your measurements and prediction. An important thing to understand is that, Kalman filter assumes that the errors follow normal distribution without any bias. So the formulation should deliberately put bias terms as unknown states that should be estimated too (Do not assume that the sensor is bias free in the formulation)..
You could checkout my other answer to gain a detailed understanding of this approach.
Complementary filter approach
Complementary filter is simpler for simpler problems :P
The idea is that we use low pass filter on noisy measurement and high pass filter on biased measurement. Then add them up and call it a day.
Make sure that both the LPF and HPF are complements of each other (Transfer function of HPF should be 1-LPF). Typically first order filters with same time constants are used. Additionally the filter equations have to be converted from continuous laplace domain to discrete form (Read about ZOH, Tustins approximation...).
The final form is scattered around the internet too.
Personally I would use a Kalman filter for this purpose, but complementary filter can be used with same amount of effort. You could do this,
Assume that the body is not accelerating on average in long term (1-10 s or so). Then you could say that the accelerometer measures the direction of gravity in long term relative to the IMU. Then arctan(accy, accz) can be used to obtain an estimate of pitch and roll. But this pitch and roll readings will suffer from substantial noise. Implement a low pass filter on it with time constant ~5 seconds or so. Additionally add the latest pitch/roll with dt*transformationMatrix*gyroscope to get another pitch and roll. But these suffer from bias. Implement a HPF over gyro based Pitch and Roll. Add them together to get Pitch and Roll. Lets call these IMU_PR.
Now forget our original acceleration assumption. accelerometer gives specific force (which is net acceleration - gravity). Since we have Pitch and Roll angles (IMU_PR), we know gravities direction. Add gravity to accel readings to get an estimate of acceleration. Apply proper frame conversion to bring this acceleration to same coordinate frame as GPS (you will need an estimate of Yaw to do so. Fuse a magnetometer with gyroscope for this purpose). Then do vel = vel + acc*dt. Integrate it again to get an estimate of position from IMU. But this will drift due to the bias in accelerometer (and pitch, roll). Implement a high pass filter over this position and low pass filter over GPS position to get a final estimate.

What is the structure of an indirect (error-state) Kalman filter and how are the error equations derived?

I have been trying to implement a navigation system for a robot that uses an Inertial Measurement Unit (IMU) and camera observations of known landmarks in order to localise itself in its environment. I have chosen the indirect-feedback Kalman Filter (a.k.a. Error-State Kalman Filter, ESKF) to do this. I have also had some success with an Extended KF.
I have read many texts and the two I am using to implement the ESKF are "Quaternion kinematics for the error-state KF" and "A Kalman Filter-based Algorithm for IMU-Camera Calibration" (pay-walled paper, google-able).
I am using the first text because it better describes the structure of the ESKF, and the second because it includes details about the vision measurement model. In my question I will be using the terminology from the first text: 'nominal state', 'error state' and 'true state'; which refer to the IMU integrator, Kalman Filter, and the composition of the two (nominal minus errors).
The diagram below shows the structure of my ESKF implemented in Matlab/Simulink; in case you are not familiar with Simulink I will briefly explain the diagram. The green section is the Nominal State integrator, the blue section is the ESKF, and the red section is the sum of the nominal and error states. The 'RT' blocks are 'Rate Transitions' which can be ignored.
My first question: Is this structure correct?
My second question: How are the error-state equations for the measurement models derived?
In my case I have tried using the measurement model of the second text, but it did not work.
Kind Regards,
Your block diagram combines two indirect methods for bringing IMU data into a KF:
You have an external IMU integrator (in green, labelled "INS", sometimes called the mechanization, and described by you as the "nominal state", but I've also seen it called the "reference state"). This method freely integrates the IMU externally to the KF and is usually chosen so you can do this integration at a different (much higher) rate than the KF predict/update step (the indirect form). Historically I think this was popular because the KF is generally the computationally expensive part.
You have also fed your IMU into the KF block as u, which I am assuming is the "command" input to the KF. This is an alternative to the external integrator. In a direct KF you would treat your IMU data as measurements. In order to do that, the IMU would have to model (position, velocity, and) acceleration and (orientation and) angular velocity: Otherwise there is no possible H such that Hx can produce estimated IMU output terms). If you instead feed your IMU measurements in as a command, your predict step can simply act as an integrator, so you only have to model as far as velocity and orientation.
You should pick only one of those options. I think the second one is easier to understand, but it is closer to a direct Kalman filter, and requires you to predict/update for every IMU sample, rather than at the (I assume) slower camera framerate.
Regarding measurement equations for version (1), in any KF you can only predict things you can know from your state. The KF state in this case is a vector of error terms, and thus you can only predict things like "position error". As a result you need to pre-condition your measurements in z to be position errors. So make your measurement the difference between your "estimated true state" and your position from "noisy camera observations". This exact idea may be represented by the xHat input to the indirect KF. I don't know anything about the MATLAB/Simulink stuff going on there.
Regarding real-world considerations for the summing block (in red) I refer you to another answer about indirect Kalman filters.
Q1) Your SIMULINK model looks to be appropriate. Let me shed some light on quaternion mechanization based KF's which I've worked on for navigation applications.
Since Kalman Filter is an elegant mathematical technique which borrows from the science of stochastics and measurement, it can help you reduce the noise from the system without the need for elaborately modeling the noise.
All KF systems start with some preliminary understanding of the model that you want to make free of noise. The measurements are fed back to evolve the states better (the measurement equation Y = CX). In your case, the states that you are talking about are errors in quartenions which would be the 4 values, dq1, dq2, dq3, dq4.
KF working well in your application would accurately determine the attitude/orientation of the device by controlling the error around the quaternion. The quaternions are spatial orientation of any body, understood using a scalar and a vector, more specifically an angle and an axis.
The error equations that you are talking about are covariances which contribute to Kalman Gain. The covariances denote spread around the mean and they are useful in understanding how the central/ average behavior of the system is changing with time. Low covariances denote less deviation from the mean behavior for any system. As KF cycles run the covariances keep getting smaller.
The Kalman Gain is finally used to compensate for the error between the estimates of the measurements and the actual measurements that are coming in from the camera.
Again, this elegant technique first ensures that the error in the quaternion values converge around zero.
Q2) EKF is a great technique to use as long as you have a non-linear measurement construction technique. Be very careful in using EKF if their are too many transformations in your system, i.e don't try to reconstruct measurements using transformation on your states, this seriously affects the model sanctity and since noise covariances would not undergo similar transformations, there would be a chance of hitting singularity as soon as matrices are non-invertible.
You could look at constant gain KF schemes, which would save you from covariance propagation and save substantial computation effort and time. These techniques are quite new and look very promising. They actively absorb P(error covariance), Q(model noise covariance) and R(measurement noise covariance) and work well with EKF schemes.

Why isn't there a simple function to reduce background noise of an audio signal in Matlab?

Is this because it's a complex problem ? I mean to wide and therefore it does not exist a simple / generic solution ?
Because every (almost) software making signal processing (Avisoft, GoldWave, Audacity…) have this function that reduce background noise of a signal. Usually it uses FFT. But I can't find a function (already implemented) in Matlab that allows us to do the same ? Is the right way to make it manually then ?
Thanks.
The common audio noise reduction approaches built-in to things like Audacity are based around spectral subtraction, which estimates the level of steady background noise in the Fourier transform magnitude domain, then removes that much energy from every frame, leaving energy only where the signal "pokes above" this noise floor.
You can find many implementations of spectral subtraction for Matlab; this one is highly rated on Matlab File Exchange:
http://www.mathworks.com/matlabcentral/fileexchange/7675-boll-spectral-subtraction
The question is, what kind of noise reduction are you looking for? There is no one solution that fits all needs. Here are a few approaches:
Low-pass filtering the signal reduces noise but also removes the high-frequency components of the signal. For some applications this is perfectly acceptable. There are lots of low-pass filter functions and Matlab helps you apply plenty of them. Some knowledge of how digital filters work is required. I'm not going into it here; if you want more details consider asking a more focused question.
An approach suitable for many situations is using a noise gate: simply attenuate the signal whenever its RMS level goes below a certain threshold, for instance. In other words, this kills quiet parts of the audio dead. You'll retain the noise in the more active parts of the signal, though, and if you have a lot of dynamics in the actual signal you'll get rid of some signal, too. This tends to work well for, say, slightly noisy speech samples, but not so well for very noisy recordings of classical music. I don't know whether Matlab has a function for this.
Some approaches involve making a "fingerprint" of the noise and then removing that throughout the signal. It tends to make the result sound strange, though, and in any case this is probably sufficiently complex and domain-specific that it belongs in an audio-specific tool and not in a rather general math/DSP system.
Reducing noise requires making some assumptions about the type of noise and the type of signal, and how they are different. Audio processors typically assume (correctly or incorrectly) something like that the audio is speech or music, and that the noise is typical recording session background hiss, A/C power hum, or vinyl record pops.
Matlab is for general use (microwave radio, data comm, subsonic earthquakes, heartbeats, etc.), and thus can make no such assumptions.
matlab is no exactly an audio processor. you have to implement your own filter. you will have to design your filter correctly, according to what you want.

Trying to filter (tons of) noise from accelerometers and gyroscopes

My project:
I'm developing a slot car with 3-axis accelerometer and gyroscope, trying to estimate the car pose (x, y, z, yaw, pitch) but I have a big problem with my vibration noise (while the car is running, the gears induce vibration and the track also gets it worse) because the noise takes values between ±4[g] (where g = 9.81 [m/s^2]) for the accelerometers, for example.
I know (because I observe it), the noise is correlated for all of my sensors
In my first attempt, I tried to work it out with a Kalman filter, but it didn't work because values of my state vectors had a really big noise.
EDIT2: In my second attempt I tried a low pass filter before the Kalman filter, but it only slowed down my system and didn't filter the low components of the noise. At this point I realized this noise might be composed of low and high frecuency components.
I was learning about adaptive filters (LMS and RLS) but I realized I don't have a noise signal and if I use one accelerometer signal to filter other axis' accelerometer, I don't get absolute values, so It doesn't work.
EDIT: I'm having problems trying to find some example code for adaptive filters. If anyone knows about something similar, I will be very thankful.
Here is my question:
Does anyone know about a filter or have any idea about how I could fix it and filter my signals correctly?
Thank you so much in advance,
XNor
PD: I apologize for any mistake I could have, english is not my mother tongue
The first thing i would do, would be to run a DFT on the sensor signal and see if there is actually a high and low frequency component of your accelerometer signals.
With a DFT you should be able to determine an optimum cutoff frequency of your lowpass/bandpass filter.
If you have a constant component on the Z axis, there is a chance that you haven't filtered out gravity. Note that if there is a significant pitch or roll this constant can be seen on your X and Y axes as well
Generally pose estimation with an accelerometer is not a good idea as you need to integrate the acceleration signals twice to get a pose. If the signal is noisy you are going to be in trouble already after a couple of seconds if the noise is not 100% evenly distributed between + and -.
If we assume that there is no noise coming from your gears, even the conversion accuracy of the Accelerometer might start to mess up your pose after a couple of minutes.
I would definately use a second sensor, eg a compass/encoder in combination with your mathematical model and combine all your sensor data in a kalmann filter(Sensor fusion).
You might also be able to derive a black box model of your noise by assuming that it is correlated with your motors RPM. (Box-jenkins/Arma/Arima).
I had similar problems with noise with low and high frequencies and I managed to decently remove it without removing good signal too by using an universal microphone shock mount. It does a good job with gyroscope too especially if you find one which fits it (or you can put it in a small case then mount it)
It basically uses elastic strings to remove shocks and vibration.
Have you tried a simple low-pass filter on the data? I'd guess that the vibration frequency is much higher than the frequencies in normal car acceleration data. At least in normal driving. Crashes might be another story...

C/C++/Obj-C Real-time algorithm to ascertain Note (not Pitch) from Vocal Input

I want to detect not the pitch, but the pitch class of a sung note.
So, whether it is C4 or C5 is not important: they must both be detected as C.
Imagine the 12 semitones arranged on a clock face, with the needle pointing to the pitch class. That's what I'm after! ideally I would like to be able to tell whether the sung note is spot-on or slightly off.
This is not a duplicate of previously asked questions, as it introduces the constraints that:
the sound source is a single human voice, hopefully with negligible background interference (although I may need to deal with this)
the octave is not important, only the pitch class
EDIT -- Links:
Real time pitch detection
Using the Apple FFT and Accelerate Framework
See my answer here for getting smooth FREQUENCY detection: https://stackoverflow.com/a/11042551/1457445
As far as snapping this frequency to the nearest note -- here is a method I created for my tuner app:
- (int) snapFreqToMIDI: (float) frequencyy {
int midiNote = (12*(log10(frequencyy/referenceA)/log10(2)) + 57) + 0.5;
return midiNote;
}
This will return the MIDI note value (http://www.phys.unsw.edu.au/jw/notes.html)
In order to get a string from this MIDI note value:
- (NSString*) midiToString: (int) midiNote {
NSArray *noteStrings = [[NSArray alloc] initWithObjects:#"C", #"C#", #"D", #"D#", #"E", #"F", #"F#", #"G", #"G#", #"A", #"A#", #"B", nil];
return [noteStrings objectAtIndex:midiNote%12];
}
For an example implementation of the pitch detection with output smoothing, look at musicianskit.com/developer.php
Pitch is a human psycho-perceptual phenomena. Peak frequency content is not the same as either pitch or pitch class. FFT and DFT methods will not directly provide pitch, only frequency. Neither will zero crossing measurements work well for human voice sources. Try AMDF, ASDF, autocorrelation or cepstral methods. There are also plenty of academic papers on the subject of pitch estimation.
There is another long list of pitch estimation algorithms here.
Edited addition: Apple's SpeakHere and aurioTouch sample apps (available from their iOS dev center) contain example source code for getting PCM sample blocks from the iPhone's mic.
Most of the frequency detection algorithms cited in other answers don't work well for voice. To see why this is so intuitively, consider that all the vowels in a language can be sung at one particular note. Even though all those vowels have very different frequency content, they would all have to be detected as the same note. Any note detection algorithm for voices must take this into account somehow. Furthermore, human speech and song contains many fricatives, many of which have no implicit pitch in them.
In the generic (non voice case) the feature you are looking for is called the chroma feature and there is a fairly large body of work on the subject. It is equivalently known as the harmonic pitch class profile. The original reference paper on the concept is Tayuka Fujishima's "Real-Time Chord Recognition of Musical Sound: A System Using Common Lisp Music". The Wikipedia entry has an overview of a more modern variant of the algorithm. There are a bunch of free papers and MATLAB implementations of chroma feature detection.
However, since you are focusing on the human voice only, and since the human voice naturally contains tons of overtones, what you are practically looking for in this specific scenario is a fundamental frequency detection algorithm, or f0 detection algorithm. There are several such algorithms explicitly tuned for voice. Also, here is a widely cited algorithm that works on multiple voices at once. You'd then check the detected frequency against the equal-tempered scale and then find the closest match.
Since I suspect that you're trying to build a pitch detector and/or corrector a la Autotune, you may want to use M. Morise's excellent WORLD implementation, which permits fast and good quality detection and modification of f0 on voice streams.
Lastly, be aware that there are only a few vocal pitch detectors that work well within the vocal fry register. Almost all of them, including WORLD, fail on vocal fry as well as very low voices. A number of papers refer to vocal fry as "creaky voice" and have developed specific algorithms to help with that type of voice input specifically.
If you are looking for the pitch class you should have a look at the chromagram (http://labrosa.ee.columbia.edu/matlab/chroma-ansyn/)
You can also simply dectect the f0 (using something like YIN algorithm) and return the appropriate semitone, most of fundamental frequency estimation algorithms suffer from octave error
Perform a Discrete Fourier Transform on samples from your input waveform, then sum values that correspond to equivalent notes in different octaves. Take the largest value as the dominant frequency.
You can likely find some existing DFT code in Objective C that suits your needs.
Putting up information as I find it...
Pitch detection algorithm on Wikipedia is a good place to start. It lists a few methods that fail for determining octave, which is okay for my purpose.
A good explanation of autocorrelation can be found here (why can't Wikipedia put things simply like that??).
Finally I have closure on this one, thanks to this article from DSP Dimension
The article contains source code.
Basically he performs an FFT. then he explains that frequencies that don't coincide spot on with the centre of the bin they fall in will smear over nearby bins in a sort of bell shaped curve. and he explains how to extract the exact frequency from this data in a second pass (FFT being the first pass).
the article then goes further to pitch shift; I can simply delete the code.
note that they supply a commercial library that does the same thing (and far more) only super optimised. there is a free version of the library that would probably do everything I need, although since I have worked through the iOS audio subsystem, I might as well just implement it myself.
for the record, I found an alternative way to extract the exact frequency by approximating a quadratic curve over the bin and its two neighbours here. I have no idea what is the relative accuracy between these two approaches.
As others have mentioned you should use a pitch detection algorithm. Since that ground is well-covered I will address a few particulars of your question. You said that you are looking for the pitch class of the note. However, the way to find this is to calculate the frequency of the note and then use a table to convert it to the pitch class, octave, and cents. I don't know of any way to obtain the pitch class without finding the fundamental frequency.
You will need a real-time pitch detection algorithm. In evaluating algorithms pay attention to the latency implied by each algorithm, compared with the accuracy you desire. Although some algorithms are better than others, fundamentally you must trade one for the other and cannot know both with certainty -- sort of like the Heisenberg uncertainty principle. (How can you know the note is C4 when only a fraction of a cycle has been heard?)
Your "smoothing" approach is equivalent to a digital filter, which will alter the frequency characteristics of the voice. In short, it may interfere with your attempts to estimate the pitch. If you have an interest in digital audio, digital filters are fundamental and useful tools in that field, and a fascinating subject besides. It helps to have a strong math background in understanding them, but you don't necessarily need that to get the basic idea.
Also, your zero crossing method is a basic technique to estimate the period of a waveform and thus the pitch. It can be done this way, but only with a lot of heuristics and fine-tuning. (Essentially, develop a number of "candidate" pitches and try to infer the dominant one. A lot of special cases will emerge that will confuse this. A quick one is the less 's'.) You'll find it much easier to begin with a frequency domain pitch detection algorithm.
if you re beginner this may be very helpful. It is available both on Java and IOS.
dywapitchtrack for ios
dywapitchtrack for java