Processing accelerometer data - accelerometer

I would like to know if there are some libraries/algorithms/techniques that help to extract the user context (walking/standing) from accelerometer data (extracted from any smartphone)?
For example, I would collect accelerometer data every 5 seconds for a definite period of time and then identify the user context (ex. for the first 5 minutes, the user was walking, then the user was standing for a minute, and then he continued walking for another 3 minutes).
Thank you very much in advance :)

Check new activity recognization apis
http://developer.android.com/google/play-services/location.html

its still a research topic,please look at this paper which discuss the algorithm
http://www.enggjournals.com/ijcse/doc/IJCSE12-04-05-266.pdf

I don't know of any such library.
It is a very time consuming task to write such a library. Basically, you would build a database of "user context" that you wish to recognize.
Then you collect data and compare it to those in the database. As for how to compare, see Store orientation to an array - and compare, the same holds for accelerometer.

Walking/running data is analogous to heart-rate data in a lot of ways. In terms of getting the noise filtered and getting smooth peaks, look into noise filtering and peak detection algorithms. The following is used to obtain heart-rate information for heart patients, it should be a good starting point : http://www.docstoc.com/docs/22491202/Pan-Tompkins-algorithm-algorithm-to-detect-QRS-complex-in-ECG
Think about how you want to filter out the noise and detect peaks; the filters will obviously depend on the raw data you gather, but it's good to have a general idea of what kind of filtering you'd want to do on your data. Think about what needs to be done once you have filtered data. In your case, think about how you would go about designing an algorithm to find out when the data indicates activity (like walking, running,etc.), and when it shows the user being stationary. This is a fairly challenging problem to solve, once you consider the dynamics of the device itself (how it's positioned when the user is walking/running), and the fact that there are very few (if not no) benchmarked algos that do this with raw smartphone data.
Start with determining the appropriate algorithms, and then tackle the complexities (mentioned above) one by one.

Related

Topics and LL/token in Mallet change every time

Why do I get different keywords and LL/token every time I run topic models in Mallet? Is it normal?
Please help. Thank you.
Yes, this is normal and expected. Mallet implements a randomized algorithm. Finding the exact optimal best topic model for a collection is computationally intractable, but it's much easier to find one of countless "pretty good" solutions.
As an intuition, imagine shaking a box of sand. The smaller particles will sift towards one side, and the larger particles towards the other. That's way easier than trying to sort them by hand. You won't get the exact order, but each time you'll get one of a large number of equally good approximate sortings.
If you want to have a stronger guarantee of local optimality, add --num-icm-iterations 100 to switch from sampling to choosing the single best allocation for each token, given all the others.

What to do with not enough training data?

I have a problem that I don't have enough training data for my NN. It is trying to predict the result of a soccer game given the last games which I woulf say is a regression task.
The training data are results of soccer games of the last 15 seasons (which are about 4500 games). Getting to new data would be hard and would take a lot of time.
What should I do now?
Is it good to duplicate the data?
Should I input randomized data? (Maybe noise but I'm not quite sure what that is)
If there is no way of creating more data,
I should probably turn up the learning rate right? (I have it sitting at 0.01 and the momentum at 0.9)
I am using mini batches consisting of 32 training datas in training. Since I don't have a lot of training I don't have a lot of mini batches. Should I stop using them?
To start from the beginning: This is a very theoretical question and is not directly related to programming, which I recommend (in future) to post over at the Data Science Stackexchange.
To go into your problem: 4500 samples is not as bad as it sounds, depending on the exact task at hand. Are you trying to predict the match results (i.e. which team is the winner?), are you looking for more specific predictions (across a lot of different, specific teams)?
If you can make sure that you have a reasonable amount of data per class, one can work with a number of samples lower than what you have. Simply duplicating the data will not help you much, since you are very likely to just overfit on the samples you are seeing, without much of an improvement; Or rather, you will get the same results as training over a longer period (since essentially you see every sample twice per epoch, instead of one).
Again, what usually happens after long training periods is overfitting, so nothing gained here.
Your second suggestion is generally called data augmentation. Instead of simply copying samples, you alter them enough to make it look "different" to the network. But be careful! Data augmentation works well for some inputs, like images, since the change in input is significant enough to not represent the same sample, but still contains meaningful information about the class (a horizontally mirrored image of a cat still shows a "valid cat", unlike a vertically mirrored image, which is more unrealistic in the real world).
Essentially, it depends on your input features to determine where it makes sense to add noise. If you are only changing the results of the previous game, a minor change in input (adding/subtracting one goal at random) can significantly change the prediction you make.
If you slightly scramble ELO scores by a random number, on the other hand, the input value will not be too different, "but different enough" to use it as a novel example.
Turning up the learning rate is not a good idea, since you are essentially just letting the network converge more towards the specific samples. On the contrary, I would argue that the current learning rate is still too high, and you should certainly not increase it.
Regarding mini batches, I think I have referenced this a million times now, but always consider smaller minibatches. From a theoretical point of view, you are more likely to converge to a local minimum.

Scala streaming peak detection with reactive events

I am trying to work out the best way to structure an application that in essence is a peak detection program. In my line of work I have been given charge of developing a system that essentially is looking at pulses in a stream of data and doing calculations on the peak data.
At the moment the software is implemented in LabVIEW. I'm sure many of you on here would understand why I'd love to see the end of that environment. I would like to redesign this in Scala (and possibly use Play if I was to make it use a web frontend) but I am not sure how best to approach the initial peak-detection component.
I've seen many tutorials for peak detection in various languages and I understand from a theoretical perspective many of the algorithms. What I am not sure is how would I approach this from the most Scala/Play idiomatic way?
Obviously I don't expect someone to write the code for me but I would really appreciate any pointers as to the direction I should take that makes the most sense. Since I cannot be too specific on the use case I'll try to give an overview of what I'm trying to do below:
Interfacing with data acquisition hardware to send out control voltages and read back "streams" of data.
I should be able to work the hardware side out, but is there a specific structure that would be best for the returned stream? I don't necessarily know ahead of time how much data I'll be reading so a stream that can be buffered and chunked would probably be appropriate.
Scan through the stream to find peaks and measure their height and trigger an event.
Peaks are usually about 20 samples wide or so but that depends on sample rate so I don't want to hard-code anything like that. I assume a sliding window would be necessary so peaks don't get "cut off" on the edge of a buffer. As a peak arrives I need to record and act on it. I think reactive streams and so on may be appropriate but I'm not sure. I will be making live graphs etc with the data so however it is done I need a way to send an event immediately on a successful detection.
The streams can be quite long and are at high sample-rates (minimum of 250ksamples per second) so I'd prefer not to have to buffer the entire stream to memory. The only information that needs to be permanent is the peak voltage data. I will need a way to visualise the raw stream for calibration purposes but I imagine that should be pretty simple.
The full application is much more complex and I'll need to do some initial filtering of noise and drift but I believe I should be able to work that out once I know what kind of implementation I should build on.
I've tried to look into Play's Iteratees and such but they are a little hard to follow. If they are an appropriate fit then I'm happy to work on learning them but since I'm not sure if that is the best way to approach the problem I'd love to know where I should look.
Reactive frameworks and the like certainly look interesting and I can see how I could really easily build the rest of the application around them but I'm just not sure how best to implement a streaming peak detection function on top of them beyond something simple like triggering when a value is over a threshold (as mentioned previously a "peak" can be quite wide and the signal is noisy).
Any advice would be greatly appreciated!
This is not a solution to this question but I'm writing this as an answer because of space/formatting limitations in the comments section.
Since you are exploring options I would suggest the following:
Assuming you have a large enough buffer to keep a window of data in memory (W=tXw) you can calculate the peak for the buffer using your existing algorithm. Next you can collect the next few samples data in a delta buffer (d) (a much smaller window). The delta buffer is the size of your increment. Assuming this is time series data you can easily create the new sliding window by removing the first delta (dXt) values from the buffer W and adding d values to the buffer. This is how Spark-streaming implements reduceByWindow function on a DStream. Iteratee can also help here.
If your system is distributed then you can use stream processing systems (Storm, Spark-streaming) to get better latency and throughput at the cost of distributing the system.
If you are really resource constrained and can live approximate results that bounded I would suggest you look at implementing a combination of probabilistic data structures such as count-min-sketch, hyperloglog and bloom filter.

How can I Compare 2 Audio Files Programmatically?

I want to compare 2 audio files programmatically.
For example: I have a sound file in my iPhone app, and then I record another one. I want to check if the existing sound matches the recorded sound or not ( - similar to voice recognition).
How can I accomplish this?
Have a server doing audio fingerprinting computation that is not suitable for mobile device anyway. And then your mobile app uploads your files to the server and gets the analysis result for display. So I don't think programming language implementing it matters much. Following are a few AF implementations.
Java: http://www.redcode.nl/blog/2010/06/creating-shazam-in-java/
VC++: http://code.google.com/p/musicip-libofa/
C#: https://web.archive.org/web/20190128062416/https://www.codeproject.com/Articles/206507/Duplicates-detector-via-audio-fingerprinting
I know the question has been asked a long time ago, but a clear answer could help someone else.
The libraries from Echoprint ( website: echoprint.me/start ) will help you solve the following problems :
De-duplicate a big collection
Identify (Track, Artist ...) a song on a hard drive or on a server
Run an Echoprint server with your data
Identify a song on an iOS device
PS: For more music-oriented features, you can check the list of APIs here.
If you want to implement Fingerprinting by yourself, you should read the docs listed as references here, and probably have a look at musicip-libofa on Google Code
Hope this will help ;)
Apply bandpass filter to reduce noise
Normalize for amplitude
Calculate the cross-correlation
It can be fairly Mhz intensive.
The DSP details are in the well known text:
Digital Signal Processing by
Alan V. Oppenheim and Ronald W. Schafer
I think as well you may try to select a few second sample from both audio track, mnormalise them in amplitude and reduce noise with a band pass filter and after try to use a correlator.
for instance you may take a 5 second sample of one of the thwo and made it slide over the second one computing a cross corelation for any time you shift. (be carefull that if you take a too small pachet you may have high correlation when not expeced and you will soffer the side effect due to the croping of the signal and the crosscorrelation).
After yo can collect an array with al the results of the cross correlation and get the index of the maximun.
You should then set experimentally up threshould o decide when yo assume the pachet to b the same. this will change depending on the quality of the audio track you are comparing.
I implemented a correator to receive and distinguish preamble in wireless communication. My script is actually done in matlab. if you are interested i can try to find the common part and send it to you.
It would be a too long code to be pasted hene in the forum. if you want just let me know and i will send it to ya asap.
cheers

Whate are the basic concepts for implementing anti-shock and anti-shake algorithms?

I have some animations happening upon fine acceleration detections. But when the user sits in a car or is walking it may get annoying.
Basically, all that stuff has to be disabled automatically as soon as there is too much vibration or shaking. Conceptually, I think that it's very hard to filter those vibrations out , since the "vibration phase" changes permanently. I woul define "unwanted vibration or shocks" as acceleration values that change very fast by an large interval of values, or, an permanently changing accumulated value that does not exceed an specified treshold range in an specified minimum period of time.
I am looking for "proven" concepts, before I start reinventing the wheel for a couple of days.
I don't have any concrete answers for you, but you might want to Google band-pass filters or anti-aliasing filters for some ideas on how to approach this. Basically, if you can identify the frequency range of accelerations that you want to consider real, you can filter out frequencies that fall outside this range.
Before you start doing too much pre-optimization, I think you should implement a low pass filter and see if that does the job. Most iPhone apps effectively use a variation of an LPF to get rid of unwanted accelerometer noise.
You could also go the other way and use a high pass filter. Once you get a certain power level passing through the HPF, stop processing data.