My current project is to gather data from a sports video using video processing techniques. Specifically, in sports like Tennis or badminton, I want to identify the type of shots taken.
So, I thought of two methods to do this:
Use motion detection techniques and highlight only the players and the ball in the foreground, then use some kind of a filtering algorithm like the Kalman filter to track the player and then track the motion of their hands separately. This method seems to be really hard and complicated, I cannot seem to track the players accurately at all.
Supplying a collection of videos of a particular shot to a neural network, make it train and identify those shots eventually. But I do not know how to supply videos as inputs to a neural networks and I'm not clear as to which software is ideal for this application.
Any help would be great.
The second option looks better to me. Maybe have a look at how images are fed to neural networks, for example in Alexnet. Then feeding videos should not be too different. You'll probably have to first decode the videos, though.
Related
I want to save the video from the results of my Anylogic simulation. Could you send me a guide?
I want to send my result for others that they didn't install Anylogic. Also, I want use a video of results in my powerpoint.
THANKS
AnyLogic does not come with video recording. Either, you upload the model to the AnyLogic cloud and let your users play with it themselves.
Or you install a screen-recording software such as the free ScreenCast-o-Matic
You can set up a script to speed up, slow down, jump to a time, zoom, pan, change views etc. during the run. Then you will need to use screen capture software like snagit, or others to actually capture the video and save it to a file.
With the script you can create a more interesting video if you want to move around and focus on different areas without making a whole bunch of smaller videos that you have to piece together.
I was wondering if there was a way to record the sensor and video data from my iPhone, save it in some way, and then feed it into Unity to test an AR app.
I'd like to see how different algorithms behave on identical input, and that's hard to do when the only way to test is to pick up my phone and wave it around.
What do you can do is capture the image buffer. I've done something similar using ARCore. Not sure if ARKit has a similar implementation. I found this when I did a brief search https://forum.unity.com/threads/how-to-access-arframe-image-in-unity-arkit.496372/
In ARCore, you can take this image buffer and using ImageConversion.EncodeToPNG you can create PNG files with the timestamp. You can pull your sensor data in parallel. Depending on what you want, you can write it to a file using a similar approach: https://support.unity3d.com/hc/en-us/articles/115000341143-How-do-I-read-and-write-data-from-a-text-file-
After which, you can use FFMPEG to convert these PNGs into a video. If you want to try different algorithms, there's a good chance the PNGs alone will be enough. Else you can use a command like so: http://freesoftwaremagazine.com/articles/assembling_video_png_stream_ffmpeg/
You should be able to pass these images and the corresponding sensor data to your algorithm to check.
I'm doing an app where I want to detect sound frequency. How to detect frequency for particular sound like dog sound? Does anybody have tutorial or some sample codes?
Detecting a single frequency, or even computing a single FFT, is not a reliable method for differentiating a dog bark from other common sounds of around the same volume.
What might work is sound fingerprint analysis using MFCC's, followed by statistical pattern matching against a large enough "dog" sound database. Some pointers to the type of signal processing required might be answered here: Music Recognition and Signal Processing
This is non-trivial stuff more suited for multiple college textbook chapters than any short tutorial.
To detect the frequency, you can use a pitch detection algorithm like FFT.
Learn more here: http://en.wikipedia.org/wiki/Pitch_detection_algorithm
You can look at this project for working source code for iOS that uses FFT algorithm to detect frequencies:
https://github.com/hollance/SimonSings
I want to film a batter swinging at a baseball, but the bat is blurry. The video is 30 fps.
Through research I have found that deconvolution seems to be the way to minimize motion blur, but I have no idea if or how I can implement it in my iOS app post processing.
I was hoping someone could point me in the right direction like how to apply a deconvolution algorithm in iOS or what I might need to do...or if it is even possible. I imagine it takes some processing power.
Any suggestions at all are welcome...
Thanks, this is driving me crazy...
After a lot of research and talks with developers about deconvolusion on iOS (Thanks to Brad Larson for taking the time to give me detailed information) I am confident that it is not possible and/or not worth the time. If the hardware can handle the computations (No guarantee) it would be EXTREMELY slow and consume much of the device's battery. I have also been told it could take months to implement the algorithms...if it is possible at all.
Here is the response I received from Apple...
Deconvolution algorithms are generally difficult to implement and can be very computationally intensive. I suggest you starting with a simple sharpening technique. Depending on the amount of the motion blur in your video, it might just suffice.
The sharpen filters, including CISharpenLuminance and CIUnsharpMask, are now available in iOS 6, so it is moderately easy to test them out.
Core Image Filter Reference
https://developer.apple.com/library/mac/#documentation/graphicsimaging/reference/CoreImageFilterReference/Reference/reference.html
Core Image sample code from this year's WWDC session 511 "Core Image Techniques". It's called "Attempt3". This sample demonstrates best practices for applying CIFilter's to a live video taken by the iPhone/iPad camera. You may download the session video from the following page: https://developer.apple.com/videos/wwdc/2012/.
Just wanted to pass this information along.
I hope this falls within the "programming question" category.
Im all lightheaded from Googling (and reading every post in here on the subject) on the subject "Computer Vision", but Im getting more confused than enlightened.
I have 6 abstract shapes printed on a piece of paper and I would like to have the camera on the iPhone identify these shapes (from different angles, lightning etc.).
I have used OpenCV a while back(Java) and I looked at other libraries out there. The caveat is that it seems that either they rely on a jail broken iPhone or they are so experimental and hard to use that I would probably end up using days learning libraries only to figure out they didn't work.
I have thought of taking +1000 images of my shapes and training a Haar filter. But again
if there is anything out there that is a bit easier to work with I would really appreciate the advise, suggestion of people with a bit of experience.
Thank you for any suggestion or pieces of advise you might have:)
Have a look at at OpenCV's SURF feature extraction (they also have a demo which uses it to detect objects).
Surf features are salient image features which are invariant to rotation and scale. Many algorithms detect objects by extracting such features from an image, and then use simple "bag of words" classification (comparing the set of extracted image features to the features of your "shapes". Even without referring to their spacial alignment you can have good detection rates if you only have 6 shapes).
While not a library, Chris Greening explains how iPhone Sudoku Grab does its image recognition of puzzles in his post here. He does seem to recommend OpenCV, and not just for jailbroken devices.
Also Glen Low talks a bit about how Instaviz does its shape recognition in an interview for the Mobile Orchard podcast.
I do shape recognition in my iPhone app Instaviz and the routines are actually packaged into a library I call "Recog". Only problem is that it is meant for finger or mouse gesture recognition rather than image recognition. You pass the routines a set of points representing the gesture and it tells you whether it's a square, circle etc.
I haven't yet decided on a licensing model but probably use a minimal per-seat royalty.