I am creating a flutter app to detect hand gestures and convert them in Text for deaf and dumb people - flutter

Is there anybody who can provide me a road map on how to start the project, from where I can get the data set, and how to convert hand gestures to text? or is there any SDK or API, it will be great if you can provide me git link

Research on hand gesture recognition: Before starting any project, it's important to research the topic you are interested in. Learn about the types of hand gestures commonly used by deaf and dumb people, and the techniques used for hand gesture recognition. You can start with some introductory material on machine learning-based computer vision, image processing, and object recognition, and then dive into more specific literature about hand gesture recognition techniques.
Collect dataset: To train your hand gesture recognition model, you'll need a dataset of labeled images of hands making different gestures. There are a few options for this. You can collect images yourself, but this can be time-consuming and require a lot of effort. Alternatively, you can look for existing datasets, such as the American Sign Language Hand Gesture dataset, which has over 8000 images of hand gestures, or the ChaLearn Gesture Recognition dataset, which has over 20,000 videos of hand gestures.
Preprocess the data: Once you have the dataset, you'll need to preprocess the images to prepare them for training. This can include resizing the images to a uniform size, converting them to grayscale, and normalizing the pixel values.
Train the model: With the preprocessed dataset, you can train a machine learning model to recognize hand gestures. There are several machine learning algorithms that can be used for this, including convolutional neural networks (CNNs) and support vector machines (SVMs).
Convert hand gestures to text: Once the model has been trained, you can use it to recognize hand gestures in real-time and convert them to text. One approach is to use a sign language recognition API, such as Sign Language Interpreter API by Microsoft Azure or Sign Language Detection API by IBM Watson. Alternatively, you can write your own code to convert the output of your model into text using techniques such as sequence labeling.
Build a user interface: Finally, you can build a user interface for your app that captures the hand gestures in real-time using the device's camera and displays the converted text. You can use Flutter's camera plugin to access the device's camera and display the captured video stream in your app.
As for resources and links, here are some to get you started:
American Sign Language Hand Gesture dataset: https://www.kaggle.com/ahmedkhanak1995/asl-gesture-images-dataset
ChaLearn Gesture Recognition dataset: http://chalearnlap.cvc.uab.es/dataset/24/description/
TensorFlow Lite: https://www.tensorflow.org/lite
Sign Language Interpreter API by Microsoft Azure: https://azure.microsoft.com/en-us/services/cognitive-services/sign-language-interpreter/
Sign Language Detection API by IBM Watson: https://www.ibm.com/cloud/watson-studio/autoai/sign-language-detector
Flutter camera plugin: https://pub.dev/packages/camera

Related

Face Recognition by using IBM Watson Visual Recognition

I am currently evaluating capabilities of IBM Watson Visual Recognition service to recognize faces. So that System should identify the each person that we have trained. Individuals may come with different clothes, and other possible variations. But system should identify each individual by looking at each face.
As per IBM, IBM visual recognition do not support face recognition but only face detection.
Face Recognition: Visual Recognition is capable of face detection
(detecting the presence of faces) not face recognition (identifying
individuals).
Can we use the custom classifiers by adding different types of images for each individuals?
What is the significant pre/post-work from the developer to get at least 90% accuracy ?
Matt Hill posted a great reply to this similar question on dW Answers. Here's what he had to say:
It is possible to train a custom classifier to try to identify people's faces. It might help to use the face detection service as a preprocessor to give you bounding boxes around faces, and use them to crop the images submitted for custom classification. However, the VR custom learning engine is not optimized for face identification, and I would not expect the results to be as accurate as a system that is designed specifically for face recognition.
The issue is that human faces are typically very similar to each other in with respect to the wide set of features that were trained in learning the basis of the system, which needed a very broad exposure to many types of scenes and objects.

crowd tracking with opencv

I am working on a crowd controlled soundsystem for a music festival. Music would be controlled by individuals and the crowd as a whole, more or less 500 people.
While searching for crowd tracking techniques, I stumbled upon this one http://www.mikelrodriguez.com/crowd-analysis/#density; Matlab code and dataset are enclosed. Are you aware of similar techniques, maybe simpler, based eg on blob detection? Do you have an idea about how well this one would perform in a real-time scenario? Is there a known way to do this with eg OpenCV?
One of my former colleagues implemented something similar (controlling a few motors according to crowd movement) using optical flow. You can analyze the frames of video from a camera, calculate optical flow between frames, and use the values to estimate the crowd movement.
OpenCV has support to perform the above tasks, and comes with good code samples. A desktop should be able to do this in real-time (you might have to tweak with image resolution).
I am not exactly sure how to interface between a C++ program and a sound system. Pure Data (PD) is an alternative, but it might not have much support for motion analysis.

OpenCV IOS real-time template matching

I'd like to create an app (on iPhone) which does this:
I have a template image (a logo or any object) and I'd like to find that in camera view and put a layer on the place of where it is found and tracking it!
It is a markless AR with OpenCV!
I read some docs and books and Q&A-s here, but sadly
actually i'd like to create something like this or something like this.
If anyone can send to me some source code or a really useful tutorial (step by step) i'd really be happy!!!
Thank you!
Implementing this is not trivial - it involves Augmented Reality combined with template matching and 3D rendering.
A rough outline:
Use some sort of stable feature extraction to obtain features from the input video stream. (eg. see FAST in OpenCV).
Combine these features and back-project to estimate the camera parameters and pose. (See Camera Calibration for a discussion, but note that these usually require calibration pattern such as a checkerboard.)
Use template matching to scan the image for patches of your target image, then use the features and camera parameters to determine the pose of the object.
Apply the camera and object transforms forward and render the replacement image into the scene.
Implementing all this will require much research and hard work!
There are a few articles on the web you might find useful:
Simple Augmented Reality for OpenCV
A minimal library for Augmented Reality
AR with NyartToolkit
You might like to investigate some of the AR libraries and frameworks available. Wikipedia has a good list:
AR Software
Notable is Qualcomm's toolkit, which is not FLOSS but appears highly capable.

Does a free API for a Augmented reality service exist?

Currently I am trying to create an app for iPhone which is capable of recognizing the objects on an image such as car, bus, building, bridge, human, etc, and label as object name with the help of Internet.
Is there any free service which provide solution to my problem, as object recognition its self a complex algorithm requiring digital image processing, neural networks and all.
Can this can be done via API?
If you want to recognise planar images the current generation of mobile AR SDKs from Metaio, Qualcomm and Layar will allow you to upload images to match against, and perform the matching.
If you want to match freely against a set of 3D objects, e.g. a Toyota Prius or the Empire state, the same techniques might be applied to match against sets of images taken at different rotations, but you might have to choose to match just one object due to limitations on how large an image database you can have with the service, or contact those companies for a custom solution, and it may not work very reliably given the state of the art is to reliably match against planar images.
If you want to recognize general classes (human, car, building), this is a very difficult problem, and I don't know of any solutions anywhere fast enough to operate online (which I assume is a requirement given you want an AR solution - is that a fair assumption?). It's been a few years since I studied CV, but at that time the most promising solution for visual classification was "bag of visual words" approaches - you might try reading up on those.
Take a look at Cortexica. Very useful for this sort of thing.
http://www.cortexica.com/
I haven't done work with mobile AR in a while, but the last time I was working on this stuff I was using Layar and starting to investigate Junaio. Those are oriented toward 3D graphics, not simply text labels, so for your use case you may be better served with OpenCV.
Note that Layar (and I believe Junaio too) works like a web app, where you put the content on your own server and give Layar the URL to link to.

Is there a library that can do raster to vector conversion, for the iPhone?

I am trying to take an image and extract hand written text so that it can be read easily and zoomed in on. I would like to convert the text to vector paths.
I am not aware of any libraries that would make this as painless as possible. Any help is greatly appreciated. Examples are nice too :)
Simple iPhone Image Processing (on Google code) contains all the primitive tools you will need:
Canny edge detection
Histogram equalisation
Skeletonisation
Thresholding, adaptive and global )
Gaussian blur (used as a
preprocessing step for canny edge
detection)
Brightness normalisation
Connected region extraction
Resizing - uses interpolation
The only program I know of for the iPhone that does handwriting recognition is HWPEN. Unfortunately, it's not a library but a full application and (to make matters worse) it requires a Jailbroken phone.
I fear you must either try to get the source for HWPEN or reverse engineer it to obtain the code you need.
Barring that, you may want to write your own. There are several studies on handwriting recognition that may help.