Recommendation in Video Analysis with Neural Network

Recommendation in Video Analysis with Neural Network - neural-network

I recently took a course at Neural Networks and decided to do research work. What I have considered is designing a network that recognizes the movement of the lips, which is commonly known as lip-reading.
I know the theory about neural networks, I chose to design a Convolutional neural network but I have problems thinking about how to extract the characteristics of the video or sequence of images that will serve as input to the network that I plan to design.
Before focusing on the full investigation, I wanted to be helped a bit by giving me concepts or ideas on how to do it, mainly in the feature extraction part.
What I have thought in general is the following:
A vowel or syllable lasts approximately 1 to 2 seconds in video. From that video I have to extract a sequence of images that show how the lips move. Assuming I selected about 10 or 15 images, I suppose all those images, after being processed, should be my "input" to get the characteristics.
But I have already analyzed a single image, like the classic example of "Recognize a letter" but, as I said before, I suppose I will have a sequence of images to analyze and that confuses me a bit.
I would like to know if I'm on the right track with this idea and if not, I would they to guide me with this. I hope I have been clear with the aforementioned, thank you very much.

This paper should help you decide how to handle the sequence of frames as input to a neural network. Looks like you can concatenate(combine) all of the frames for a particular sound into one image and feed into your net for training and evaluation.
http://cs231n.stanford.edu/reports/2016/pdfs/217_Report.pdf

Related

How would I find the cost of a neural network without knowing the target output like for a game?

For example,
I want to create an AI that plays Ticktacktoe, this is how I would go about it.
I have 9 input nodes which is for each space on the board, 3 nodes for one hidden layer (which I'm guessing would somehow benefit the AI by having it select a row or column with 3 spaces), and then 9 output nodes to see where the AI would put its mark on the entire board.
I'm lost on how I would find the cost of this neural network because I don't know how I would judge its prediction and affect its weights and biases.
If I wanted the AI to play a guessing game, it would make sense since I have the correct answer and I can teach it to be more accurate based on how off it was to the actual answer.
(NOTE: I am very new to neural networks, so there may be a simple answer that I've missed somewhere)

So, I did some digging around and found a good introduction to reinforcement learning. This is the method that is used to train neural networks to achieve a goal without knowing an exact target like which move is good in a certain scenario. Backpropagation is not the only learning method, but so many sources only used this method without letting the viewer know of any other methods which confused me.
Going through this playlist right now: https://www.youtube.com/watch?v=2pWv7GOvuf0&index=1&list=PL7-jPKtc4r78-wCZcQn5IqyuWhBZ8fOxT
Hope this will help someone getting started with neural networks!

neural network for sudoku solver

I recently started learning neural networks, and I thought that creating a sudoku solver would be a nice application for NN. I started learning them with backward propagation neural network, but later I figured that there are tens of neural networks. At this point, I find it hard to learn all of them and then pick an appropriate one for my purpose. Hence, I am asking what would be a good choice for creating this solver. Can back propagation NN work here? If not, can you explain why and tell me which one can work.
Thanks!

Neural networks don't really seem to be the best way to solve sudoku, as others have already pointed out. I think a better (but also not really good/efficient) way would be to use an genetic algorithm. Genetic algorithms don't directly relate to NNs but its very useful to know how they work.
Better (with better i mean more likely to be sussessful and probably better for you to learn something new) ideas would include:
If you use a library:
Play around with the networks, try to train them to different datasets, maybe random numbers and see what you get and how you have to tune the parameters to get better results.
Try to write an image generator. I wrote a few of them and they are stil my favourite projects, with one of them i used backprop to teach a NN what x/y coordinate of the image has which color, and the other aproach combines random generated images with ine another (GAN/NEAT).
Try to use create a movie (series of images) of the network learning to create a picture. It will show you very well how backprop works and what parameter tuning does to the results and how it changes how the network gets to the result.
If you are not using a library:
Try to solve easy problems, one after the other. Use backprop or a genetic algorithm for training (whatever you have implemented).
Try to improove your implementation and change some things that nobody else cares about and see how it changes the results.
List of 'tasks' for your Network:
XOR (basically the hello world of NN)
Pole balancing problem
Simple games like pong
More complex games like flappy bird, agar.io etc.
Choose more problems that you find interesting, maybe you are into image recognition, maybe text, audio, who knows. Think of something you can/would like to be able to do and find a way to make you computer do it for you.
It's not advisable to only use your own NN implemetation, since it will probably not work properly the first few times and you'll get frustratet. Experiment with librarys and your own implementation.
Good way to find almost endless resources:
Use google search and add 'filetype:pdf' in the end in order to only show pdf files. Search for neural network, genetic algorithm, evolutional neural network.

Neither neural nets not GAs are close to ideal solutions for Sudoku. I would advise to look into Constraint Programming (eg. the Choco or Gecode solver). See https://gist.github.com/marioosh/9188179 for example. Should solve any 9x9 sudoku in a matter of milliseconds (the daily Sudokus of "Le monde" journal are created using this type of technology BTW).
There is also a famous "Dancing links" algorithm for this problem by Knuth that works very well https://en.wikipedia.org/wiki/Dancing_Links

Just like was mentioned in the comments, you probably want to take a look at convolutional networks. You basically input the sudoku bord as an two dimensional 'image'. I think using a receptive field of 3x3 would be quite interesting, and I don't really think you need more than one filter.
The harder thing is normalization: the numbers 1-9 don't have an underlying relation in sudoku, you could easily replace them by A-I for example. So they are categories, not numbers. However, one-hot encoding every output would mean a lot of inputs, so i'd stick to numerical normalization (1=0.1, 2 = 0.2, etc.)
The output of your network should be a softmax with of some kind: if you don't use softmax, and instead outupt just an x and y coordinate, then you can't assure that the outputedd square has not been filled in yet.
A numerical value should be passed along with the output, to show what number the network wants to fill in.

As PLEXATIC mentionned, neural-nets aren't really well suited for these kind of task. Genetic algorithm sounds good indeed.
However, if you still want to stick with neural-nets you could have a look at https://github.com/Kyubyong/sudoku. As answered Thomas W, 3x3 looks nice.
If you don't want to deal with CNN, you could find some answers here as well. https://www.kaggle.com/dithyrambe/neural-nets-as-sudoku-solvers

Convolutional Neural Network for time-dependent features

I need to do dimensionality reduction from a series of images. More specifically, each image is a snapshot of a ball moving and the optimal features would be its position and velocity. As far as I know, CNN are the state-of-the-art for reducing the features for image classification, but in that case only a single frame is provided. Is it possible to extract also time-dependent features given many images at different time steps? Otherwise which is the state-of-the-art techniques for doing so?
It's the first time I use CNN and I would also appreciate any reference or any other suggestion.

If you want to be able to have the network somehow recognize a progression which is time dependent, you should probably look into recurrent neural nets (RNN). Since you would be operating on video, you should look into recurrent convolutional neural nets (RCNN) such as in: http://jmlr.org/proceedings/papers/v32/pinheiro14.pdf
Recurrence adds some memory of a previous state of the input data. See this good explanation by Karpathy: http://karpathy.github.io/2015/05/21/rnn-effectiveness/
In your case you need to have the recurrence across multiple images instead of just within one image. It would seem like the first problem you need to solve is the image segmentation problem (being able to pick the ball out of the rest of the image) and the first paper linked above deals with segmentation. (then again, maybe you're trying to take advantage of the movement in order to identify the moving object?)
Here's another thought: perhaps you could only look at differences between sequential frames and use that as your input data to your convnet? The input "image" would then show where the moving object was in the previous frame and where it is in the current one. Larger differences would indicate larger amounts of movement. That would probably have a similar effect to using a recurrent network.

What types of problems can neural networks solve? (excluding optical character recognition)

Neural networks seem really cool but what types of problems can they solve?

They are good at regognizing pattern, not only characters. So they can tell you is a picture showing a man or a woman or is this picture showing person xy. Note that they never give an exact answer but only probabilities. Also signal patterns like audio signals. For example does sound x sound like sound y and so on.
They can not tell you what is the answer is formula xy, because small changes in the formula can make a complete different answer.

Neural networks can deal with a large number of different problems. For example, in our case, we have used them to successfully reproduce stresses, forces and eigenvalues in loaded parts (for example in finite elements analysis problems). There is a whole world out there in engineering and computer science where they can be applied successfully.

i'd say any problem where an ai needs to learn from previous experiences. e.g. pandoras box or last.fm where the application learns what music the listener wants to listen to by having the listener vote suggestions up or down.

Training for pattern recognition (neural network)

How do you train Neural Network for pattern recognition? For example a face recognition in a picture how would you define the output neurons? (eg. how to detect where is the face exactly, rather than just saying that there is a face in camera). Also, how about detecting multiple faces and different size of faces?
If anyone could give me a pointer it would be really great
Cheers!

Generally speaking I would split the problem into multiple stages e.g.
1 - Is there a face in the picture?
2 - Where is the face in the picture?
3 - Is the face in the picture one that the NN (Neural network) recognises?
In each instance I would suggest you build a separate NN and train it to answer the questions posed.
As for the structure of the NN, that's a bit trickier to answer as it depends on your input data and desired output. For example if you had a 100x100 px image then I suppose its feasible to have 10,000 inputs. You might want to consider doing some preprocessing before hand to say detect ovals that way you could look and see if there are a number of ovals in a predictable outline (1 for the face, 2 for the eyes, and one for the mouth possibly). If you are preprocessing the data then you might have inputs for each oval.
Now for the output... for question one you could just have one output to say how sure the NN is that there is a face in the input data i.e a valuer of 0.0 (defiantly no face) --> 1.0 (defiantly a face). This way you can move onto stages 2 and 3.
I might say at this point that this is a non-trivial problem and you might be better to have a look at some of the frameworks available e.g. OpenCV
Now for the training part, you need to have a stockpile of images available to train the NN. There are a number of ways in which you could train the NN. One potential solution is to use a technique called back propagation 1, 2. In general terms, you use the NN on an image and compare it to a predetermined output. If its wrong tweak the NN to produce the desired output and repeat.
If you want a good book on AI, then I would highly recommend Artificial Intelligence: A Modern Approach by Russell and Norvig. Im sure that there are more appropriate Computer Vision textbooks, but the Russell & Norvig book is an excellent starter.

Dear GantengX, you should prepare your self to the fact that the answer is so large, complex and hard to understand. There is so many approaches to pattern and face recognition. And implementing real-life face recognition system is a huge array of work that one person can never handle. Prepare your self for at least 10 years of life behind books on mathematic and artificial intelligence, I'm not talking about hiring 5 highly payed developers in the end who will understand what you want them to do. And maybe you will end up having your own face recognition system. There are also dozen of other issues that will jump out during the process. So be ready for a life full of stresses and problems.

I'm sorry for telling obvious things, but your question was not specific, complete answer would touch many different scientific spheres and will result as a book with over 1k pages.
Regarding your question (the short answer).
There are several principal parts that each face recognition app consists of:
Artificial intelligence algorithm
Optimization algorithm (for AI optimization)
Different filtration algorithms
Effective data set development
Items 1. and 2. are the central part of each system, they do the actual work. Any other preprocessing just makes the input data less complex, making it easier to do a decision for your AI. Don't start 3. and 4. until you will have your first results.
P.S.
Using existing solutions is more cost-effective, but if you are studying things then don't loose time like I did, and start your dissertation right away.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse