Neural network to figure out the meaning of user input - neural-network

I'm trying to make a neural network try to figure out the meaning of input(keyboard keys in this case) according to the user.
I have multiple possible output "commands" that the NN can interpret the inputs to mean, and at each state certain outputs can count as beneficial while others are a detriment
When the NN starts up for the first time, no input should have any particular meaning to it but as time goes on I want the NN to be able to figure out what the user most likely meant.
I've tried a Multilayer perceptron NN that has as many input nodes as there are physical inputs and as many output nodes as there are commands and a number of nodes equal to the sum of the other two layers as a single hidden layer, in this case it is then a 5,15,10.
The NN assumes that the user will only make moves that are in the NN's best interest.
So far it seems the NN is just figuring out what is the command it can take that will most likely result in a beneficial move, regardless of the input key rather than trying to figure out what key should result in what move according to the user.
Because of this I'm wondering (most likely wrong) if I should produce a separate NN for each input to try and figure out the current output according to the user.
Is there a different type of NN I should look into that will work better, and is there a recommended configuration for this problem?
I'll be happy with some recommendations of reading material that would help in this particular problem.
I'm at best an amateur in NN and would like to learn a lot more about the whole field, But I'm trying to focus my efforts on this problem for now.

Accordng to me you want the output to be according to behaviour of the player as number of inputs are more than in actual case. So according to me there should be some type of memory for the actions taken by the player in order to find the patterns.This can be done using Long Short term Memory.

Related

Implementating spell drawing/casting mechanism in Luau (Roblox)

I am coding a spell-casting system where you draw a symbol with your wand (mouse), and it can recognize said symbol.
There are two methods I believe might work; neural networking and an "invisible grid system"
The problem with the neural networking system is that It would be (likely) suboptimal in Roblox Luau, and not be able to match the performance nor speed I wish for. (Although, I may just be lacking in neural networking knowledge. Please let me know whether I should continue to try implementing it this way)
For the invisible grid system, I thought of converting the drawing into 1s and 0s (1 = drawn, 0 = blank), then seeing if it is similar to one of the symbols. I create the symbols by making a dictionary like:
local Symbol = { -- "Answer Key" shape, looks like a tilted square
00100,
01010,
10001,
01010,
00100,
}
The problem is that user error will likely cause it to be inaccurate, like this "spell"'s blue boxes, showing user error/inaccuracy. I'm also sure that if I have multiple Symbols, comparing every value in every symbol will surely not be quick.
Do you know an algorithm that could help me do this? Or just some alternative way of doing this I am missing? Thank you for reading my post.
I'm sorry if the format on this is incorrect, this is my first stack-overflow post. I will gladly delete this post if it doesn't abide to one of the rules. ( Let me know if there are any tags I should add )
One possible approach to solving this problem is to use a template matching algorithm. In this approach, you would create a "template" for each symbol that you want to recognize, which would be a grid of 1s and 0s similar to what you described in your question. Then, when the user draws a symbol, you would convert their drawing into a grid of 1s and 0s in the same way.
Next, you would compare the user's drawing to each of the templates using a similarity metric, such as the sum of absolute differences (SAD) or normalized cross-correlation (NCC). The template with the lowest SAD or highest NCC value would be considered the "best match" for the user's drawing, and therefore the recognized symbol.
There are a few advantages to using this approach:
It is relatively simple to implement, compared to a neural network.
It is fast, since you only need to compare the user's drawing to a small number of templates.
It can tolerate some user error, since the templates can be designed to be tolerant of slight variations in the user's drawing.
There are also some potential disadvantages to consider:
It may not be as accurate as a neural network, especially for complex or highly variable symbols.
The templates must be carefully designed to be representative of the expected variations in the user's drawings, which can be time-consuming.
Overall, whether this approach is suitable for your use case will depend on the specific requirements of your spell-casting system, including the number and complexity of the symbols you want to recognize, the accuracy and speed you need, and the resources (e.g. time, compute power) that are available to you.

How to build 2-layer fuzzy inference system

I am implementing a fuzzy logic based decision support system that uses nine variables but group each three together to form an output then take these three output to make the final output of the system.
I am using fuzzy logic toolbox in matlab, I made each one of these three outputs but I can't figure out how I can make these outputs as inputs again for the final output.
The system is shown in this picture:
system picture
Correct me if I'm wrong, but I think you would need two FIS separately , where the inputs from the second one will be the outputs from the first one.
Sorry If I didn't help much :/

Neural Network playing Tic Tac Toe doesn't learn

I have a neural network playing tic-tac-toe. (I know there are other better methods for this, but I want to learn about NN)
So the NN plays against a random AI. First, it should learn to make an allowed move, ie. not choosing a field that is already occupied.
It doesn't get very far with this, however.
When NN chooses an illegal move I optimize the weights such that the distance to another, randomly chosen (legal) field is minimized. (There is one output which should have values between 1 and 9).
My problem is: in changing the weights, a formerly optimized outcome is now also changed. So I have this kind of overfitting: Everytime I backpropagade to optimize the weights for one particular situation, the decision for every other situation becomes worse!
I know I should probably have 9 output neurons instead of 1 and should probably not use a random field as the target, as I assume this can mess things up. I am starting to change this.
Still, the issue seems to remain. Obviously. How can I improve the decision in one situation without forgetting every other situation?
One solution I came up with is to "remember" every game played and optimizing simultaneously over all games played.
However, after a while this becomes very demanding on the computation. Also, it seems to go into the direction of a complete enumartion of all possible board situations. This might be possible for Tic Tac Toe but if I move to another game, say Go, this becomes infeasible.
Where is my mistake? How do I generally tackle this problem? Or where could I read about it? Thanks a lot!
To tackle this problem efficiently, you sould consider Reinforcement Learning methods, instead of what you are currently doing. What your are trying to do is to learn the behaviour of an agent playing Tic Tac Toe. The agent gets a high reward when he wins a game, a high penalty when he loses and an even higher penalty when he performs an illegal move. My guess is that using methods such as Q-learning with neural networks will work perfectly, even with very simple neural nets. One useful paper on the topic could be: https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf, or earlier papers on TD-Gammon (I think you can easily find tutorials on the topic using the keywords TD-Gammon, Q-learning, ...).
By the way, a more down-to-earth answer to why your model might not work is that you are seemingly using one single unit to represent categorical outputs: if you want to represent an integer between 1 and N, you should represent it using N output neurons with values between 0 and 1, and pick the neuron with the highest value as your answer. Using a single neuron with value between 1 and 9 creates an unatural assymetry between your outputs, and, for example, when the expected value is 3, your network gets a higher error for outputing a 9 than a 2. This should obviously not be the case: all wrong answers are equally wrong.
Hope this helps,
Best

Newbie to Neural Networks

Just starting to play around with Neural Networks for fun after playing with some basic linear regression. I am an English teacher so don't have a math background and trying to read a book on this stuff is way over my head. I thought this would be a better avenue to get some basic questions answered (even though I suspect there is no easy answer). Just looking for some general guidance put in layman's terms. I am using a trial version of an Excel Add-In called NEURO XL. I apologize if these questions are too "elementary."
My first project is related to predicting a student's Verbal score on the SAT based on a number of test scores, GPA, practice exam scores, etc. as well as some qualitative data (gender: M=1, F=0; took SAT prep class: Y=1, N=0; plays varsity sports: Y=1, N=0).
In total, I have 21 variables that I would like to feed into the network, with the output being the actual score (200-800).
I have 9000 records of data spanning many years/students. Here are my questions:
How many records of the 9000 should I use to train the network?
1a. Should I completely randomize the selection of this training data or be more involved and make sure I include a variety of output scores and a wide range of each of the input variables?
If I split the data into an even number, say 9x1000 (or however many) and created a network for each one, then tested the results of each of these 9 on the other 8 sets to see which had the lowest MSE across the samples, would this be a valid way to "choose" the best network if I wanted to predict the scores for my incoming students (not included in this data at all)?
Since the scores on the tests that I am using as inputs vary in scale (some are on 1-100, and others 1-20 for example), should I normalize all of the inputs to their respective z-scores? When is this recommended vs not recommended?
I am predicting the actual score, but in reality, I'm NOT that concerned about the exact score but more of a range. Would my network be more accurate if I grouped the output scores into buckets and then tried to predict this number instead of the actual score?
E.g.
750-800 = 10
700-740 = 9
etc.
Is there any benefit to doing this or should I just go ahead and try to predict the exact score?
What if ALL I cared about was whether or not the score was above or below 600. Would I then just make the output 0(below 600) or 1(above 600)?
5a. I read somewhere that it's not good to use 0 and 1, but instead 0.1 and 0.9 - why is that?
5b. What about -1(below 600), 0(exactly 600), 1(above 600), would this work?
5c. Would the network always output -1, 0, 1 - or would it output fractions that I would then have to roundup or rounddown to finalize the prediction?
Once I have found the "best" network from Question #3, would I then play around with the different parameters (number of epochs, number of neurons in hidden layer, momentum, learning rate, etc.) to optimize this further?
6a. What about the Activation Function? Will Log-sigmoid do the trick or should I try the other options my software has as well (threshold, hyperbolic tangent, zero-based log-sigmoid).
6b. What is the difference between log-sigmoid and zero-based log-sigmoid?
Thanks!
First a little bit of meta content about the question itself (and not about the answers to your questions).
I have to laugh a little that you say 'I apologize if these questions are too "elementary."' and then proceed to ask the single most thorough and well thought out question I've seen as someone's first post on SO.
I wouldn't be too worried that you'll have people looking down their noses at you for asking this stuff.
This is a pretty big question in terms of the depth and range of knowledge required, especially the statistical knowledge needed and familiarity with Neural Networks.
You may want to try breaking this up into several questions distributed across the different StackExchange sites.
Off the top of my head, some of it definitely belongs on the statistics StackExchange, Cross Validated: https://stats.stackexchange.com/
You might also want to try out https://datascience.stackexchange.com/ , a beta site specifically targeting machine learning and related areas.
That said, there is some of this that I think I can help to answer.
Anything I haven't answered is something I don't feel qualified to help you with.
Question 1
How many records of the 9000 should I use to train the network? 1a. Should I completely randomize the selection of this training data or be more involved and make sure I include a variety of output scores and a wide range of each of the input variables?
Randomizing the selection of training data is probably not a good idea.
Keep in mind that truly random data includes clusters.
A random selection of students could happen to consist solely of those who scored above a 30 on the ACT exams, which could potentially result in a bias in your result.
Likewise, if you only select students whose SAT scores were below 700, the classifier you build won't have any capacity to distinguish between a student expected to score 720 and a student expected to score 780 -- they'll look the same to the classifier because it was trained without the relevant information.
You want to ensure a representative sample of your different inputs and your different outputs.
Because you're dealing with input variables that may be correlated, you shouldn't try to do anything too complex in selecting this data, or you could mistakenly introduce another bias in your inputs.
Namely, you don't want to select a training data set that consists largely of outliers.
I would recommend trying to ensure that your inputs cover all possible values for all of the variables you are observing, and all possible results for the output (the SAT scores), without constraining how these requirements are satisfied.
I'm sure there are algorithms out there designed to do exactly this, but I don't know them myself -- possibly a good question in and of itself for Cross Validated.
Question 3
Since the scores on the tests that I am using as inputs vary in scale (some are on 1-100, and others 1-20 for example), should I normalize all of the inputs to their respective z-scores? When is this recommended vs not recommended?
My understanding is that this is not recommended as the input to a Nerual Network, but I may be wrong.
The convergence of the network should handle this for you.
Every node in the network will assign a weight to its inputs, multiply them by their weights, and sum those products as a core part of its computation.
That means that every node in the network is searching for some coefficients for each of their inputs.
To do this, all inputs will be converted to numeric values -- so conditions like gender will be translated into "0=MALE,1=FEMALE" or something similar.
For example, a node's metric might look like this at a given point in time:
2*ACT_SCORE + 0*GENDER + (-5)*VARISTY_SPORTS ...
The coefficients for each values are exactly what the network is searching for as it converges.
If you change the scale of a value, like ACT_SCORE, you just change the scale of the coefficient that will be found by the reciporical of that scaling factor.
The result should still be the same.
There are other concerns in terms of accuracy (computers have limited capacity to represent small fractions) and speed that may enter this, but not being familiar with NEURO XL, I can't say whether or not they apply for this technology.
Question 4
I am predicting the actual score, but in reality, I'm NOT that concerned about the exact score but more of a range. Would my network be more accurate if I grouped the output scores into buckets and then tried to predict this number instead of the actual score?
This will reduce accuracy, although you should converge to a solution much faster with fewer possible outputs (scores).
Neural Networks actually describe very high-dimensional functions in their input variables.
If you reduce the granularity of that function's output space, you essentially state that you don't care about local minima and maxima in that function, especially around the borders between your output scores.
As a result, you are sacrificing information that may be an essential component of the "true" function that you are searching for.
I hope this has been helpful, but you really should break this question down into its many components and ask them separately on different sites -- potentially some of them do belong here on StackOverflow as well.

Merge sensor data for clustering/neural net usage

I have several datasets i.e. matrices that have a 2 columns, one with a matlab date number and a second one with a double value. Here an example set of one of them
>> S20_EavesN0x2DEAir(1:20,:)
ans =
1.0e+05 *
7.345016409722222 0.000189375000000
7.345016618055555 0.000181875000000
7.345016833333333 0.000177500000000
7.345017041666667 0.000172500000000
7.345017256944445 0.000168750000000
7.345017465277778 0.000166875000000
7.345017680555555 0.000164375000000
7.345017888888889 0.000162500000000
7.345018104166667 0.000161250000000
7.345018312500001 0.000160625000000
7.345018527777778 0.000158750000000
7.345018736111110 0.000160000000000
7.345018951388888 0.000159375000000
7.345019159722222 0.000159375000000
7.345019375000000 0.000160625000000
7.345019583333333 0.000161875000000
7.345019798611111 0.000162500000000
7.345020006944444 0.000161875000000
7.345020222222222 0.000160625000000
7.345020430555556 0.000160000000000
Now that I have those different sensor values, I need to get them together into a matrix, so that I could perform clustering, neural net and so on, the only problem is, that the sensor data was taken with slightly different timings or timestamps and there is nothing I can do about that from a data collection point of view.
My first thought was interpolation to make one sensor data set fit another one, but that seems like a messy approach and I was thinking maybe I am missing something, a toolbox or function that would enable me to do this quicker without me fiddling around. To even complicate things more, the number of sensors grew over time, therefore I am looking at different start dates as well.
Someone a good idea on how to go about this? Thanks
I think your first thought about interpolation was the correct one, at least if you plan to use NNs. Another option would be to use approaches which are designed to deal with missing data, like http://en.wikipedia.org/wiki/Dempster%E2%80%93Shafer_theory for example.
It's hard to give an answer for the clustering part, because I have no idea what you're looking for in the data.
For the neural network, beside interpolating there are at least two other methods that come to mind:
training separate networks for each matrix
feeding them all together to the same network, with a flag specifying which matrix the data is coming from, i.e. something like: input (timestamp, flag_m1, flag_m2, ..., flag_mN) => target (value) where the flag_m* columns are mutually exclusive boolean values - i.e. flag_mK is 1 iff the line comes from matrix K, 0 otherwise.
These are the only things I can safely say with the amount of information you provided.