Encog Neural Network error not changing - neural-network

I am creating a neural network that works on colored images. but when i train it, the error will never change after some point. even after a thousand iterations. what causes it? or what should i do? here is the structure:
BasicNetwork network = new BasicNetwork();
network.addLayer(new BasicLayer(null,true,16875));
network.addLayer(new BasicLayer(new ActivationSigmoid(),true,(50)));
network.addLayer(new BasicLayer(new ActivationSigmoid(),true,setUniqueNumbers.size()));
network.getStructure().finalizeStructure();
network.reset();
the input layer is actually 75 * 75 (75x75 pixels) *3 (red, green, blue), so i came up with 16875.

When the error stops changing, you have hit a minimum, probably a local minimum.
This means that it has found what it thinks is the best solution so far, and to move away from that point will cause more error (even if it has to go up hill a bit before it can go down hill to an even lower error rate). This happens early when there isn't a strong pattern/correlation in the data. It can also happen when the structure is out of wack.
That looks like it may definitely be one of your problems. ~17,000 input neurons is a ton. And then 50 hidden neurons doesn't seem to match up well with so many inputs. Instead of feeding in so much data, find a way to extract features to reduce the input size and make it more meaningful to the network.
Examples that may help it run better:
Down sample the image from 75*75 to, say, 10*10. This depends on what the picture is.
Convert from RGB to gray scale
Learn about feature extraction - if you can extract features from the image, such as lines, edges, etc., the net will have a lot more to learn from. Seriously, feature extraction is your golden ticket to making ANNs work.
Good luck!

Related

Implementating spell drawing/casting mechanism in Luau (Roblox)

I am coding a spell-casting system where you draw a symbol with your wand (mouse), and it can recognize said symbol.
There are two methods I believe might work; neural networking and an "invisible grid system"
The problem with the neural networking system is that It would be (likely) suboptimal in Roblox Luau, and not be able to match the performance nor speed I wish for. (Although, I may just be lacking in neural networking knowledge. Please let me know whether I should continue to try implementing it this way)
For the invisible grid system, I thought of converting the drawing into 1s and 0s (1 = drawn, 0 = blank), then seeing if it is similar to one of the symbols. I create the symbols by making a dictionary like:
local Symbol = { -- "Answer Key" shape, looks like a tilted square
00100,
01010,
10001,
01010,
00100,
}
The problem is that user error will likely cause it to be inaccurate, like this "spell"'s blue boxes, showing user error/inaccuracy. I'm also sure that if I have multiple Symbols, comparing every value in every symbol will surely not be quick.
Do you know an algorithm that could help me do this? Or just some alternative way of doing this I am missing? Thank you for reading my post.
I'm sorry if the format on this is incorrect, this is my first stack-overflow post. I will gladly delete this post if it doesn't abide to one of the rules. ( Let me know if there are any tags I should add )
One possible approach to solving this problem is to use a template matching algorithm. In this approach, you would create a "template" for each symbol that you want to recognize, which would be a grid of 1s and 0s similar to what you described in your question. Then, when the user draws a symbol, you would convert their drawing into a grid of 1s and 0s in the same way.
Next, you would compare the user's drawing to each of the templates using a similarity metric, such as the sum of absolute differences (SAD) or normalized cross-correlation (NCC). The template with the lowest SAD or highest NCC value would be considered the "best match" for the user's drawing, and therefore the recognized symbol.
There are a few advantages to using this approach:
It is relatively simple to implement, compared to a neural network.
It is fast, since you only need to compare the user's drawing to a small number of templates.
It can tolerate some user error, since the templates can be designed to be tolerant of slight variations in the user's drawing.
There are also some potential disadvantages to consider:
It may not be as accurate as a neural network, especially for complex or highly variable symbols.
The templates must be carefully designed to be representative of the expected variations in the user's drawings, which can be time-consuming.
Overall, whether this approach is suitable for your use case will depend on the specific requirements of your spell-casting system, including the number and complexity of the symbols you want to recognize, the accuracy and speed you need, and the resources (e.g. time, compute power) that are available to you.

What to do with not enough training data?

I have a problem that I don't have enough training data for my NN. It is trying to predict the result of a soccer game given the last games which I woulf say is a regression task.
The training data are results of soccer games of the last 15 seasons (which are about 4500 games). Getting to new data would be hard and would take a lot of time.
What should I do now?
Is it good to duplicate the data?
Should I input randomized data? (Maybe noise but I'm not quite sure what that is)
If there is no way of creating more data,
I should probably turn up the learning rate right? (I have it sitting at 0.01 and the momentum at 0.9)
I am using mini batches consisting of 32 training datas in training. Since I don't have a lot of training I don't have a lot of mini batches. Should I stop using them?
To start from the beginning: This is a very theoretical question and is not directly related to programming, which I recommend (in future) to post over at the Data Science Stackexchange.
To go into your problem: 4500 samples is not as bad as it sounds, depending on the exact task at hand. Are you trying to predict the match results (i.e. which team is the winner?), are you looking for more specific predictions (across a lot of different, specific teams)?
If you can make sure that you have a reasonable amount of data per class, one can work with a number of samples lower than what you have. Simply duplicating the data will not help you much, since you are very likely to just overfit on the samples you are seeing, without much of an improvement; Or rather, you will get the same results as training over a longer period (since essentially you see every sample twice per epoch, instead of one).
Again, what usually happens after long training periods is overfitting, so nothing gained here.
Your second suggestion is generally called data augmentation. Instead of simply copying samples, you alter them enough to make it look "different" to the network. But be careful! Data augmentation works well for some inputs, like images, since the change in input is significant enough to not represent the same sample, but still contains meaningful information about the class (a horizontally mirrored image of a cat still shows a "valid cat", unlike a vertically mirrored image, which is more unrealistic in the real world).
Essentially, it depends on your input features to determine where it makes sense to add noise. If you are only changing the results of the previous game, a minor change in input (adding/subtracting one goal at random) can significantly change the prediction you make.
If you slightly scramble ELO scores by a random number, on the other hand, the input value will not be too different, "but different enough" to use it as a novel example.
Turning up the learning rate is not a good idea, since you are essentially just letting the network converge more towards the specific samples. On the contrary, I would argue that the current learning rate is still too high, and you should certainly not increase it.
Regarding mini batches, I think I have referenced this a million times now, but always consider smaller minibatches. From a theoretical point of view, you are more likely to converge to a local minimum.

Neural Network Underfitting with Dogs and Cats

Without necessarily getting into the code of it, but focusing more on the principles, I have a question about what I assume would be underfitting.
If I am training a network that recognizes true or false as to whether an image is of a dog, and I have maybe 40,000 images, where all dog images are labeled as 1, and all other images are labeled as 0 - what can I do to assure accuracy so that, if only maybe 5,000 of those images are dogs, the network does not act “lazily” from its training, and also label dogs as closer to 0 than 1?
For example, the main purpose of this question is to be able to recognize with high accuracy if an image really is of a dog, without really caring too much about the other images, other than the fact that they are not of dogs. Also, I would like to be able to retain the probability that the guess is correct, because this is highly important for my purposes.
The only two things I was able to come up with were to:
Have more nodes in the network, or
Have half of the images be of dogs (so use 10,000 images where 5,000 of them are dogs).
But I think this 2nd option might give dogs a disproportionately large chance of being the output of the testing data, which would destroy the accuracy and the whole purpose of this network.
I am sure this has been addressed before, so even a point in the right direction would be highly appreciated!
So you have a binary classification task where both classes appear with different frequency in your dataset. About 1/8 is "dog" and 7/8 is "no dog".
In order to avoid biased learning towards one or the other class, it is important that you stratify your training, validation and test data so that these fractions are kept across every subset.
You say that you want to "retain the probability" that the guess is correct - I assume you mean you want to evaluate the "dogness"-probability as output variable. That's a simple softmax output layer with two outputs: 1st is "dog", 2nd "not dog". It's the typical way to address classification problems, regardless of the number of classes you need to distinguish.

Image based steganography that survives resizing?

I am using a startech capture card for capturing video from the source machine..I have encoded that video using matlab so every frame of that video will contain that marker...I run that video on the source computer(HDMI out) connected via HDMI to my computer(HDMI IN) once i capture the frame as bitmap(1920*1080) i re-size it to 1280*720 i send it for processing , the processing code checks every pixel for that marker.
The issue is my capture card is able to capture only at 1920*1080 where as the video is of 1280*720. Hence in order to retain the marker I am down scaling the frame captured to 1280*720 which in turn alters the entire pixel array I believe and hence I am not able to retain marker I fed in to the video.
In that capturing process the image is going through up-scaling which in turn changes the pixel values.
I am going through few research papers on Steganography but it hasn't helped so far. Is there any technique that could survive image resizing and I could retain pixel values.
Any suggestions or pointers will be really appreciated.
My advice is to start with searching for an alternative software that doesn't rescale, compress or otherwise modify any extracted frames before handing them to your control. It may save you many headaches and days worth of time. If you insist on implementing, or are forced to implement a steganography algorithm that survives resizing, keep on reading.
I can't provide a specific solution because there are many ways this can be (possibly) achieved and they are complex. However, I'll describe the ingredients a solution will most likely involve and your limitations with such an approach.
Resizing a cover image is considered an attack as an attempt to destroy the secret. Other such examples include lossy compression, noise, cropping, rotation and smoothing. Robust steganography is the medicine for that, but it isn't all powerful; it may be able to provide resistance to only specific types attacks and/or only small scale attacks at that. You need to find or design an algorithm that suits your needs.
For example, let's take a simple pixel lsb substitution algorithm. It modifies the lsb of a pixel to be the same as the bit you want to embed. Now consider an attack where someone randomly applies a pixel change of -1 25% of the time, 0 50% of the time and +1 25% of the time. Effectively, half of the time it will flip your embedded bit, but you don't know which ones are affected. This makes extraction impossible. However, you can alter your embedding algorithm to be resistant against this type of attack. You know the absolute value of the maximum change is 1. If you embed your secret bit, s, in the 3rd lsb, along with setting the last 2 lsbs to 01, you guarantee to survive the attack. More specifically, you get xxxxxs01 in binary for 8 bits.
Let's examine what we have sacrificed in order to survive such an attack. Assuming our embedding bit and the lsbs that can be modified all have uniform probabilities, the probability of changing the original pixel value with the simple algorithm is
change | probability
-------+------------
0 | 1/2
1 | 1/2
and with the more robust algorithm
change | probability
-------+------------
0 | 1/8
1 | 1/4
2 | 3/16
3 | 1/8
4 | 1/8
5 | 1/8
6 | 1/16
That's going to affect our PSNR quite a bit if we embed a lot of information. But we can do a bit better than that if we employ the optimal pixel adjustment method. This algorithm minimises the Euclidean distance between the original value and the modified one. In simpler terms, it minimises the absolute difference. For example, assume you have a pixel with binary value xxxx0111 and you want to embed a 0. This means you have to make the last 3 lsbs 001. With a naive substitution, you get xxxx0001, which has a distance of 6 from the original value. But xxx1001 has only 2.
Now, let's assume that the attack can induce a change of 0 33.3% of the time, 1 33.3% of the time and 2 33.3%. Of that last 33.3%, half the time it will be -2 and the other half it will be +2. The algorithm we described above can actually survive a +2 modification, but not a -2. So 16.6% of the time our embedded bit will be flipped. But now we introduce error correcting codes. If we apply such a code that has the potential to correct on average 1 error every 6 bits, we are capable of successfully extracting our secret despite the attack altering it.
Error correction generally works by adding some sort of redundancy. So even if part of our bit stream is destroyed, we can refer to that redundancy to retrieve the original information. Naturally, the more redundancy you add, the better the error correction rate, but you may have to double the redundancy just to improve the correction rate by a few percent (just arbitrary numbers here).
Let's appreciate here how much information you can hide in a 1280x720 (grayscale) image. 1 bit per pixel, for 8 bits per letter, for ~5 letters per word and you can hide 20k words. That's a respectable portion of an average novel. It's enough to hide your stellar Masters dissertation, which you even published, in your graduation photo. But with a 4 bit redundancy per 1 bit of actual information, you're only looking at hiding that boring essay you wrote once, which didn't even get the best mark in the class.
There are other ways you can embed your information. For example, specific methods in the frequency domain can be more resistant to pixel modifications. The downside of such methods are an increased complexity in coding the algorithm and reduced hiding capacity. That's because some frequency coefficients are resistant to changes but make embedding modifications easily detectable, then there are those that are fragile to changes but they are hard to detect and some lie in the middle of all of this. So you compromise and use only a fraction of the available coefficients. Popular frequency transforms used in steganography are the Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT).
In summary, if you want a robust algorithm, the consistent themes that emerge are sacrificing capacity and applying stronger distortions to your cover medium. There have been quite a few studies done on robust steganography for watermarks. That's because you want your watermark to survive any attacks so you can prove ownership of the content and watermarks tend to be very small, e.g. a 64x64 binary image icon (that's only 4096 bits). Even then, some algorithms are robust enough to recover the watermark almost intact, say 70-90%, so that it's still comparable to the original watermark. In some case, this is considered good enough. You'd require an even more robust algorithm (bigger sacrifices) if you want a lossless retrieval of your secret data 100% of the time.
If you want such an algorithm, you want to comb the literature for one and test any possible candidates to see if they meet your needs. But don't expect anything that takes only 15 lines to code and 10 minutes of reading to understand. Here is a paper that looks like a good start: Mali et al. (2012). Robust and secured image-adaptive data hiding. Digital Signal Processing, 22(2), 314-323. Unfortunately, the paper is not open domain and you will either need a subscription, or academic access in order to read it. But then again, that's true for most of the papers out there. You said you've read some papers already and in previous questions you've stated you're working on a college project, so access for you may be likely.
For this specific paper, table 4 shows the results of resisting a resizing attack and section 4.4 discusses the results. They don't explicitly state 100% recovery, but only a faithful reproduction. Also notice that the attacks have been of the scale 5-20% resizing and that only allows for a few thousand embedding bits. Finally, the resizing method (nearest neighbour, cubic, etc) matters a lot in surviving the attack.
I have designed and implemented ChromaShift: https://www.facebook.com/ChromaShift/
If done right, steganography can resiliently (i.e. robustly) encode identifying information (e.g. downloader user id) in the image medium while keeping it essentially perceptually unmodified. Compared to watermarks, steganography is a subtler yet more powerful way of encoding information in images.
The information is dynamically multiplexed into the Cb Cr fabric of the JPEG by chroma-shifting pixels to a configurable small bump value. As the human eye is more sensitive to luminance changes than to chrominance changes, chroma-shifting is virtually imperceptible while providing a way to encode arbitrary information in the image. The ChromaShift engine does both watermarking and pure steganography. Both DRM subsystems are configurable via a rich set of of options.
The solution is developed in C, for the Linux platform, and uses SWIG to compile into a PHP loadable module. It can therefore be accessed by PHP scripts while providing the speed of a natively compiled program.

trainning neural network

I have a picture.1200*1175 pixel.I want to train a net(mlp or hopfield) to learn a specific part of it(201*111pixel) to save its weight to use in a new net(with the same previous feature)only without train it to find that specific part.now there are this questions :what kind of nets is useful;mlp or hopfield,if mlp;the number of hidden layers;the trainlm function is unuseful because "out of memory" error.I convert the picture to a binary image,is it useful?
What exactly do you need the solution to do? Find an object with an image (like "Where's Waldo"?). Will the target object always be the same size and orientation? Might it look different because of lighting changes?
If you just need to find a fixed pattern of pixels within a larger image, I suggest using a straightforward correlation measure, such as crosscorrelation to find it efficiently.
If you need to contend with any of the issues mentioned above, then there are two basic solutions: 1. Build a model using examples of the object in different poses, scalings, etc. so that the model will recognize any of them, or 2. Develop a way to normalize the patch of pixels being examined, to minimize the effect of those distortions (like Hu's invariant moments). If nothing else, yuo'll want to perform some sort of data reduction to get the number of inputs down. Technically, you could also try a model which is invariant to rotations, etc., but I don't know how well those work. I suspect that they are more tempermental than traditional approaches.
I found AdaBoost to be helpful in picking out only important bits of an image. That, and resizing the image to something very tiny (like 40x30) using a Gaussian filter will speed it up and put weight on more of an area of the photo rather than on a tiny insignificant pixel.