trainning neural network - neural-network

I have a picture.1200*1175 pixel.I want to train a net(mlp or hopfield) to learn a specific part of it(201*111pixel) to save its weight to use in a new net(with the same previous feature)only without train it to find that specific part.now there are this questions :what kind of nets is useful;mlp or hopfield,if mlp;the number of hidden layers;the trainlm function is unuseful because "out of memory" error.I convert the picture to a binary image,is it useful?

What exactly do you need the solution to do? Find an object with an image (like "Where's Waldo"?). Will the target object always be the same size and orientation? Might it look different because of lighting changes?
If you just need to find a fixed pattern of pixels within a larger image, I suggest using a straightforward correlation measure, such as crosscorrelation to find it efficiently.
If you need to contend with any of the issues mentioned above, then there are two basic solutions: 1. Build a model using examples of the object in different poses, scalings, etc. so that the model will recognize any of them, or 2. Develop a way to normalize the patch of pixels being examined, to minimize the effect of those distortions (like Hu's invariant moments). If nothing else, yuo'll want to perform some sort of data reduction to get the number of inputs down. Technically, you could also try a model which is invariant to rotations, etc., but I don't know how well those work. I suspect that they are more tempermental than traditional approaches.

I found AdaBoost to be helpful in picking out only important bits of an image. That, and resizing the image to something very tiny (like 40x30) using a Gaussian filter will speed it up and put weight on more of an area of the photo rather than on a tiny insignificant pixel.

Related

Implementating spell drawing/casting mechanism in Luau (Roblox)

I am coding a spell-casting system where you draw a symbol with your wand (mouse), and it can recognize said symbol.
There are two methods I believe might work; neural networking and an "invisible grid system"
The problem with the neural networking system is that It would be (likely) suboptimal in Roblox Luau, and not be able to match the performance nor speed I wish for. (Although, I may just be lacking in neural networking knowledge. Please let me know whether I should continue to try implementing it this way)
For the invisible grid system, I thought of converting the drawing into 1s and 0s (1 = drawn, 0 = blank), then seeing if it is similar to one of the symbols. I create the symbols by making a dictionary like:
local Symbol = { -- "Answer Key" shape, looks like a tilted square
00100,
01010,
10001,
01010,
00100,
}
The problem is that user error will likely cause it to be inaccurate, like this "spell"'s blue boxes, showing user error/inaccuracy. I'm also sure that if I have multiple Symbols, comparing every value in every symbol will surely not be quick.
Do you know an algorithm that could help me do this? Or just some alternative way of doing this I am missing? Thank you for reading my post.
I'm sorry if the format on this is incorrect, this is my first stack-overflow post. I will gladly delete this post if it doesn't abide to one of the rules. ( Let me know if there are any tags I should add )
One possible approach to solving this problem is to use a template matching algorithm. In this approach, you would create a "template" for each symbol that you want to recognize, which would be a grid of 1s and 0s similar to what you described in your question. Then, when the user draws a symbol, you would convert their drawing into a grid of 1s and 0s in the same way.
Next, you would compare the user's drawing to each of the templates using a similarity metric, such as the sum of absolute differences (SAD) or normalized cross-correlation (NCC). The template with the lowest SAD or highest NCC value would be considered the "best match" for the user's drawing, and therefore the recognized symbol.
There are a few advantages to using this approach:
It is relatively simple to implement, compared to a neural network.
It is fast, since you only need to compare the user's drawing to a small number of templates.
It can tolerate some user error, since the templates can be designed to be tolerant of slight variations in the user's drawing.
There are also some potential disadvantages to consider:
It may not be as accurate as a neural network, especially for complex or highly variable symbols.
The templates must be carefully designed to be representative of the expected variations in the user's drawings, which can be time-consuming.
Overall, whether this approach is suitable for your use case will depend on the specific requirements of your spell-casting system, including the number and complexity of the symbols you want to recognize, the accuracy and speed you need, and the resources (e.g. time, compute power) that are available to you.

Using ImageMagick to get a fuzzy hash of an image

I have a situation where I have many images, and I compare them using a specific fuzz factor (say 10%), looking for images that match. Works fine.
However, I sometimes have a situation where I want to compare all images to all other images (for e.g. 1000 images). Doing 5000+ ImageMagick compares is way too slow.
Hashing all the files and comparing the hashes 5000 times is lightning fast, but of course only works when the images are identical (no fuzz factor).
I'm wondering if there is some way to produce an ID or fingerprint - or maybe a range of IDs - where I could very quickly determine what images are close enough to each other, and then pay the ImageMagick compare cost only for those likely matches. Ideas or names of existing algorithms/approaches are very welcome.
There are quite a few imaging hashing algorithms out there. pHash is the one that springs to the top of my mind. http://www.phash.org/. That one works with basic transformations that one might want to do on an image. If you want to be more sophisticated and roll your own, you can use a pre-trained image classifier like image net (https://www.learnopencv.com/keras-tutorial-using-pre-trained-imagenet-models/), lop off the final layer, and use the penultimate layer as a vector. For small # of images, you can easily do a nearest neighbor. If you have more, you cam use annoy (https://github.com/spotify/annoy) to make the nearest neighbor search a bit more efficient

Template matching not necessarily invariant to rotation and scale, but should detect artifacts (of 3X3 pixels or more)

I want to compare a specific pattern (as on template image) and output "yes"/"no" according to the match. I don't require the method to be scale invariant. It just has to be translation invariant and rotation invariant (only till +/-2 degrees maximum)
Also, even if there's a slight mismatch between the template image and runtime image, the output should be "no".
So far, here are a couple of codes I have tried:
Template Matching by Alaa Eleyan. This detects even if the pattern has noise, which I don't want.
Simple template match. in matlab. This outputs a score of the match. The variation of scores of images with and without noise didn't seem to vary much.
Fast/Robust Template Matching by Dirk-Jan Kroon. This too detects the pattern even if it's noisy.
Template Matching using Correlation Coefficients by Yue Wu. It is similar to #3 but takes more time.
Many of them are not invariant to rotation. So at present, I match SURF features and calculate how much the runtime image has been rotated wrt template pattern. I then rotate it in the opposite direction so I need not apply an algorithm which is invariant to rotation (this is why I don't need rotation invariance).
In many cases even if there's no pattern present, it's still falsely detected. Here is a screenshot of output using #3:
Here is another such wrong output:
I have also worked on a couple of Algorithms based on SIFT/ASIFT Features earlier. They are very robust and obviously match patterns even if it's noisy. Hence, I am not using these in the present application.
I have attached an example Template image, and Yes and No images for your reference.
Please let me know an algorithm for this purpose. I thought it was a simple template match, but in many cases, it was falsely detected.
At present I think I can use #3 algorithm to at least detect the position of pattern in runtime image (as shown in 2nd screenshot above) and match this region. Is this possible? I can't use image subtraction because it's not exactly a pixel-to-pixel match. There may be very slight variations.
Any inputs will be appreciated.
Regards,
Meghana.
Edit: I tried using Zernike Moments by Amir Tahmasbi. Although I got two outputs for each image, A and phi, I am not sure if I can count on this output. The value for A seems to be very close for good ('yes') and bad ('no') images. I have the screen shot posted here, along with the two outputs for each image. For my issue, is there any other way to infer these two outputs of each image so I can categorise them as good/bad?
This is for good image:
This is for three bad images:

Encog Neural Network error not changing

I am creating a neural network that works on colored images. but when i train it, the error will never change after some point. even after a thousand iterations. what causes it? or what should i do? here is the structure:
BasicNetwork network = new BasicNetwork();
network.addLayer(new BasicLayer(null,true,16875));
network.addLayer(new BasicLayer(new ActivationSigmoid(),true,(50)));
network.addLayer(new BasicLayer(new ActivationSigmoid(),true,setUniqueNumbers.size()));
network.getStructure().finalizeStructure();
network.reset();
the input layer is actually 75 * 75 (75x75 pixels) *3 (red, green, blue), so i came up with 16875.
When the error stops changing, you have hit a minimum, probably a local minimum.
This means that it has found what it thinks is the best solution so far, and to move away from that point will cause more error (even if it has to go up hill a bit before it can go down hill to an even lower error rate). This happens early when there isn't a strong pattern/correlation in the data. It can also happen when the structure is out of wack.
That looks like it may definitely be one of your problems. ~17,000 input neurons is a ton. And then 50 hidden neurons doesn't seem to match up well with so many inputs. Instead of feeding in so much data, find a way to extract features to reduce the input size and make it more meaningful to the network.
Examples that may help it run better:
Down sample the image from 75*75 to, say, 10*10. This depends on what the picture is.
Convert from RGB to gray scale
Learn about feature extraction - if you can extract features from the image, such as lines, edges, etc., the net will have a lot more to learn from. Seriously, feature extraction is your golden ticket to making ANNs work.
Good luck!

Efficient way to fingerprint an image (jpg, png, etc)?

Is there an efficient way to get a fingerprint of an image for duplicate detection?
That is, given an image file, say a jpg or png, I'd like to be able to quickly calculate a value that identifies the image content and is fairly resilient to other aspects of the image (eg. the image metadata) changing. If it deals with resizing that's even better.
[Update] Regarding the meta-data in jpg files, does anyone know if it's stored in a specific part of the file? I'm looking for an easy way to ignore it - eg. can I skip the first x bytes of the file or take x bytes from the end of the file to ensure I'm not getting meta-data?
Stab in the dark, if you are looking to circumvent meta-data and size related things:
Edge Detection and scale-independent comparison
Sampling and statistical analysis of grayscale/RGB values (average lum, averaged color map)
FFT and other transforms (Good article Classification of Fingerprints using FFT)
And numerous others.
Basically:
Convert JPG/PNG/GIF whatever into an RGB byte array which is independent of encoding
Use a fuzzy pattern classification method to generate a 'hash of the pattern' in the image ... not a hash of the RGB array as some suggest
Then you want a distributed method of fast hash comparison based on matching threshold on the encapsulated hash or encoding of the pattern. Erlang would be good for this :)
Advantages are:
Will, if you use any AI/Training, spot duplicates regardless of encoding, size, aspect, hue and lum modification, dynamic range/subsampling differences and in some cases perspective
Disadvantages:
Can be hard to code .. something like OpenCV might help
Probabilistic ... false positives are likely but can be reduced with neural networks and other AI
Slow unless you can encapsulate pattern qualities and distribute the search (MapReduce style)
Checkout image analysis books such as:
Pattern Classification 2ed
Image Processing Fundamentals
Image Processing - Principles and Applications
And others
If you are scaling the image, then things are simpler. If not, then you have to contend with the fact that scaling is lossy in more ways than sample reduction.
Using the byte size of the image for comparison would be suitable for many applications. Another way would be to:
Strip out the metadata.
Calculate the MD5 (or other suitable hashing algorithm) for the
image.
Compare that to the MD5 (or whatever) of the potential dupe
image (provided you've stripped out
the metadata for that one too)
You could use an algorithm like SIFT (Scale Invariant Feature Transform) to determine key points in the pictures and match these.
See http://en.wikipedia.org/wiki/Scale-invariant_feature_transform
It is used e.g. when stitching images in a panorama to detect matching points in different images.
You want to perform an image hash. Since you didn't specify a particular language I'm guessing you don't have a preference. At the very least there's a Matlab toolbox (beta) that can do it: http://users.ece.utexas.edu/~bevans/projects/hashing/toolbox/index.html. Most of the google results on this are research results rather than actual libraries or tools.
The problem with MD5ing it is that MD5 is very sensitive to small changes in the input, and it sounds like you want to do something a bit "smarter."
Pretty interesting question. Fastest and easiest would be to calculate crc32 of content byte array but that would work only on 100% identical images. For more intelligent compare you would probably need some kind of fuzy logic analyzis...
I've implemented at least a trivial version of this. I transform and resize all images to a very small (fixed size) black and white thumbnail. I then compare those. It detects exact, resized, and duplicates transformed to black and white. It gets a lot of duplicates without a lot of cost.
The easiest thing to do is to do a hash (like MD5) of the image data, ignoring all other metadata. You can find many open source libraries that can decode common image formats so it's quite easy to strip metadata.
But that doesn't work when image itself is manipulated in anyway, including scaling, rotating.
To do exactly what you want, you have to use Image Watermarking but it's patented and can be expensive.
This is just an idea: Possibly low frequency components present in the DCT of the jpeg could be used as a size invariant identifier.