Strange behaviour in Neural Network training? [closed]

Strange behaviour in Neural Network training? [closed] - neural-network

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I have created a neural network for detection spam.
It involves the following steps;
1.Formation of tf-idf matrix of terms and mails.
2.Reduction of matrix using PCA.
3.Feeding the 20 most important terms according to eigen values to neural network as features.
I'm training it for 1-Spam and 0-Not spam.
EDITS:
I decided to train it by taking a batch size of 7 mails because it was prone to showing Out of memory error while forming the matrix. I used the standard enron dataset of ham and spam .
I used to train neural network via back-propagation -1 input - 1 hidden - 1 output layer with 20 neurons in first layer and 6 hidden layer neurons.
So I started training with my original spam mails in my gmail giving very bad results before switching it to enron dataset. Satisfactory outputs were obtained after training quite a lot.
6 out of 14 mails were being detected spam when i tested.
I used alternative training like batch 1 of spam mails and batch 2 for ham mail and so on such that the network is trained for 1 output for spam and 0 for ham .
But now after too much training almost 400-500 mails i guess, it if giving bad results again . I reduced learning rate but no help.
What's going wrong?

To summarize my comments into an answer... If you're net is producing results that you would expect and then after additional training the output is less accurate, then there is a good chance it is overtrained.
This is especially prone to happen if your data set is small or doesn't vary enough. Finding the optimal number of epochs is mostly trial-and-error.

Related

neural network check plastic parts [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
neural networks are used to generalize and classify...
I have a little experience with classify digits...
Using neural nets to recognize handwritten digits
i want to use a network to check plastic parts.
I have a videostream of production from these plastic parts.
should i train the network with many videos of correct plastic parts to get positive output and random videos to get negative output?
If you have any books or links i would be happy to see them.
EDIT
It looks like i asked a bit stupid...
During production, wrong plastic parts can be created and these should be recognized by network. There are a lot of mistakes can happen during production, so i think
it only makes sense to train the network with correct plastic parts.

A convolution neural network would be my recommendation.
You should show individual parts with similar background and lighting.
The training has to be done on both good and bad parts - a sufficient random sampling of both. You should also set aside a test set once your CNN is trained so you can evaluate it.
You'll want to generate a confusion matrix from the test data so you'll know the rate of false positives, false negatives, correct, and incorrect classifications.

Can I use neural network in this case? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
can I use neural networks or svm or etc, if my output data is 27680 that all of them are zero and just one of them is one?
I mean that Is it right to do this?
when I use SVM I have this error:
Error using seqminopt>seqminoptImpl (line 198)
No convergence achieved within maximum number of iterations.

SVMs are usually binary classifiers. Basically that means that they seperate your datapoints into two groups, which signals whether a datapoint does or doesn't belong to a class. Common strategies for solving multi-class problems with SVMs are one-vs-rest and one-vs-one. In the case of one-vs-rest, you would train one classifier per class, which would be 27,680 for you. In the case of one-vs-one, you would train (K over 2) = (K(K-1))/2 classifiers, so in your case around 38 million. As you can see, both numbers are rather high, so I would be pessimistic about your probability of successfully solving your problem with SVMs.
Nevertheless you can try to increase the maximum amount iterations as described in another stackoverflow thread. Maybe it still works.
You can use Neural Nets for your task and a 1-of-K output is nothing unusual. However, even with only one hidden layer of 500 neurons (and using the input and output vector sizes mentioned in your comment) you will have (27680*2*500) + (500*27680) = 41,520,000 weights in your network. So I would expect rather long training times (although a Google employee would probably laugh about these numbers). You will also most likely need a lot of training examples, unless your input is really simple.
As an alternative you might look into Decision Trees/Random Forests, Naive Bayes or kNN.

What does the value of the gradient during NN training signify? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
The by default value for the gradient descent approach is 1e-5.
Is this a very small value for generalization to a testing set? What range should I keep it in?
Does the gradient signify the error between the targets and the predicted class during TRAINING (i.e using the training data)?

If you're not using regularization, you should check several values for the learning rate and several values for the number of iterations. You should do this on on a hold out set (also called validation set). If you're using regularization you should not do this and instead try several values for the weight of the regularization term (usually C or lambda).
As for values people try from 2^-10 to 2^-1. Also it is in general useful if your feature values are in a reasonable numerical range (from -1 to 1) or from (0 to 1).

Clustering using K-means algorithm for documents [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
How i calculate the distance between two documents? In the k-means for numbers you have to caculate the distance between two points. I know that i can use the cosinus function.
I want to perform clustering to rss documents. I have done stemming and removed the stop words from the documents. I have counted the frequency of word in each document. And now i want to implement the k-mean algorithm.

I'm assuming that your difficulty is in creating the feature vector? Create a feature vector for each document by
Collecting together all words to form a giant vector
Setting the elements of that vector to be the count of terms.
For example, if you have
Document 1 = the quick brown fox jumped over the brown dog
Document 2 = the brown cows eat hippo meat
Then the total set of words is [the,quick,brown,fox,jumped,over,the,dog,cows,eat,hippo,meat] and the document vectors are
Document 1 = [1,1,2,1,1,1,1,1,0,0,0,0]
Document 2 = [1,0,1,0,0,0,0,0,1,1,1,1]
And now you just have two giant feature vectors that you can use to represent the document and you can use k-means clustering. As others have said, Euclidean distance can be used to calculate the distance between documents.

There various distance functions. One is the Euclidean Distance.

You can use the euclidean distance formula for an n-dimensional system.
sqrt((x1-x2)^2 + (y1-y2)^2 + (z1 - z2)^2 ... )

Do you know a good and efficient FFT? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I am trying to find a very fast and efficient Fourier transform (FFT). Does anyone know of any good ones. I need to run it on the iPhone so it must not be intensive. Instead, maybe you know of one that is wavelet like, i need frequency resolution but only a narrow band (vocal audio range up to 10khz max...even 10Khz might be too high). Im thinking also of truncating this FFT to keep the frequency resolution while eliminating the unwanted frequency band. This is for an iphone
...I have taken a look at the FFT in Aurio touch but it seems this is an int FFT but my app uses floats.....would it give a big performance increase to try and adapt program to an int FFT or not(which i really dont feel like doing...plus aurio touch uses a radix 2 FFT which is not that great).

The iPhone OS4 SDK will include the Accelerate framework, which will (finally) give us Apple-written FFT functions
Accelerate provides hundreds of
mathematical functions optimized for
iPhone and iPod touch, including
signal-processing routines, fast
Fourier transforms, basic vector and
matrix operations, and
industry-standard functions for
factoring matrices and solving systems
of linear equations.

I've wrapped Ooura's FFT library in Objective-C. Ooura's code is of comparable performance to FFTW, but totally and utterly free.
This code uses double-precision and has several built-in window types (rectangular, Blackwell, Triangle, Hamming). I use Ooura's FFT code to implement Welch's method, which will generate a much smoother spectra when viewed over time.
Check it out at:
http://github.com/alexbw/iPhoneFFT

Give the Fastest Fourier Transform in the West (FFTW) a go, The performance is good compared to others, but it is not completely free. See the details on commercial use here. Obviously being a c library you should have no problem linking it as a static library to your iphone app.

The performance of the FFTW sets the standard for arbitrary length FFT's - especially for non-power of 2 lengths in 2 and greater dimensions. The commercial license for FFTW is $5000, which may or may not fit in your budget.
However, it sounds like you have a 1D signal processing problem in which case you have a few more options - and if you can further either pad or sample your data to power-of-2 lengths, then many libraries will offer reasonable performance. Check out this list of FFT algorithms that FFTW used for comparison - many are free and some may be adequate. I'd probably start with good old numerical recipes which offers an easy power of 2, 1D FFT implementation for free and some typing - and would be very memory efficient.
BTW - for voice you probably only need to go to 3-4Khz....10Khz is way way up there for the
human voice.

Here is a primary source link to Ooura's numerical software:
http://www.kurims.kyoto-u.ac.jp/~ooura/
I have been using many of Ooura's FFTs over the years, I should send him a "domo" at the very least, and I use his real radix-4 in several iPad and iPhone applications under development. I did translate the code to operate with 32-bit single precision for performance on ARM. Looking at the assembly produced with XCode 3.2.2, it vectorizes with NEON SIMD instructions very nicely. I was half disappointed actually, as I was willing to vectorize the code a bit myself for even more performance. These optimizations cannot be had without first translating the FFT to single precision obviously.
While I have used Objective-C for many years, I actively develop using it, and even taught an object oriented programming course using it, I did not prepare such a wrapper (though I had done the same back in 1992 with a different FFT) for performance reasons.
I haven't tested FFTW against Ooura's FFT in at least 10 years, but when I did Ooura's library was faster for 1024 point real FFTs. However, it is quite possible that FFTW may do much better now -- but licensing it and cross-compiling it for ARM is inconvenient and I have always found FFTW to be far too bulky and obtrusive for my DSP needs. Apple's VecLib is very nice but unfortunately they have not ported it to iPhoneOS. I opened a feature request in BugReporter and you can too: https://bugreport.apple.com/

As answered before, the Accelerate Framework now provides some APIs that might help you.
Check:
Accelerate Framework Reference
vDSP Reference
Using Fourier Transforms

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse