Such a problem: I've trained some ann using MSE stop function up to "desired error" 10^-5 (5MB of training data, 15000 input items,long training period -- about a day). I've got 0 bit fail during training. I've saved the ann to a file.
Then I loaded the net from the file, and check it on the same training data. And sometimes I'm getting bit fail up to 5 (not so seldom, BTW!).
What is this? Does anybody meet such a phenomenon?
I suspect, this is the rounding artefact: many thousands of weights saved to the file in text format and loaded back...
Solved.
MSE after fann_reset_MSE() and fann_test_data() has no relation to the error returned by fann_train(). If the ANN is trained up to very low MSE, then fann_get_MSE() and fann_get_bit_fail() are more or less in agreement with values returned by these functions ater fann_reset_MSE() and fann_test_data(). If not (ANN is not trained well), then these values might differ in orders of magnitude.
Related
In Python I am working on a binary classification problem of Fraud detection on travel insurance. Here is the characteristic about my dataset:
Contains 40,000 samples with 20 features. After one hot encoding, the number of features is 50(4 numeric, 46 categorical).
Majority unlabeled: out of 40,000 samples, 33,000 samples are unlabeled.
Highly imbalanced: out of 7,000 labeled samples, only 800 samples(11%) are positive(Fraud).
Metrics is precision, recall and F2 score. We focus more on avoiding false positive, therefore high recall is appreciated. As preprocessing I oversampled positive cases using SMOTE-NC, which takes into account categorical variables as well.
After trying several approaches including Semi-Supervised Learning with Self Training and Label Propagation/Label Spreading etc, I achieved high recall score(80% on training, 65-70% on test). However, my precision score shows some trace of overfitting(60-70% on training, 10% on testing). I understand that precision is good on training because it's resampled, and low on test data because it directly reflects the imbalance of the classes in test data. But this precision score is unacceptably low so I want to solve it.
So to simplify the model I am thinking about applying dimensionality reduction. I found a package called prince which comes with FAMD(Factor Analysis for Mixture Data).
Question 1: How I should do normalization, FAMD, k-fold Cross Validation and resampling? Is my approach below correct?
Question 2: The package prince does not have methods such as fit or transform like in Sklearn, so I cannot do the 3rd step described below. Any other good packages to do fitand transform for FAMD? And is there any other good way to reduce dimensionality on this kind of dataset?
My approach:
Make k folds and isolate one of them for validation, use the rest for training
Normalize training data and transform validation data
Fit FAMD on training data, and transform training and test data
Resample only training data using SMOTE-NC
Train whatever model it is, evaluate on validation data
Repeat 2-5 k times and take the average of precision, recall F2 score
*I would also appreciate for any kinds of advices on my overall approach to this problem
Thanks!
My data set has 150 independent variables and 10 predictors or response. The problem is to find a mapping between input and output variables. There are 1000 data points out of which 70% I have used for training and 30% for testing. I am using a feedforward neural network with 10 hidden neurons as explained in this Matlab document . I am evaluating the performance using the command
perf_Train = perform(net,TrainedData',lblTrain')
YPred = net(XTest);
perf_Test = perform(net,YPred,lblTest')
which basically gives the mean square error between the actual and the predicted (estimated) response for training and testing. My testing data is not able to fit properly to the trained model, however the training data fits quite well.
Problem1: My training performance is always lesser than test performance measure i.e., perf_Train = 0.0867 and perf_Test = 0.567
Is this overfitting or underfitting?
Problem2: How do I make the test data fit accurately? Theory say that to overcome overfitting and underfitting, we need to do regularization. Is there any parameter that needs to be input into the function such as regularization to overcome this?
It is overfitting since training error is lower than test error.
I would recommend to set less epochs(iteration) for your training or use less training data.
I would also recommend to check that the training data and test data are picked up randomly.
For regulation, it can be set like this:
net.performParam.regularization = 0.5;
The performance ratio depends on the model, 0.5 is just an example.
For more details, you can refer to the documentation below.
https://www.mathworks.com/help/deeplearning/ug/improve-neural-network-generalization-and-avoid-overfitting.html#bss4gz0-38
I trained a Neural Network with a GA and with Backpropagation. The GA finds suitable weights for the training data but performs poorly on the test data. If I train the NN with BackPropagation, it performs much better on the test data even though the training error isn't much smaller than for the GA trained version. Even when I use the weights obtained by the GA as initial weights for Backpropagation, the NN performs worse on the test data than using only Backpropagation for training. Can anyone tell me where I could have made a mistake?
I suggest you read something about overfitting. In short you will be excelent at training set but poor at testing set(because NN follows anomaly and uncertainity and datas). Task of NN is generalize, but GA only perfect minimize error in training set(to be fair, this is GA task).
There are some methods how to deal with overfitting. I suggest you use validation set. First step is division your data into the three sets. Training testing and validation. Method is simple, you will train your NN with GA to minimalize error on training set, but you also run your NN on validation set, only run, not train. Error of network decrease on training set, but error should also decrease at validation set. So if error decrease at training set, but start increase at validation set, you must stop with learning(please don't stop at first iterations).
Hope it will be helpful.
I have encountered a similar problem, and the choice of the initial values of the neural network does not seem to affect the final classification accuracy. I used the feedforwardnet() function in matlab to compare the two cases. One is direct training, and the program gives random initial weights and bias values. One is to find the appropriate initial weights values and bias values through the GA algorithm, and then assign them to the neural network, and then start training. However, the latter approach does not improve the accuracy of neural network classification.
I'm relatively new to Matlab ANN Toolbox. I am training the NN with pattern recognition and target matrix of 3x8670 containing 1s and 0s, using one hidden layer, 40 neurons and the rest with default settings. When I get the simulated output for new set of inputs, then the values are around 0 and 1. I then arrange them in descending order and choose a fixed number(which is known to me) out of 8670 observations to be 1 and rest to be zero.
Every time I run the program, the first row of the simulated output always has close to 100% accuracy and the following rows dont exhibit the same kind of accuracy.
Is there a logical explanation in general? I understand that answering this query conclusively might require the understanding of program and problem, but its made of of several functions to clearly explain. Can I make some changes in the training to get consistence output?
If you have any suggestions please share it with me.
Thanks,
Nishant
Your problem statement is not clear for me. For example, what you mean by: "I then arrange them in descending order and choose a fixed number ..."
As I understand, you did not get appropriate output from your NN as compared to the real target. I mean, your output from NN is difference than target. If so, there are different possibilities which should be considered:
How do you divide training/test/validation sets for training phase? The most division should be assigned to training (around 75%) and rest for test/validation.
How is your training data set? Can it support most scenarios as you expected? If your trained data set is not somewhat similar to your test data sets (e.g., you have some new records/samples in the test data set which had not (near) appear in the training phase, it explains as 'outlier' and NN cannot work efficiently with these types of samples, so you need clustering approach not NN classification approach), your results from NN is out-of-range and NN cannot provide ideal accuracy as you need. NN is good for those data set training, where there is no very difference between training and test data sets. Otherwise, NN is not appropriate.
Sometimes you have an appropriate training data set, but the problem is training itself. In this condition, you need other types of NN, because feed-forward NNs such as MLP cannot work with compacted and not well-separated regions of data very well. You need strong function approximation such as RBF and SVM.
I have created a neural network and the performance is good. By using nprtool, we are allow to test the network with an input data and target data. Here is my question, what is the purpose of testing a neural network with target data provided? Isn't it testing should not hav e target data so that we can know how well can the trained neural network perform without target data is given? Hope someone will respond to this, thanks =)
I'm not familiar with nprtool, but I suspect it would give the input data to your neural network, and then compare your NN's output data with the target data (and compute some kind of success rate based on that).
So your NN will never see the target data, it's just used to measure the performance.
It's like the "teacher's edition" of the exercise books in school. The student (i.e. the NN) doesn't have the solutions, but her/his answers will be compared against them by the teacher (i.e. nprtool). (Okay, the teacher probably/hopefully knows the subject, but you get the idea.)
The "target" data t is the desired y of y=net(x) used as example to train the network.
What nprtool do is to divide the training set into three groups: the training set, the validation set and the test set.
The first one is used to actually update the network.
The second one is used to determine the performances of the net (note: this set is NOT used in any way to update the network): as the NN "learns" the error (as difference between the t and net(x)) over the validation set decreases. The trend will eventually stop or even reverse: this phenomena is called "overfitting", which means the NN is now chasing the training set, "memorizing" it at the cost of the ability to generalize (meaning: to perform well with unseen data). So the purpose of this validation set is to determine when to stop the training before the NN starts overfitting. This should answer your question.
Finally third set is for external testing, to leave you a set of data untouched by the training procedure.
Even though the total data set [training, validation and testing] are inputs to the training algorithm, the testing data is in no way used to design (i.e., train and validate) the net
total = design + test
design = train + validate
The training data is used to estimate weights and biases
The validation data is used to monitor the design performance on nontraining data. REGARDLESS OF THE PERFORMANCE ON TRAINING DATA, if validation performance degrades continuously for 6 (default) epochs, training is terminated (VALIDATION STOPPING).
This mitigates the dreaded phenomenon of OVERTRAINING AN OVERFIT NET where performance on nontraining data degrades even if the training set performance is improving.
An overfit net has more unknown weights and biases than training equations, thereby allowing an infinite number of solutions. A simple example of overfitting with two unknowns but only one equation:
KNOWN: a, b, c
FIND: unique x1 and x2
USING: a * x1 + b * x2 = c
Hope this helps.
Greg