Neural network preprocessing - neural-network

I'm working on school project about data prediction in NN I have my data normalized and I have three input and one output
My questions is
what is the different between the taring data and test data (is the training data supposed to be the input data and the test the output data)
what is testing rate is it any random number or is there rule to find it
what is training error
and my final question is after training my data I remember something about error I'm not quite sure but do I need to find the error of my prediction and how to find it
I know my questions might not be clear but I'm just confused and tried to explain it as much as I can

Answering in a school spirit: Let's suppose you are given 10 solved exercises to study. You do study them, and then the teacher tests you on these exact exercises. You do well on the test. However, there is an important question. Why did you do well?? Did you really understand the exercises, or did you just memorize them?? And how can the teacher know ??
There is only one way: The teacher must test you on a set of similar but different exercises. If you also do well on them, you have gotten a feel for the subject, and you are able to generalize the knowledge you acquired. If not, you probably memorized them, without understanding a thing. This kind of knowledge is useless.
The same happens with neural networks. You use some patterns to (training set) to train them. But, to check if they are able to generalize, you have to test them on a different set of patterns (test set) without the network knowing the correct answers. Ideally, you should have small differences in performance between the two sets, that is good generalization ability.
So, both train and tests sets are inputs, not outputs. The only difference is when you use them, the training set during the training, and the test set after it. The training/test set rate is the percentage you got correct of the training/test sets respectively. The training/test error is the complementary, that is, the percentage you got wrong.

I know this reply might come late but I will just complement the previous answer by saying that in supervised learning both the training set and the test set are input-output pairs. By structure alone they are exactly the same, a set of input and their corresponding output(or label) pairs. There is no difference in structure between both.
As blue_note said, they are just used in different occasions: one during training and one after that

Related

Neural Network checkpoints?

I am new to Neural Network and I dont know what exactly to search on google for solution,here is my problem ,if you kindly please let me know what I am looking for,
So I am working on a project where,it will have many contributors over time,and each contributor will write a new line on excel file and then run the code to train dataset,
if want to ask is that ,is there a way to save a checkpoint so each time the code don't have to train the whole dataset and just continue to train the new entries instead of starting from zero.
Please let me know what exactly I should google.
Kind regards
This is, as you guessed, extremely common and usually referred to as "fine-tuning". In your case, since the dataset barely changes between training runs, you can expect the model to be very similar, so you could initialize your weights to the weights of the previous best model and retrain for only a few epochs, likely with a small learning rate.
People usually do fine-tuning starting from a network trained on an entirely different dataset, so it's likely that you will find that use-case rather than yours, but it will work even better if you keep a very similar dataset.
"Continual learning without forgetting"

How to interpret the discriminator's loss and the generator's loss in Generative Adversarial Nets?

I am reading people's implementation of DCGAN, especially this one in tensorflow.
In that implementation, the author draws the losses of the discriminator and of the generator, which is shown below (images come from https://github.com/carpedm20/DCGAN-tensorflow):
Both the losses of the discriminator and of the generator don't seem to follow any pattern. Unlike general neural networks, whose loss decreases along with the increase of training iteration. How to interpret the loss when training GANs?
Unfortunately, like you've said for GANs the losses are very non-intuitive. Mostly it happens down to the fact that generator and discriminator are competing against each other, hence improvement on the one means the higher loss on the other, until this other learns better on the received loss, which screws up its competitor, etc.
Now one thing that should happen often enough (depending on your data and initialisation) is that both discriminator and generator losses are converging to some permanent numbers, like this:
(it's ok for loss to bounce around a bit - it's just the evidence of the model trying to improve itself)
This loss convergence would normally signify that the GAN model found some optimum, where it can't improve more, which also should mean that it has learned well enough. (Also note, that the numbers themselves usually aren't very informative.)
Here are a few side notes, that I hope would be of help:
if loss haven't converged very well, it doesn't necessarily mean that the model hasn't learned anything - check the generated examples, sometimes they come out good enough. Alternatively, can try changing learning rate and other parameters.
if the model converged well, still check the generated examples - sometimes the generator finds one/few examples that discriminator can't distinguish from the genuine data. The trouble is it always gives out these few, not creating anything new, this is called mode collapse. Usually introducing some diversity to your data helps.
as vanilla GANs are rather unstable, I'd suggest to use some version
of the DCGAN models, as they contain some features like convolutional
layers and batch normalisation, that are supposed to help with the
stability of the convergence. (the picture above is a result of the DCGAN rather than vanilla GAN)
This is some common sense but still: like with most neural net structures tweaking the model, i.e. changing its parameters or/and architecture to fit your certain needs/data can improve the model or screw it.

Neural Network Retraining

I am coding a simple Neural Network, but I have thought of one issue that is bothering me.
This NN is for finding categories in the input. To better understand this, say the categories are "the numbers" (0,1,2...9).
To implement this the output layer is 10 nodes. Say I train this NN with several input -output pairs and save the learned weights somewhere. As the learning process takes quite a lot of time, after that I go and take a break. Come fresh the next day and re-start learning with new input -output pairs. So fair so goo
But what happen if on that time, I decide that I want to recognize hexadecimals (0,1,...9,A,B,,,E,F)... ergo the categories are increasing.
I suspect that would imply changing the structure of the NN and therefore I should retrain the NN from scratch.
Is this so?
Any comment, advice or your share of experience will be greatly appreciated
EDIT: This question has been marked as duplicate. I read the other question and although similar, my question is more concrete. While the other question speaks in generalities and the answer also is quite general- mine is very concrete as I use an example:
If I train a NN to recognize decimal numbers and later on decide to add data to make it recognize hexadecimals, can this be possible? How? Do I have to retrain the whole NN? In other words, does the structure of the NN needs to stay stationary with 10 OR 16 outputs since the beginning?
I would very much appreciate for a concrete answer to this. Thanks
A few considerations
Your training set and testing set should have the same distribution
Unless you have some way of specifying sample weights like some algorithms can you should at all costs avoid training on biased data. This is true for machine learning in general, not only neural networks.
Resuming training from a previous session is equivalent of using good initial values
Technically, you're just using the previous network as initial value instead of a random value. You should keep training in the whole dataset as always, to avoid a biased network.
Short Answer
Yes, you should always retrain your network if by retrain, you mean doing a training routine with the full dataset.
If you just mean retrain as doing a really long training iteration, it isn't your choice anyway. You must always train the network until the training error and testing error (or cross validated error) converge. If you reuse the previously trained network, that will probably happen faster.
You see, this is true no matter what kind of model change. If you change the network architecture, or the dataset, or both (your example), or some other parameter.
Of course, if you change the network architecture, you're going to have a bit of trouble on reusing the previous network. You could reuse the learned parameters from nodes that were kept and randomly initialize the parameters for the new nodes.

neural network and a intrusion detection system

How do I approach the problem with a neural network and a intrusion detection system where by lets say we have an attack via FTP.
Lets say some one attempts to continuously try different logins via brute force attack on an ftp account.
How would I set the structure of the NN? What things do I have to consider? How would it recognise "similar approaches in the future"?
Any diagrams and input would be much appreciated.
Your question is extremely general and a good answer is a project in itself. I recommend contracting someone with experience in neural network design to help come up with an appropriate model or even tell you whether your problem is amenable to using a neural network. A few ideas, though:
Inputs need to be quantized, so start by making a list of possible numeric inputs that you could measure.
Outputs also need to be quantized and you probably can't generate a simple "Yes/no" response. Most likely you'll want to generate one or more numbers that represent a rough probability of it being an attack, perhaps broken down by category.
You'll need to accumulate a large set of training data that has been analyzed and quantized into the inputs and outputs you've designed. Figuring out the process of doing this quantization is a huge part of the overall problem.
You'll also need a large set of validation data, which should be quantized in the same way as the training data, but that should not take any part in the training, as otherwise you will simply force a correlation network that may well be completely meaningless.
Once you've completed the above, you can think about how you want to structure your network and the specific algorithms you want to use to train it. There is a wide range of literature on this topic, but, honestly, this is the simpler part of the problem. Representing the problem in a way that can be processed coherently is much more difficult.

Choose the right classification algorithm. Linear or non-linear? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I find this question a little tricky. Maybe someone knows an approach to answer this question. Imagine that you have a dataset(training data) which you don't know what it is about. Which features of training data would you look at in order to infer classification algorithm to classify this data? Can we say anything whether we should use a non-linear or linear classification algorithm?
By the way, I am using WEKA to analyze the data.
Any suggestions?
Thank you.
This is in fact two questions in one ;-)
Feature selection
Linear or not
add "algorithm selection", and you probably have three most fundamental questions of classifier design.
As an aside note, it's a good thing that you do not have any domain expertise which would have allowed you to guide the selection of features and/or to assert the linearity of the feature space. That's the fun of data mining : to infer such info without a priori expertise. (BTW, and while domain expertise is good to double-check the outcome of the classifier, too much a priori insight may make you miss good mining opportunities). Without any such a priori knowledge you are forced to establish sound methodologies and apply careful scrutiny to the results.
It's hard to provide specific guidance, in part because many details are left out in the question, and also because I'm somewhat BS-ing my way through this ;-). Never the less I hope the following generic advice will be helpful
For each algorithm you try (or more precisely for each set of parameters for a given algorithm), you will need to run many tests. Theory can be very helpful, but there will remain a lot of "trial and error". You'll find Cross-Validation a valuable technique.
In a nutshell, [and depending on the size of the available training data], you randomly split the training data in several parts and train the classifier on one [or several] of these parts, and then evaluate the classifier on its performance on another [or several] parts. For each such run you measure various indicators of performance such as Mis-Classification Error (MCE) and aside from telling you how the classifier performs, these metrics, or rather their variability will provide hints as to the relevance of the features selected and/or their lack of scale or linearity.
Independently of the linearity assumption, it is useful to normalize the values of numeric features. This helps with features which have an odd range etc.
Within each dimension, establish the range within, say, 2.5 standard deviations on either side of the median, and convert the feature values to a percentage on the basis of this range.
Convert nominal attributes to binary ones, creating as many dimensions are there are distinct values of the nominal attribute. (I think many algorithm optimizers will do this for you)
Once you have identified one or a few classifiers with a relatively decent performance (say 33% MCE), perform the same test series, with such a classifier by modifying only one parameter at a time. For example remove some features, and see if the resulting, lower dimensionality classifier improves or degrades.
The loss factor is a very sensitive parameter. Try and stick with one "reasonnable" but possibly suboptimal value for the bulk of the tests, fine tune the loss at the end.
Learn to exploit the "dump" info provided by the SVM optimizers. These results provide very valuable info as to what the optimizer "thinks"
Remember that what worked very well wih a given dataset in a given domain may perform very poorly with data from another domain...
coffee's good, not too much. When all fails, make it Irish ;-)
Wow, so you have some training data and you don't know whether you are looking at features representing words in a document, or genese in a cell and need to tune a classifier. Well, since you don't have any semantic information, you are going to have to do this soley by looking at statistical properties of the data sets.
First, to formulate the problem, this is more than just linear vs non-linear. If you are really looking to classify this data, what you really need to do is to select a kernel function for the classifier which may be linear, or non-linear (gaussian, polynomial, hyperbolic, etc. In addition each kernel function may take one or more parameters that would need to be set. Determining an optimal kernel function and parameter set for a given classification problem is not really a solved problem, there are only useful heuristics and if you google 'selecting a kernel function' or 'choose kernel function', you will be treated to many research papers proposing and testing various approaches. While there are many approaches, one of the most basic and well travelled is to do a gradient descent on the parameters-- basically you try a kernel method and a parameter set , train on half your data points and see how you do. Then you try a different set of parameters and see how you do. You move the parameters in the direction of best improvement in accuracy until you get satisfactory results.
If you don't need to go through all this complexity to find a good kernel function, and simply want an answer to linear or non-linear. then the question mainly comes down to two things: Non linear classifiers will have a higher risk of overfitting (undergeneralizing) since they have more dimensions of freedom. They can suffer from the classifier merely memorizing sets of good data points, rather than coming up with a good generalization. On the other hand a linear classifier has less freedom to fit, and in the case of data that is not linearly seperable, will fail to find a good decision function and suffer from high error rates.
Unfortunately, I don't know a better mathematical solution to answer the question "is this data linearly seperable" other than to just try the classifier itself and see how it performs. For that you are going to need a smarter answer than mine.
Edit: This research paper describes an algorithm which looks like it should be able to determine how close a given data set comes to being linearly seperable.
http://www2.ift.ulaval.ca/~mmarchand/publications/wcnn93aa.pdf