input data of deep learning based auto encoder - unsupervised-learning

Lots unsupervised deep learning methods for multivariate time series data have AE or GAN based. My question is that even though these algorithms are for unsupervised learning, the training only use normal data. Is there any solution to train datasets that has no label(cannot identify which dataset is normal or abnormal) to these networks?

Related

Can a neural network be trained with just a single class of training data?

I just want to know if a neural network can be trained with a single class of data set. I have a set of data that I want to train a neural network with. After training it, I want to give new data(for testing) to the trained neural network to check if it can recognize it as been similar to the training sample or not.
Is this possible with neural network? If yes, will that be a supervised learning or unsupervised.
I know neural networks can be used for classification if there are multiple classes but I have not seen with a single class before. A good explanation and link to any example will be much appreciated. Thanks
Of course it can be. But in this case it will only recognize this one class that you have trained it with. And depending on the expected output you can measure the similarity to the training data.
An NN, after training, is just a function. For classification problems you can imagine it as a function that takes data as input and returns an integer indicating to which class it belongs to. That being said, if you have only one class that can be represented by an integer value 1, and if training data is not similar to that class, you will get something like 1.555; It will not tel you that it belongs to another class, because you have introduced only one, but it will definitely give you a hint about its similarity.
NNs are considered to be supervised learning, because before training you have to provide both input and target, i. e. the expected output.
If you train a network with only a single class of data then It is popularly known as One-class Classification. There are various algorithms developed in the past like One-class SVM, Support Vector Data Description, OCKELM etc. Tax and Duin developed a MATLAB toolbox for this and it supports various one-class classifiers.
DD Toolbox
One-class SVM
Kernel Ridge Regression based or Kernelized ELM based or LSSVM(where bias=0) based One-class Classification
There is a paper Anomaly Detection Using One-Class Neural Networks
which combines One-Class SVM and Neural Networks.
Here is source code. However, I've had difficulty connecting the source code and the paper.

Applying Neural Network to forecast prices

I have read this line about neural networks :
"Although the perceptron rule finds a successful weight vector when
the training examples are linearly separable, it can fail to converge
if the examples are not linearly separable.
My data distribution is like this :The features are production of rubber ,consumption of rubber , production of synthetic rubber and exchange rate all values are scaled
My question is that the data is not linearly separable so should i apply ANN on it or not? is this a rule that it should be applied on linerly separable data only ? as i am getting good results using it (0.09% MAPE error) . I have also applied SVM regression (fitrsvm function in MATLAB)so I have to ask can SVM be used in forecasting /prediction or it is used only for classification I haven't read anywhere about using SVM to forecast , and the results for SVM are also not good what can be the possible reason?
Neural networks are not perceptrons. Perceptron is on of the oldest ideas, which is at most a single building block of neural networks. Perceptron is designed for binary, linear classification and your problem is neither the binary classification nor linearly separable. You are looking at regression here, where neural networks are a good fit.
can SVM be used in forecasting /prediction or it is used only for classification I haven't read anywhere about using SVM to forecast , and the results for SVM are also not good what can be the possible reason?
SVM has regression "clone" called SVR which can be used for any task NN (as a regressor) can be used. There are of course some typical characteristics of both (like SVR being non parametric estimator etc.). For the task at hand - both approaches (as well as any another regressor, there are dozens of them!) is fine.

Applying neural network to MFCCs for variable-length speech segments

I'm currently trying to create and train a neural network to perform simple speech classification using MFCCs.
At the moment, I'm using 26 coefficients for each sample, and a total of 5 different classes - these are five different words with varying numbers of syllables.
While each sample is 2 seconds long, I am unsure how to handle cases where the user can pronounce words either very slowly or very quickly. E.g., the word 'television' spoken within 1 second yields different coefficients than the word spoken within two seconds.
Any advice on how I can solve this problem would be much appreciated!
I'm currently trying to create and train a neural network to perform simple speech classification using MFCCs.
Simple neural networks do not have input lenght invariance and do not allow to analyze time series.
For classification of time series like a series of MFCC frames you can use a classifier with time invariance. For example you can use neural networks combined with hidden Markov models (ANN-HMM), gaussian mixture model with hidden markov models (GMM-HMM) or recurrent neural networks (RNN). Matlab implementation for RNN is here. Theano implementation is also available. You can find a detailed description of those structures in Google.
Speech recognition is not a simple thing to implement, it is better to use existing software like CMUSphinx

How to train on and make a serialized feature vector for a Neural Network?

By serialized i mean that the values for an input come in discrete intervals of time and that size of the vector is also not known before hand.
Conventionally the neural networks employ fixed size parallel input neurons and fixed size parallel output neurons.
A serialized implementation could be used in speech recognition where i can feed the network with a time series of the waveform and on the output end get the phonemes.
It would be great if someone can point out some existing implementation.
Simple neural network as a structure doesn't have invariance across time scale deformation that's why it is impractical to apply it to recognize time series. To recognize time series usually a generic communication model is used (HMM). NN could be used together with HMM to classify individual frames of speech. In such HMM-ANN configuration audio is split on frames, frame slices are passed into ANN in order to calculate phoneme probabilities and then the whole probability sequence is analyzed for a best match using dynamic search with HMM.
HMM-ANN system usually requires initialization from more robust HMM-GMM system thus there are no standalone HMM-ANN implementation, usually they are part of a whole speech recognition toolkit. Among popular toolkits Kaldi has implementation for HMM-ANN and even for HMM-DNN (deep neural networks).
There are also neural networks which are designed to classify time series - recurrent neural networks, they can be successfully used to classify speech. The example can be created with any toolkit supporting RNN, for example Keras. If you want to start with recurrent neural networks, try long-short term memory networks (LSTM), their architecture enables more stable training. Keras setup for speech recognition is discussed in Building Speech Dataset for LSTM binary classification
There are several types of neural networks that are intended to model sequence data; I would say most of these models fit into an equivalence class known as a recurrent neural network, which is generally any neural network model whose connection graph contains a cycle. The cycle in the connection graph can typically be exploited to model some aspect of the past "state" of the network, and different strategies -- for example, Elman/Jordan nets, Echo State Networks, etc. -- have been developed to take advantage of this state information in different ways.
Historically, recurrent nets have been extremely difficult to train effectively. Thanks to lots of recent work in second-order optimization tools for neural networks, along with research from the deep neural networks community, several recent examples of recurrent networks have been developed that show promise in modeling real-world tasks. In my opinion, one of the neatest current examples of such a network is Ilya Sutskever's "Generating text with recurrent neural networks" (ICML 2011), in which a recurrent net is used as a very compact, long-range n-gram character model. (Try the RNN demo on the linked homepage, it's fun.)
As far as I know, recurrent nets have not yet been applied successfully to speech -> phoneme modeling directly, but Alex Graves specifically mentions this task in several of his recent papers. (Actually, it looks like he has a 2013 ICASSP paper on this topic.)

Which predictive modelling technique will be most helpful?

I have a training dataset which gives me the ranking of various cricket players(2008) on the basis of their performance in the past years(2005-2007).
I've to develop a model using this data and then apply it on another dataset to predict the ranking of players(2012) using the data already given to me(2009-2011).
Which predictive modelling will be best for this? What are the pros and cons of using the different forms of regression or neural networks?
The type of model to use depends on different factors:
Amount of data: if you have very little data, you better opt for a simple prediction model like linear regression. If you use a prediction model which is too powerful you run into the risk of over-fitting your model with the effect that it generalizes bad on new data. Now you might ask, what is little data? That depends on the number of input dimensions and on the underlying distributions of your data.
Your experience with the model. Neural networks can be quite tricky to handle if you have little experience with them. There are quite a few parameters to be optimized, like the network layer structure, the number of iterations, the learning rate, the momentum term, just to mention a few. Linear prediction is a lot easier to handle with respect to this "meta-optimization"
A pragmatic approach for you, if you still cannot opt for one of the methods, would be to evaluate a couple of different prediction methods. You take some of your data where you already have target values (the 2008 data), split it into training and test data (take some 10% as test data, e.g.), train and test using cross-validation and compute the error rate by comparing the predicted values with the target values you already have.
One great book, which is also on the web, is Pattern recognition and machine learning by C. Bishop. It has a great introductory section on prediction models.
Which predictive modelling will be best for this? 2. What are the pros
and cons of using the different forms of regression or neural
networks?
"What is best" depends on the resources you have. Full Bayesian Networks (or k-Dependency Bayesian Networks) with information theoretically learned graphs, are the ultimate 'assumptionless' models, and often perform extremely well. Sophisticated Neural Networks can perform impressively well too. The problem with such models is that they can be very computationally expensive, so models that employ methods of approximation may be more appropriate. There are mathematical similarities connecting regression, neural networks and bayesian networks.
Regression is actually a simple form of Neural Networks with some additional assumptions about the data. Neural Networks can be constructed to make less assumptions about the data, but as Thomas789 points out at the cost of being considerably more difficult to understand (sometimes monumentally difficult to debug).
As a rule of thumb - the more assumptions and approximations in a model the easier it is to A: understand and B: find the computational power necessary, but potentially at the cost of performance or "overfitting" (this is when a model suits the training data well, but doesn't extrapolate to the general case).
Free online books:
http://www.inference.phy.cam.ac.uk/mackay/itila/
http://ciml.info/dl/v0_8/ciml-v0_8-all.pdf