Deeplearning4j Autoencoder - autoencoder

I couldn't find any full example of an autoencoder in DL4J documentation. I see a good general description of Autoencoders here with a small piece of code for just the MultiLayerConfiguration, but the code is not full. Is there any full example where a dataset is loaded, pre-processed and then inserted into the network and a prediction is generated? For example, an example working with the Movielens dataset, or any other. Thank you.

You have an example of a deep auto encoder using the mnist dataset here:
https://deeplearning4j.konduit.ai/deeplearning4j/reference/auto-encoders
With code here:
https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/quickstart/modeling/feedforward/unsupervised/MNISTAutoencoder.java

Related

can we use autoencoders for text data

I am doing my project based on health care.I am going to train my autoencoders with the symptoms and the diseases i.e my input is in textual form. Will that work? (I am using Rstudio).Please anyone help me with this
You have to convert the text to vectors/numbers. To do this traditional approaches like Bag of words, Tf-Idf will help but the latest Neural Word Embedding like Word2Vec, RNN Language model etc are the best techniques to obtain numeric representation of text.
Please use any Neural Word Embedding technique and convert the text(word level[word2vec], document level[doc2vec]) into numbers/vectors.
Now these vectors come with some dimension and to compress this representation to even smaller dimension u can use AutoEncoder.
Feel Free to ask any other information required.
Try using Python for these tasks as it has the latest packages.
You can use Autoencoder on Textual data as explained here.
Autoencoder usually worked better on image data but recent approaches changed the autoencoder in a way it is also good on the text data.
have a look at this.
the code is also available in GitHub.

Clustering and classification

I need to perform clustering and classification on data, which is present in a csv file. The data is in form of simple text containing the vendor names.
Is there some free library available for this task?
Thanks,
Ashish
I don't understand what you mean by "clustering a classification" since those two are different from each other, but you can do clustering and classification with these libraries:
Python-Scikit
Java-weka
First convert your Dataset from csv to arff using the following link.
http://www.cs.ccsu.edu/~markov/MDLclustering/MDLmanual.pdf
After doing this please let me know that what are your expectations from the data as every algorithm in weka show some different results.
You can simply apply k-means and any other algorithm once you convert the data.

NuPIC on MNIST Dataset

I am a newbie. I think idea of NuPIC is really cool and therefore wanted to apply KNN Classifier on
NuPIC's output. I saw there is a KNNClassifier object already in python. I am confused about the input
patter that I should use. In case of MNIST dataset I will be having images where each image is a 2D
array of numbers and will be sparse. I can understand the format of output can be encoded using
categorical encoder in NuPIC but there is no such example of encoding an input that comes in the
form of arrays.
Any help will be highly appreciated.
This might help: http://numenta.org/search.html?q=mnist. There are some good discussions on our mailing lists about MNIST.

matlab train set UCI

I'm being required to train a perceptron in Matlab to learn a classification data set (any, really). The only restriction is that the data set must come from the UCI Machine Learning Repository. The problem is that I have really no idea where to begin as my teacher is extremely bad at what he does and never explained it well. I've tried asking other class-mates for help but none of them seem to have the answers. I hope I can get help from this community as it's my last chance. Thank you guys.
Well, we're really not your last chance. There are plenty of tutorials, examples, and resources findable easily from Google that would help (for example, search on "MATLAB perceptron iris" - the iris dataset is a famous example dataset, included in the UCI repository).
But here's a start. I'm assuming that if you've been set the task of training a perceptron in MATLAB, then you have access to Neural Network Toolbox (if you're asking how to implement a perceptron algorithm from scratch in MATLAB, look in a textbook).
Type doc nnet. That will bring up the documentation for Neural Network Toolbox. Then click through to the section labelled "Examples". Scrolling down to the bottom, there are several demos in a section called "Perceptrons". Try looking at the demos "Classification with a 2-Input Perceptron" or "Linearly Non-separable Vectors". Those demos use toy datasets, but should give you an idea of how to train a perceptron.
Then scroll up to the section "Pattern Recognition and Classification", and take a look at the demo "Wine Classification". The Wine dataset this demo uses is part of the UCI repository. Adapt and combine the demos you've now learnt from, to create an example your prof will like.
Neural Network Toolbox also comes with the Iris dataset that is part of the UCI repository. You may also find a demo somewhere that uses this as an example.
Hope that helps!

neural network data for training and testing

I have a question regarding Training and testing data for my ANN .
Should the testing data going trough a feature extraction process before it can be classified?
I am new to this field. Is what I am doing right?
I separate the dataset to 80% train and 20 % test. Both sets , I extract the features. for train data I put it into training network but not for the test data. Then go to classification. Is this correct? because my SV said the test data should not go through the feature extraction process. I am wondering how the ANN can recognize the input if not specific feature is being extract. Apologize my bad English.
If anyone have link or journal that I can refer please provide it..
Thanks a lot.
Both the training and the test data needs to be in the same format - thus your training data and test data should go through the same pre-processing steps else your network will not learn correctly.
You are doing it right (as far as I understand your question).
Example: If you were to show me 10 images of faces (training data) on paper and then present me 2 people (training data) by their name only (different feature representation) - I wouldn't be able to classify what I didn't learn. You can't train the network with images and then test it with audio or any representation other than the one you used for training. I can't link any papers for that as it's just common sense.
You can modify the training set, e.g. by adding noise. But whatever you do, the representation format has to be the same.