Autoencoders: Papers and Books regarding algorithms for Training - neural-network

Which are some of the famous research papers and/or books which concern Autoencoders and the various different training algorithms for Autoencoders?
I'm talking about research papers and/or books which lay the foundation for the different training algorithms used to train autoencoders.

I first saw autoencoders in Fahlman's 1988 article where he introduces quickpropagation for their training. The paper is here.
Fahlman, S. E. (1988) "Faster-Learning Variations on Back-Propagation: An Empirical Study" in Proceedings, 1988 Connectionist Models Summer School, Morgan-Kaufmann, Los Altos CA. (This paper introduced the Quickprop learning algorithm.)
I also wrote the following example around it, including quickprop.
https://github.com/encog/encog-java-examples/blob/master/src/main/java/org/encog/examples/neural/benchmark/FahlmanEncoder.java

Related

Encoding a graph-like physical structure for genetic algorithms

I am looking for research/literature on encoding physical graph-like structures, such as bridges or buildings, as chromosomes for a genetic algorithm.
By graph-like I mean the structure is nodes connected by edges, or for example steel beams connected by welded joints. Research on mutation operators that work on shapes to modify them is also helpful. For these structures, the positions of each joint in space are important, not just the connections themselves.
I am familiar with bitstrings and real-valued encodings. Perhaps there are some similarities to genetic algorithm encodings for neural networks.
Graph structures of neural networks can be adopted to general structures. I just remembered Combining Genetic Algorithms and Neural Networks: The Encoding Problem by Koehn discusses this in 2.2.2 Node-based Encoding and 2.3 Indirect Encoding. Also, the NEAT algorithm fits the problem description nicely.
Perhaps you could find interesting:
EVOLUTIONARY DESIGN OF ANALOG ELECTRICAL CIRCUITS USING GENETIC PROGRAMMING
(John R. Koza, Forrest H Bennett III, David Andre, Martin A. Keane)
It describes a genetic programming approach for automatically designing an analog electrical circuit from a high-level statement of the circuit's desired behavior.
The technique is quite general: in the paper it's used to automate the design of both the topology and sizing of the circuit, but it could be adapted to other graph-like problems.
You should change the three original component-creation functions for resistors, capacitors and inductors with steel beams...
The nine ways of changing internal connections can probably be used with minor modifications.

Evaluation of user-based collaborative filtering K-Nearest Neighbor Algorithm

I was trying to find evaluation mechanisms of collaborative K-Nearest neighbor algorithm, but i am confused how can I evaluate this algorithm. How can I be sure that the recommendation done by this algorithm is correct or good. Actually I have also developed an algorithm that i want to compare with it. but i am not sure how can i compare and evaluate both of them. The data set used by me is of movie lens.
your people help on evaluating this recomender system will be highly appreciated.
Evaluating recommender systems is a large concern of its research and industry communities. Look at "Evaluating collaborative filtering recommender systems", a Herlocker et al paper. The people who publish MovieLens data (the GroupLens research lab at the University of Minnesota) also publish many papers on recsys topics, and the PDFs are often free at http://grouplens.org/publications/.
Check out https://scholar.google.com/scholar?hl=en&q=evaluating+recommender+systems.
In short, you should use a method that hides some data. You will train your model on a portion of the data (called "training data") and test on the remainder of the data that your model has never seen before. There's a formal way to do this called cross-validation, but the general concept of visible training data versus hidden test data is the most important.
I also recommend https://www.coursera.org/learn/recommender-systems, a Coursera course on recommender systems taught by GroupLens folks. In that course you'll learn to use LensKit, a recommender systems framework in Java that includes a large evaluation suite. Even if you don't take the course, LensKit may be just what you want.

Classification in real time without prior knowledge of the number of classes

Is there an implemented algorithm (with python/R or java in preference) that can classify incoming data from an unknown generator with absolutely no prior knowledge or assumption.
For example:
Let G be a generator of 2d vectors that generate one vector in each second.
What we know, and nothing else, is that this vectors are separable into clusters in space (euclidean distance).
Question: How can I classify my data in real time so that at each iteration, the algorithm propose clusters?
I'm also in the process of searching for something related to Data stream clustering and I found some papers and code:
Aforementioned survey by Charu C. Aggarwal from aforementioned book;
Density-Based Clustering over an Evolving Data Stream with Noise. by Feng Cao et al., proposes DenStream; here is a git repo for that (Matlab);
Density-Based Clustering for Real-Time Stream Data by Yixin Chen, Li Tu, proposes the D-Stream framework (2008 version called Stream data clustering based on grid density and attraction); There is a DD-Stream that I can't find a pdf for in A grid and density-based clustering algorithm for processing data stream by Jia.
A Fast and Stable Incremental Clustering Algorithm by Steven Young et al. focuses on clustering as an unsupervised learning process;
Self-Adaptive Anytime Stream Clustering by Philipp Kranen et al. has ClusTree and this git repo implements DClusTree;
Pre-clustering algorithm for anomaly detection and clustering that uses variable size buckets by Manish Sharma et al. is more recent and may be relevant (git repo of the author);
This paper is about MOA (Massive Online Analysis) states that it implements some of the above (StreamKM++, CluStream, ClusTree, Den-Stream, D-Stream and CobWeb). I believe that D-Stream is work in progress/wishful thinking (is not part of the pre-release available from their website). MOA is written in Java, here is streamMOA package.
The code in this repository seems to be a Python implementation of the D-Stream but, according to the author, it is slow.
Also, stream is a framework for data stream clustering research with R.
I think you are asking about "Stream mining" here.
Read this article
Chapter 10: A Survey of Stream Clustering Algorithms.
Charu C. Aggarwal, IBM T. J. Watson Research Center, Yorktown Heights, NY
This can be found in the 2014 book
DATA CLUSTERING- Algorithms and Applications, Edited by Charu C. Aggarwal and Chandan K. Reddy.
In that chapter the "CluStream" framework is described. This project is from 2002, and it is based on the BIRCH algorithm from 1997 which is a "Micro-Clustering" approach. The algorithm creates an index structure on the fly.
Considering that there are few BIRCH implementations,
there is probably no open-source CluStream algorithm/framework available.
Here's a Github repo with a BIRCH implementation in Java - although I haven't tried this code, and that repo is not for "stream mining".
All this just appeared on my radar because I just recently participated in the Coursera MOOC on cluster analysis.
There are no assumption free methods.
You are asking for magic to happen.
Never blindly use a clustering result. Do not use clustering on a stream. Instead, analyze and correct any clustering result before deployment.
Watch out for hidden assumptions. For example, assumptions that clusters are convex, distance based (why is Euclidean distance the coorect choice?), have the same size or extend, are separated (by what?) or shape. Whenever you design a method, you make assumptions on what is interesting.
Wothout assumption, anything is a "clustering"!

Theoretically, can everyday computing tasks be broken down into ones solvable by a neural network?

MIT Review recently published this article about a chip from IBM, which is more or less a Artificial neural network. Why IBM’s New Brainlike Chip May Be “Historic” | MIT Technology Review
The article suggests that the chip might have borrowed a page from the future. It might be the beginning of an era of new and evolved computation power. And also talks about programming for the chip.
One downside is that IBM’s chip requires an entirely new approach to
programming. Although the company announced a suite of tools geared
toward writing code for its forthcoming chip last year (see “IBM
Scientists Show Blueprints for Brainlike Computing”), even the best
programmers find learning to work with the chip bruising, says Modha:
“It’s almost always a frustrating experience.” His team is working to
create a library of ready-made blocks of code to make the process
easier.
Which brings me to the question, can everyday computing tasks be broken down into ones solvable by a neural network (theoretically and/or practically)?
It depends from the task.
There are plenty of tasks, for which John von Neumann computers are good enough. For example calculate precise values of functions in some range, or for example apply some filter at image, or store text into db and read it from it or store prices of some products etc. This is not area where NN are needed. Of course it is possible theoretically to train NN to choose where and how to save data. Or to train it to accountancy. But for cases where big array of data need to be analyzed, and current methods doesn't feet NN can be option to consider. Or speech recognition, or buying some pictures which will become masterpieces in the future can be done with NN.

Integrating content information with factorization-based collaborative filtering

I'm reading some papers in CF and noticed that most state-of-the-art methods are based on different factorization methods on the rating matrix only. I'd like to know if there are some representative works on combining content information (e.g. user features and item features) into factorization. Any ideas?
I am a researcher in the field of recommender systems, and did some work on exactly that. Here are some papers on that topic:
Aditya Krishna Menon, Charles Elkan: A log-linear model with latent features for dyadic prediction, ICDM 2010
David Stern, Ralf Herbrich, and Thore Graepel: Matchbox: Large Scale Bayesian Recommendations, WWW 2009
Chong Wang, David Blei: Collaborative topic modeling for recommending scientific articles, KDD 2011
Zeno Gantner, Lucas Drumond, Christoph Freudenthaler, Steffen Rendle, Lars Schmidt-Thieme: Learning Attribute-to-Feature Mappings for Cold-Start Recommendations, ICDM 2010
D. Agarwal and B.-C. Chen. Regression-based latent factor models, KDD 2009
D. Agarwal and B.-C. Chen. fLDA: Matrix factorization through latent dirichlet allocation, WSDM 2010
Please note that (4) is a paper by me, so this is also some kind of advertisement ;-)
Also, the KDD Cup 2011 involved an item taxonomy, and there has been some interesting work on combining such taxonomy information with latent factor models at the workshop: http://kddcup.yahoo.com/workshop.php
See for example "5. Hybrid Collaborative Filtering Techniques" in
X. Su, T. M. Khoshgoftaar, A Survey of Collaborative Filtering Techniques,
Advances in Artificial Intelligence (2009). PDF