GPy and GPflow mathematical background - references - gpflow

Does GPy and GPflow share a common mathematical background? I'm asking this because I'm using GPy but I cannot see the references. However, GPflow provides references in its examples.
Is it Ok using keep using GPy or would you suggest the use GPflow inmediately for gaussian processes purposes?

GPy and GPflow definitely share a common mathematical background: Gaussian processes Rasmussen and Williams, and many of the concepts are very similar in both frameworks: kernels, likelihoods, mean-functions, inducing points, etc. For me, the biggest difference between GPy and GPflow is the computational backend: AFAIK GPy uses plain Python and numpy to perform all its computations, whereas GPflow relies on TensorFlow. This gives GPflow multiple nice features for free: GPU acceleration, automatic gradients, compatibility with TF eco-system, etc. Depending on your use-case, these features can be crucial or simply nice-to-have.
Here is more information on the technical details between the two frameworks:
https://gpflow.readthedocs.io/en/master/intro.html#what-s-the-difference-between-gpy-and-gpflow

That would depend on what you are actually doing.
The very basic GPs should be similar, just that GPflow relies on tensorflow for the gradients (if used) and probably some technical implementation differences.
For the other more advanced models, both libraries provide references to the respective papers in the docs. In my opinion, GPflow's design is mainly centered around the SVGP framework from [1] and [2] (and many other extensions.. I can really recommend [2] if you are interested in the theory).
But they still do provide some other implementations.
I use GPflow since it works on the GPU and offers a lot of state-of-the-art implementations. However, the disadvantage would be that it is under a lot of change.
If you want to use classic GPs and are not too concerned with performance or very up-to-date methods I'd say GPy should be sufficient and the more stable variant.
[1] Hensman, James, Alexander Matthews, and Zoubin Ghahramani. "Scalable variational Gaussian process classification." (2015).
[2] Matthews, Alexander Graeme de Garis. Scalable Gaussian process inference using variational methods. Diss. University of Cambridge, 2017.

Related

Feature selection for one class classification

I try to apply One Class SVM but my dataset contains too many features and I believe feature selection would improve my metrics. Are there any methods for feature selection that do not need the label of the class?
If yes and you are aware of an existing implementation please let me know
You'd probably get better answers asking this on Cross Validated instead of Stack Exchange, although since you ask for implementations I will answer your question.
Unsupervised methods exist that allow you to eliminate features without looking at the target variable. This is called unsupervised data (dimensionality) reduction. They work by looking for features that convey similar information and then either eliminate some of those features or reduce them to fewer features whilst retaining as much information as possible.
Some examples of data reduction techniques include PCA, redundancy analysis, variable clustering, and random projections, amongst others.
You don't mention which program you're working in but I am going to presume it's Python. sklearn has implementations for PCA and SparseRandomProjection. I know there is a module designed for variable clustering in Python but I have not used it and don't know how convenient it is. I don't know if there's an unsupervised implementation of redundancy analysis in Python but you could consider making your own. Depending on what you decide to do it might not be too tricky (especially if you just do correlation based).
In case you're working in R, finding versions of data reduction using PCA will be no problem. For variable clustering and redundancy analysis, great packages like Hmisc and ClustOfVar exist.
You can also read about other unsupervised data reduction techniques; you might find other methods more suitable.

Which architecture of Neural Network gives better accuracy for text classification?

I have tried and searched, found that RNNs give better results. Which to use: LSTMs or GRU or traditional RNN or CNN?
The architectures you mention are really loose families of architecture. Performance depends on the details and (of course) the task. Moreover, the two styles are often combined in various ways, so it isn't really an "either-or" choice.
Nevertheless, at the time of writing the CNN-like BERT and RNN-like ELMo architectures are popular. Pre-trained models and code are available for both and they both perform well across a variety of tasks, including classification. Why not try them both?
these architectures can be considered as "vanilla", because there are many advanced architectures that depend on these one, a new one called ULMFiT is . actually giving some state of the art result in classification and is simple to understand and implement using fast.ai library. BERT is also a good one but more complicated to understand in my opinion.

what is high performance version of LAPACK and BLAS?

This page of IMSL says
To obtain improved performance we recommend linking with High
Performance versions of LAPACK and BLAS, if available.
What is High Performance versions of LAPACK and BLAS ?
There are plenty of good implementations to pick from:
Intel MKL is likely the best on Intel machines. It's not free though, so that may be a problem.
According to their benchmark, OpenBLAS compares quite well with Intel MKL and is free
Eigen is also an option and has a largish (albeit old) benchmark showing good performance on small matrices (though it's not technically a drop-in BLAS library)
ATLAS, OSKI, POSKI are examples of auto-tuned kernels which will claim to work on many architectures
Generally, it is quite hard to pick one of these without benchmarking because:
some implementations work better on different types of matrices. For example Eigen works better on matrices with small rank (100s)
some are optimised for specific architectures (e.g. Intel's)
in some cases the multithreading of the BLAS library may conflict with a multithreaded application (e.g. OpenBLAS)
developer's benchmarks may tend to emphasise cases which work better on their implementation.
I would suggest pick one or two of these libraries that apply for your use case and benchmark them for your particular application on your particular (or similar) machine. This is quite easy to do even after compiling your code.
LAPACK and BLAS are performance libraries that provides basically Linear Algebra mathematical operations for a system of linear equations. you can find such libraries useful in the computer vision for example ( Object detection and classifications ) , Classical algorithms, Modelling , ...
TAsking provides a full C implementation of the LAPACK and BLAS performance libraries, both libraries are ISO-C99 Compliant with full documentation and examples, you can check it here
http://www.tasking.com/products/tasking-lapack-performance-libraries

Are there any implementations available online for filter based feature selection methods?

The selection methods I am looking for are the ones based on subset evaluation (i.e. do not simply rank individual features). I prefer implementations in Matlab or based on WEKA, but implementations in any other language will still be useful.
I am aware of the existence of CsfSubsetEval and ConsistencySubsetEval in WEKA, but they did not lead to good classification performance, probably because they suffer from the following limitation:
CsfSubsetEval is biased toward small feature subsets, which may prevent locally predictive features from being included in the selected subset, as noted in [1].
ConsistencySubsetEval use min-features bias [2] which, similarly to CsfSubsetEval, result in the selection of too few features.
I know it is "too few" because I have built classification models with larger subsets and their classification performance were relatively much better.
[1] M. A. Hall, Correlation-based Feature Subset Selection for Machine Learning, 1999.
[2] Liu, Huan, and Lei Yu, Toward integrating feature selection algorithms for classification and clustering, 2005.
Check out python scikit learn simple and efficient tools for data mining and data analysis. There are various implemented methods for feature selection, classification, evaluation and a lot of documentations and tutorials.
My search has led me to the following implementations:
FEAST toolbox: it is an interesting toolbox, developed by the University of Manchester, and provide implementations of Shannon's Information Theory functions. The implementations can be downloaded from THIS webpage, and they can be used to evaluate individual features as well as subset of features.
I have also found THIS matlab code, which is an implementation for a selection algorithm based on Interaction Information.
PY_FS: A Python Package for Feature Selection
I came across this package [1] which was just released (2021) and contains many methods with reference to their original papers.

Has anyone tried to compile code into neural network and evolve it?

Do you know if anyone has tried to compile high level programming languages (java, c#, etc') into a recurrent neural network and then evolve them?
I mean that the whole process including memory usage is stored in a graph of a neural net, and I'm talking about complex programs (thinking about natural language processing problems).
When I say neural net I mean a directed weighted graphs that spreads activation, and the nodes are functions of their inputs (linear, sigmoid and multiplicative to keep it simple).
Furthermore, is that what people mean in genetic programming or is there a difference?
Neural networks are not particularly well suited for evolving programs; their strength tends to be in classification. If anyone has tried, I haven't heard about it (which considering I barely touch neural networks is not a surprise, but I am active in the general AI field at the moment).
The main reason why neural networks aren't useful for generating programs is that they basically represent a mathematical equation (numeric, rather than functional). Given some numeric input, you get a numeric output. It is difficult to interpret these in the context of a program any more complicated than simple arithmetic.
Genetic Programming traditionally uses Lisp, which is a pure functional language, and often programs are often shown as tree diagrams (which occasionally look similar to some neural network diagrams - is this the source of your confusion?). The programs are evolved by exchanging entire branches of a tree (a function and all its parameters) between programs or regenerating an entire branch randomly.
There are certainly a lot of good (and a lot of bad) references on both of these topics out there - I refrain from listing them because it isn't clear what you are actually interested in. Wikipedia covers each of these techniques, and is a good starting point.
Genetic programming is very different from Neural networks. What you are suggesting is more along the lines of genetic programming - making small random changes to a program, possibly "breeding" successful programs. It is not easy, and I have my doubts that it can be done successfully across a large program.
You may have more luck extracting a small but critical part of your program, one which has a few particular "aspects" (such as parameter values) that you can try to evolve.
Google is your friend.
Some sophisticated anti-virus programs as well as sophisticated malware use formal grammar and genetic operators to evolve against each other using neural networks.
Here is an example paper on the topic: http://nexginrc.org/nexginrcAdmin/PublicationsFiles/raid09-sadia.pdf
Sources: A class on Artificial Intelligence I took a couple years ago.
With regards to your main question, no one has ever tried that on programming languages to the best of my knowledge, but there is some research in the field of evolutionary computation that could be compared to something like that (but it's obviously a far-fetched comparison). As a matter of possible interest, I asked a similar question about sel-improving compilers a while ago.
For a difference between genetic algorithms and genetic programming, have a look at this question.
Neural networks have nothing to do with genetic algorithms or genetic programming, but you can obviously use either to evolve neural nets (as any other thing for that matters).
You could have look at genetic-programming.org where they claim that they have found some near human competitive results produced by genetic programming.
I have not heard of self-evolving and self-imrpvoing programs before. They may exist as special research tools like genetic-programming.org have but nothing solid for generic use. And even if they exist they are very limited to special purpose operations like malware detection as Alain mentioned.