GPflow multi-output support for SGPR - gpflow

GPflow seems to only support multi-output for SVGP. Is it possible to use this multi-output support for other models (e.g. SGPR)? For example:
kernel = mk.SharedIndependentMok(gpf.kernels.RBF(D), P)
feature = features.InducingPoints(X[:M,...].copy())
m = gpf.models.SGPR(X, Y, kernel, feature)

This is an identified issue (https://github.com/GPflow/GPflow/issues/1209), that currently has comparatively low priority for the GPflow core developers - but we'd be very happy for you to join us and contribute features! There is now a public GPflow slack for ease of discussion.

Related

What algorithm does core ml use?

I have searched it everywhere but couldn’t end up with an solid answer. My question is what machine learning algorithm does core ml use? How do I answer to that question?
From the documentation:
Core ML supports a variety of machine learning models, including
neural networks, tree ensembles, support vector machines, and
generalized linear models.

GPy and GPflow mathematical background - references

Does GPy and GPflow share a common mathematical background? I'm asking this because I'm using GPy but I cannot see the references. However, GPflow provides references in its examples.
Is it Ok using keep using GPy or would you suggest the use GPflow inmediately for gaussian processes purposes?
GPy and GPflow definitely share a common mathematical background: Gaussian processes Rasmussen and Williams, and many of the concepts are very similar in both frameworks: kernels, likelihoods, mean-functions, inducing points, etc. For me, the biggest difference between GPy and GPflow is the computational backend: AFAIK GPy uses plain Python and numpy to perform all its computations, whereas GPflow relies on TensorFlow. This gives GPflow multiple nice features for free: GPU acceleration, automatic gradients, compatibility with TF eco-system, etc. Depending on your use-case, these features can be crucial or simply nice-to-have.
Here is more information on the technical details between the two frameworks:
https://gpflow.readthedocs.io/en/master/intro.html#what-s-the-difference-between-gpy-and-gpflow
That would depend on what you are actually doing.
The very basic GPs should be similar, just that GPflow relies on tensorflow for the gradients (if used) and probably some technical implementation differences.
For the other more advanced models, both libraries provide references to the respective papers in the docs. In my opinion, GPflow's design is mainly centered around the SVGP framework from [1] and [2] (and many other extensions.. I can really recommend [2] if you are interested in the theory).
But they still do provide some other implementations.
I use GPflow since it works on the GPU and offers a lot of state-of-the-art implementations. However, the disadvantage would be that it is under a lot of change.
If you want to use classic GPs and are not too concerned with performance or very up-to-date methods I'd say GPy should be sufficient and the more stable variant.
[1] Hensman, James, Alexander Matthews, and Zoubin Ghahramani. "Scalable variational Gaussian process classification." (2015).
[2] Matthews, Alexander Graeme de Garis. Scalable Gaussian process inference using variational methods. Diss. University of Cambridge, 2017.

spark ml : how to find feature importance

I am new to ML and I am building a prediction system using Spark ml. I read that a major part of feature engineering is to find the importance of each feature in doing the required prediction. In my problem, I have three categorical features and two string features. I use the OneHotEncoding technique for transforming the categorical features and simple HashingTF mechanism to transform the string features. And, then these are input as various stages of the Pipeline, including ml NaiveBayes and a VectorAssembler (to assemble all the features into a single column), fit and transformed using the training and test data sets respectively.
Everything is good, except, how do I decide the importance of each feature? I know I have only a handful of features now, but I will be adding more soon. The closest thing I came across was the ChiSqSelector available with spark ml module, but it seems to only work for categorical features.
Thanks, any leads appreciated!
You can see these example:
The method mentioned in question's comment
Information Gain based feature selection in Spark’s MLlib
This package contains several feature selection methods (also InfoGain):
Information Theoretic Feature Selection Framework
Using ChiSqSelector is okay, you can simply discretize your continuous features (the HashingTF values). One example is provided in: http://spark.apache.org/docs/latest/mllib-feature-extraction.html, I copy here the part of interest:
// Discretize data in 16 equal bins since ChiSqSelector requires categorical features
// Even though features are doubles, the ChiSqSelector treats each unique value as a category
val discretizedData = data.map { lp =>
LabeledPoint(lp.label, Vectors.dense(lp.features.toArray.map { x => (x / 16).floor })) }
L1 regulation is also an option.
You may use L1 to get the features importance from the coefficients, and decide which features to use for the Bayes training accordingly.
Example of getting coefficients
Update:
Some conditions under which coefficients not work very well

Are there any implementations available online for filter based feature selection methods?

The selection methods I am looking for are the ones based on subset evaluation (i.e. do not simply rank individual features). I prefer implementations in Matlab or based on WEKA, but implementations in any other language will still be useful.
I am aware of the existence of CsfSubsetEval and ConsistencySubsetEval in WEKA, but they did not lead to good classification performance, probably because they suffer from the following limitation:
CsfSubsetEval is biased toward small feature subsets, which may prevent locally predictive features from being included in the selected subset, as noted in [1].
ConsistencySubsetEval use min-features bias [2] which, similarly to CsfSubsetEval, result in the selection of too few features.
I know it is "too few" because I have built classification models with larger subsets and their classification performance were relatively much better.
[1] M. A. Hall, Correlation-based Feature Subset Selection for Machine Learning, 1999.
[2] Liu, Huan, and Lei Yu, Toward integrating feature selection algorithms for classification and clustering, 2005.
Check out python scikit learn simple and efficient tools for data mining and data analysis. There are various implemented methods for feature selection, classification, evaluation and a lot of documentations and tutorials.
My search has led me to the following implementations:
FEAST toolbox: it is an interesting toolbox, developed by the University of Manchester, and provide implementations of Shannon's Information Theory functions. The implementations can be downloaded from THIS webpage, and they can be used to evaluate individual features as well as subset of features.
I have also found THIS matlab code, which is an implementation for a selection algorithm based on Interaction Information.
PY_FS: A Python Package for Feature Selection
I came across this package [1] which was just released (2021) and contains many methods with reference to their original papers.

Sensitivity analysis in LP solvers from MATLAB

As far as I understand, CPLEX, LP_solve and GLPK, among other LP solvers, offer sensitivity analysis.
I have the above three solvers installed on my machine, along with these two MATLAB wrappers:
CPLEX for MATLAB API (for CPLEX)
YALMIP (a general MATLAB wrapper for several solvers)
I looked in the documentation of these two wrappers but could not find a way of running sensitivity analysis from them. Do they support it? If not, are there any LP solvers that offer MATLAB support for their sensitivity analysis?
What do I mean by sensitivity analysis?
I mean sensitivity analysis with respect to the cost function and constraints. Conceptually speaking, sensitivity analysis tries to address the following question:
How would the solution change if some aspect of the problem is
changed?
For example:
What is the range of values the coefficient for the variable j can
take without affecting the optimality of the solution?
More specifically, here is a list of the Java, C++ and C APIs that CPLEX provides for sensitivity analysis.
Here is information about the sensitivity analysis provided by LP_solve. You can find the help text for the previous link within LP_solve's main reference guide by searching for "sensitivity" here.