Predicting a vector of variables with Random Forest: Multidimensional Classification - classification

I have a given a feature vector containing both numerical and categorical data and a set of observations with corresponding feature vector.
The outcome variable to be classified upon is a multidimensional vector with numerical entries of same type. How can I do that? Random Forest?
My application is to predict the travel speeds in a city given the daytime, weekday, weather, temperature etc.
More precisely, I intend to classify my observations into clusters such that the overall speeds are somewhat similar in each sample.

Related

for how How to compute whether representational similarity matrix values are significant

I am new to RSA analysis in fMRI images. I used SPM 12 for preprocessing and first level analysis of my fMRI images and used RSA-toolbox to compute RDMs (representational dissimilarity matrix) for my conditions in an specific region of the brain. Now I have the RDM mateix for every single subject also have the overall RDM across all subjects. However, RSA-toolbox doesn't report any p value or significance test for the values in the RDM. How can I compute or determine which values in the RDM matrix are significant and which are not? I used pearson's r method for to compute RDMs. In paeticular I want to have an explaination about the mathematics that can be used to test significancy of these values.

Proper way of generating initial random vectors for generator model in GAN?

Frequently linear interpolation is used with a Gaussian or uniform prior which has unit variance and zero mean where the size of the vector can be defined in an arbitrary way e.g. 100 to generate initial random vectors for generator model in Generative Adversarial Neural (GAN).
Let's say we have 1000 images for training and batch size is 64. Then each epoch, need to generate a number of random vectors using prior distribution corresponding to each image given small batch. But the problem I see is that since there is no mapping between random vector and corresponding image, the same image can be generated using multiple initial random vectors. In this paper, it suggests overcoming this problem by using different spherical interpolation up to some extent.
So what will happens if initially generate random vectors corresponding to the number of training images and when train the model uses the same random vector which is generated initially?
In GANs the random seed used as input does not actually correspond to any real input image. What GANs actually do is learn a transformation function from a known noise distribution (e.g. Gaussian) to a complex unknown distribution, which is representated by i.i.d. samples (e.g. your training set). What the discriminator in a GAN does is to calculate a divergence (e.g. Wasserstein divergence, KL-divergence, etc.) between the generated data (e.g. transformed gaussian) and the real data (your training data). This is done in a stochastic fashion and therefore no link is neccessary between the real and the fake data. If you want to learn more about this on a hands on example, I can recommend that you train to train a Wasserstein GAN to transform one 1D gaussian distribution into another one. There you can visualize the discriminator and the gradient of the discriminator and really see the dynamics of such a system.
Anyways, what your paper is trying to tell you is after you have trained your GAN and want to see how it has mapped the generated data from the known noise space to the unknown image space. For this reason interpolation schemes have been invented like the spherical one you are quoting. They also show that the GAN has learned to map some parts of the latent space to key characteristics in images, like smiles. But this has nothing to do with the training of GANs.

Discriminant analysis method to classify data

my aim is to classify the data into two sections- upper and lower- finding the mid line of the peaks.
I would like to apply machine learning methods- i.e. Discriminant analysis.
Could you let me know how to do that in MATLAB?
It seems that what you are looking for is GMM (gaussian mixture model). With K=2 (number of mixtures) and dimension equal 1 this will be simple, fast method, which will give you a direct solution. Given components it is easy to analytically find a local minima (which is just a weighted average of means, with weights proportional to the std's).

Hierarchical Cluster Analysis in Cluster 3.0

I'm new to this site as well as new to cluster analysis, so I apologize if I violate conventions.
I've been using Cluster 3.0 to perform Hierarchical Cluster Analysis with Euclidean Distance and Average linkage. Cluster 3.0 outputs a .gtr file with a node joining a gene and their similarity score. I've noticed that the first line in the .gtr file always links a gene with another gene followed by the similarity score. But, how do I reproduce this similarity score?
In my data set, I have 8 genes and create a distance matrix where d_{ij} contains the Euclidian distance between gene i and gene j. Then I normalize the matrix by dividing each element by the max value in the matrix. To get the similarity matrix, I subtract all the elements from 1. However, my result does not use the linkage type and differs from the output similarity score.
I am mainly confused how linkages affect the similarity of the first node (the joining of the two closest genes) and how to compute the similarity score.
Thank you!
The algorithm compares clusters using some linkage method, not data points. However, in the first iteration of the algorithm each data point forms its own cluster; this means that your linkage method is actually reduced to the metric you use to measure the distance between data points (for your case Euclidean distance). For subsequent iterations, the distance between clusters will be measured according to your linkage method, which in your case is average link. For two clusters A and B, this is calculated as follows:
where d(a,b) is the Euclidean distance between the two data points. Convince yourself that when A and B contain just one data point (as in the first iteration) this equation reduces itself to d(a,b). I hope this makes things a bit more clear. If not, please provide more details of what exactly you want to do.

Rapidminer - neural net operator - output confidence

I have feed-forward neural network with six inputs, 1 hidden layer and two output nodes (1; 0). This NN is learned by 0;1 values.
When applying model, there are created variables confidence(0) and confidence(1), where sum of this two numbers for each row is 1.
My question is: what do these two numbers (confidence(0) and confidence(1)) exactly mean? Are these two numbers probabilities?
Thanks for answers
In general
The confidence values (or scores, as they are called in other programs) represent a measure how, well, confident the model is that the presented example belongs to a certain class. They are highly dependent on the general strategy and the properties of the algorithm.
Examples
The easiest example to illustrate is the majority classifier, who just assigns the same score for all observations based on the proportions in the original testset
Another is example the k-nearest-neighbor-classifier, where the score for a class i is calculated by averaging the distance to those examples which both belong to the k-nearest-neighbors and have class i. Then the score is sum-normalized across all classes.
In the specific example of NN, I do not know how they are calculated without checking the code. I guess it is just the value of output node, sum-normalized across both classes.
Do the confidences represent probabilities ?
In general no. To illustrate what probabilities in this context mean: If an example has probability 0.3 for class "1", then 30% of all examples with similar feature/variable values should belong to class "1" and 70% should not.
As far as I know, his task is called "calibration". For this purpose some general methods exist (e.g. binning the scores and mapping them to the class-fraction of the corresponding bin) and some classifier-dependent (like e.g. Platt Scaling which has been invented for SVMs). A good point to start is:
Bianca Zadrozny, Charles Elkan: Transforming Classifier Scores into Accurate Multiclass Probability Estimates
The confidence measures correspond to the proportion of outputs 0 and 1 that are activated in the initial training dataset.
E.g. if 30% of your training set has outputs (1;0) and the remaining 70% has outputs (0; 1), then confidence(0) = 30% and confidence(1) = 70%