Creating pairwise cluster plot with histograms down diagonal (similar to FisherIris Demo) - matlab

I have 8 columns of data and want to create a pairwise scatter plot like the one in the Kevin Murphy Book
I was told there is code online to create this found it here, but i dont see how it seems to cluster the data and i dont know how to adapt it to my data

Related

Plot multiple probability distribution function lines on one plot in MatLab

I'm new to matlab and am trying to work with three dimensional data. It has latitude, longitude and time as the three dimensions, and I want to create a PDF for each of these matrices and then put all of them on the same plot rather than have three separate PDF plots. I don't know how to create reproducible data for matlab for this question so if there are more questions I can provide more guidance (also would be happy to hear guidance about how to create reproducible 3 dimensional data).
Essentially I need help creating a probability distribution function for three dimensional data, and then I want to plot multiple PDF lines on the same figure.
I've tried using the histogram function and normplot() but neither has worked.

Visualizing clusters using TSNE

I have a dataset which I need to cluster and display in a way wherein elements in the same cluster should appear closer together. The dataset is based out of a research study, and has around 16 rows(entries) and about 50 features. I do agree that its not an ideal dataset to begin with, but unfortunately thats is the situation on hand.
Following is the approach I took:
I first applied KMeans on the dataset after normalizing it.
In parallel I also tried to use TSNE to map the data into 2 dimensions and plotted them on a scatterplot. From my understanding of TSNE, that technique should already be placing items in same clusters closer to each other. When I look at the scatterplot, however, the clusters are really all over the place.
The result of the scatterplot can be found here: https://imgur.com/ZPhPjHB
Is this because TSNE and KMeans intrinsically work differently? Should I just do TSNE and try to label the clusters (and if so, how?) or should I be using TSNE output to feed into KMeans somehow?
I am really new in this space and advice would be greatly appreciated!
Thanks in advance once again
Edit: The same overlap happens if I first use TSNE to reduce dimensions to 2 and then use those reduced dimensions to cluster using KMeans
There is a difference between TSNE and KMeans. TSNE is used for visualization mostly and it tries to project points on the 2D/3D space (from bigger spaces) in order to keep distances (if in the bigger space 2 points were far away TSNE will try to show it).
So TSNE is not a real clustering. And that's why results you got that strange scatter plot.
For TSNE sometimes you need to apply PCA before but that is needed if your number of features is big. Just to speed-up calculations.
As already advised, try to use hierarchical clustering or simply generate more rows.
Apply tSNE and fit k-means is one of the basic things you can start from.
I would say consider using different f-divergence.
Stochastic Neighbor Embedding under f-divergences https://arxiv.org/pdf/1811.01247.pdf
This paper tries five different f- divergence functions : KL, RKL, JS, CH (Chi-Square), HL (Hellinger).
The paper goes over which divergence emphasize what in terms of precision and recall.

K-means boundaries

Is there any way to find boundaries (coordinates) for a x-y data in kmeans clustering. I produced 8 clusters from the xy data which looks like below (each color represent one cluster). I need to get values of the boundaries for each cluster.
The ELKI tool that I usually use for clustering will generate the boundaries for you in the visualization. I don't know if it will also output the coordinates to a file though.
It's called a Voronoi diagram, and you need the dual, the Delaunay Triangulation to build it. You can easily find algorithms for that.
Beware that some edges will go to infinity (just imagine two clusters, how does their boundary look like? What are the coordinates of the boundary?)
Note that on your data set, this clustering does not appear to be very good. The boundaries between clusters look quite arbitrary to me.

naïve Bayes classifier

I am working on a naïve Bayes classifier and would like to classify some data using MATLAB. In the example of Fisher's Iris Data as given in MATLAB (see here for details), they consider only the first 2 variables (Sepal Length & Width). I would like to proceed with classification with more features such as Petal Length and Petal Width.
In the documentation of this Fisher Iris example it is mentioned that "You can use the two columns containing sepal measurements." I want to take 3 or 4 columns means 4 properties with 2 classes. I want to plot the classes on x-axis and y-axis. How I can do this?
You can plot things in 3D, and use color as your fourth dimension. However this will not be readable at all especially with large datasets.
I recommend you plot combinations of 2D because you will need to use color encoding for your class type normally.
The MATLAB machine learning app can be very helpful to you.

k-means algorithm for energy data against time and date

I am using Matlab 2015a.
I have got electricity consumption data to cluster it. Initially i am trying to cluster it against hours and dates. I have created three different variables, one for time, one for dates and third for data. I am unable to understand how should i combine these in a matrix form so that the loads are distributed according to time? Then i have tried to look how can i plot a line graph for k-means but i can only find scatter command graphs but no line plots.
Further how can i plot it as a 3-d plot?
Further at a later stage i want to include temperature variable aswel. But when the 4th variable is involved, what will the plot be? will it still be 3-d?
Any suggestions, links?
In Matlab you can create N-dimensional matrices, so you can arrange your 3D data in a N*M*3 matrix (you might want to look for the cat() function to help you out).
There are several functions that allow you to plot in 3D, one of these is scatter3() which is perfect for K-Means clustering. I don't really understand which lines you do want to plot: K-Means is about clusters and centroids (i.e. points).
If a 4th variable is involved, you can as well create a 4D matrix. Although I reckon plotting a 4D graph isn't going to be easy. A first approach might be using several colours for your scatter points with different colours for different temperatures (or temperatures range). In this case the 5th input argument for scatter3() will be helpful.
Help for scatter3() here.
Help for cat() here.