k-means algorithm for energy data against time and date - matlab

I am using Matlab 2015a.
I have got electricity consumption data to cluster it. Initially i am trying to cluster it against hours and dates. I have created three different variables, one for time, one for dates and third for data. I am unable to understand how should i combine these in a matrix form so that the loads are distributed according to time? Then i have tried to look how can i plot a line graph for k-means but i can only find scatter command graphs but no line plots.
Further how can i plot it as a 3-d plot?
Further at a later stage i want to include temperature variable aswel. But when the 4th variable is involved, what will the plot be? will it still be 3-d?
Any suggestions, links?

In Matlab you can create N-dimensional matrices, so you can arrange your 3D data in a N*M*3 matrix (you might want to look for the cat() function to help you out).
There are several functions that allow you to plot in 3D, one of these is scatter3() which is perfect for K-Means clustering. I don't really understand which lines you do want to plot: K-Means is about clusters and centroids (i.e. points).
If a 4th variable is involved, you can as well create a 4D matrix. Although I reckon plotting a 4D graph isn't going to be easy. A first approach might be using several colours for your scatter points with different colours for different temperatures (or temperatures range). In this case the 5th input argument for scatter3() will be helpful.
Help for scatter3() here.
Help for cat() here.

Related

Plot multiple probability distribution function lines on one plot in MatLab

I'm new to matlab and am trying to work with three dimensional data. It has latitude, longitude and time as the three dimensions, and I want to create a PDF for each of these matrices and then put all of them on the same plot rather than have three separate PDF plots. I don't know how to create reproducible data for matlab for this question so if there are more questions I can provide more guidance (also would be happy to hear guidance about how to create reproducible 3 dimensional data).
Essentially I need help creating a probability distribution function for three dimensional data, and then I want to plot multiple PDF lines on the same figure.
I've tried using the histogram function and normplot() but neither has worked.

Visualizing clusters using TSNE

I have a dataset which I need to cluster and display in a way wherein elements in the same cluster should appear closer together. The dataset is based out of a research study, and has around 16 rows(entries) and about 50 features. I do agree that its not an ideal dataset to begin with, but unfortunately thats is the situation on hand.
Following is the approach I took:
I first applied KMeans on the dataset after normalizing it.
In parallel I also tried to use TSNE to map the data into 2 dimensions and plotted them on a scatterplot. From my understanding of TSNE, that technique should already be placing items in same clusters closer to each other. When I look at the scatterplot, however, the clusters are really all over the place.
The result of the scatterplot can be found here: https://imgur.com/ZPhPjHB
Is this because TSNE and KMeans intrinsically work differently? Should I just do TSNE and try to label the clusters (and if so, how?) or should I be using TSNE output to feed into KMeans somehow?
I am really new in this space and advice would be greatly appreciated!
Thanks in advance once again
Edit: The same overlap happens if I first use TSNE to reduce dimensions to 2 and then use those reduced dimensions to cluster using KMeans
There is a difference between TSNE and KMeans. TSNE is used for visualization mostly and it tries to project points on the 2D/3D space (from bigger spaces) in order to keep distances (if in the bigger space 2 points were far away TSNE will try to show it).
So TSNE is not a real clustering. And that's why results you got that strange scatter plot.
For TSNE sometimes you need to apply PCA before but that is needed if your number of features is big. Just to speed-up calculations.
As already advised, try to use hierarchical clustering or simply generate more rows.
Apply tSNE and fit k-means is one of the basic things you can start from.
I would say consider using different f-divergence.
Stochastic Neighbor Embedding under f-divergences https://arxiv.org/pdf/1811.01247.pdf
This paper tries five different f- divergence functions : KL, RKL, JS, CH (Chi-Square), HL (Hellinger).
The paper goes over which divergence emphasize what in terms of precision and recall.

How to display clusters of optics algorithm in matlab

I used optics.m function from http://chemometria.us.edu.pl/download/OPTICS.M to calculate optics algorithm in MATLAB. This function outputs RD and CD and Order vector of all points.
I used bar(RD(order)); code to display Reachability plot of them. But I want to index clusters of points and scatter them in MATLAB. How can i do that?
Don't use that code. It is slow, and I read somewhere here that it is even giving incorrect results?
But most of all, it lacks cluster extraction functionality; it is only half of OPTICS.

Data interpolation over a non-homogeneous surface

In my project i deal with big data surfaces.
In a certain point, i have a line across the data, and I need the values of the points of the line.
The grid is non,homogeneous, it doesnt go from n:m with fixed steps nor nothing.
Lets ilustrate!
In the figure the 2D proyection of my data can be seen. Each of the points has also other 3 data information. I defined a arbitrary red line with the form y=ax+b. a and b are known.
How can I define i.e. 50 points in the line that has not only the x and y coords (wich is straigforward) but also the interpolation of the 3 data information of each of the points around it.
I know is not an easy question but I can't seem to step forward even a bit.
PD: realize I DONT want code written for me, but the idea of how to achieve my objective.
You could use a tool like triScatteredInterp, which will triangulate the 2-d domain, then interpolate a list of points along your line. Griddata is also an option.
I have a toolbox for problems like this (of course.) It allows me to build a triangulation of the non-convex domain in the (x,y) plane. Then it can form a completely general slice through that surface, interpolating in z also as it does so. The result will be a 1-manifold, in this case a piecewise linear function along that path in (x,y,z). While those tools are not posted on the file exchange, they are available for the person willing to invest the time to learn to use them.
If the surface you describe is a completely general one in 3-d, that might be fairly complex, then you might need a CRUST based tool to define that surface triangulation. These can be found online too. Once a triangulation is available, my tools can then be used to slice them. (Sorry, I never did finish that piece.)
What I did was to define several points in the crack line and then cheack for each one of them in wich quadrilateral it is with inpoligon matlab function (no tthe fastest way but less than 2 secs).
Then I created a triangular plane in the used quadrilaterals using x,y and Z or the othre data , achieving a linear interpolation between the data.
finally i take out all the points that are 0 o Nan.

Extract only one spike from matlab plot

I have a matlab plot between two matrices (129 by 1), I just want to extract the only first spike from the multiple spikes as seen in plot. I am newbie to Matlab.
-John
Either code your own or use this peak finder algorithm. That one works very well.