Convex-hull in Pyspark

Convex-hull in Pyspark - pyspark

Given a dataframe of points in R^n (n dimensions), is there a way to calculate the area of their convex hull or to find the points that create the convex hull?
(I know that in Python we have scipy, but I need it for distributed data, as the dataframe is large and with high dimensionality)
Thanks in advance

Related

Evaluation of K-means clustering ( accuracy)

I created a 2-dimensional random datasets (composed from a dataset of points and a column of labels) for centroid based k-means clustering in MATLAB where each point is represented by a vector of X and Y (the point coordinates) and each label represents the data point cluster,see example in figure below.
I applied the K-means clustering algorithm on these point datasets. I need help with the following:
What function can I use to evaluate the accuracy of the K-means algorithm? In more detail: My aim is to score the Kmeans algorithm based on how many assigned labels it correctly identifies by comparing with assigned numbers by matlab. For example, I verify if the point (7.200592168, 11.73878455) is assigned with the point (6.951107307, 11.27498898) to the same cluster... etc.

If I correctly understand your question, you are looking for the adjusted rand index. This will score the similarity between your matlab labels and your k-means labels.
Alternatively you can create a confusion matrix to visualise the mapping between your two labelsets.

I would use squared error
You are trying to minimize the total squared distance between each point and the mean coordinate of it's cluster.

Finding all complex eigenvalues near the unit circle of a large matrix (n around 10k-100k) with MATLAB

I am studying wave propagation in periodic material and need to computes slowness surfaces which are obtained by computing polynomial eigenvalues of some matrix.
Given that I am only interested in propagative waves, only eigenvalues near the unit circle should be researched.
Is there an efficient way to computes those given we do not know the number of values we are searching ?
Thanks for the help !

K-means Clustering, major understanding issue

Suppose that we have a 64dim matrix to cluster, let's say that the matrix dataset is dt=64x150.
Using from vl_feat's library its kmeans function, I will cluster my dataset to 20 centrers:
[centers, assignments] = vl_kmeans(dt, 20);
centers is a 64x20 matrix.
assignments is a 1x150 matrix with values inside it.
According to manual: The vector assignments contains the (hard) assignments of the input data to the clusters.
I still can not understand what those numbers in the matrix assignments mean. I dont get it at all. Anyone mind helping me a bit here? An example or something would be great. What do these values represent anyway?

In k-means the problem you are trying to solve is the problem of clustering your 150 points into 20 clusters. Each point is a 64-dimension point and thus represented by a vector of size 64. So in your case dt is the set of points, each column is a 64-dim vector.
After running the algorithm you get centers and assignments. centers are the 20 positions of the cluster's center in a 64-dim space, in case you want to visualize it, measure distances between points and clusters, etc. 'assignments' on the other hand contains the actual assignments of each 64-dim point in dt. So if assignments[7] is 15 it indicates that the 7th vector in dt belongs to the 15th cluster.
For example here you can see clustering of lots of 2d points, let's say 1000 into 3 clusters. In this case dt would be 2x1000, centers would be 2x3 and assignments would be 1x1000 and will hold numbers ranging from 1 to 3 (or 0 to 2, in case you're using openCV)
EDIT:
The code to produce this image is located here: http://pypr.sourceforge.net/kmeans.html#k-means-example along with a tutorial on kmeans for pyPR.

In openCV it is the number of the cluster that each of the input points belong to

Matlab Finding All Possible Triangles

I have several datasets and smallest have around 1000 points and largest have around 1,000,000 points. These points are consist of Longitude and Latitude information.
I would like to create triangles for all possible combinations of these points. I am planning to use Matlab. I will appreciate any answer about how to create triplets of points from these datasets by using Matlab.
One other problem is as you can see there are huge number of points in my dataset so how can I find a fast way to do this. Thanks for any help.

You can call combnk( points, k);
http://www.mathworks.in/help/stats/combnk.html

Controlled random number/dataset generation in MATLAB

Say, I have a cube of dimensions 1x1x1 spanning between coordinates (0,0,0) and (1,1,1). I want to generate a random set of points (assume 10 points) within this cube which are somewhat uniformly distributed (i.e. within certain minimum and maximum distance from each other and also not too close to the boundaries). How do I go about this without using loops? If this is not possible using vector/matrix operations then the solution with loops will also do.
Let me provide some more background details about my problem (This will help in terms of what I exactly need and why). I want to integrate a function, F(x,y,z), inside a polyhedron. I want to do it numerically as follows:
$F(x,y,z) = \sum_{i} F(x_i,y_i,z_i) \times V_i(x_i,y_i,z_i)$
Here, $F(x_i,y_i,z_i)$ is the value of function at point $(x_i,y_i,z_i)$ and $V_i$ is the weight. So to calculate the integral accurately, I need to identify set of random points which are not too close to each other or not too far from each other (Sorry but I myself don't know what this range is. I will be able to figure this out using parametric study only after I have a working code). Also, I need to do this for a 3D mesh which has multiple polyhedrons, hence I want to avoid loops to speed things out.

Check out this nice random vectors generator with fixed sum FEX file.
The code "generates m random n-element column vectors of values, [x1;x2;...;xn], each with a fixed sum, s, and subject to a restriction a<=xi<=b. The vectors are randomly and uniformly distributed in the n-1 dimensional space of solutions. This is accomplished by decomposing that space into a number of different types of simplexes (the many-dimensional generalizations of line segments, triangles, and tetrahedra.) The 'rand' function is used to distribute vectors within each simplex uniformly, and further calls on 'rand' serve to select different types of simplexes with probabilities proportional to their respective n-1 dimensional volumes. This algorithm does not perform any rejection of solutions - all are generated so as to already fit within the prescribed hypercube."

Use i=rand(3,10) where each column corresponds to one point, and each row corresponds to the coordinate in one axis (x,y,z)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Convex-hull in Pyspark - pyspark

Related

Evaluation of K-means clustering ( accuracy)

Finding all complex eigenvalues near the unit circle of a large matrix (n around 10k-100k) with MATLAB

K-means Clustering, major understanding issue

Matlab Finding All Possible Triangles

Controlled random number/dataset generation in MATLAB

Categories

Resources