How to make a point cloud from xyz values in Matlab - matlab

I am trying to convert data from radar sensors into a pointcloud to use pcsegdist, however, I am getting a Too Many Inputs error when trying to make the pointcloud.
points = [xvals(:), yvals(:), zvals(:)];
ptCloud = pointCloud(points);
xvals, yvals, and zvals are arrays of the radar readings.
Am I formatting the values into the points array incorrectly?
Is there a way to group points without converting everything into a pointcloud?
I tried using test values instead of the actual radar readings but I still got the same "too many inputs error.
testVals = [1, 2, 3, 4, 5, 6, 7];
points = [testVals(:), testVals(:), testVals(:)];
ptCloud = pointCloud(points);

Related

How to deal with KML3D results in R?

I have done a KML3d in R using longitudinal medical data with 2 outcome-measurements (Eq5d score + oxford score) measured on 3 time-points.
After pre-processing the data, I used the following code to build the model:
trajectory_knee <-clusterLongData3d(data, timeInData = list(oxford = c(17, 19, 21), eq5d = c(18, 20, 22)), varNames = c("Oxford score", "Eq5d score"))
kml3d(trajectory_knee, nbClusters = 2:5, nbRedrawing = 4, toPlot = "both")
The goal of my KML3d is to predict different clusters containing observations. However, running the KML3D gave me the calinski harabatz plot, while my goal is to:
obtain the cluster labels generated by the model
plot the clusters
use the BIC + plot trajectories to find the optimal numbers of clusters
However, I do not know how to reach the above goals. Is there someone who can help me/ put me in the right direction?
Thanks!
I tried using
NbClust
Plotallcriterion
choice(trajectory)
but none of these give me information about the previously formulated goals...

How to randomly sample data with seeding?

I would like to randomly choose elements from a finite set that contains both numbers and NaNs while seeding the random number generation procedure.
So far I can make it work without seeding:
data = [0, 1, 2, 3, 4, 5, nan];
sample = datasample(data, 50);
but if I want to seed the number generation:
seed = rng(100);
sample = datasample(seed, data, 50);
I get the following error:
Error using datasample (line 89)
Sample size K must be a non-negative integer.
even if the syntax for datasample is (*):
[y,...] = datasample(s,data,k,...)
I have tried using randsample, too, but I get similar results.
(*) https://it.mathworks.com/help/stats/datasample.html
The documentation isn't super explicit about the first input. You need to pass a RandStream object as the first input argument rather than the struct that rng generates (As a sidenote, the output of rng is the previous setting not the new settings).
Here is the equivalent of what it seems you were trying to do
stream = RandStream('mt19937ar', 'Seed', 100);
output = datasample(stream, data, k);
If you want to instead use rng to specify the seed, you can call rng and then use RandStream.getGlobalStream to get the current global random number stream and then pass that to datasample. This is slightly redudant though since datasample is going to use the global random number stream if one isn't provided.
rng(100)
stream = RandStream.getGlobalStream();
output = datasample(stream, data, k);

Subscript indices must either be real positive integers or logicals while Registering Multimodal 3-D Medical Images

I got this error while trying to do the sample on Registering Multimodal 3-D Medical Images. the sample files are at this URL.
Here are what commands I used:
fixedHeader = helperReadHeaderRIRE('header.ascii'); // at training_001/mr_T1 folder
movingHeader = helperReadHeaderRIRE('header.ascii'); // at training_001/ct folder
fixedVolume = multibandread('image.bin',...
[fixedHeader.Rows, fixedHeader.Columns, fixedHeader.Slices],...
'int16=>single', 0, 'bsq', 'ieee-be' );
// at training_001/mr_T1 folder
movingVolume = multibandread('image.bin',...
[movingHeader.Rows, movingHeader.Columns, movingHeader.Slices],...
'int16=>single', 0, 'bsq', 'ieee-be' );
// at training_001/ct folder
helperVolumeRegistration(fixedVolume,movingVolume);
centerFixed = size(fixedVolume)/2;
centerMoving = size(movingVolume)/2;
figure, title('Unregistered Axial slice');
imshowpair(movingVolume(:,:,centerMoving(3)), fixedVolume(:,:,centerFixed(3)));
and I get the error.
I use MATLAB 2014a version.
The cause probably is that the 3rd dimension of fixedVolume and/or movingVolume is an odd number so that dividing by 2 produces a non-integer (###.5). Such a fractional number can not be used as an index into an array, as you try to do on the last line. A possible fix is to round the result of the division:
centerFixed = round(size(fixedVolume)/2);
centerMoving = round(size(movingVolume)/2);

clustering and matlab

I'm trying to cluster some data I have from the KDD 1999 cup dataset
the output from the file looks like this:
0,tcp,http,SF,239,486,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,19,19,1.00,0.00,0.05,0.00,0.00,0.00,0.00,0.00,normal.
with 48 thousand different records in that format. I have cleaned the data up and removed the text keeping only the numbers. The output looks like this now:
I created a comma delimited file in excel and saved as a csv file then created a data source from the csv file in matlab, ive tryed running it through the fcm toolbox in matlab (findcluster outputs 38 data types which is expected with 38 columns).
The clusters however don't look like clusters or its not accepting and working the way I need it to.
Could anyone help finding the clusters? Im new to matlab so don't have any experience and I'm also new to clustering.
The method:
Chose number of clusters (K)
Initialize centroids (K patterns randomly chosen from data set)
Assign each pattern to the cluster with closest centroid
Calculate means of each cluster to be its new centroid
Repeat step 3 until a stopping criteria is met (no pattern move to another cluster)
This is what I'm trying to achieve:
This is what I'm getting:
load kddcup1.dat
plot(kddcup1(:,1),kddcup1(:,2),'o')
[center,U,objFcn] = fcm(kddcup1,2);
Iteration count = 1, obj. fcn = 253224062681230720.000000
Iteration count = 2, obj. fcn = 241493132059137410.000000
Iteration count = 3, obj. fcn = 241484544542298110.000000
Iteration count = 4, obj. fcn = 241439204971005280.000000
Iteration count = 5, obj. fcn = 241090628742523840.000000
Iteration count = 6, obj. fcn = 239363408546874750.000000
Iteration count = 7, obj. fcn = 238580863900727680.000000
Iteration count = 8, obj. fcn = 238346826370420990.000000
Iteration count = 9, obj. fcn = 237617756429912510.000000
Iteration count = 10, obj. fcn = 226364785036628320.000000
Iteration count = 11, obj. fcn = 94590774984961184.000000
Iteration count = 12, obj. fcn = 2220521449216102.500000
Iteration count = 13, obj. fcn = 2220521273191876.200000
Iteration count = 14, obj. fcn = 2220521273191876.700000
Iteration count = 15, obj. fcn = 2220521273191876.700000
figure
plot(objFcn)
title('Objective Function Values')
xlabel('Iteration Count')
ylabel('Objective Function Value')
maxU = max(U);
index1 = find(U(1, :) == maxU);
index2 = find(U(2, :) == maxU);
figure
line(kddcup1(index1, 1), kddcup1(index1, 2), 'linestyle',...
'none','marker', 'o','color','g');
line(kddcup1(index2,1),kddcup1(index2,2),'linestyle',...
'none','marker', 'x','color','r');
hold on
plot(center(1,1),center(1,2),'ko','markersize',15,'LineWidth',2)
plot(center(2,1),center(2,2),'kx','markersize',15,'LineWidth',2)
Since you are new to machine-learning/data-mining, you shouldn't tackle such advanced problems. After all, the data you are working with was used in a competition (KDD Cup'99), so don't expect it to be easy!
Besides the data was intended for a classification task (supervised learning), where the goal is predict the correct class (bad/good connection). You seem to be interested in clustering (unsupervised learning), which is generally more difficult.
This sort of dataset requires a lot of preprocessing and clever feature extraction. People usually employ domain knowledge (network intrusion detection) to obtain better features from the raw data.. Directly applying simple algorithms like K-means will generally yield poor results.
For starters, you need to normalize the attributes to be of the same scale: when computing the euclidean distance as part of step 3 in your method, the features with values such as 239 and 486 will dominate over the other features with small values as 0.05, thus disrupting the result.
Another point to remember is that too many attributes can be a bad thing (curse of dimensionality). Thus you should look into feature selection or dimensionality reduction techniques.
Finally, I suggest you familiarize yourself with a simpler dataset...

What kind of data/format should matlabs clustering toolbox use [duplicate]

I'm trying to cluster some data I have from the KDD 1999 cup dataset
the output from the file looks like this:
0,tcp,http,SF,239,486,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,19,19,1.00,0.00,0.05,0.00,0.00,0.00,0.00,0.00,normal.
with 48 thousand different records in that format. I have cleaned the data up and removed the text keeping only the numbers. The output looks like this now:
I created a comma delimited file in excel and saved as a csv file then created a data source from the csv file in matlab, ive tryed running it through the fcm toolbox in matlab (findcluster outputs 38 data types which is expected with 38 columns).
The clusters however don't look like clusters or its not accepting and working the way I need it to.
Could anyone help finding the clusters? Im new to matlab so don't have any experience and I'm also new to clustering.
The method:
Chose number of clusters (K)
Initialize centroids (K patterns randomly chosen from data set)
Assign each pattern to the cluster with closest centroid
Calculate means of each cluster to be its new centroid
Repeat step 3 until a stopping criteria is met (no pattern move to another cluster)
This is what I'm trying to achieve:
This is what I'm getting:
load kddcup1.dat
plot(kddcup1(:,1),kddcup1(:,2),'o')
[center,U,objFcn] = fcm(kddcup1,2);
Iteration count = 1, obj. fcn = 253224062681230720.000000
Iteration count = 2, obj. fcn = 241493132059137410.000000
Iteration count = 3, obj. fcn = 241484544542298110.000000
Iteration count = 4, obj. fcn = 241439204971005280.000000
Iteration count = 5, obj. fcn = 241090628742523840.000000
Iteration count = 6, obj. fcn = 239363408546874750.000000
Iteration count = 7, obj. fcn = 238580863900727680.000000
Iteration count = 8, obj. fcn = 238346826370420990.000000
Iteration count = 9, obj. fcn = 237617756429912510.000000
Iteration count = 10, obj. fcn = 226364785036628320.000000
Iteration count = 11, obj. fcn = 94590774984961184.000000
Iteration count = 12, obj. fcn = 2220521449216102.500000
Iteration count = 13, obj. fcn = 2220521273191876.200000
Iteration count = 14, obj. fcn = 2220521273191876.700000
Iteration count = 15, obj. fcn = 2220521273191876.700000
figure
plot(objFcn)
title('Objective Function Values')
xlabel('Iteration Count')
ylabel('Objective Function Value')
maxU = max(U);
index1 = find(U(1, :) == maxU);
index2 = find(U(2, :) == maxU);
figure
line(kddcup1(index1, 1), kddcup1(index1, 2), 'linestyle',...
'none','marker', 'o','color','g');
line(kddcup1(index2,1),kddcup1(index2,2),'linestyle',...
'none','marker', 'x','color','r');
hold on
plot(center(1,1),center(1,2),'ko','markersize',15,'LineWidth',2)
plot(center(2,1),center(2,2),'kx','markersize',15,'LineWidth',2)
Since you are new to machine-learning/data-mining, you shouldn't tackle such advanced problems. After all, the data you are working with was used in a competition (KDD Cup'99), so don't expect it to be easy!
Besides the data was intended for a classification task (supervised learning), where the goal is predict the correct class (bad/good connection). You seem to be interested in clustering (unsupervised learning), which is generally more difficult.
This sort of dataset requires a lot of preprocessing and clever feature extraction. People usually employ domain knowledge (network intrusion detection) to obtain better features from the raw data.. Directly applying simple algorithms like K-means will generally yield poor results.
For starters, you need to normalize the attributes to be of the same scale: when computing the euclidean distance as part of step 3 in your method, the features with values such as 239 and 486 will dominate over the other features with small values as 0.05, thus disrupting the result.
Another point to remember is that too many attributes can be a bad thing (curse of dimensionality). Thus you should look into feature selection or dimensionality reduction techniques.
Finally, I suggest you familiarize yourself with a simpler dataset...