Plotting a boxplot without using 'toPandas' and in a one graph - pyspark

I want to plot 'boxplot' of two different columns od a dataframe. I use the following code in pyspark python3:
c1 = df.where(F.col('c1') > 0.0).select(F.col('c1')).toPandas()
c2 = df.where(F.col('c1') > 0.0).select(F.col('c2')).toPandas()
fig, (ax1, ax2) = plt.subplots(nrows=2)
c1.boxplot(ax = ax1)
c2.boxplot(ax = ax2)
Is there any way to plot them
without using '.toPandas'?
in one picture like the following figure?

Related

Matlab Plotting with Labels

I'm trying to plot a knn result from my data, which has 3 columns: x, y , label. There are 3 classes and for each of them I would like to used a different symbol. Here's the way I'm plotting now:
t1 = data(:,3) == 1;
t2 = data(:,3) == 2;
t3 = data(:,3) == 3;
train1 = data(t1,:);
train2 = data(t2,:);
train3 = data(t3,:);
figure(1);
plot(train1(:,1),train1(:,2),'#',train2(:,1),train2(:,2),'*',train3(:,1),train3(:,2),'o');
I want to know if there's a more concise way of doing this. Thanks
Here's the most concise (and working) way to plot your data:
figure(1);
hold all
plot(train1(:,1),train1(:,2),'o')
plot(train2(:,1),train2(:,2),'x')
plot(train3(:,1),train3(:,2),'s')
Here's an example that does what you want in a robust and modular manner. You can easily add classes or modify the figure output.
data = [0.53,0.17,2;0.78,0.60,3;0.93,0.26,1;0.13,0.65,2;0.57,0.69,1;...
0.47,0.75,3;0.010,0.45,1;0.34,0.080,3;0.16,0.23,3;0.79,0.91,3;...
0.31,0.15,1;0.53,0.83,2];
categories = [1,2,3];
symbols = {'s','<','o','d','v','+','x','*'};
figure;
hold all
for loopj = 1:length(categories)
t = data(:,3) == categories(loopj);
train = data(t,:);
label = strcat('Class ',num2str(categories(loopj)));
plot(train(:,1),train(:,2),symbols{loopj},'DisplayName',label,'LineWidth',1.3)
end
lg = legend('show');
lg.Location = 'best';
Use hold all to write on the figure without erasing the previous axi and let Matlab pick line colors.
In any case, you need to manually define the different symbols and each plot command comes with one unique type of line and markers.

Plot means over grouped boxplot in MATLAB?

I am using multiple_boxplot function to generate grouped boxplots:
http://au.mathworks.com/matlabcentral/fileexchange/47233-multiple-boxplot-m
However, instead of medians I want to plot means. First I tried general method:
plot([mean(x)],'dg');
But it did not work. I tried to extract the means and then plot them but that also is not working.
m=[];
for i=1:max(group)
idx=find(group==i);
m=[m nanmean(x(idx))];
end
boxplot(x,group, 'positions', positions);hold on
plot([m],'dg')
What am I doing wrong? And how to plot the means with each boxplot?
Thanks.
You can do the following:
In the function multiple_boxplot change line 48 to:
B = boxplot(x,group, 'positions', positions);
and change the header of the function to:
B = multiple_boxplot(data...
and save the function file.
This won't change anything in how the function works but will let you obtain a handle to the boxplot (B).
Then in your code, create the boxplot as before, but with the output argument B:
B = multiple_boxplot(data...);
And add the following lines:
% compute the mean by group:
M = cellfun(#mean,data);
% convert it to pairs of Y values:
M = mat2cell(repmat(M(:),1,2),ones(size(M,1),1),2);
% change the medians to means:
set(B(6,:),{'YData'},M)

Finding peaks in matlab in flat regions

I have data in an array in matlab. I want to find peaks, but faced the following problem shown in the picture below.
To generate peaks and plot them I used the following code:
gyryMF = medfilt1(gyry, 3);
[pks, gyryPeaks] = findpeaks(gyryMF);
%%
plot(gyryMF);
text(gyryPeaks+.02,pks,num2str((1:numel(pks))'));
As you see from picture, some peaks not found, because there is a flat region. I wonder if somehow I can find and include them as well?
How about writing your own peaks function with your own criteria?
peak_no = []
ind_peak_no = []
if Data(x) < Data(x+1) && Data(x+1) > Data(x+2) && Data(x+1)> Peak_min
peak_no = [peak_no;Data(x+1)];
ind_peak_no = [ind_peak_no; x+1];
end

Matlab: Plotting bar groups

I want to plot bar plot in which bars are grouped like this:
I have tried this code but I am not getting this type of plot. Please guide me how can I generate plot like above:
load Newy.txt;
load good.txt;
one = Newy(:,1);
orig = good(:,1);
hold on
bar(one,'m');
bar(orig,'g');
hold off
set(gca,'XTickLabel',{'0-19','20-39','40-79','80-159','160-319','320-639','640-1279','1280-1500'})
In each text file there is a list of numbers. The list comprises of 8 values.
You can use histc to count the values within certain edges.
To group bars you can collect them in a single matrix (with the values in each column).
edges = [0 20 40 80 160 320 640 1280 1501];
edLeg = cell(numel(edges)-1,1);
for i=1:length(edLeg)
edLeg{i} = sprintf('%d-%d',edges(i),edges(i+1)-1);
end
n = histc([one,orig],edges);
bar(n(1:end-1,:));
set(gca,'XTickLabels',edLeg)
legend('One','Orig')
I used these as test data
one = ceil(1500.*rand(200,1));
orig = ceil(1500.*rand(200,1));
I got the way to achieve group bars:
I had to plot the data such that there are 8 groups of bars where each group consists of 3 bars.
For that I wrote the data in my each file like this:
Y = [30.9858 1.36816 38.6943
0.655176 6.44236 13.1563
1.42942 3.0947 0.621403
22.6364 2.80378 17.1299
0.621871 5.37145 1.87824
0.876739 5.97647 3.80334
40.6585 68.6757 23.0408
2.13606 6.26739 1.67559
];
bar(Y)

Kmean plotting in matlab

I am on a project thumb recognition system on matlab. I implemented Kmean Algorithm and I got results as well. Actually now I want to plot the results like here they done. I am trying but couldn't be able to do so. I am using the following code.
load training.mat; % loaded just to get trainingData variable
labelData = zeros(200,1);
labelData(1:100,:) = 0;
labelData(101:200,:) = 1;
k=2;
[trainCtr, traina] = kmeans(trainingData,k);
trainingResult1=[];
for i=1:k
trainingResult1 = [trainingResult1 sum(trainCtr(1:100)==i)];
end
trainingResult2=[];
for i=1:k
trainingResult2 = [trainingResult2 sum(trainCtr(101:200)==i)];
end
load testing.mat; % loaded just to get testingData variable
c1 = zeros(k,1054);
c1 = traina;
cluster = zeros(200,1);
for j=1:200
testTemp = repmat(testingData(j,1:1054),k,1);
difference = sum((c1 - testTemp).^2, 2);
[value index] = min(difference);
cluster(j,1) = index;
end
testingResult1 = [];
for i=1:k
testingResult1 = [testingResult1 sum(cluster(1:100)==i)];
end
testingResult2 = [];
for i=1:k
testingResult2 = [testingResult2 sum(cluster(101:200)==i)];
end
in above code trainingData is matrix of 200 X 1054 in which 200 are images of thumbs and 1054 are columns. actually each image is of 25 X 42. I reshaped each image in to row matrix (1 X 1050) and 4 other (some features) columns so total of 1054 columns are in each image. Similarly testingData I made it in the similar manner as I made testingData It is also the order of 200 X 1054. Now my Problem is just to plot the results as they did in here.
After selecting 2 features, you can just follow the example. Start a figure, use hold on, and use plot or scatter to plot the centroids and the data points. E.g.
selectedFeatures = [42,43];
plot(trainingData(trainCtr==1,selectedFeatures(1)),
trainingData(trainCtr==1,selectedFeatures(2)),
'r.','MarkerSize',12)
Would plot the selected feature values of the data points in cluster 1.