boxplots from SPSS and Matlab show different outliers and percentiles - matlab

I encountered the below problem while trying to check my ANOVA data for outliers with MATLAB. By chance I also did it in SPSS and noticed the below weird thing:
I have a vector with 5 data points (for simplicity).
1.1648 0.4254 0.6796 0.5627 0.6904
I then use the Matlab boxplot command, i.e. boxplot(test) and get this:
looks reasonable.
I then copy the same vector into SPSS (I swear it's the same). I choose 'explore' from the analyze> descriptive stats menu. While I don't think it makes a difference in this case, I choose (from the plots option): boxplots: dependents together. Then I hit OK and get this:
There are two things that seem weird to me here:
The 3rd quartile appears to be larger in Matlab (above 0.8) whereas it is <0.8 in SPSS
Matlab doesn't seem to think there is an outlier. As far as I understand the documentation on "boxplot" in Matlab, outliers are, by default, defined as more than 1.5 times the interquartile range away from the 1st or 3rd quartile. This definition is the same in SPSS as far as I can gather.
I'm probably doing something wrong somewhere but can't figure out where.
Thanks for the help!

Related

plotting histogram in matlab with highly unequal distribution

So I am plotting two years worth of data to see the change in distribution of values, but with one of the histograms, one of the years is heavily dependent upon the first column, this is because there are many zeros in the data set. Would you recommend creating a bar strictly for zeros? How would I create this index in matlabd? Or how can I better manipulate the histogram to reflect the actual data set and make it clear that zeros are accounting for the sharp initial rise?
Thanks.
This is more of a statistical question. I f you have good reason to ignore the zeros, for example one of your data aquisition system produced them because of malfunction. you can get simply get rid of them by
hist(data(data~=0))
but you would not need to look at the histograms anyways you can use the variance or even standard deviation to see how much your data shifted.
Furthermore to compare data populations boxplots are much better and easier to handle.
doc boxplot
If on the other hand your zeros are genuine to your data you have to keep them! I am sorry but also here the boxplot function might help you because the zeros might be outliers (shown as little red crosses) or the box is just starting at the zero line.

Matlab: Eliminating freak values in dataset

I am searching for a method to eliminate freak values out of given dataset. For example:
All these peaks should be eliminated. I've tried different filters like medfilt, but the peaks are still there. I've also tried a lowpass filter, but it didn't work either. I am a beginner in filtering signals, so I probably did it wrong.
You can download data sets for the x array here and y array here.
I could also imagine a loop to compare the values next to each other, but I am sure there has to be a built-in function?
Here is the result using medfilt1(input,15):
The peaks are vanishing, but the then I get these ugly steps, which I don't want.
just use median filter! medfilt1(data,3) will do if this is a 1 pixel "cosmic" spike. If the peaks remain, increase the window size to 5 or more...
EDIT:
so this is how op's data looks like:
So we see that the data is not exactly uniform or ordered, and there are a lot of data points in the spikes unlike what one first understand from the question (guys please plot your data correctly!) The question is now, is the data in the spikes or on in the baseline?

Sympy/Matlab Plot y=mx, without any numerical value of m

I need to plot y = m*x where x ranges from, say 0 to 10. But m is a symbolic constant here, I dont want to supply a specific value.
Here's what my desired graph looks like (similar to how a class teacher would draw this):
[Consider m=a]
Sympy:
Tried doing this:
sympy.plot(m*x,(x,0,10))
but this shows the following error:
ValueError: The same variable should be used in all univariate expressions being plotted.
I cant really understand the error message, bit I am guessing it cant plot m as a (symbolic) constant in this case. Is it so? And in general, how can I do this?
Matlab:
Soon, I wanted to know if this is a limitation of sympy only, and thought maybe popular ones like matlab can do it? But with a bit of search on docs and SO, I couldnt find any. Both plot and fplot doesnt seem to cover this, they expect numerical values.
Others:
I am not acquainted with other plotting or CAS softwares, but it will be interesting to know if they support this out of the box
So, to repeat the main question, how to draw similar graphs, preferably without managing the plotting code yourself ?
The solution must be generic enough like plot to be applied to other equations.
[ The question was heavily edited from a sympy-specific question ]
Only for some functions with specific conditions you can plot thus in Maple. In Python (using matplotlib, sympy or any other packages) or Matlab you need to create code to manage that (assuming values and then replace ticks with literal ticks).

Is there any way of annotating multiple plots using arrows with code in MATLAB?

When doing multiple plots on the same figure in MATLAB, is there any way of annotating them such that the legend entries have arrows pointing to the plots that they're named for?
Here's an example of what I have in mind. I'd like to do this using code.
Note that the MATLAB website mentions a way to do this using the annotation function. The problem with that function is that it takes x and y values (normalized for the plot) and puts the text there. Given that I am not certain where the datapoints will lie, this is unhelpful for what I want to do.
I don't mind if the text shows up at a random place, actually. What's important is to have an arrow or some way of pointing to the plot that it is referencing.
Take a look at the following Math works file exchange post:
http://www.mathworks.co.uk/matlabcentral/fileexchange/10656-data-space-to-figure-units-conversion
The code given here allows you to convert between data-space units and normalised figure units. The example given in the post seems to be doing almost exactly what you are asking.

Efficiently visualise large quantities of points with matlab.

I have a set of 3D points which numbers up around 1 million points. I am looking to visualise these with matlab.
I have tried the following functions:
plot3
scatter3
But they are both very sluggish. Is there a more efficient way to visualise this level of points in matlab? Maybe a way to mesh the points?
If not can anyone suggest a plug-in or even a different program for visualising 3D points?
You're going to run into efficiency issues no matter what plugin/program you use if you want all million+ points to show up in a plot. My suggestion would be to downsample. Use the plot3 or scatter3 function on every other point, or every nth point, until you get a figure that is not sluggish. As long as the variance in your data isn't astronomical, downsampling a little bit shouldn't affect the overall distribution of points (given that you have a million+ points). And any software that is able to display that much data without being sluggish is most likely downsampling/binning or using some interpolation technique to do so (so you might as well have control over it).
fscatter3 from the file exchange, does what you like.
Is there a specific reason to actually have it display that many points?
I Googled around a bit and found some people who have had similar issues (someone suggested Avizo as an alternate program but I've never used it):
http://www.mathworks.com/matlabcentral/newsreader/view_thread/308948
mathworks.com/matlabcentral/newsreader/view_thread/134022 (not clickable because I don't have enough rep to post more than two links)
An alternate solution would be to generate a histogram if you're more interested in the density of the data:
http://blogs.mathworks.com/videos/2010/01/22/advanced-making-a-2d-or-3d-histogram-to-visualize-data-density/
I you know beforehand roughly the coordinates of the feature you are looking for, try passing the cloud through a simple pass-through-filter, which essentially crops your point cloud. I.e. if you know that the feature is at a x-coordinate > 5, remove all points with x-coordinate < 5.
This filter could for the first coordinated be realized as
data = data(data(1,:) > 5,:);
Provided that your 3d data is stored in an n by 3 matrix.
This, together with downsampling, could help you out. If you still find the performance lagging, consider using something like the PCD viewer in PointCloudLibrary, check half way down the page at
http://pointclouds.org/documentation/overview/visualization.php
It is a stand alone app you could launch from matlab. I find it's performance far better than the sluggish matlab plotting tools.
For anyone who is interested I ended up finding a Point cloud visualiser called Cloud Compare. It is extremely fast and allows selection and segmentation as well as filtering on point clouds.