How to insert values on a boxplot in SPSS? - boxplot

I'm starting to use SPSS for some statistical analysis, so I don't have much knowledge of this tool, I still have many doubts.
I would like to know if it is possible to enter the values in a boxplot plot. For example: Q1, Q2 (median), Q3, maximum and minimum and outliers (by definition SPSS already presents these values). enter image description here
You can see my example as I would like to see the boxplot. Is this possible in SPSS?

Related

Extracting different values of a function vs time using MATLAB curve fitting

I feel like this should be something simple to solve - but I'm struggling to find the answer anywhere.
I have a set of 'R' values and a set of time values, I want to use curve fitting (I haven't used this part of the software before) to calculate the 'R' values at a different set of time values, literally just be able to access what is displayed in a figure created using curve fitting using a different set of time values (ie I can point the curser to the values I want on a figure and write them down but this is not efficient at all for the number of time values I have). Context is an orbital motion radius vs time.
Thanks in advance :)
You can use Matlab's fit function to do this very easily. Assuming you have your data in arrays r and t, you can do something like this:
f = fit(t, r, 'smoothingspline')
disp(f(5))
If you consult the documentation, you can see the various fit types available. (See https://www.mathworks.com/help/curvefit/fit.html)

Linear regression returning bad fit with large x values

I'm looking to do a linear regression to determine the estimated date of depletion for a particular resource. I have a dataset containing a column of dates, and several columns of data, always decreasing. A linear regression using scikit learn's LinearRegression() function yields a bad fit.
I converted the date column to ordinal, which resulted in values ~700,000. Relative to the y axis of values between 0-200, this is rather large. I imagine that the regression function is starting at low values and working its way up, eventually giving up before it finds a good enough fit. If i could assign starting values to the parameters, large intercept and small slope, perhaps it would fix the problem. I don't know how to do this, and i am very curious as to other solutions.
Here is a link to some data-
https://pastebin.com/BKpeZGmN
And here is my current code
model=LinearRegression().fit(dates,y)
model.score(dates,y)
y_pred=model.predict(dates)
plt.scatter(dates,y)
plt.plot(dates,y_pred,color='red')
plt.show()
print(model.intercept_)
print(model.coef_)
This code plots the linear model over the data, yielding stunning inaccuracy. I would share in this post, but i am not sure how to post an image from my desktop.
My original data is dates, and i convert to ordinal in code i have not shared here. If there is an easier way to do this that would be more accurate, i would appreciate a suggestion.
Thanks,
Will

boxplots from SPSS and Matlab show different outliers and percentiles

I encountered the below problem while trying to check my ANOVA data for outliers with MATLAB. By chance I also did it in SPSS and noticed the below weird thing:
I have a vector with 5 data points (for simplicity).
1.1648 0.4254 0.6796 0.5627 0.6904
I then use the Matlab boxplot command, i.e. boxplot(test) and get this:
looks reasonable.
I then copy the same vector into SPSS (I swear it's the same). I choose 'explore' from the analyze> descriptive stats menu. While I don't think it makes a difference in this case, I choose (from the plots option): boxplots: dependents together. Then I hit OK and get this:
There are two things that seem weird to me here:
The 3rd quartile appears to be larger in Matlab (above 0.8) whereas it is <0.8 in SPSS
Matlab doesn't seem to think there is an outlier. As far as I understand the documentation on "boxplot" in Matlab, outliers are, by default, defined as more than 1.5 times the interquartile range away from the 1st or 3rd quartile. This definition is the same in SPSS as far as I can gather.
I'm probably doing something wrong somewhere but can't figure out where.
Thanks for the help!

plotting histogram in matlab with highly unequal distribution

So I am plotting two years worth of data to see the change in distribution of values, but with one of the histograms, one of the years is heavily dependent upon the first column, this is because there are many zeros in the data set. Would you recommend creating a bar strictly for zeros? How would I create this index in matlabd? Or how can I better manipulate the histogram to reflect the actual data set and make it clear that zeros are accounting for the sharp initial rise?
Thanks.
This is more of a statistical question. I f you have good reason to ignore the zeros, for example one of your data aquisition system produced them because of malfunction. you can get simply get rid of them by
hist(data(data~=0))
but you would not need to look at the histograms anyways you can use the variance or even standard deviation to see how much your data shifted.
Furthermore to compare data populations boxplots are much better and easier to handle.
doc boxplot
If on the other hand your zeros are genuine to your data you have to keep them! I am sorry but also here the boxplot function might help you because the zeros might be outliers (shown as little red crosses) or the box is just starting at the zero line.

Curve fitting, but I want to guarantee only one inflection point

I often find myself fitting a scatter plot, and knowing that the 'true fit' should have only one inflection point. Any ideas for forcing a fit that will obey this?
I am using Matlab and Microsoft Excel
Many thanks
Option 1:
I like to use spline smoothing with Akaike information criteria, and while it is a hyper-parametric fit and has a large number of analytic candidate inflection points, the smoothed data at the sample points tends to reveal only what is within the data.
If your data doesn't actually have an inflection point, this is indicated. If it does, it is also usually captured. Statistical jargon for an important cousin to this is called a "non-informative prior".
Try slides 30-31 here: link.
Option 2:
If you have an older version of MatLab then you can specify the exact model easily in the "cftool" (not the same as sftool) then get m-file that gives how you put it into your own script. Pick a model appropriate to your data.