histfit seems to be modifying graph data - matlab

I am looking to plot my data in MATLAB as a histogram with a normal distribution line drawn. I have used the histfit function but I do not understand why the graphs do not seem to match? e.g. the data no longer goes above 300.
Could someone explain why the values seem to be changing when using this function?

Related

Fitting a gaussian to data with Matlab

I want to produce a figure like the following one (found in a paper)
I think it is done using histfit
However, histfit doesen't really work with my data. The bars exceed the curve. My data is not really normally distributed but I want all the bins to be inside the curve except some outliers. Is there any way to fit a gaussian and plot it like in the above figure?
Edit
This is what histfit(data)has given
I want to fit a gaussian to it and keep some values as ouliers. I need to only use a normal distribution as it is going to be used in a Kalman filter based on the assumption that the data is normally distributed. The fact that is not really normally distributed will certainly affect the performance of the filter but I have to feed it first with the parameters of a normal distribution , i.e mean and std.
I'm not sure you understand how a fit works, if your data is kinda gaussian the function will plot the fitted curve based on the values, some bars will be above some below, it all depends on how the least squares are minimized over the entire curve. you can't force the fit to look different, this is the result of the fitting process. If your data is not normally distributed then the goodness of the fit is poor. without having more info or data, this is the best I can answer :)

k-means algorithm for energy data against time and date

I am using Matlab 2015a.
I have got electricity consumption data to cluster it. Initially i am trying to cluster it against hours and dates. I have created three different variables, one for time, one for dates and third for data. I am unable to understand how should i combine these in a matrix form so that the loads are distributed according to time? Then i have tried to look how can i plot a line graph for k-means but i can only find scatter command graphs but no line plots.
Further how can i plot it as a 3-d plot?
Further at a later stage i want to include temperature variable aswel. But when the 4th variable is involved, what will the plot be? will it still be 3-d?
Any suggestions, links?
In Matlab you can create N-dimensional matrices, so you can arrange your 3D data in a N*M*3 matrix (you might want to look for the cat() function to help you out).
There are several functions that allow you to plot in 3D, one of these is scatter3() which is perfect for K-Means clustering. I don't really understand which lines you do want to plot: K-Means is about clusters and centroids (i.e. points).
If a 4th variable is involved, you can as well create a 4D matrix. Although I reckon plotting a 4D graph isn't going to be easy. A first approach might be using several colours for your scatter points with different colours for different temperatures (or temperatures range). In this case the 5th input argument for scatter3() will be helpful.
Help for scatter3() here.
Help for cat() here.

Sympy/Matlab Plot y=mx, without any numerical value of m

I need to plot y = m*x where x ranges from, say 0 to 10. But m is a symbolic constant here, I dont want to supply a specific value.
Here's what my desired graph looks like (similar to how a class teacher would draw this):
[Consider m=a]
Sympy:
Tried doing this:
sympy.plot(m*x,(x,0,10))
but this shows the following error:
ValueError: The same variable should be used in all univariate expressions being plotted.
I cant really understand the error message, bit I am guessing it cant plot m as a (symbolic) constant in this case. Is it so? And in general, how can I do this?
Matlab:
Soon, I wanted to know if this is a limitation of sympy only, and thought maybe popular ones like matlab can do it? But with a bit of search on docs and SO, I couldnt find any. Both plot and fplot doesnt seem to cover this, they expect numerical values.
Others:
I am not acquainted with other plotting or CAS softwares, but it will be interesting to know if they support this out of the box
So, to repeat the main question, how to draw similar graphs, preferably without managing the plotting code yourself ?
The solution must be generic enough like plot to be applied to other equations.
[ The question was heavily edited from a sympy-specific question ]
Only for some functions with specific conditions you can plot thus in Maple. In Python (using matplotlib, sympy or any other packages) or Matlab you need to create code to manage that (assuming values and then replace ticks with literal ticks).

How to reduce size of printed eps when plotting large amounts of data

I am Plotting and printing a large dataset to eps:
plot(Voltage,Torque,'b.')
print -depsc figure.eps
Through these million data points I will fit a graph. However since the sizes of the Voltage and Torque vectors are enormous my eps file is 64.5 MB.
Most of the plotted points however lie on top of other points or very close. How can I reduce the size of the .eps while still having limited effects on the way the data is shown in the graph? Can I make matlab detect and remove data points close enough to other already plotted points?
although it is a scatter plot, I am not using scatterplot since all points should have the same size and color. Is it possible to use scatterplot to remove visual obsolete datapoints?
Beyond stackoverflow, the File Exchange is always a good place to start the search for a solution.
In this case I found the following submissions:
Plot (Big):
This simple tool intercepts data going into a plot and reduces it to the smallest possible set that looks identical given the number of pixels available on the screen.
DSPLOT:
This version of "plot" will allow you to visualize data that has very large number of elements. Plotting large data set makes your graphics sluggish, but most times you don't need all of the information displayed in the plot.
If you end up using the plot in a LaTeX-file, you should consider using
matlab2tikz:
This is matlab2tikz, a MATLAB(R) script for converting MATLAB figures into native TikZ/Pgfplots figures.
For use in LaTeX you don't have to go the detour of PostScript and it will make for beautiful plots.
It also provides a function called: CLEANFIGURE('minimumPointsDistance', DOUBLE,...), that will help you reduce the data points. (Possibly you could also combine this with the above solutions.)
If your vector Voltage is already sorted and more or less regularly spaced, you can simply plot a fraction of the data:
plot(Voltage(1:step:end),Torque(1:step:end),'b.')
with step set to find the right tradeoff between accuracy and size of your eps file.
If needed, first sort your vectors with:
[Voltage,I] = sort(Voltage);
Torque = Torque(I);

Is there any way of annotating multiple plots using arrows with code in MATLAB?

When doing multiple plots on the same figure in MATLAB, is there any way of annotating them such that the legend entries have arrows pointing to the plots that they're named for?
Here's an example of what I have in mind. I'd like to do this using code.
Note that the MATLAB website mentions a way to do this using the annotation function. The problem with that function is that it takes x and y values (normalized for the plot) and puts the text there. Given that I am not certain where the datapoints will lie, this is unhelpful for what I want to do.
I don't mind if the text shows up at a random place, actually. What's important is to have an arrow or some way of pointing to the plot that it is referencing.
Take a look at the following Math works file exchange post:
http://www.mathworks.co.uk/matlabcentral/fileexchange/10656-data-space-to-figure-units-conversion
The code given here allows you to convert between data-space units and normalised figure units. The example given in the post seems to be doing almost exactly what you are asking.