Datetick take into account NaN in plot - matlab

I have a series y that contains values, some of which are NaN some numeric (double).
The series has an associated vector d which contains the datenum dates.
Example:
y=[NaN(5,1); rand(10,1)]
d=now-14:now
When I run:
plot(d,y)
I get the graph I want; the NaN observations are taken out.
However, when I run:
plot(d,y); datetick
then my graph starts from the beginning and takes into account all the observations (even when y is a NaN).
How can I prevent this from happening?

From the documentation we can see that there is an easy way (shown below) to preserve the current axes limits.
plot(d,y);
datetick('keeplimits');
The 'keeplimits' argument does exactly what it suggests, maintaining the x-axis limits whilst converting the tick values to dates. You may also want to pass 'keepticks' to preserve tick mark locations.
The behaviour you describe seems contrary to the docs:
datetick selects a label format based on the minimum and maximum limits of the specified axis.
From this statement I would expect the values to remain the same, but there is obviously something about the way the limits are handled internally which means the NaN points are included. At least we are given a simple work around!

Related

How do I plot values in an array vs the number of times those values appear in Matlab?

I have a set of ages (over 10000 of them) and I want to plot a graph with the age from 20 to 100 on the x axis and then the number of times each of those ages appears in the data on the y axis. I have tried several ways to do this and I can't figure it out. I also have some other data which requires me to plot values vs how many times they occur so any advice on how to do this would be much appreciated.
I'm quite new to Matlab so it would be great if you could explain how things in your answer work rather than just typing out some code.
Thanks.
EDIT:
So I typed histogram(Age, 80) because as I understand that will plot the values in Age on a histogram split up into 80 bars (1 for each age). Instead I get this:
The bars aren't aligned and it's clearly not 1 per age nor has it plotted the number of times each age occurs on the y axis.
You have to use histogram(), and that's correct.
Let's see with an example.
I extract 100 ages between 20 and 100:
ages=randsample([20:100],100,true);
Now I call histogram() in this manner:
h=histogram(ages,[20:100]);
where h is an histogram object and this will also show the following plot:
However, this might look easy due to the fact that my ages vector is in range 20:100, so it will not contain any other values. If your vector, as instead, contains also ages not in range 20:100, you can specify the additional option 'BinLimits' as third input in histogram() like this:
h=histogram(ages,length([20:100]),'BinLimits',[20:100]);
and this option plots a histogram using the values in ages that fall between 20 and 100 inclusive.
Note: by inspecting h you can actually see and/or edit some proprieties of your histogram. An attribute (field) of such object you might be interested to is Values. This is a vector of length 80 (in our case, since we work with 80 bins) in which the i-th element is the number of items is the i-th bin. This will help you count the occurrences (just in case you need them to go on with your analysis).
Like Luis said in comments, hist is the way to go. You should specify bin edges, rather than the number of bins:
ages = randi([20 100], [1 10000]);
hist(ages, [20:100])
Is this what you were looking for?

get vector which's mean is zero from arbitrary vector

as i know to get zero mean vector from given vector,we should substract mean of given vector from each memeber of this vector.for example let us see following example
r=rand(1,6)
we get
0.8687 0.0844 0.3998 0.2599 0.8001 0.4314
let us create another vector s by following operation
s=r-mean(r(:));
after this we get
0.3947 -0.3896 -0.0743 -0.2142 0.3260 -0.0426
if we calculate mean of s by following formula
mean(s)
we get
ans =
-5.5511e-017
actually as i have checked this number is very small
-5.5511*exp(-017)
ans =
-2.2981e-007
so we should think that our vector has mean zero?so it means that that small deviation from 0 is because of round off error?for exmaple when we are creating white noise or such kind off random uncorrelated sequence of data,actually it is already supposed that even for such small data far from 0,it has zero mean and it is supposed in this case that for example for this case
-5.5511e-017 =0 ?
approximately of course
e-017 means 10 to the power of -17 (10^-17) but still the number is very small and hypothetically it is 0. And if you type
format long;
you will see the real precision used by Matlab
Actually you can refer to the eps command. Although matlab uses double that can encode numbers down to 2.2251e-308 the precission is determined size of the number.
Use it in the format eps(number) - it tell you the how large is the influence of the least significant bit.
on my machine eg. eps(0.3) returns 5.5511e-17 - exactly the number you report.

Ignoring vectors containing NaN entries in Matlab calculations

This code prices bonds according to the fitSvensson function. How do I get Matlab to ignore NaN values in the CleanPrice vector when a date is selected for which some bonds have a NaN entry for a missing price. How can I get it to ignore that bond altogether when deriving the zero curve? It seems many solutions to NaNs resort to interpolation or setting to zero, but this will lead to an erroneous curve.
Maturity=gcm3.data.CouponandMaturity(1:36,2);
[r,c]=find(gcm3.data.CleanPrice==datenum('11-May-2012'));
row=r
SettleDate=gcm3.data.CouponandMaturity(row,3);
Settle = repmat(SettleDate,[length(Maturity) 1]);
CleanPrices =transpose(gcm3.data.CleanPrice(row,2:end));
CouponRate = gcm3.data.CouponandMaturity(1:36,1);
Instruments = [Settle Maturity CleanPrices CouponRate];
PlottingPoints = gcm3.data.CouponandMaturity(1,2):gcm3.data.CouponandMaturity(36,2);
Yield = bndyield(CleanPrices,CouponRate,Settle,Maturity);
SvenssonModel = IRFunctionCurve.fitSvensson('Zero',SettleDate,Instruments)
ParYield=SvenssonModel.getParYields(Maturity);
The Data looks like this where each column is a bond, column 1 is the dates and the elements the clean price. As you can see, the first part of the data contains lots of NaNs for bonds yet to have prices. After a point all bonds have prices but unfortunately there are instances where one or two day's prices are missing.
Ideally, if a NaN is present I would like it to ignore that bond on that date if possible as the more curves generated (irrespective of number of bonds used) the better. If this is not possible then ignoring that date is an option but will result in many curves not generating.
This is a general solution to your problem. I don't have that toolbox on my work computer so I can't test whether it works with the IRFunctionCurve.fitSvensson command
[row,~]=find(gcm3.data.CleanPrice(:,1)==datenum('11-May-2012'));
col_set=find(~isnan(gcm3.data.CleanPrice(row,2:end)));
CleanPrices=transpose(gcm3.data.CleanPrice(row,col));

Core Plot skipped values Graph

i'm trying to draw a Graph with a user-friendly timeline having every day/week (to be decided by time range) as a label at x-axis. However, the datasource values are given on another basis - there might be 10 entries one day and the eleventh comes in a month.
See the photoshop image:
With the latest Core Plot drop I cannot find a way to do it, need your help.
Regards,
user792677.
The scatter plot asks the datasource for x-y pairs of data. There is no requirement that either value follow some sort of sequence or regular pattern. Other plot types work similarly, although the names and number of data fields can vary.
In your example, return 4 from the -numberOfRecordsForPlot: method. Return the following values from the -numberForPlot:field:recordIndex: method. The table assumes your y-values are calculated by a function f(x), but they can of course come from any source. Use the field parameter to determine whether the plot is asking for the x- or y- value.
Index X-Value Y-Value
0 1 f(1)
1 3 f(3)
2 9 f(9)
3 10 f(10)

Arbitrary distribution -> Uniform distribution (Probability Integral Transform?)

I have 500,000 values for a variable derived from financial markets. Specifically, this variable represents distance from the mean (in standard deviations). This variable has a arbitrary distribution. I need a formula that will allow me to select a range around any value of this variable such that an equal (or close to it) amount of data points fall within that range.
This will allow me to then analyze all of the data points within a specific range and to treat them as "similar situations to the input."
From what I understand, this means that I need to convert it from arbitrary distribution to uniform distribution. I have read (but barely understood) that what I am looking for is called "probability integral transform."
Can anyone assist me with some code (Matlab preferred, but it doesn't really matter) to help me accomplish this?
Here's something I put together quickly. It's not polished and not perfect, but it does what you want to do.
clear
randList=[randn(1e4,1);2*randn(1e4,1)+5];
[xCdf,xList]=ksdensity(randList,'npoints',5e3,'function','cdf');
xRange=getInterval(5,xList,xCdf,0.1);
and the function getInterval is
function out=getInterval(yPoint,xList,xCdf,areaFraction)
yCdf=interp1(xList,xCdf,yPoint);
yCdfRange=[-areaFraction/2, areaFraction/2]+yCdf;
out=interp1(xCdf,xList,yCdfRange);
Explanation:
The CDF of the random distribution is shown below by the line in blue. You provide a point (here 5 in the input to getInterval) about which you want a range that gives you 10% of the area (input 0.1 to getInterval). The chosen point is marked by the red cross and the
interval is marked by the lines in green. You can get the corresponding points from the original list that lie within this interval as
newList=randList(randList>=xRange(1) & randList<=xRange(2));
You'll find that on an average, the number of points in this example is ~2000, which is 10% of numel(randList)
numel(newList)
ans =
2045
NOTE:
Please note that this was done quickly and I haven't made any checks to see if the chosen point is outside the range or if yCdfRange falls outside [0 1], in which case interp1 will return a NaN. This is fairly straightforward to implement, and I'll leave that to you.
Also, ksdensity is very CPU intensive. I wouldn't recommend increasing npoints to more than 1e4. I assume you're only working with a fixed list (i.e., you have a list of 5e5 points that you've obtained somehow and now you're just running tests/analyzing it). In that case, you can run ksdensity once and save the result.
I do not speak Matlab, but you need to find quantiles in your data. This is Mathematica code which would do this:
In[88]:= data = RandomVariate[SkewNormalDistribution[0, 1, 2], 10^4];
Compute quantile points:
In[91]:= q10 = Quantile[data, Range[0, 10]/10];
Now form pairs of consecutive quantiles:
In[92]:= intervals = Partition[q10, 2, 1];
In[93]:= intervals
Out[93]= {{-1.397, -0.136989}, {-0.136989, 0.123689}, {0.123689,
0.312232}, {0.312232, 0.478551}, {0.478551, 0.652482}, {0.652482,
0.829642}, {0.829642, 1.02801}, {1.02801, 1.27609}, {1.27609,
1.6237}, {1.6237, 4.04219}}
Verify that the splitting points separate data nearly evenly:
In[94]:= Table[Count[data, x_ /; i[[1]] <= x < i[[2]]], {i, intervals}]
Out[94]= {999, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000}