In Matlab, I have data in the following form:
"z", a 17'256x1 double, containing the residuals of a regression, e.g. -0.0596
"dates", a 17'256x1 cell, containing date- and timestamps of each observation in the regression (and therefore, of the residuals), e.g. '10/3/2011 9:30:00 PM'
What I would like to do:
Plot the residuals versus the datestamp as label. The observations are not from a continuous sequence of days (i.e. there might be some gaps with no observations between days), and some days have more observations than others. I cannot have one label per observation because that would be too many labels. So I need to group them somehow, either by day or by month. That is, only show the month and day (e.g. 10/3) under all the observations from that day, or only show the month (e.g. 3) under all the observations from that month. How can I do that, using the data I have?
You should be able to plot this without 'grouping' things. If you convert your dates to timestamps:
timestamps = cellfun(#(date)datenum(date), dates);
then you can do a normal plot:
plot(timestamps, z);
and Matlab will deal with the xaxis labels itself (i.e. it will spread them evenly over the time range of dates), but they will be the timestamp numbers. To get formatted dates on the xaxis, use:
datetick('x');
Related
I have a scatter plot of calls / time. My x variable is the date (Day/Month) and my Y variable is a number of calls on each date. I would like to plot two regression lines using PROC SGPLOT REG, one for 2019 and one for 2020. However, when I try to do this, all I get is a regular scatter plot with no regression lines. Here is my code:
proc sgplot data=intern.bothphase1;
reg x=date y=count / group=Year;
label count="Calls Per Day" year="Year";
Title "Comparison of EMS Calls per Day 1/1 - 3/31 in 2019 vs.
2020";
run;
The scatter plot comes up without issue (2019 and 2020 values in different colors) but I want to see how the trends differed between the two time periods, so I really want to get the regression lines on there. Can anyone help?
I imagine this has to do with the fact that I concatenated my day and month with a / so it is a character variable and so SAS cannot calculate the regression. I did this so I could use year as a class variable. I still have the original date variable in my table, is there a way I could get SAS to give me the month/day from that as a numeric variable?
Thanks!
EDIT: I used a date value in SAS and changed the format to mm/dd, but this doesn't help because the regression lines are just on either end of the graph rather than overlapping (picture attached). what I want is to have the regression lines overlap for the same time period 2019 vs. 2020 This is because SAS dates correspond to numbers from 1/1/1960. What I want is the mm/dd to correspond to numbers 1-365 so I get two overlapping regression lines to show how the trends changed from one year to the next. Anyone know how I can do this?
So two steps here: first, you need to generate a "day" value that's 1-365... so let's just subtract out 01JAN from the day value.
data have;
do date = '01JAN2019'd to '31DEC2020'd;
count = 25+2*rand('uniform');
year = year(date);
if month(date) le 3 then output;
end;
format date date9.;
run;
data adjusted;
set have;
date_fixed = date - intnx('year',date,0,'b') + 1; *current date minus jan 1 plus 1 (otherwise off by 1);
format date_fixed date5.; *this does not actually affect the graph axis, oddly;
run;
proc sgplot data=adjusted;
reg x=date_fixed y=count / group=Year;
xaxis valuesformat=date5.; *this seems to be needed for some reason;
label count="Calls Per Day" year="Year";
Title "Comparison of EMS Calls per Day 1/1 - 3/31 in 2019 vs.
2020";
run;
Then we add the xaxis line because for some reason it won't obey the DATE5. format (could also use MMDDYY5. as Reeza noted in comments, but we can force it to here.
Here is what I get. You can use other axis options to further limit things, so for example 01APR doesn't show up.
)
I am working with a data set of 10,000s of variables which have been repeatedly measured since the 1980s. The first meassurements for each variable are not on the same date and the variables are irregularly measured - sometimes measurements are only a month apart, in a small number of cases they are decades apart.
I want to get the change in each variable per month.
So far I have a cell of dates of measurements,and interpolated rates of change between measurements (each cell represents a single variable in either, and I've only posted the first 5 cells in each array)
DateNumss= {[736614;736641;736669] [736636;736666] 736672 [736631;736659;736685] 736686}
LinearInterpss={[17.7777777777778;20.7142857142857;0] [0.200000000000000;0] 0 [2.57142857142857;2.80769230769231;0]}
How do I get monthly sums of the interpolated change in variable?
i.e.
If the first measurement for a variable is made on the January 1st, and the linearly interpolated change between that an the next measurement is 1 per day; and the next measurement is on Febuary the 5th and the corresponding linearly interpolated change is 2; then January has a total change of 1*31 (31 days at 1) and febuary has a total change of 1*5+2*23 (5 days at 1, 23 days at 2).
You would need the points in the serial dates that correspond with the change of a month.
mat(:,1)=sort(repmat(1980:1989,[1,12]));
mat(:,2)=repmat(1:12,[1,size(mat,1)/12]);
mat(:,3)=1;
monthseps=datenum(mat);
This gives you a list of all 120 changes of months in the eighties.
Now you want, for each month, the change per day, and sum it. If you take the original data it is easier, since you can just interpolate each day's value using matlab. If you only have the "LinearInterpss" you need to map it on the days using interp1 with the method 'previous'.
for ct = 2:length(monthseps)
days = monthseps(ct-1):(monthseps(ct)-1); %days in the month
%now we need each day assigned a certain change. This value depends on your "LinearInterpss". interp1 with method 'previous' searches LineairInterpss for the last value.
vals = interp1(DateNumss,LinearInterpss,days,'previous');
sum(vals); %the sum over the change in each day is the total change in a month
end
I have daily time series data and I want to calculate 5-day averages of that data while also retrieving the corresponding start date for each of the 5-day averages. For example:
x = [732099 732100 732101 732102 732103 732104 732105 732106 732107 732108];
y= [1 5 3 4 6 2 3 5 6 8];
Where x and y are actually size 92x1.
Firstly, how do I compute the 5-day mean when this time series data is not divisible by 5? Ultimately, I want to compute the 'jumping mean', where the average is not computed continuously (e.g., June 1-5, June 6-10, and so on).
I've tried doing the following:
Pentad_avg = mean(reshape(y(1:90),5,[]))'; %manually adjusted to be divisible by 5
Pentad_dt = x(1:5:90); %select every 5th day for time
However, Pentad_dt gives me dates 01-Jun-2004 and 06-Jun-2004 as output. And, that brings me to my second point.
I am looking to find 5-day averages for x and y that correspond to 5-day averages of another time series. This second time series has 5-day averaged data starting from 15-Jun-2004 until 29-Aug-2004 (instead of starting at 01-Jun-2004). Ultimately, how do I align the dates and 5-day averages between these two time series?
Synchronization between two time series can be accomplished using the timeseries object. Placing your data into an object allows Matlab to intelligently process it. The most useful thing is adds for your usage is the synchronize method.
You'll want to make sure to properly set the time vector on each of the timeseries objects.
An example of what this might look like is as follows:
ts1 = timeseries(y,datestr(x));
ts2 = timeseries(OtherData,OtherTimes);
[ts1 ts2] = synchronize(ts1,ts2,'Uniform','Interval',5);
This should return to you each timeseries aligned to be with the same times. You could also specify a specific time vector to align a timeseries to using the resample method.
I have time set up as serial dates. Each number corresponds to a day, in order, from 20100101 to 20130611. How do I convert the serial date to a date in the format month-year? I need this because I want to plot data and need the x axis to show the date.
Thanks!
The first step is to convert your date-format into one of the standard Matlab date formats. The best format to use for plots is the "serial date format". The numbers itself are a bit awkward, since they represent the "amount of time after 0/0/0000, in days", which is a huge number. Also, this date actually never existed, making it really weird when you want to work with dates that are BC.
However, the conversion is easy, since your format also counts the days, but you count after 31st of December, 2009. You can convert this using
numeric_date_vec = datenum(2009, 12, 31) + x;
You then plot your data using
plot(numeric_date_vec, y)
and you let Matlab add the date-ticks automatically by calling
datetick('mmm yyyy')
The problem is, the ticks do not update after zooming in. You can either call
datetick('mmm yyyy','keeplimits')
again, after each zooming or panning, or you download datetickzoom from the Matlab file exchange. It takes the same arguments as datetick, but it hooks into the zoom function and updates the ticks automatically.
Edit:
Sometimes, the dateticks are not spaced in any sensible way, then you can either try to zoom in and out a little until it snaps to something good, or you have to set the ticks manually:
% Set ticks to first day of the months in 2010
tick_locations = datenum(2012,[1:12],1);
% Set ticks on x-axis
set(gca, 'XTick', tick_locations)
% Call datetick again to get the right date labels, use option "keepticks"
datetick('mmm yyyy','keeplimits', 'keepticks')
You might have to modify the tick_locations = datenum(2012,[1:12],1) a bit to get the ticks that you want. For instance, you can use
tick_locations = datenum(2012,[1:2:25],1)
to get every second month between Jan 2012 and Jan 2013.
For day number n use
datestr(datenum(2009, 12, 31) + n, 'yyyy-mm')
for example
>> datestr(datenum(2009, 12, 31)+365, 'yyyy-mm')
ans =
2010-12
>> datestr(datenum(2009, 12, 31)+366, 'yyyy-mm')
ans =
2011-01
I have two arrays arr1 and arr2 containing datetime in dd/MM/yyyy HH:mm:ss and HH:mm:ss respectively.
For eg. arr1[0] contains 23/12/2011 09:15:30 while arr2[0] contains 09:15:30.
Note that arr1 will always contain today's date, which is exactly what I want for my iPad app which I'm preparing using Xcode 4.2.
I want to plot specific time (either from arr1 or from arr2 whichever is best suitable) on the X axis and corresponding float value contained in an array arr3 on Y-axis.
Now I'm stuck at plotting time on X axis as I can't pass time in dd/MM/yyyy HH:mm:ss or HH:mm:ss.
I googled about it and got the suggestion to use epoch but I'm unable to implement it for my app.
Other than using epoch, how can I pass a time value to Core Plot for the X axis?
Core Plot requires numeric plot data. You'll have to find a way to convert the dates to a number. One way is to convert each one to an epoch value as you mentioned. If the dates are not evenly spaced, that may be your only option.
If you know that the dates are spaced at regular intervals (e.g., hourly, daily, monthly, etc.) you could just figure an index that corresponds to each date. Make custom labels to display the dates and set the plot space range to cover the correct range of indices.