I am a beginner with R. It took me the larger part of a day to not find a solution.
I need a 2D-scatterplot with dates on the x-axis. Here is what I have:
################## CSV file looks like this
#;Datum;Summe;dink;
#;1.1.2010;10;2;
#;1.2.2010;20;3;
#;10.3.2010;40;4;
#;1.4.2010;70;3;
#;15.5.2010;80;5;
#;1.6.2010;100;6;
#;24.7.2010;200;5;
#;1.8.2010;150;7;
#;1.9.2010;130;8;
csvu<-read.csv("2D-plot-Tabelle.csv", sep = ";", encoding="UTF-8") # from Numbers
# as.Date converts into class Date
csvu$Datum <- as.Date(csvu$Datum, "%d.%m.%Y") # this is the read format. capital Y = year with century
plot(csvu$Datum, csvu$Summe,
main="This is my plot",
xlab="Datum",
ylab="Summe",
type = "b", # both lines and dots
lwd = 2, # line width
# col = "darkgreen", # col = symbol surround color
col = rainbow(25),
pch=16, # pch = symbol interior color, see http://applied-r.com/r-graphics-plot-parameters/#axes
cex = 2) # cex = size of plot symbols
plot() apparently applies a default x-axis numbering which gives month values for 5 months and no day or year.
I need x-axis ticks and "numbering" with date in the format "%d.%m.%Y" both in monthly intervalls and according to the date. I prefer vertical writing of dates. Any suggestion?
########
Found the solution myself. Maybe interesting to others:
# add xaxt="n", as plot-parameter (disable x-axis numbering and ticks), than
axis.Date(1,at=csvu$Datum,labels=format(csvu$Datum,"%d %b %Y"),las=2) # las=2 vertical, 1 horiz.
Related
I am comparing the microbiome generated from MiSeq and PacBio at different taxonomic levels using VennDiagram.
I already extracted taxa at each taxonomic level from each technology as columns (MiSeq, PacBio) and created excel sheets for each level to draw venndiagrams of intersections.
For example, at the phylum level, I used this code:
ven <- venn.diagram(
x = list(miseq_vector, pacbio_vector), col=c("#0000FF", '#00FF00'),
category.names = c("MiSeq" , "PacBio " ), fill = c(alpha("#0000FF",0.5), alpha('#00FF00',0.5)), main ="Phylum level", main.fontface = "bold", main.fonfamily = "serif", main.cex = 3, main.just = c(0.5, 1), fontfamily = "serif", fontface = "bold", lwd = c(1,1), cat.cex = c(2,2) , cex= c(1,1,1), cat.col = c("black", "black"),compression =
"lzw",
filename = '#pollen_phylum.tiff',
output=TRUE, cat.pos = c(0, 30))
ven
The figure is generated as below,
[1]: https://i.stack.imgur.com/YWmZH.png
However, number of phyla for miseq is 20 and for pacbio is 8. The count is correct for miseq and the intersection, but I have an additional value for pacbio. I tried using zeros and the word "NotAvailable" for empty cells in pacbio columns since I am comparing 2 vectors of the same length, and in both cases, these values were counted as a field (as shown in the figure 2 taxa for pacbio only, however, it is only one). I do not know how to compare 2 vectors of different lengths without using zeros or NA since each is counted as a value. I need to match both columns and draw venndiagram.
Thanks!
I have a netcdf file that has monthly global data from 1991 to 2000 (10 years).
Using CDO, how can I modify the netcdf from monthly to daily timesteps by repeating the monthly values each day of each month?
for eaxample,
convert from
Month 1, value = 0.25
to
Day 1, value = 0.25
Day 2, value = 0.25
Day 3, value = 0.25
....
Day 31, value = 0.25
convert from
Month 2, value = 0.87
to
Day 1, value = 0.87
Day 2, value = 0.87
Day 3, value = 0.87
....
Day 28, value = 0.87
Thanks
##############
Update
my monthly netcdf has the monthly values not on the first day of each month, but in sparse order. e.g. on the 15th, 7th, 9th, etc.. however one value for each month.
The question is perhaps ambiguously worded. Adrian Tompkins' answer is correct for interpolation. However, you are actually asking to set the value for each day of the month to that for the first day of the month. You could do this by adding a second CDO call as follows:
cdo -inttime,1991-01-01,00:00:00,1day in.nc temp.nc
cdo -monadd -gtc,100000000000000000 temp.nc in.nc out.nc
Just set the value after gtc to something much higher than anything in your data.
You can use inttime which interpolates in time at the interval required, but this is not exactly what you asked for as it doesn't repeat the monthly values and your series will be smoothed by the interpolation.
If we assume your dataset starts on the 1st January at time 00:00 (you don't state in the question) then the command would be
cdo inttime,1991-01-01,00:00:00,1day in.nc out.nc
This performs a simple linear interpolation between steps.
Note: This is fine for fields like temperature and seems to be want you ask for, but readers should note that one has to be more careful with flux fields such as rainfall, where one might want to scale and/or change the units appropriately.
I could not find a solution with CDO but I solved the issue with R, as follows:
library(dplyr)
library(ncdf4)
library(reshape2)
## Read ncfile
ncpath="~/my/path/"
ncname="my_monthly_ncfile"
ncfname=paste(ncpath, ncname, ".nc", sep="")
ncin=nc_open(ncfname)
var=ncvar_get(ncin, "nc_var")
## melt ncfile
var=melt(var)
var=var[complete.cases(var), ] ## remove any NA
## split ncfile by gridpoint (lat and lon) into a list
var=split(var, list(var$lat, var$lon))
var=var[lapply(var,nrow)>0] ## remove any empty list element
## create new list and replicate, for each gridpoint, each monthly value n=30 times
var_rep=list()
for (i in 1:length(var)) {
var_rep[[i]]=data.frame(value=rep(var[[i]]$value, each=30))
}
I have a date vector in form
20140117
20130325
20130530
etc.
There are 5,000,000 lines in the double vector.
How can I transfer it to a datevector recognized by matlab?
I don't like changing it to string and putting the parts in separately. It takes too long!
Please Help!
a combination of fix and mod let's you extract the digits you want:
%Matrix Columns YY,MM,DD,hh,mm,ss
[mod(fix(x/10000),100000),mod(fix(x/100),100),mod(x,100),zeros(size(x,1),3)]
%datenum
datenum(mod(fix(x/10000),10000),mod(fix(x/100),100),mod(x,100))
If you really want to avoid casting to string then back to number then you can use this method:
D = [20140117; 20130325; 20130530];
YY = fix( D./10000 ) ;
MM = fix( (D-YY.*10000) /100 ) ;
DD = fix( (D-YY.*10000-MM.*100 ) );
DateInMatlabformat = datenum( YY , MM , DD ) ;
You can package that in a one liner if you want, but basically what it does is:
Divide by 10000 to get the year in the variable YY
Remove this part from your original date ((D-YY.*10000)), then divide by 100 to get the month.
remove all of that, you obtain the day.
The last line merge all of that in a Matlab standard time serial format. Read the doc on datenum and datestr for more information.
I simply want to generate a series of dates 1 year apart from today.
I tried this
CurveLength=30;
t=zeros(CurveLength);
t(1)=datestr(today);
x=2:CurveLength-1;
t=addtodate(t(1),x,'year');
I am getting two errors so far?
??? In an assignment A(I) = B, the number of elements in B and
Which I am guessing is related to the fact that the date is a string, but when I modified the string to be the same length as the date dd-mmm-yyyy i.e. 11 letters I still get the same error.
Lsstly I get the error
??? Error using ==> addtodate at 45
Quantity must be a numeric scalar.
Which seems to suggest that the function can't be vectorised? If this is true is there anyway to tell in advance which functions can be vectorised and which can not?
To add n years to a date x, you do this:
y = addtodate(x, n, 'year');
However, addtodate requires the following:
x must be a scalar number, not a string.
n must be a scalar number, not a vector.
Hence the errors you get.
I suggest you use a loop to do this:
CurveLength = 30;
t = zeros(CurveLength, 1);
t(1) = today; % # Whatever today equals to...
for ii = 2:CurveLength
t(ii) = addtodate(t(1), ii - 1, 'year');
end
Now that you have all your date values, you can convert it to strings with:
datestr(t);
And here's a neat one-liner using arrayfun;
datestr(arrayfun(#(n)addtodate(today, n, 'year'), 0:CurveLength))
If you're sequence has a constant known start, you can use datenum in the following way:
t = datenum( startYear:endYear, 1, 1)
This works fine also with months, days, hours etc. as long as the sequence doesn't run into negative numbers (like 1:-1:-10). Then months and days behave in a non-standard way.
Here a solution without a loop (possibly faster):
CurveLength=30;
t=datevec(repmat(now(),CurveLength,1));
x=[0:CurveLength-1]';
t(:,1)=t(:,1)+x;
t=datestr(t)
datevec splits the date into six columns [year, month, day, hour, min, sec]. So if you want to change e.g. the year you can just add or subtract from it.
If you want to change the month just add to t(:,2). You can even add numbers > 12 to the month and it will increase the year and month correctly if you transfer it back to a datenum or datestr.
I have a dataset for which I have extracted the date at which an event occurred. The date is in the format of MMDDYY although MatLab does not show leading zeros so often it's MDDYY.
Is there a method to find the mean or median (I could use either) date? median works fine when there is an odd number of days but for even numbers I believe it is averaging the two middle ones which doesn't produce sensible values. I've been trying to convert the dates to a MatLab format with regexp and put it back together but I haven't gotten it to work. Thanks
dates=[32381 41081 40581 32381 32981 41081 40981 40581];
You can use datenum to convert dates to a serial date number (1 at 01/01/0000, 2 at 02/01/0000, 367 at 01/01/0001, etc.):
strDate='27112011';
numDate = datenum(strDate,'ddmmyyyy')
Any arithmetic operation can then be performed on these date numbers, like taking a mean or median:
mean(numDates)
median(numDates)
The only problem here, is that you don't have your dates in a string type, but as numbers. Luckily datenum also accepts numeric input, but you'll have to give the day, month and year separated in a vector:
numDate = datenum([year month day])
or as rows in a matrix if you have multiple timestamps.
So for your specified example data:
dates=[32381 41081 40581 32381 32981 41081 40981 40581];
years = mod(dates,100);
dates = (dates-years)./100;
days = mod(dates,100);
months = (dates-days)./100;
years = years + 1900; % set the years to the 20th century
numDates = datenum([years(:) months(:) days(:)]);
fprintf('The mean date is %s\n', datestr(mean(numDates)));
fprintf('The median date is %s\n', datestr(median(numDates)));
In this example I converted the resulting mean and median back to a readable date format using datestr, which takes the serial date number as input.
Try this:
dates=[32381 41081 40581 32381 32981 41081 40981 40581];
d=zeros(1,length(dates));
for i=1:length(dates)
d(i)=datenum(num2str(dates(i)),'ddmmyy');
end
m=mean(d);
m_str=datestr(m,'dd.mm.yy')
I hope this info to be useful, regards
Store the dates as YYMMDD, rather than as MMDDYY. This has the useful side effect that the numeric order of the dates is also the chronological order.
Here is the pseudo-code for a function that you could write.
foreach date:
year = date % 100
date = (date - year) / 100
day = date % 100
date = (date - day) / 100
month = date
newdate = year * 100 * 100 + month * 100 + day
end for
Once you have the dates in YYMMDD format, then find the median (numerically), and this is also the median chronologically.
You see above how to present dates as numbers.
I will add no your issue of finding median of the list. The default matlab median function will average the two middle values when there are an even number of values.
But you can do it yourself! Try this:
dates; % is your array of dates in numeric form
sdates = sort(dates);
mediandate = sdates(round((length(sdates)+1)/2));