I have a fairly simple question: I have a cell vector that looks like this:
temp_y_date{1} = '2012Q2'
temp_y_date{2} = '2012Q1'
temp_y_date{3} = '2011Q4'
I would like to transform this cell vector into a date vector using the function datenum. I initialy transform the vector to the format 'QQ-YYYY' as follows:
for i = 1:length(temp_y_date)
temp = temp_y_date(i);
year = cellfun(#(c) {c(1:4)}, temp);
quarter = cellfun(#(c) {c(5:6)}, temp);
temp_y_date(i) = strcat(quarter,'-',year);
end
The values of temp_y_date are now
temp_y_date (1) = 'Q2-2012'
temp_y_date (2) = 'Q1-2012'
temp_y_date (3) = 'Q4-2011'
I thought I could now apply the datenum function:
temp_y_date = datenum(temp_y_date,'QQ-YYYY');
However, I get the error:
??? Error using ==> datenum at 178
DATENUM failed.
Caused by:
Error using ==> dtstr2dtnummx
Failed on converting date string to date number.
I tested on both R2009b and R2012a. Your code works in the latter, but I get the same error in R2009b.
When I dropped the Q in temp_y_date it was fixed in the older version. So apparently older versions don't accept the a quarter definition.
Working code:
strdate = {'2012Q2', '2012Q1', '2011Q4'};
strdate2 = cellfun(#(c) strcat(c(6),'-',c(1:4)),strdate,'uni',false);
result = datenum(strdate2,'QQ-YYYY');
I changed you're variable names, so it's easier for debugging. Also fixed the loop with cellfun in it, you were using it quite inefficiently. I suggest you learn what a cell is, and how to index it.
Note: this will break if you upgrade to another version, or if someone else uses it in a later version. So if you don't want that, I suggest you use verLessThan to distinguish between different versions of matlab:
EDIT:
From help datenum:
Formats with 'Q' are not accepted by DATENUM.
So I guess you'll have to either upgrade or implement this one yourself, doesn't look that hard though:
strdate = {'2012Q2', '2012Q1', '2011Q4'};
if verLessThan('matlab','7.12') % or maybe 7.11, it depends on which version the datenum functionality changed...
strdate2 = cellfun(#(c) sprintf('%s/%02d/01',c(1:4),3*str2double(c(6))-2),strdate,'uni',false);
result = datenum(strdate2,'YYYY/mm/dd');
else
strdate2 = cellfun(#(c) strcat(c(5:6),'-',c(1:4)),strdate,'uni',false);
result = datenum(strdate2,'QQ-YYYY');
end
So as you can see, older versions of datenum need full date spec: 'YYYY/mm/dd'.
Related
Having this long task that I will resume:
Perform a regression model over the normalized active cases in China using the model.....(long assignment that I'm not worried about and will save you time). Tip: To convert from datetime to a numeric variable for the regression, use x=day(date-min(date(:)))+1; being “date” the datetime vector return from getdata function.
This is what I have:
function RP_ejercicio1
data = readtable('COVID-19.csv');
[active_res, confirmed_res, death_res, recovered_res, date] = getdata(data, 'China', 93/147);
x=day(date-min(date(:)))+1;
y = active_res;
yp = log(y./x);
a = [x ones(size(x))];
sol = inv(a'*a)*(a'*yp);
b = sol(1);
c = sol(2);
a = exp(c);
end
I get this error: Check for missing argument or incorrect argument data type in call to function 'day'. In this line: x=day(date-min(date(:)))+1;. The one that is supposed to help as a tip is giving me a headache. I can ensure that date is a 1x50 datetime array after executing the getdata function.
Am I doing something wrong? Is the tip wrong? And if it's the second case, is there other way to do the same?
I add an image for more clarity:
Date array
As somebody said here, you should be using the function days.
I have an excel file and I need to read it based on string values in the 4th column. I have written the following but it does not work properly:
[num,txt,raw] = xlsread('Coordinates','Centerville');
zn={};
ctr=0;
for i = 3:size(raw,1)
tf = strcmp(char(raw{i,4}),char(raw{i-1,4}));
if tf == 0
ctr = ctr+1;
end
zn{ctr}=raw{i,4};
end
data=zeros(1,10); % 10 corresponds to the number of columns I want to read (herein, columns 'J' to 'S')
ctr=0;
for j = 1:length(zn)
for i=3:size(raw,1)
tf=strcmp(char(raw{i,4}),char(zn{j}));
if tf==1
ctr=ctr+1;
data(ctr,:,j)=num(i-2,10:19);
end
end
end
It gives me a "15129x10x22 double" thing and when I try to open it I get the message "Cannot display summaries of variables with more than 524288 elements". It might be obvious but what I am trying to get as the output is 'N = length(zn)' number of matrices which represent the data for different strings in the 4th column (so I probably need a struct; I just don't know how to make it work). Any ideas on how I could fix this? Thanks!
Did not test it, but this should help you get going:
EDIT: corrected wrong indexing into raw vector. Also, depending on the format you might want to restrict also the rows of the raw matrix. From your question, I assume something like selector = raw(3:end,4); and data = raw(3:end,10:19); should be correct.
[~,~,raw] = xlsread('Coordinates','Centerville');
selector = raw(:,4);
data = raw(:,10:19);
[selector,~,grpidx] = unique(selector);
nGrp = numel(selector);
out = cell(nGrp,1);
for i=1:nGrp
idx = grpidx==i;
out{i} = cell2mat(data(idx,:));
end
out is the output variable. The key here is the variable grpidx that is an output of the unique function and allows you to trace back the unique values to their position in the original vector. Note that unique as I used it may change the order of the string values. If that is an issue for you, use the setOrderparameter of the unique function and set it to 'stable'
I've got a .xls file and I want to import it into Matlab by xlsread function..I get NaNs for numbers with engineering notation..like I get NaNs for 15.252 B or 1.25 M
Any suggestions?
Update: I can use [num,txt,raw] = xlsread('...') and the raw one is exactly what I want but how can I replace the Ms with (*106)?
First you could extract everything from excel in a cell array using
[~,~,raw] = xlsread('MyExcelFilename.xlsx')
Then you could write a simple function that returns a number from the string based on 'B', 'M' and so on. Here is such an example:
function mynumber = myfunc( mystring )
% get the numeric part
my_cell = regexp(mystring,'[0-9.]+','match');
mynumber = str2double(my_cell{1});
% get ending characters
my_cell = regexp(mystring,'[A-z]+','match');
mychars = my_cell{1};
% multiply the number based on char
switch mychars
case 'B'
mynumber = mynumber*1e9;
case 'M'
mynumber = mynumber*1e6;
otherwise
end
end
Of course there are other methods to split the numeric string from the rest, use what you want. For more info see the regexp documentation. Finally use cellfun to convert cell array to numeric array:
my_array = cellfun(#myfunc,raw);
EDIT:
Matlab does not offer any built-in formatting of strings in engineering format.
Source: http://se.mathworks.com/matlabcentral/answers/892-engineering-notation-printed-into-files
In the source you will find also function which would be helpful for you.
I've had a look around and can't quite seem to get a grasp of is going on with this. I'm using R in Eclipse. The file I'm trying to import is 700mb with around 15mil rows and 6 columns. As I was having problems loading in I have started using the ff package.
library(ff)
FDF = read.csv.ffdf(file='C:\\Users\\William\\Desktop\\R Data\\GBPUSD.1986.2014.txt', header = FALSE, colClasses=c('factor','factor','numeric','numeric','numeric','numeric'), sep=',')
names(FDF)= c('Date','Time','Open','High','Low','Close')
#names the columns in the ffdf file
dim(FDF)
# produces dimensions of the file
I then want to create a POSIXct sequence which will later be joined against the imported file. I had tried;
tm1 = seq(as.POSIXct("1986/12/1 00:00"), as.POSIXct("2014/09/04 23:59"),"mins"))
tm1 = data.frame (DateTime=strftime(tm1,format='%Y.%m.%d %H:%M'))
However R kept of crashing. I then tested this is RStudio and saw that their where constraints on the vector. It did, however, produce the correct
dim(tm1)
names(tm1)
So I went back into Eclipse thinking this was something to do with memory allocation. I've attempted the following;
library(ff)
tm1 = as.ffdf(seq(as.POSIXct("1986/12/1 00:00"), as.POSIXct("2014/09/04 23:59"),"mins"))
tm1 = as.ffdf(DateTime=strftime(tm1,format='%Y.%m.%d %H:%M'))
names(tm1) = c('DateTime')
dim(tm1)
names(tm1)
This gives an error of
no applicable method for 'as.ffdf' applied to an object of class "c('POSIXct', 'POSIXt')"
I can't seem to work around this. I then tried ...
library(ff)
tm1 = as.ff(seq(as.POSIXct("1986/12/1 00:00"), as.POSIXct("2014/09/04 23:59"),"mins"))
tm1 = as.ff(DateTime=strftime(tm1,format='%Y.%m.%d %H:%M'))
Which produce the output dates, however not in the correct format. In addition to this, when ...
dim(tm1)
names(tm1)
where executed they both returned null.
Question
How can I produce a POSIXct seq in the format I require above?
We'll we got there in the end.
I believe the problem was the available RAM during the creation of the full vector. As this was the case I broke the vector into 3, converted them into ffdf format to free up RAM and then used rbind to bind them together.
The problem with formatting the vector once created, I believe, was due to accessing RAM. Every time I tried this R crashed.
Even with the work around below my machine is slowing (4gb). I've ordered some more RAM and hope this will smooth future operations.
Below is the working code;
library(ff)
library(ffbase)
tm1 = seq(from = as.POSIXct('1986-12-01 00:00'), to = as.POSIXct('2000-12-01 23:59'), by = 'min')
tm1 = data.frame(DateTime=strftime(tm1, format='%Y.%m.%d %H:%M'))
# create data frame within memory contrainst
tm1 = as.ffdf(tm1)
# converts to ffdf format
memory.size()
tm2 = seq(from = as.POSIXct('2000-12-02 00:00'), to = as.POSIXct('2010-12-01 23:59'), by = 'min')
tm2 = data.frame(DateTime=strftime(tm2, format='%Y.%m.%d %H:%M'))
# create data frame within memory contrainst
tm2 = as.ffdf(tm2)
memory.size()
tm3 = seq(from = as.POSIXct('2010-12-2 00:00'), to = as.POSIXct('2014-09-04 23:59'), by = 'min')
tm3 = data.frame(DateTime=strftime(tm3, format='%Y.%m.%d %H:%M'))
memory.size()
tm3 = as.ffdf(tm3)
# converts to ffdf format
memory.size()
tm4 = rbind(tm1, tm2, tm3)
# binds ffdf objects into one
dim(tm4)
# checks the row numbers
I have many large dataset arrays in my workspace (loaded from a .mat file).
A minimal working example is like this
>> disp(old_ds)
Date firm1 firm2 firm3 firm4
734692 880,0 102,1 32,7 204,2
734695 880,0 102,0 30,9 196,4
734696 880,0 100,0 30,9 200,2
734697 880,0 101,4 30,9 200,2
734698 880,0 100,8 30,9 202,2
where the first row (with the strings) already are headers in the dataset, that is they are already displayed if I run old_ds.Properties.VarNames.
I'm wondering whether there is an easy and/or fast way to make the first column as ObsNames.
As a first approach, I've thought of "exporting" the data matrix (columns 2 to 5, in the example), the vector of dates and then creating a new dataset where the rows have names.
Namely:
>> mat = double(old_ds(:,2:5)); % taking the data, making it a matrix array
>> head = old_ds.Properties.VarNames % saving headers
>> head(1,1) = []; % getting rid of 'Date' from head
>> dates = dataset2cell(old_ds(:,1)); % taking dates as column cell array
>> dates(1) = []; % getting rid of 'Date' from dates
>> new_ds = mat2dataset(mat,'VarNames',head,'ObsNames',dates);
Apart from the fact that the last line returns the following error, ...
Error using setobsnames (line 25)
NEWNAMES must be a nonempty string or a cell array of nonempty strings.
Error in dataset (line 377)
a = setobsnames(a,obsnamesArg);
Error in mat2dataset (line 75)
d = dataset(vars{:},args{:});
...I would have found a solution, then created a function (such to generalize the process for all 22 dataset arrays that I have) and then run the function 22 times (once for each dataset array).
To put things into perspective, each dataset has 7660 rows and a number of columns that ranges from 2 to 1320.
I have no idea about how I could (and if I could) make the dataset directly "eat" the first column as ObsNames.
Can anyone give me a hint?
EDIT: attached a sample file.
Actually it should be quite easy (but the fact that I'm reading your question means that having the same problem, I first googled it before looking up the documentation... ;)
When loading the dataset, use the following command (adjusted to your case of course):
cell_dat{1} = dataset('File', 'YourDataFile.csv', 'Delimiter', ';',...
'ReadObsNames', true);
The 'ReadObsNames' default is false. It takes the header of the first column and saves it in the file or range as the name of the first dimension in A.Properties.DimNames.
(see the Documentation, Section: "Name/value pairs available when using text files or Excel spreadsheets as inputs")
I can't download your sample file, but if you haven't yet solved the problem otherwise, just try the suggested solution and tell if it works. Glad if I could help.
You are almost there, the error message you got is basically saying that Obsname have to be strings. In your case the 'dates' variable is cell array containing doubles. So you just need to convert them to string.
mat = double(piHU(:,2:end)); % taking the data, making it a matrix array
head = piHU.Properties.VarNames % saving headers
head(1) = []; % getting rid of 'Date' from head
dates = dataset2cell(piHU(:,1)); % taking dates as column cell array, here dates are of type double. try typing on the command window class(dates{2}), you can see the output is double.
dates(1) = []; % getting rid of 'Date' from dates
dates_str=cellfun(#(s) num2str(s),dates,'UniformOutput',false); % convert dates to string, now try typing class(dates_str{2}), the output should be char
new_ds = mat2dataset(mat,'VarNames',head,'ObsNames',dates_str); % construct new dataset.