How can I read a file with readtable (docs) starting at row 6?
I tried the following but this only reads the first two columns (I have columns A:L):
opts = detectImportOptions(fileName);
opts.VariableNamesRange = 'A6';
opts.DataRange = 'A7';
raw = readtable(fileName,opts,'ReadVariableNames',true)
When I do
opts.VariableNamesRange = 'A6:L6';
opts.DataRange = 'A7:L7';
I get the error message:
Invalid 'VariableNamesRange'. The column size must match the number of
variables.
Before setting the VariableNamesRange and DataRange fields of opts, try setting the VariableNames field to something like opts.VariableNames = cellstr(['A':'L']').
A couple of notes on this:
The number of columns in the VariableNamesRange and DataRange fields needs to match the length of the VariableNames field. Check the results detectImportOptions to see how many columns it detected;
If you do this, then check the VariableTypes field to make sure that all variables are the correct type (double or char).
Related
I'm trying to convert a DataFrame String column to Date format in Julia, but if the column contains missing values an error is produced:
ERROR: MethodError: no method matching Int64(::Missing)
The code I've tried to run (which works for columns with no missing data) is:
df_pp[:tod] = Date.(df_pp[:tod], DateFormat("d/m/y"));
Other lines of code I have tried are:
df_pp[:tod] = Date.(passmissing(df_pp[:tod]), DateFormat("d/m/y"));
df_pp[.!ismissing.(df_pp[:tod]), :tod] = Date.(df_pp[:tod], DateFormat("d/m/y"));
The code relates to a column named tod in a data frame named df_pp. Both the DataFrames & Dates packages have been loaded prior to attempting this.
The passmissing way is
df_pp.tod = passmissing(x->Date(x, DateFormat("d/m/y"))).(df_pp.tod)
What happens here is this: passmissing takes a function, and returns a new function that handles missings (by returning missing). Inside the bracket, in x->Date(x, DateFormat("d/m/y")) I define a new, anonymous function, that calls the Date function with the appropriate DateFormat.
Finally, I use the function returned by passmissing immediately on df_pp.tod, using a . to broadcast along the column.
It's easier to see the syntax if I split it up:
myDate(x) = Date(x, DateFormat("d/m/y"))
Date_accepting_missing = passmissing(myDate)
df_pp[:tod] = Date_accepting_missing.(df_pp[:tod])
I have an excel file and I need to read it based on string values in the 4th column. I have written the following but it does not work properly:
[num,txt,raw] = xlsread('Coordinates','Centerville');
zn={};
ctr=0;
for i = 3:size(raw,1)
tf = strcmp(char(raw{i,4}),char(raw{i-1,4}));
if tf == 0
ctr = ctr+1;
end
zn{ctr}=raw{i,4};
end
data=zeros(1,10); % 10 corresponds to the number of columns I want to read (herein, columns 'J' to 'S')
ctr=0;
for j = 1:length(zn)
for i=3:size(raw,1)
tf=strcmp(char(raw{i,4}),char(zn{j}));
if tf==1
ctr=ctr+1;
data(ctr,:,j)=num(i-2,10:19);
end
end
end
It gives me a "15129x10x22 double" thing and when I try to open it I get the message "Cannot display summaries of variables with more than 524288 elements". It might be obvious but what I am trying to get as the output is 'N = length(zn)' number of matrices which represent the data for different strings in the 4th column (so I probably need a struct; I just don't know how to make it work). Any ideas on how I could fix this? Thanks!
Did not test it, but this should help you get going:
EDIT: corrected wrong indexing into raw vector. Also, depending on the format you might want to restrict also the rows of the raw matrix. From your question, I assume something like selector = raw(3:end,4); and data = raw(3:end,10:19); should be correct.
[~,~,raw] = xlsread('Coordinates','Centerville');
selector = raw(:,4);
data = raw(:,10:19);
[selector,~,grpidx] = unique(selector);
nGrp = numel(selector);
out = cell(nGrp,1);
for i=1:nGrp
idx = grpidx==i;
out{i} = cell2mat(data(idx,:));
end
out is the output variable. The key here is the variable grpidx that is an output of the unique function and allows you to trace back the unique values to their position in the original vector. Note that unique as I used it may change the order of the string values. If that is an issue for you, use the setOrderparameter of the unique function and set it to 'stable'
I have some data in a .txt file. that are separated by commas.
for example:
1.4,2,3,4,5
2,3,4.2,5,6
24,5,2,33.4,62
what if you want the average of columns, like first column (1.4,2 and 24)? or second column(2,3 and 5)?
I think putting the column in an array and using the built in mean function would work, but so far, I am only able to extract rows, not columns
instead of making another thread, I thought i'd edit this one. I am working on getting the average of each column of the well known iris data set.
I cut a small portion of the data:
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
delimiterln= ',';
data = importdata('iris.txt', delimiterln);
meanCol1 = mean(data(:,1))
meanCol2 = mean(data(:,2))
meanCol3 = mean(data(:,3))
meanCol4 = mean(data(:,4))
Undefined function 'sum' for input arguments of type 'cell'.
Error in mean (line 115)
y = sum(x, dim, flag)/size(x,dim);
Error in irisData(line 6)
meanCol1 = mean(data(:,1))
it looks like there is an error with handling data type...any thoughts on this? I tried getting rid of the last column, which are strings. and it seems to work without error. So i am thinking that it's because of the strings.
Use comma separated file reading function:
M = csvread(filename);
Now you have the matrix M:
col1Mean=mean(M(:,1));
I have a dataset that I would like to categorise and store in a structure based on the value in one column of the dataset. For example, the data can be categorised into element 'label_100', 'label_200' or 'label_300' as I attempt below:
%The labels I would like are based on the dataset
example_data = [repmat(100,1,100),repmat(200,1,100),repmat(300,1,100)];
data_names = unique(example_data);
%create a cell array of strings for the structure fieldnames
for i = 1:length(data_names)
cell_data_names{i}=sprintf('label_%d', data_names(i));
end
%create a cell array of data (just 0's for now)
others = num2cell(zeros(size(cell_data_names)));
%try and create the structure
data = struct(cell_data_names{:},others{:})
This fails and I get the following error message:
"Error using struct
Field names must be strings."
(Also, is there a more direct method to achieve what I am trying to do above?)
According to the documentation of struct,
S = struct('field1',VALUES1,'field2',VALUES2,...) creates a
structure array with the specified fields and values.
So you need to have each value right after its field name. The way you are calling struct now is
S = struct('field1','field2',VALUES1,VALUES2,...)
instead of the correct
S = struct('field1',VALUES1,'field2',VALUES2,...).
You can solve that by concatenating cell_data_names and others vertically and then using {:} to produce a comma-separated list. This will give the cells' contents in column-major order, so each field name fill be immediately followed by the corresponding value:
cell_data_names_others = [cell_data_names; others]
data = struct(cell_data_names_others{:})
I have many large dataset arrays in my workspace (loaded from a .mat file).
A minimal working example is like this
>> disp(old_ds)
Date firm1 firm2 firm3 firm4
734692 880,0 102,1 32,7 204,2
734695 880,0 102,0 30,9 196,4
734696 880,0 100,0 30,9 200,2
734697 880,0 101,4 30,9 200,2
734698 880,0 100,8 30,9 202,2
where the first row (with the strings) already are headers in the dataset, that is they are already displayed if I run old_ds.Properties.VarNames.
I'm wondering whether there is an easy and/or fast way to make the first column as ObsNames.
As a first approach, I've thought of "exporting" the data matrix (columns 2 to 5, in the example), the vector of dates and then creating a new dataset where the rows have names.
Namely:
>> mat = double(old_ds(:,2:5)); % taking the data, making it a matrix array
>> head = old_ds.Properties.VarNames % saving headers
>> head(1,1) = []; % getting rid of 'Date' from head
>> dates = dataset2cell(old_ds(:,1)); % taking dates as column cell array
>> dates(1) = []; % getting rid of 'Date' from dates
>> new_ds = mat2dataset(mat,'VarNames',head,'ObsNames',dates);
Apart from the fact that the last line returns the following error, ...
Error using setobsnames (line 25)
NEWNAMES must be a nonempty string or a cell array of nonempty strings.
Error in dataset (line 377)
a = setobsnames(a,obsnamesArg);
Error in mat2dataset (line 75)
d = dataset(vars{:},args{:});
...I would have found a solution, then created a function (such to generalize the process for all 22 dataset arrays that I have) and then run the function 22 times (once for each dataset array).
To put things into perspective, each dataset has 7660 rows and a number of columns that ranges from 2 to 1320.
I have no idea about how I could (and if I could) make the dataset directly "eat" the first column as ObsNames.
Can anyone give me a hint?
EDIT: attached a sample file.
Actually it should be quite easy (but the fact that I'm reading your question means that having the same problem, I first googled it before looking up the documentation... ;)
When loading the dataset, use the following command (adjusted to your case of course):
cell_dat{1} = dataset('File', 'YourDataFile.csv', 'Delimiter', ';',...
'ReadObsNames', true);
The 'ReadObsNames' default is false. It takes the header of the first column and saves it in the file or range as the name of the first dimension in A.Properties.DimNames.
(see the Documentation, Section: "Name/value pairs available when using text files or Excel spreadsheets as inputs")
I can't download your sample file, but if you haven't yet solved the problem otherwise, just try the suggested solution and tell if it works. Glad if I could help.
You are almost there, the error message you got is basically saying that Obsname have to be strings. In your case the 'dates' variable is cell array containing doubles. So you just need to convert them to string.
mat = double(piHU(:,2:end)); % taking the data, making it a matrix array
head = piHU.Properties.VarNames % saving headers
head(1) = []; % getting rid of 'Date' from head
dates = dataset2cell(piHU(:,1)); % taking dates as column cell array, here dates are of type double. try typing on the command window class(dates{2}), you can see the output is double.
dates(1) = []; % getting rid of 'Date' from dates
dates_str=cellfun(#(s) num2str(s),dates,'UniformOutput',false); % convert dates to string, now try typing class(dates_str{2}), the output should be char
new_ds = mat2dataset(mat,'VarNames',head,'ObsNames',dates_str); % construct new dataset.