How to extract columns of data from .txt files MATLAB - matlab

I have some data in a .txt file. that are separated by commas.
for example:
1.4,2,3,4,5
2,3,4.2,5,6
24,5,2,33.4,62
what if you want the average of columns, like first column (1.4,2 and 24)? or second column(2,3 and 5)?
I think putting the column in an array and using the built in mean function would work, but so far, I am only able to extract rows, not columns
instead of making another thread, I thought i'd edit this one. I am working on getting the average of each column of the well known iris data set.
I cut a small portion of the data:
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
delimiterln= ',';
data = importdata('iris.txt', delimiterln);
meanCol1 = mean(data(:,1))
meanCol2 = mean(data(:,2))
meanCol3 = mean(data(:,3))
meanCol4 = mean(data(:,4))
Undefined function 'sum' for input arguments of type 'cell'.
Error in mean (line 115)
y = sum(x, dim, flag)/size(x,dim);
Error in irisData(line 6)
meanCol1 = mean(data(:,1))
it looks like there is an error with handling data type...any thoughts on this? I tried getting rid of the last column, which are strings. and it seems to work without error. So i am thinking that it's because of the strings.

Use comma separated file reading function:
M = csvread(filename);
Now you have the matrix M:
col1Mean=mean(M(:,1));

Related

Read specific portions of an excel file based on string values in MATLAB

I have an excel file and I need to read it based on string values in the 4th column. I have written the following but it does not work properly:
[num,txt,raw] = xlsread('Coordinates','Centerville');
zn={};
ctr=0;
for i = 3:size(raw,1)
tf = strcmp(char(raw{i,4}),char(raw{i-1,4}));
if tf == 0
ctr = ctr+1;
end
zn{ctr}=raw{i,4};
end
data=zeros(1,10); % 10 corresponds to the number of columns I want to read (herein, columns 'J' to 'S')
ctr=0;
for j = 1:length(zn)
for i=3:size(raw,1)
tf=strcmp(char(raw{i,4}),char(zn{j}));
if tf==1
ctr=ctr+1;
data(ctr,:,j)=num(i-2,10:19);
end
end
end
It gives me a "15129x10x22 double" thing and when I try to open it I get the message "Cannot display summaries of variables with more than 524288 elements". It might be obvious but what I am trying to get as the output is 'N = length(zn)' number of matrices which represent the data for different strings in the 4th column (so I probably need a struct; I just don't know how to make it work). Any ideas on how I could fix this? Thanks!
Did not test it, but this should help you get going:
EDIT: corrected wrong indexing into raw vector. Also, depending on the format you might want to restrict also the rows of the raw matrix. From your question, I assume something like selector = raw(3:end,4); and data = raw(3:end,10:19); should be correct.
[~,~,raw] = xlsread('Coordinates','Centerville');
selector = raw(:,4);
data = raw(:,10:19);
[selector,~,grpidx] = unique(selector);
nGrp = numel(selector);
out = cell(nGrp,1);
for i=1:nGrp
idx = grpidx==i;
out{i} = cell2mat(data(idx,:));
end
out is the output variable. The key here is the variable grpidx that is an output of the unique function and allows you to trace back the unique values to their position in the original vector. Note that unique as I used it may change the order of the string values. If that is an issue for you, use the setOrderparameter of the unique function and set it to 'stable'

Convert dataset column to obsnames

I have many large dataset arrays in my workspace (loaded from a .mat file).
A minimal working example is like this
>> disp(old_ds)
Date firm1 firm2 firm3 firm4
734692 880,0 102,1 32,7 204,2
734695 880,0 102,0 30,9 196,4
734696 880,0 100,0 30,9 200,2
734697 880,0 101,4 30,9 200,2
734698 880,0 100,8 30,9 202,2
where the first row (with the strings) already are headers in the dataset, that is they are already displayed if I run old_ds.Properties.VarNames.
I'm wondering whether there is an easy and/or fast way to make the first column as ObsNames.
As a first approach, I've thought of "exporting" the data matrix (columns 2 to 5, in the example), the vector of dates and then creating a new dataset where the rows have names.
Namely:
>> mat = double(old_ds(:,2:5)); % taking the data, making it a matrix array
>> head = old_ds.Properties.VarNames % saving headers
>> head(1,1) = []; % getting rid of 'Date' from head
>> dates = dataset2cell(old_ds(:,1)); % taking dates as column cell array
>> dates(1) = []; % getting rid of 'Date' from dates
>> new_ds = mat2dataset(mat,'VarNames',head,'ObsNames',dates);
Apart from the fact that the last line returns the following error, ...
Error using setobsnames (line 25)
NEWNAMES must be a nonempty string or a cell array of nonempty strings.
Error in dataset (line 377)
a = setobsnames(a,obsnamesArg);
Error in mat2dataset (line 75)
d = dataset(vars{:},args{:});
...I would have found a solution, then created a function (such to generalize the process for all 22 dataset arrays that I have) and then run the function 22 times (once for each dataset array).
To put things into perspective, each dataset has 7660 rows and a number of columns that ranges from 2 to 1320.
I have no idea about how I could (and if I could) make the dataset directly "eat" the first column as ObsNames.
Can anyone give me a hint?
EDIT: attached a sample file.
Actually it should be quite easy (but the fact that I'm reading your question means that having the same problem, I first googled it before looking up the documentation... ;)
When loading the dataset, use the following command (adjusted to your case of course):
cell_dat{1} = dataset('File', 'YourDataFile.csv', 'Delimiter', ';',...
'ReadObsNames', true);
The 'ReadObsNames' default is false. It takes the header of the first column and saves it in the file or range as the name of the first dimension in A.Properties.DimNames.
(see the Documentation, Section: "Name/value pairs available when using text files or Excel spreadsheets as inputs")
I can't download your sample file, but if you haven't yet solved the problem otherwise, just try the suggested solution and tell if it works. Glad if I could help.
You are almost there, the error message you got is basically saying that Obsname have to be strings. In your case the 'dates' variable is cell array containing doubles. So you just need to convert them to string.
mat = double(piHU(:,2:end)); % taking the data, making it a matrix array
head = piHU.Properties.VarNames % saving headers
head(1) = []; % getting rid of 'Date' from head
dates = dataset2cell(piHU(:,1)); % taking dates as column cell array, here dates are of type double. try typing on the command window class(dates{2}), you can see the output is double.
dates(1) = []; % getting rid of 'Date' from dates
dates_str=cellfun(#(s) num2str(s),dates,'UniformOutput',false); % convert dates to string, now try typing class(dates_str{2}), the output should be char
new_ds = mat2dataset(mat,'VarNames',head,'ObsNames',dates_str); % construct new dataset.

matlab How to use a textstring as input parameter in functions

I would like to use a dataset filename "AUDUSD" in several functions. It would be easier for me, just to change the filename "AUDUSD" to a more general name like "FX" and then using the abbreviation "FX" in other_matlab functions, e.g. double(). But matlab does not know the name "FX" (that should be assigned to the dataset "AUDUSD") in the code below... Any suggestions?
CODE:
FX = 'AUDUSD';
load(FX); %OKAY !!! FX works as input to open file AUDUSD!
Svars = {'S_bid','S_offer'};
Fvars = {'F_bid','F_offer'};
vS = double(FX,Svars); % FX does NOT work as input for the file AUDUSD
There is no double() function that accepts multiple cell arrays as arguments (this is what happens when you call double(FX,Svars)).
If you call double(FX), then each character in FX is interpreted for its ASCII value and then cast to double. So you get [ 65.0 85.0 68.0 85.0 83.0 68.0 ]. This is the behavior for the double() function if you provide a vector: each individual value in the vector is cast to double.
You'd have to provide more details on what you're trying to accomplish to give any more suggestions.
I have a different example, maybe you will better understand my point. The key work I would like to process is as follows:
I have got a folder with "dataset" files. I would like to loop through this folder, entering in any datasetfile, extracting the 2nd and 3rd column of each dataset file, and constructing only ONE new datasetfile with all 2nd and 3rd columns of the datasetfiles.
One problem is that the size of the datasetfiles are not the same, so I tried to translate a datasetfile into a double-matrix and then consolidate all double matrices into ONE double matrx.
Here my code:
folder_string = 'Diss_Data/Raw';
FolderContent = dir(folder_string);
No_ds = numel(FolderContent);
for i = 1:No_ds
if isdir(FolderContent(i).name)==0
file_string = FolderContent(i).name;
file_path = strcat(folder_string,'/',file_string)
dataset_filename = file_string(1:6);
load(file_path); %loads the suggested datasetfile; OKAY
M = double(dataset_filename);% returns an ASCII code number; WRONG; should transfer the datasetfile into a matrix M
vS = M(:,2:3);
%... to be continued
end
end

How to select rows from character array that match char string and save them in a new array?

I have a char array A which basically contains a list of files names (each row one file)
(char, 526x26)
val =
0815_5275_UBA_A_1971.txt
0815_5275_UBA_A_1972.txt
0823_6275_UBA_A_1971.txt
0823_6275_UBA_A_1972.txt
0823_6275_UBA_A_1973.txt
...
I also have a variable
B = '0815_5275'
I'd like to select all rows (filenames) that start with B and save them in a new array C.
This should be simple, but somehow I can't make it work.
I've got this:
C = A(A(:,1:9) == B);
but I get the error message:
Error using ==
Matrix dimensions must agree.
I do not know in advance how many rows will match, so I can not pre-define an empty array.
thanks, any help is appreciated!
Try ismember(A(:, 1:numel(B)), B, 'rows') rather to get a logical vector that indexes only the rows you want
and now
A(C,:) to extract the rows
The reason you're getting a dimension mismatch error is because your A(:,1:9) has many rows but B only has one and Matlab does not automatically broadcast like Octave or Python. You could do it using either repmat or bsxfun but in this case ismember is the correct function to choose.

How to use "csvread" when the contents in the file have different formats?

I have a .csv file and the format is shown below:
mapping.csv
5188.40811,TMobileML
5131.40903,TMobileGregsapt
5119.40791,TMobileJonsapartment
5123.40762,TMobileRedhat
i want to store it in an 4 by 2 array, when i have a value such as 5131.40903(this is a 'string' not 'int'), i want to find the mapping relation which is TMobileGregsapt. But i meet two problem, the first is i can't use csvread('mapping.csv'), it will have some error:
(I think the problem might be 5131.40903 will be int when i use csvread, but TMobileGregsapt is a string...)
??? Error using ==> dlmread at 145
Mismatch between file and format string.
Trouble reading number from file (row 1, field 2) ==> TMobi
Error in ==> csvread at 52
m=dlmread(filename, ',', r, c);
even though i use dlmread('cell4.csv', ','), it still have some error:
??? Error using ==> dlmread at 145
Mismatch between file and format string.
Trouble reading number from file (row 1, field 2) ==> TMobi
The second problem is how can i finding the mapping relation in easy way, the naive method is using a forloop to find the position of array.
Thanks for your help:)
Both csvread and dlmread only work for numeric data. Something like this should work for you
out=textread('tmp.csv', '%s', 'whitespace',',');
nums = out(1:2:end);
strs = out(2:2:end);
% find index of 'TMobileGregsapt'
ind = find(strcmp('TMobileGregsapt',strs));
nums(ind)
Another answer that will work if you have mixed text/numeric csv but you either don't know what the format is, or it's heterogeneous, use the 'csv2cell' function from: http://www.mathworks.com/matlabcentral/fileexchange/20836-csv2cell
c = csv2cell( 'yourFile.csv', 'fromfile');
c(1, :) will give your headers and c(2:end, k) will give you each of the columns (sans the header), for k = 1:size(c, 2)