I have many, many sets of data in .csv format that I've organized by a file name standard so I can use regular expressions for the second time ever. I have, however, run into a slight problem. My data files are titled things like, "2012001_C335_2000MHZ_P_1111.CSV". There are four years of interest, two frequencies, and four different C335-style labels to describe locations. I have a significant amount of data processing to do on each of these files, so I'd like to read them all into one giant structure and then do my processing on different parts of it. I'm writing:
for ix_id = 1:length(ids)
for ix_years = 1:2:length(ids_years{ix_id})
for ix_frq = 1:length(frqs)
st = [ids_years{ix_id}{ix_year} '_' ids{ix_id} '_' ids_frqs{ix_id}{ix_frq}'_P_1111.CSV'];
data.(ids_frqs{ix_id}{ix_frq}).(ids{ix_id}).(['Y' ids_years{ix_id}{ix_year}]) =...
dlmread(st);
end
end
end
All ids variables are 1x4 cell arrays where each cell contains strings.
This produces the errors:
"Error: a cs-list cannot be further indexed"
and
"Error: invalid assignment to cs-list outside multiple assignment"
I did an internet search for these errors and found a few posts with dates ranging from 2010 to 2012, such as this one and this one, where the author suggests it's a problem with Octave itself. I can do a workaround that involves defining two separate structures by removing the innermost for loop over ix_frq and replacing the lines beginning with "st" and "data" with
data.1500.(ids{ix_id}).(['Y' ids_years{ix_id}{ix_year}]) = ...
dlmread([ids_years{ix_id}{ix_year} '_' ids{ix_id} '_' ids{ix_id} '_1500MHZ_P_1111.CSV']);
data.2000.(ids{ix_id}).(['Y' ids_years{ix_id}{ix_year}]) = ...
dlmread([ids_years{ix_id}{ix_year} '_' ids{ix_id} '_' ids{ix_id} '_2000MHZ_P_1111.CSV']);
so it seems that the trouble arises when I try to make a more nested structure. I'm wondering if this is unique to Octave or the same in Matlab, and also if there's a slicker workaround than defining two separate structures since I'd like this to be as portable as possible. If you have any insight as to the meaning of the error message, I'm interested in that too. Thanks!
EDIT: Here is the full script - now generates a few dummy .csv files. Runs on Octave v. 3.8
clear all
%this program tests the creation of various structures. The end goal is to have a structure of the format frequency.beamname.year(1) = matrix of the appropriate file
A = rand(3,2);
csvwrite('2009103_C115_1500MHZ.CSV',A)
csvwrite('2009103_C115_2000MHZ.CSV',A)
csvwrite('2010087_C115_1500MHZ.CSV',A)
csvwrite('2010087_C115_2000MHZ.CSV',A)
csvwrite('2009103_C335_1500MHZ.CSV',A)
csvwrite('2009103_C335_2000MHZ.CSV',A)
csvwrite('2010087_C335_1500MHZ.CSV',A)
csvwrite('2010087_C335_2000MHZ.CSV',A)
data = dir('*.CSV'); %imports all of the files of a directory
files = {data.name}; %cell array of filenames
nfiles = numel(files);
%find all the years
years = unique(cellfun(#(x)x{1},regexp(files,'\d{7}','match'),'UniformOutput',false));
%find all the beam names
ids = unique(cellfun(#(x)x{1},regexp(files,'([C-I]\d{3})|([C-I]\d{1}[C-I]\d{2})','match'),'UniformOutput',false));
%find all the frequencies
frqs = unique(cellfun(#(x)x{1},regexp(files,'\d{4}MHZ','match'),'UniformOutput',false));
%now, vectorize to cover all the beams
for id_ix = 1:length(ids)
expression_yrs = ['(\d{7})(?=_' ids{id_ix} ')'];
listl_yrs = regexp(files,expression_yrs,'match');
ids_years{id_ix} = unique(cellfun(#(x)x{1},listl_yrs(cellfun(#(x)~isempty(x),listl_yrs)),'UniformOutput',false)); %returns the years for data collected with both the 1500 and 2000 MHZ antennas along each of thebeams
expression_frqs = ['(?<=' ids{id_ix} '_)(\d{4}MHZ)'];
listfrq = regexp(files,expression_frqs,'match'); %finds every frequency that was collected for C115, C335
ids_frqs{id_ix} = unique(cellfun(#(x)x{1},listfrq(cellfun(#(x)~isempty(x),listfrq)),'UniformOutput',false));
end
%% finally, dynamically generate a structure data.Beam.Year.Frequency
%this works
for ix_id = 1:length(ids)
for ix_year = 1:length(ids_years{ix_id})
data1500.(ids{ix_id}).(['Y' ids_years{ix_id}{ix_year}])=dlmread([ids_years{ix_id}{ix_year} '_' ids{ix_id} '_' ids_frqs{1}{1} '.CSV']);
data2000.(ids{ix_id}).(['Y' ids_years{ix_id}{ix_year}])=dlmread([ids_years{ix_id}{ix_year} '_' ids{ix_id} '_' ids_frqs{1}{2} '.CSV']);
end
end
%this doesn't work
for ix_id=1:length(ids)
for ix_year=1:length(ids_years{ix_id})
for ix_frq = 1:numel(frqs)
data.(['F' ids_frqs{ix_id}{ix_frq}]).(ids{ix_id}).(['Y' ids_years{ix_id}{ix_year}])=dlmread([ids_years{ix_id}{ix_year} '_' ids{ix_id} '_' ids_frqs{ix_id}{ix_frq} '.CSV']);
end
end
end
Hopefully, that helps clarify the question - I am not sure of the etiquette here with posting edits and code.
The problem is that when you get to the for loop that is causing a problem, data already exists and is a struct array.
octave> data
data =
8x1 struct array containing the fields:
name
date
bytes
isdir
datenum
statinfo
When you select a field from a struct array you will get a cs-list (comma-separate list) unless you also index which of the structs in the struct array. See:
octave> data.name
ans = 2009103_C115_1500MHZ.CSV
ans = 2009103_C115_2000MHZ.CSV
ans = 2009103_C335_1500MHZ.CSV
ans = 2009103_C335_2000MHZ.CSV
ans = 2010087_C115_1500MHZ.CSV
ans = 2010087_C115_2000MHZ.CSV
ans = 2010087_C335_1500MHZ.CSV
ans = 2010087_C335_2000MHZ.CSV
octave> data(1).name
ans = 2009103_C115_1500MHZ.CSV
So when you do:
data.(...) = dlmread (...);
you don't get what you were expecting on the left hand side, you will get a cs-list. But I'm guessing this is accidental, since data at the moment only has filenames, so simply create a new empty struct:
data = struct (); # this will clear your previous data
for ix_id=1:length(ids)
for ix_year=1:length(ids_years{ix_id})
for ix_frq = 1:numel(frqs)
data.(['F' ids_frqs{ix_id}{ix_frq}]).(ids{ix_id}).(['Y' ids_years{ix_id}{ix_year}])=dlmread([ids_years{ix_id}{ix_year} '_' ids{ix_id} '_' ids_frqs{ix_id}{ix_frq} '.CSV']);
end
end
end
I would also recommend to think better about your current solution. This code looks overcomplicated to me.
Related
I have an excel file and I need to read it based on string values in the 4th column. I have written the following but it does not work properly:
[num,txt,raw] = xlsread('Coordinates','Centerville');
zn={};
ctr=0;
for i = 3:size(raw,1)
tf = strcmp(char(raw{i,4}),char(raw{i-1,4}));
if tf == 0
ctr = ctr+1;
end
zn{ctr}=raw{i,4};
end
data=zeros(1,10); % 10 corresponds to the number of columns I want to read (herein, columns 'J' to 'S')
ctr=0;
for j = 1:length(zn)
for i=3:size(raw,1)
tf=strcmp(char(raw{i,4}),char(zn{j}));
if tf==1
ctr=ctr+1;
data(ctr,:,j)=num(i-2,10:19);
end
end
end
It gives me a "15129x10x22 double" thing and when I try to open it I get the message "Cannot display summaries of variables with more than 524288 elements". It might be obvious but what I am trying to get as the output is 'N = length(zn)' number of matrices which represent the data for different strings in the 4th column (so I probably need a struct; I just don't know how to make it work). Any ideas on how I could fix this? Thanks!
Did not test it, but this should help you get going:
EDIT: corrected wrong indexing into raw vector. Also, depending on the format you might want to restrict also the rows of the raw matrix. From your question, I assume something like selector = raw(3:end,4); and data = raw(3:end,10:19); should be correct.
[~,~,raw] = xlsread('Coordinates','Centerville');
selector = raw(:,4);
data = raw(:,10:19);
[selector,~,grpidx] = unique(selector);
nGrp = numel(selector);
out = cell(nGrp,1);
for i=1:nGrp
idx = grpidx==i;
out{i} = cell2mat(data(idx,:));
end
out is the output variable. The key here is the variable grpidx that is an output of the unique function and allows you to trace back the unique values to their position in the original vector. Note that unique as I used it may change the order of the string values. If that is an issue for you, use the setOrderparameter of the unique function and set it to 'stable'
I have a long list of variables in my workspace.
First, I'm finding the potential variables I could be interested in using the who function. Next, I'd like to loop through this list to find the size of each variable, however who outputs only the name of the variables as a string.
How could I use this list to refer to the values of the variables, rather than just the name?
Thank you,
list = who('*time*')
list =
'time'
'time_1'
'time_2'
for i = 1:size(list,1);
len(i,1) = length(list(i))
end
len =
1
1
1
If you want details about the variables, you can use whos instead which will return a struct that contains (among other things) the dimensions (size) and storage size (bytes).
As far as getting the value, you could use eval but this is not recommended and you should instead consider using cell arrays or structs with dynamic field names rather than dynamic variable names.
S = whos('*time*');
for k = 1:numel(S)
disp(S(k).name)
disp(S(k).bytes)
disp(S(k).size)
% The number of elements
len(k) = prod(S(k).size);
% You CAN get the value this way (not recommended)
value = eval(S(k).name);
end
#Suever nicely explained the straightforward way to get this information. As I noted in a comment, I suggest that you take a step back, and don't generate those dynamically named variables to begin with.
You can access structs dynamically, without having to resort to the slow and unsafe eval:
timestruc.field = time;
timestruc.('field1') = time_1;
fname = 'field2';
timestruc.(fname) = time_2;
The above three assignments are all valid for a struct, and so you can address the fields of a single data struct by generating the field strings dynamically. The only constraint is that field names have to be valid variable names, so the first character of the field has to be a letter.
But here's a quick way out of the trap you got yourself into: save your workspace (well, the relevant part) in a .mat file, and read it back in. You can do this in a way that will give you a struct with fields that are exactly your variable names:
time = 1;
time_1 = 2;
time_2 = rand(4);
save('tmp.mat','time*'); % or just save('tmp.mat')
S = load('tmp.mat');
afterwards S will be a struct, each field will correspond to a variable you saved into 'tmp.mat':
>> S
S =
time: 1
time_1: 2
time_2: [4x4 double]
An example writing variables from workspace to csv files:
clear;
% Writing variables of myfile.mat to csv files
load('myfile.mat');
allvars = who;
for i=1:length(allvars)
varname = strjoin(allvars(i));
evalstr = strcat('csvwrite(', char(39), varname, '.csv', char(39), ', ', varname, ')');
eval(evalstr);
end
I have a large amount of .csv files from my experiments (200+) and previously I have been reading them in seperately and also for later steps in my data handling this is tedious work.
co_15 = csvread('CO_15K.csv',5,0);
co_25 = csvread('CO_25K.csv',5,0);
co2_15 = csvread('CO2_15K.csv',5,0);
co2_80 = csvread('CO2_80K.csv',5,0);
h2o_15 = csvread('H2O_15K.csv',1,0);
etc.....
So I want to make a cell at the beginning of my code looking like this and then a for loop that just reads them automatically.
input = {'co_15' 5;'co_25' 5;...
'co2_15' 5; 'co2_80' 5;...
'h2o_15' 1; 'h2o_140' 1;...
'methanol_15' 5;'methanol_120' 5;'methanol_140' 5;...
'ethanol_15' 5;'ethanol_80' 1;'ethanol_140' 5;...
'co2_ethanol_15' 5 ;'co2_ethanol_80' 5;...
'h2o_ethanol_15' 1 ;'h2o_ethanol_140' 1;...
'methanol_ethanol_15' 5;'methanol_ethanol_120' 5;'methanol_ethanol_140' 5};
for n = 1:size(input,1)
input{n,1} = csvread(strcat(input{n,1},'k.csv'),input{n,2},0);
end
The cell in this code is 19 rows and 2 columns, the rows are all the files and the columns will contain the parameters to handle the data. Now the problem I can't find a solution for is that my first column is a string name and I want that string name to be the name of the variable where csvread writes its data to but the way I set it up now it just overwrites the string in the first column of the cell with the csv data. To be extra clear I want my matlab workspace to have variables with string names in the first column containing the data of my csv files. How do I solve this?
You don't actually want to do this. Even the Mathworks will tell you not to do this. If you are trying to use variable names to keep track of related data like this, there is always a better data structure to hold your data.
One way would be to have a cell array
data = cell(size(input(:,1)));
for n = 1:size(input,1)
data{n} = csvread(strcat(input{n,1},'k.csv'),input{n,2},0);
end
Another good option is to use a struct. You could have a single struct with dynamic field names that correspond to your data.
data = struct();
for n = 1:size(input,1)
data.(input{n,1}) = csvread(strcat(input{n,1},'k.csv'),input{n,2},0);
end
Or actually create an array of structs and hold both the name and the data within the struct.
for n = 1:size(input, 1)
data(n).name = input{n,1};
data(n).data = csvread(strcat(input{n,1},'k.csv'),input{n,2},0);
end
If you absolutly insist on doing this (again, it's is very much not recommended), then you could do it using eval:
for n = 1:size(input, 1)
data = csvread(strcat(input{n,1},'k.csv'),input{n,2},0);
eval([input{n, 1}, '= data;']);
end
This is a problem I am working on in Matlab.
I am looping through a series of *.ALL files and stripping the field name by a delimiter '.'. The first field is the network, second station name, third location code and fourth component. I pre-allocate my structure based on the number of files (3) I run through which for this example is a 3x3x3 structure that I would like to define as struct(station,loc code,component). You can see these definitions in my code example below.
I would like to loop through the station, loc code, and component and fill their values in the structure. The only problem is for some reason the way I've defined the loop it's actually looping through the files more than once. I only want to loop through each file once and extract the station, comp, and loc code from it and put it inside the structure. Because it's looping through the files more than once it's taken like 10 minutes to fill the structure. This is not very efficient at all. Can someone point out the culprit line for me or tell me what I'm doing incorrectly?
Here's my code below:
close all;
clear;
[files]=dir('*.ALL');
for i = 1:length(files)
fields=textscan(files(i).name, '%s', 'Delimiter', '.');
net{i,1}=fields{1}{:};
sta{i,1}=fields{1}{2};
loc{i,1}=fields{1}{3};
comp{i,1}=fields{1}{4};
data = [];
halfhour(1:2) = struct('data',data);
hour(1:24) = struct('halfhour',halfhour);
day(1:366) = struct('hour',hour);
PSD_YE_DBT(1:length(files),1:length(files),1:length(files)) =
struct('sta','','loc','','comp','','allData',[],'day',day);
end
for s=1:1:length(sta)
for l=1:1:length(loc)
for c=1:1:length(comp)
tempFileName = strcat(net{s},'.',sta{s},'.',loc{l},'.',comp{c},'.','ALL');
fid = fopen(tempFileName);
PSD_YE_DBT(s,l,c).sta = sta{s};
PSD_YE_DBT(s,l,c).loc = loc{l};
PSD_YE_DBT(s,l,c).comp = comp{c};
end
end
end
Example file names for the three files I'm working with are:
XO.GRUT.--.HHE.ALL
XO.GRUT.--.HHN.ALL
XO.GRUT.--.HHZ.ALL
Thanks in advance!
I am trying to take the averages of a pretty large set of data, so i have created a function to do exactly that.
The data is stored in some struct1.struct2.data(:,column)
there are 4 struct1 and each of these have between 20 and 30 sub-struct2
the data that I want to average is always stored in column 7 and I want to output the average of each struct2.data(:,column) into a 2xN array/double (column 1 of this output is a reference to each sub-struct2 column 2 is the average)
The omly problem is, I can't find a way (lots and lots of reading) to point at each structure properly. I am using a string to refer to the structures, but I get error Attempt to reference field of non-structure array. So clearly it doesn't like this. Here is what I used. (excuse the inelegence)
function [avrg] = Takemean(prefix,numslits)
% place holder arrays
avs = [];
slits = [];
% iterate over the sub-struct (struct2)
for currslit=1:numslits
dataname = sprintf('%s_slit_%02d',prefix,currslit);
% slap the average and slit ID on the end
avs(end+1) = mean(prefix.dataname.data(:,7));
slits(end+1) = currslit;
end
% transpose the arrays
avs = avs';
slits = slits';
avrg = cat(2,slits,avs); % slap them together
It falls over at this line avs(end+1) = mean(prefix.dataname.data,7); because as you can see, prefix and dataname are strings. So, after hunting around I tried making these strings variables with genvarname() still no luck!
I have spent hours on what should have been 5min of coding. :'(
Edit: Oh prefix is a string e.g. 'Hs' and the structure of the structures (lol) is e.g. Hs.Hs_slit_XX.data() where XX is e.g. 01,02,...27
Edit: If I just run mean(Hs.Hs_slit_01.data(:,7)) it works fine... but then I cant iterate over all of the _slit_XX
If you simply want to iterate over the fields with the name pattern <something>_slit_<something>, you need neither the prefix string nor numslits for this. Pass the actual structure to your function, extract the desired fields and then itereate them:
function avrg = Takemean(s)
%// Extract only the "_slit_" fields
names = fieldnames(s);
names = names(~cellfun('isempty', strfind(names, '_slit_')));
%// Iterate over fields and calculate means
avrg = zeros(numel(names), 2);
for k = 1:numel(names)
avrg(k, :) = [k, mean(s.(names{k}).data(:, 7))];
end
This method uses dynamic field referencing to access fields in structs using strings.
First of all, think twice before you use string construction to access variables.
If you really really need it, here is how it can be used:
a.b=123;
s1 = 'a';
s2 = 'b';
eval([s1 '.' s2])
In your case probably something like:
Hs.Hs_slit_01.data= rand(3,7);
avs = [];
dataname = 'Hs_slit_01';
prefix = 'Hs';
eval(['avs(end+1) = mean(' prefix '.' dataname '.data(:,7))'])