Name convention error with write to file in Matlab - matlab

I'm running into a problem with the following code and writing an excel file name. This code is driven by user defined inputs for location and chemical compound desired. The desired output is a file with the chemical compound and location appended for the name. The problem is that any compound with a . in it errors out. For example, if I want PM2.5 for site Kenny, the file name should be PM2.5 Kenny. The code however is recognizing ".5" as a file extension when this is to be part of the name. Any help how to get around this would be appreciated.
The error this gives is:
Unrecognized file extension '.5 Kenny'. Use the 'FileType' parameter to specify the file type.
j = 1
i = 1
while j <= width(c_locations_of_interest)
while i <= width(c_data_types_of_interest)
Value = c_data_types_of_interest{1,i}
Location = c_locations_of_interest{1,j}
output_excel_file = append(Value,' ',Location)
STATEMENTS
writetable(T, output_excel_file)
i = i + 1
end
i = 1
j = j + 1
end

I was able to reproduce your problem with a crude example. Indeed Matlab complains about file extensions. This is because I believe that you are missing the file extension, in your case '.xlsx'
Using sprintf instead of Append:
value = "PM2.5";
location = " Kenny"
extension = "xlsx";
output_excel_file= sprintf("%s%s.%s", value, location, extension);
writetable(T, output_excel_file);
Hope it is clear
EDIT: Corrected my own variables naming ¬¬

Related

Extract information from path name

I want to make a script in MATLAB that saves my output data with a certain name. All information for this name is in the path from the input data, like it is shown here:
path = 'C:\projektions100\algorithm1\method_A\data1';
projection =
algorithm =
method =
data =
The script then should extract the text in the path with the keyword (f.e. method) from the adjacent backslashes so the script is more flexible in case I made a spelling mistake with some folder names.
This is what I found to extract a text between a start and a end point but I cannot simply use the backslashes since there are a few of them in the path.
How should I proceed?
You can simply use a regexp with named tokens:
>> path = 'C:\projektions100\algorithm1\method_A\data1';
>> all=regexp(path,'[^\\]+\\proje[ck]tion(?<projection>[^\\]+)\\algorithm(?<algorithm>[^\\]+)\\method(?<method>[^\\]+)\\data(?<data>.+$)','names')
all =
struct with fields:
projection: 's100'
algorithm: '1'
method: '_A'
data: '1'
The problem is on how to find the end of your keywords. Here is a bit code, which loops through the keywords and looks for them in the path (stored in p2fldr, because the variable path returns the working path in MATLAB and you overshadow it if you define it).
p2fldr = 'C:\projektions100\algorithm1\method_A\data1';
% keywords
kyWrd = {'projection','algorithm','method','data'};
Tag = cell(size(kyWrd));
for i = 1:length(kyWrd)
% get keyword
ky = kyWrd{i};
% look for it in the path
idx = strfind(p2fldr,ky);
if ~isempty(idx)
% remaining path
idx_offset = idx+strlength(ky);
prm = p2fldr(idx_offset:end);
% look for file separator '\'
idx_tmp = strfind(prm,filesep);
% if you don't find one, it is pabably the last entry, so take the
% length
if isempty(idx_tmp)
idx_tmp = length(prm)+1;
end
% this is the index where it ends
idx2 = idx_tmp(1)-1;
% assign to tag-cell
Tag{i} = prm(1:idx2);
end
end
You can build a shortcut if you know that they are always in the last 4 entries of your path, so you can use strsplit right away and index the last returned cells
str_splt = strsplit(p2fldr,filesep);
Tag = cell(size(kyWrd));
for i = 1:length(kyWrd)
% index cells
str = str_splt{end-length(kyWrd)+i};
% get keyword
ky = kyWrd{i};
Tag{i} = str(length(ky)+1:end);
end
Note that this does not care if it matches your keywords (e.g. your path says 'projektions' but I defined the keyword to be 'projection')

Reading in xlsm file into MATLAB | Error file name must be string

I am trying to read in a whole set of environmental input/output data from the WIOD (World Input output database) through a nested loop over the countries and years. Now I have done something similar for the base/data accounts before. Now I try to load in the environmental data. My previously working code looked as follows:
for yr = 95:99
V(:,:,yr-94) = xlsread(['wiot' num2str(yr) '_row_apr12.xlsx'],['WIOT_19'
num2str(yr)],'E1443:BCI1448');
end
Now my code which is not working with the error message "File name must be a character vector." looks as follows:
country =
{'AUS','AUT','BEL','BGR','BRA','CAN','CHN','CYP','CZE','DEU','DNK','ESP',...
'EST','FIN','FRA','GBR','GRC','HUN','IDN','IND','IRL','ITA','JPN','KOR',...
'LTU','LUX','LVA','MEX','MLT','NLD','POL','PRT','ROU','RUS','SVK','SVN',...
'SWE','TUR','TWN','USA','ROW'}
for c = 1:41
for year = 1995:1995
F_NRG(:,(c*35)-34:(c*35),year-1994) = transpose(xlsread([country(c)
'_EU_May12.xlsm'],[num2str(year)],'AD2:AD36'));
end
end
I do not get it because the filename should be a string if I select the country c via country(c)? The xlsread is nested in a transpose command and the cells where I want to save the read in data are computed slightly more complicated but principally it should be the same? The following code also does render a string for each c.
for c = 1:41
country(c)
end
Can you help me find my coding mistakes? Why doesn't Matlab recognize the file name as a string?
Thank you for your help.
Try this,
for c = 1:41
for year = 1995:1995
F_NRG(:,(c*35)-34:(c*35),year-1994) = transpose(xlsread([country{c}
'_EU_May12.xlsm'],[num2str(year)],'AD2:AD36'));
end
end

Read specific portions of an excel file based on string values in MATLAB

I have an excel file and I need to read it based on string values in the 4th column. I have written the following but it does not work properly:
[num,txt,raw] = xlsread('Coordinates','Centerville');
zn={};
ctr=0;
for i = 3:size(raw,1)
tf = strcmp(char(raw{i,4}),char(raw{i-1,4}));
if tf == 0
ctr = ctr+1;
end
zn{ctr}=raw{i,4};
end
data=zeros(1,10); % 10 corresponds to the number of columns I want to read (herein, columns 'J' to 'S')
ctr=0;
for j = 1:length(zn)
for i=3:size(raw,1)
tf=strcmp(char(raw{i,4}),char(zn{j}));
if tf==1
ctr=ctr+1;
data(ctr,:,j)=num(i-2,10:19);
end
end
end
It gives me a "15129x10x22 double" thing and when I try to open it I get the message "Cannot display summaries of variables with more than 524288 elements". It might be obvious but what I am trying to get as the output is 'N = length(zn)' number of matrices which represent the data for different strings in the 4th column (so I probably need a struct; I just don't know how to make it work). Any ideas on how I could fix this? Thanks!
Did not test it, but this should help you get going:
EDIT: corrected wrong indexing into raw vector. Also, depending on the format you might want to restrict also the rows of the raw matrix. From your question, I assume something like selector = raw(3:end,4); and data = raw(3:end,10:19); should be correct.
[~,~,raw] = xlsread('Coordinates','Centerville');
selector = raw(:,4);
data = raw(:,10:19);
[selector,~,grpidx] = unique(selector);
nGrp = numel(selector);
out = cell(nGrp,1);
for i=1:nGrp
idx = grpidx==i;
out{i} = cell2mat(data(idx,:));
end
out is the output variable. The key here is the variable grpidx that is an output of the unique function and allows you to trace back the unique values to their position in the original vector. Note that unique as I used it may change the order of the string values. If that is an issue for you, use the setOrderparameter of the unique function and set it to 'stable'

MATLAB dir without '.' and '..'

the function dir returns an array like
.
..
Folder1
Folder2
and every time I have to get rid of the first 2 items, with methods like :
for i=1:numel(folders)
foldername = folders(i).name;
if foldername(1) == '.' % do nothing
continue;
end
do_something(foldername)
end
and with nested loops it can result in a lot of repeated code.
So can I avoid these "folders" by an easier way?
Thanks for any help!
Edit:
Lately I have been dealing with this issue more simply, like this :
for i=3:numel(folders)
do_something(folders(i).name)
end
simply disregarding the first two items.
BUT, pay attention to #Jubobs' answer. Be careful for folder names that start with a nasty character that have a smaller ASCII value than .. Then the second method will fail. Also, if it starts with a ., then the first method will fail :)
So either make sure you have nice folder names and use one of my simple solutions, or use #Jubobs' solution to make sure.
A loop-less solution:
d=dir;
d=d(~ismember({d.name},{'.','..'}));
TL; DR
Scroll to the bottom of my answer for a function that lists directory contents except . and ...
Detailed answer
The . and .. entries correspond to the current folder and the parent folder, respectively. In *nix shells, you can use commands like ls -lA to list everything but . and ... Sadly, MATLAB's dir doesn't offer this functionality.
However, all is not lost. The elements of the output struct array returned by the dir function are actually ordered in lexicographical order based on the name field. This means that, if your current MATLAB folder contains files/folders that start by any character of ASCII code point smaller than that of the full stop (46, in decimal), then . and .. willl not correspond to the first two elements of that struct array.
Here is an illustrative example: if your current MATLAB folder has the following structure (!hello and 'world being either files or folders),
.
├── !hello
└── 'world
then you get this
>> f = dir;
>> for k = 1 : length(f), disp(f(k).name), end
!hello
'world
.
..
Why are . and .. not the first two entries, here? Because both the exclamation point and the single quote have smaller code points (33 and 39, in decimal, resp.) than that of the full stop (46, in decimal).
I refer you to this ASCII table for an exhaustive list of the visible characters that have an ASCII code point smaller than that of the full stop; note that not all of them are necessarily legal filename characters, though.
A custom dir function that does not list . and ..
Right after invoking dir, you can always get rid of the two offending entries from the struct array before manipulating it. Moreover, for convenience, if you want to save yourself some mental overhead, you can always write a custom dir function that does what you want:
function listing = dir2(varargin)
if nargin == 0
name = '.';
elseif nargin == 1
name = varargin{1};
else
error('Too many input arguments.')
end
listing = dir(name);
inds = [];
n = 0;
k = 1;
while n < 2 && k <= length(listing)
if any(strcmp(listing(k).name, {'.', '..'}))
inds(end + 1) = k;
n = n + 1;
end
k = k + 1;
end
listing(inds) = [];
Test
Assuming the same directory structure as before, you get the following:
>> f = dir2;
>> for k = 1 : length(f), disp(f(k).name), end
!hello
'world
a similar solution from the one suggested by Tal is:
listing = dir(directoryname);
listing(1:2)=[]; % here you erase these . and .. items from listing
It has the advantage to use a very common trick in Matlab, but assumes that you know that the first two items of listing are . and .. (which you do in this case). Whereas the solution provided by Tal (which I did not try though) seems to find the . and .. items even if they are not placed at the first two positions within listing.
Hope that helps ;)
If you're just using dir to get a list of files and and directories, you can use Matlab's ls function instead. On UNIX systems, this just returns the output of the shell's ls command, which may be faster than calling dir. The . and .. directories won't be displayed (unless your shell is set up to do so). Also, note that the behavior of this function is different between UNIX and Windows systems.
If you still want to use dir, and you test each file name explicitly, as in your example, it's a good idea to use strcmp (or one of its relations) instead of == to compare strings. The following would skip all hidden files and folder on UNIX systems:
listing = dir;
for i = 1:length(listing)
if ~strcmp(listing(i).name(1),'.')
% Do something
...
end
end
You may also wanna exclude any other files besides removing dots
d = dir('/path/to/parent/folder')
d(1:2)=[]; % removing dots
d = d([d.isdir]) % [d.isdir] returns a logical array of 1s representing folders and 0s for other entries
I used: a = dir(folderPath);
Then used two short code that return struct:
my_isdir = a([a.isdir]) Get a struct which only has folder info
my_notdir = a(~[a.isdir]) Get a struct which only has non-folder info
Combining #jubobs and #Tal solutions:
function d = dir2(folderPath)
% DIR2 lists the files in folderPath ignoring the '.' and '..' paths.
if nargin<1; folderPath = '.'; elseif nargin == 1
d = dir(folderPath);
d = d(~ismember({d.name},{'.','..'}));
end
None of the above puts together the elements as I see the question having being asked - obtain a list only of directories, while excluding the parents.
Just combining the elements, I would go with:
function d = dirsonly(folderPath)
% dirsonly lists the unhidden directories in folderPath ignoring '.' and '..'
% creating a simple cell array without the rest of the dir struct information
if nargin<1; folderPath = '.'; elseif nargin == 1
d = dir(folderPath);
d = {d([d.isdir] & [~ismember({d.name},{'.','..'})]).name}.';
end
if hidden folders in general aren't wanted the ismember line could be replaced with:
d = {d([d.isdir] & [~strncmp({d.name},'.',1)]).name}.';
if there were very very large numbers of interfering non-directory files it might be more efficient to separate the steps:
d = d([d.isdir]);
d = {d([~strncmp({d.name},'.',1)]).name}.';
We can use the function startsWith
folders = dir("folderPath");
folders = string({folders.name});
folders = folders(~startsWith(folders,"."))
Potential Solution - just remove the fields
Files = dir;
FilesNew = Files(3:end);
You can just remove them as they are the first two "files" in the structure
Or if you are actually looking for specific file types:
Files = dir('*.mat');

dir(myFiles{m}) does't work MATLAB

myFiles = 1x7 cell
when I try
for m =1:numel(myFiles )
fil{m} = dir(myFiles {m});
fil{m}.bytes ;
end
This is not working
I got the error :
function is not defined for 'cell' inputs.
First of all you should mention the error message you get.
Now, besides that are some obvious problems:
myFiles {ii}
This is not valid syntax to index into a cell array. Perhaps removing the space helps.
Furthermore you loop over m and then use ii as an index.
Lastly you assign to fil everytime. In practice this means only the last result is stored. Perhaps assigning to fil(m) would suit your needs better.
The command dir will show you the content of a folder. As your variable is named "myFiles" I assume it contains filenames and not foldernames. So I think you're rather looking for a loop like this:
for ii = 1:numel(myFiles)
fil{ii} = which( myFiles{ii} )
end
which gives you an array with the full paths to your files. Or are you looking for the folders containing the files in "myFiles"? Then you can use:
for ii = 1:numel(myFiles)
fil{ii} = fileparts( which( myFiles{ii} ) )
end
returning you the corresponding folders.
regarding your comments:
the existence of the files/folders in "myFiles" is the only purpose?
Then you could do that:
for ii = 1:numel(myFiles)
fil(ii) = exist( which(myFiles{ii}), 'file' );
end
existMyFiles = logical(fil);
returning a logical array specifying the existence of your files.