Looping through documents in matlab - matlab

I am attempting to loop through the variable 'docs' which is a cell array that holds strings, i need to make a for loop that colllects the terms in a cell array and then uses command 'lower' and unique to create a dictionary.
Here is the code i've tried sp far and i just get errors
docsLength = length(docs);
for C = 1:docsLength
list = tokenize(docs, ' .,-');
Mylist = [list;C];
end
I get these errors
Error using textscan
First input must be of type double or string.
Error in tokenize (line 3)
C = textscan(str,'%s','MultipleDelimsAsOne',1,'delimiter',delimiters);
Error in tk (line 4)
list = tokenize(docs, ' .,-');

Generically, if you get an "must be of type" error, that means you are passing the wrong sort of input to a function. In this case you should look at the point in your code where this is taking place (here, in tokenize when textscan is called), and doublecheck that the input going in is what you expect it to be.
As tokenize is not a MATLAB builtin function, unless you show us that code we can't say what those inputs should be. However, as akfaz mentioned in comments, it is likely that you want to pass docs{C} (a string) to tokenize instead of docs (a cell array). Otherwise, there's no point in having a loop as it just repeatedly passes the same input, docs, into the function.
There are additional problems with the loop:
Mylist = [list; C]; will be overwritten each loop to consist of the latest version of list plus C, which is just a number (the index of the loop). Depending on what the output of tokenize looks like, Mylist = [Mylist; list] may work but you should initialise Mylist first.
Mylist = [];
for C = 1:length(docs)
list = tokenize(docs{C}, ' .,-');
Mylist = [Mylist; list];
end

Related

How to shuffle such that two same elements are not together?

I have a string containing several elements, some identical and some unique. I want my code to check every 2 following elements in my string and if they're equal, it should call a function ShuffleString, where the input variable (randomize) is the string itself, that will re-shuffle the string in a new position. Then, the script should re-check every 2 following elements in the string again until no two identical elements appear next to each other.
I have done the following:
My function file ShuffleString works fine. The input variable randomize, as stated earlier, contains the same elements as MyString but in a different order, as this was needed on an unrelated matter earlier in the script.
function [MyString] = ShuffleString(randomize)
MyString = [];
while length(randomize) > 0
S = randi(length(randomize), 1);
MyString = [MyString, randomize(S)];
randomize(S) = [];
end
The script doesn't work as intended. Right now it looks like this:
MyString = ["Cat" "Dog" "Mouse" "Mouse" "Dog" "Hamster" "Zebra" "Obama"...
"Dog" "Fish" "Salmon" "Turkey"];
randomize = MyString;
while(1)
for Z = 1:length(MyString)
if Z < length(MyString)
Q = Z+1;
end
if isequal(MyString{Z},MyString{Q})
[MyString]=ShuffleString(randomize)
continue;
end
end
end
It just seems to reshuffle the string an infinite amount of times. What's wrong with this and how can I make it work?
You are using an infinite while loop that has no way to break and hence it keeps iterating.
Here is a simpler way:
Use the third output argument of the unique function to get the elements in numeric form for easier processing. Apply diff on it to check if consecutive elements are same. If there is any occurrence of same consecutive elements, the output of diff will give at least one zero which when applied with negated all will return true to continue the loop and vice versa. At the end, use the shuffled indices/numeric representation of the strings obtained after the loop to index the first output argument of unique (which was calculated earlier). So the script will be:
MyString = ["Cat" "Dog" "Mouse" "Mouse" "Dog" "Hamster" "Zebra" "Obama"...
"Dog" "Fish" "Salmon" "Turkey"]; %Given string array
[a,~,c] = unique(MyString);%finding unique elements and their indices
while ~all(diff(c)) %looping until there are no same strings together
c = ShuffleString(c); %shuffling the unique indices
end
MyString = a(c); %using the shuffled indices to get the required string array
For the function ShuffleString, a better way would be to use randperm. Your version of function works but it keeps changing the size of the arrays MyString and randomize and hence adversely affects the performance and memory usage. Here is a simpler way:
function MyString = ShuffleString(MyString)
MyString = MyString(randperm(numel(MyString)));
end

MATLAB: Pass part of structure field name to function

I need to pass a part of a structure's name into a function.
Examples of a available structs:
systems.system1.stats.equityCurve.relative.exFee
systems.system1.stats.equityCurve.relative.inFee
systems.system2.stats.equityCurve.relative.exFee
systems.system2.stats.equityCurve.relative.inFee
systems.system1.returns.aggregated.exFee
systems.system1.returns.aggregated.inFee
systems.system2.returns.aggregated.exFee
systems.system2.returns.aggregated.inFee
... This goes on...
Within a function, I loop through the structure as follows:
function mat = test(fNames)
feeString = {'exFee', 'inFee'};
sysNames = {'system1', 'system2'};
for n = 1 : 2
mat{n} = systems.(sysNames{n}).stats.equityCurve.relative.(feeString{n});
end
end
What I like to handle in a flexible way within the loop is the middle part, i.e. the part after systems.(sysNames{n}) and before .(feeString{n}) (compare examples).
I am now looking for a way to pass the middle part as an input argument fNames into the function. The loop should than contain something like
mat{n} = systems.(sysNames{n}).(fName).(feeString{n});
How about using a helper function such as
function rec_stru = recSA(stru, field_names)
if numel(field_names) == 1
rec_stru = stru.(field_names{1});
else
rec_stru = recSA(stru.(field_names{1}), field_names(2:end));
end
This function takes the intermediate field names as a cell array.
This would turn this statement:
mat{n} = systems.(sysNames{n}).stats.equityCurve.relative.(feeString{n});
into
mat{n} = recSA(systems.(sysNames{n}), {'stats', 'equityCurve', 'relative', feeString{n}});
The first part of the cell array could then be passed as an argument to the function.
This is one of those cases where matlab is a bit unhelpful in the documentation. There is a way to use the fieldnames function in matlab to get the list of all the fields and iterate over that using dynamic fields.
systems.system1.stats.equityCurve.relative.exFee='T'
systems.system1.stats.equityCurve.relative.inFee='E'
systems.system2.stats.equityCurve.relative.exFee='S'
systems.system2.stats.equityCurve.relative.inFee='T'
systems.system1.returns.aggregated.exFee='D'
systems.system1.returns.aggregated.inFee='A'
systems.system2.returns.aggregated.exFee='T'
systems.system2.returns.aggregated.inFee='A'
dynamicvariable=fieldnames(systems.system1)
This will return a cell matrix of the field names which you can use to iterate over.
systems.system1.(dynamicvariable{1})
ans =
equityCurve: [1x1 struct]
Ideally you would have your data structure fixed in such a way that you know how many levels of depth are in your data structure.

Splice list of structs into arguments for function call - Matlab

I want to splice a list of arguments to pass to a function. For a vector I know that I can use num2cell and call the cell with curly braces (see this question), but in my case the list I want to splice originally has structs and I need to access one of their attributes. For example:
austen = struct('ids', ids, 'matrix', matrix);
% ... more structs defined here
authors = [austen, dickens, melville, twain];
% the function call I want to do is something like
tmp = num2cell(authors);
% myFunction defined using varargin
[a,b] = myFunction(tmp{:}.ids);
The example above does not work because Matlab expected ONE output from the curly braces and it's receiving 4, one for each author. I also tried defining my list of arguments as a cell array in the first place
indexes = {austen.ids, dickens.ids, melville.ids, twain.ids};
[a,b] = myFunction(indexes{:});
but the problem with this is that myFunction is taking the union and intersection of the vectors ids and I get the following error:
Error using vertcat
The following error occurred converting from double to struct:
Conversion to struct from double is not possible.
Error in union>unionR2012a (line 192)
c = unique([a;b],order);
Error in union (line 89)
[varargout{1:nlhs}] = unionR2012a(varargin{:});
What is the correct way for doing this? The problem is that I will have tens of authors and I don't want to pass al of them to myFunction by hand.
As #kedarps rightly pointed out I need to use struct2cell instead of num2cell. The following code does the trick
tmp = struct2cell(authors);
[a, b] = myFunction(tmp{1,:,:}); %ids is the first entry of the structs
I had never heard about struct2cell before! It doesn't even show up in the See also of help num2cell! It would be amazing to have an apropos function like Julia's....

Create multiple substructs in loop and indexing with cellarray

Let's say I have got a struct called data and I want to create three substructs called area, inhabitants and industry. These names are stored in a cellarray.
My method looks like this:
names={'area','inhabitants','industrie'};
for i=1:length(names)
data.(names(i)) = struct;
end
I get this error: "Argument to dynamic structure reference must evaluate to a valid field name."
However doing it like this works:
somestr = 'area';
data.(somestr) = struct;
That's why I tried:
names={'area','inhabitants','industrie'};
for i=1:length(names)
somestr = names(i);
data.(somestr) = struct;
end
But I get the same error as before.
I want to do it that way because I have to import a lot of data and want to store it in Matlab. If someone later wants to change the code it might be much easier to just change the cellarray.
Until the specific element of the cell is accessed via curly braces, the element will be a one-by-one cell and not a char. So you just need curly braces:
names={'area','inhabitants','industrie'};
for i=1:length(names)
data.(names{i}) = struct;
end

Workaround equivalent of "inputname" to return structure name?

I know that, inside a MATLAB function, inputname(k) will return the k-th argument iff the argument is a variable name. Is there any way to write some parsing code that can retrieve the full input argument when that argument is a structure, e.g. foo.bar ? The reason I want to be able to do this is that I'm writing some tools for generic use where the input could be either a named variable or a named structure element.
My primary intent is to be able to store and return the input argment(s) as part of a structure or other variable that the function returns. This is a 'chain of custody' feature which makes it easier for me or others to verify the source data sets used to generate the output data sets.
I don't want the user to have to self-parse externally, or to have to deal with some kludge like
function doit(name,fieldname)
if(exist('fieldname','var'))
name = name.(fieldname);
myinput = [inputname(1),inputname(2)];
else
myinput = inputname(1);
end
% do the function stuff
(I call this a kludge because it both requires the user to enter strange arguments and because it fouls up the argument sequence for functions with multiple inputs)
There is no support from the language to get the input names when passing structs. The reason is probably x.a is internally a call to subsref which returns a new variable, all context is lost. The only possibility you have is using the debug tools and parse the code. There is no other option.
function x=f(varargin)
[ST, I] = dbstack('-completenames', 1);
if numel(ST)>0
fid=fopen(ST(1).file,'r');
for ix=2:ST(1).line;fgetl(fid);end
codeline=fgetl(fid);
fclose(fid);
fprintf('function was called with line %s\n',codeline);
else
fprintf('function was called from base workspace\n');
end
end
From there you may try to parse the code line to get the individual argument names.
Far uglier than Daniel's approach, and probably will crash on the wrong OS, but here's a hack that works to retrieve the first argument; easily adjusted to retrieve all arguments.
[~,myname] = system('whoami');
myname = strtrim(myname(4:end)); % removes domain tag in my Windows envir
% sorry about " \' " fouling up SO's color parsing
myloc = ['C:\Users\' , myname , '\AppData\Roaming\MathWorks\MATLAB\R2015a\History.xml'] ;
f = fopen(myloc,'r');
foo = fscanf(f,'%s');
fclose(f);
pfoo = findpat(foo,'myFunctionName');
% just look for the last instance
namstart = find(foo(pfoo(end):(pfoo(end)+30)) =='(',1) +pfoo(end);
% catch either ')' or ','
namend(1) = find(foo((namstart):end)== ')',1) -2 +namstart;
if numel(find(foo((namstart):end)== ',',1)),
namend(2) = find(foo((namstart):end)== ',',1) -2 +namstart;
end
thearg = foo(namstart:(min(namend)) );