How to shuffle such that two same elements are not together? - matlab

I have a string containing several elements, some identical and some unique. I want my code to check every 2 following elements in my string and if they're equal, it should call a function ShuffleString, where the input variable (randomize) is the string itself, that will re-shuffle the string in a new position. Then, the script should re-check every 2 following elements in the string again until no two identical elements appear next to each other.
I have done the following:
My function file ShuffleString works fine. The input variable randomize, as stated earlier, contains the same elements as MyString but in a different order, as this was needed on an unrelated matter earlier in the script.
function [MyString] = ShuffleString(randomize)
MyString = [];
while length(randomize) > 0
S = randi(length(randomize), 1);
MyString = [MyString, randomize(S)];
randomize(S) = [];
end
The script doesn't work as intended. Right now it looks like this:
MyString = ["Cat" "Dog" "Mouse" "Mouse" "Dog" "Hamster" "Zebra" "Obama"...
"Dog" "Fish" "Salmon" "Turkey"];
randomize = MyString;
while(1)
for Z = 1:length(MyString)
if Z < length(MyString)
Q = Z+1;
end
if isequal(MyString{Z},MyString{Q})
[MyString]=ShuffleString(randomize)
continue;
end
end
end
It just seems to reshuffle the string an infinite amount of times. What's wrong with this and how can I make it work?

You are using an infinite while loop that has no way to break and hence it keeps iterating.
Here is a simpler way:
Use the third output argument of the unique function to get the elements in numeric form for easier processing. Apply diff on it to check if consecutive elements are same. If there is any occurrence of same consecutive elements, the output of diff will give at least one zero which when applied with negated all will return true to continue the loop and vice versa. At the end, use the shuffled indices/numeric representation of the strings obtained after the loop to index the first output argument of unique (which was calculated earlier). So the script will be:
MyString = ["Cat" "Dog" "Mouse" "Mouse" "Dog" "Hamster" "Zebra" "Obama"...
"Dog" "Fish" "Salmon" "Turkey"]; %Given string array
[a,~,c] = unique(MyString);%finding unique elements and their indices
while ~all(diff(c)) %looping until there are no same strings together
c = ShuffleString(c); %shuffling the unique indices
end
MyString = a(c); %using the shuffled indices to get the required string array
For the function ShuffleString, a better way would be to use randperm. Your version of function works but it keeps changing the size of the arrays MyString and randomize and hence adversely affects the performance and memory usage. Here is a simpler way:
function MyString = ShuffleString(MyString)
MyString = MyString(randperm(numel(MyString)));
end

Related

finding the number of occurrence of a pattern within a cell in matlab?

i have a cell like this:
x = {'3D'
'B4'
'EF'
'D8'
'E7'
'6C'
'33'
'37'}
let's assume that the cell is 1000x1. i want to find the number of occurrence of pattern = [30;30;64;63] within this cell but as the order shown. in the other word it's first check x{1,1},x{2,1},x{3,1},x{4,1}
then check x{2,1},x{3,1},x{4,1},x{5,1} and like this till the end of the cell and return the number of occurrence of it.
Here is my code but it didn't work!
while (size (pattern)< size(x))
count = 0;
for i=1:size(x)-length(pattern)+1
if size(abs(x(i:i+size(pattern)-1)-x))==0
count = count+1;
end
end
end
Your example code has a couple of issues - foremost I don't believe you are doing any comparison operations, which would be necessary to identify the occurrence of the pattern within the search data (x). Also, there is a variable type mismatch between x and pattern - one is a cell array of strings, and the other is a decimal array.
One way to approach this problem would be to restructure x and pattern as strings, and then use strfind to find occurrences of pattern. This method will only work if there is no missing data in either of the variables.
x = {'3D';'B4';'EF';'D8';'E7';'6C';'33';'37';'xE';'FD';'8y'};
pattern = {'EF','D8'};
collated_x=[x{:}];
collated_pattern = [pattern{:}];
found_locations = strfind(collated_x, collated_pattern);
% Remove 'offset' matches that start at even locations
found_locations = found_locations(mod(found_locations,2)==1);
count = length(found_locations)
Use string find function.
This is fast and simple solution:
clear
str_pattern=['B4','EF']; %same as str_pattern=['B4EF'];
x = {'3D'
'B4'
'EF'
'D8'
'EB'
'4E'
'F3'
'B4'
'EF'
'37'} ;
str_x=horzcat(x{:});
inds0=strfind(str_x,str_pattern); %including in-middle
inds1=inds0(bitand(inds0,1)==1); %exclude all in-middle results
disp(str_x);
disp(str_pattern);
disp(inds0);
disp(inds1);

Counting occurrences of a character in a string within a cell

I'm having trouble figuring out how to count the occurrences of a character in a string within a cell. For example, I have a file that contains information like so:
type
m
mmNs
SmNm
and I'm trying to determine how many m's are in each line. To do this, I've tried this code:
sampleddata = dataset('file','sample.txt','Delimiter','\t');
muts = sampleddata.type;
fileID = fopen('number_occur.txt','w');
for j = 1:3
mutations = muts(j)
M = length(find(mutations == 'm'));
fprintf(fileID, '%1f\n',M)
end
fclose(fileID)
However, I get an error that informs me: "Undefined operator '==' for input arguments of type 'cell'." Does anyone know how to overcome this problem?
Gonna post a result here in case you did not find a way to do it. There are loads of ways to do it, I am just going to put one of them.
Basically, you want a regex to do string matches:
a = {'type';
'm';
'mmNs';
'SmNm';
'mmmmM'} %//Load in Data,
pattern = 'm'; %//The pattern you are looking for is 'm', it could be anything really, a number of specific word or a specific pattern
lines = regexp(a, pattern, 'tokens'); %// look for this pattern in each line
result = cellfun('length',lines); %//count the size of matched patterns, so each time it matches, the size should increase by 1.
This gives the result in a matrix form:
result =
0
1
2
2
4

Looping through documents in matlab

I am attempting to loop through the variable 'docs' which is a cell array that holds strings, i need to make a for loop that colllects the terms in a cell array and then uses command 'lower' and unique to create a dictionary.
Here is the code i've tried sp far and i just get errors
docsLength = length(docs);
for C = 1:docsLength
list = tokenize(docs, ' .,-');
Mylist = [list;C];
end
I get these errors
Error using textscan
First input must be of type double or string.
Error in tokenize (line 3)
C = textscan(str,'%s','MultipleDelimsAsOne',1,'delimiter',delimiters);
Error in tk (line 4)
list = tokenize(docs, ' .,-');
Generically, if you get an "must be of type" error, that means you are passing the wrong sort of input to a function. In this case you should look at the point in your code where this is taking place (here, in tokenize when textscan is called), and doublecheck that the input going in is what you expect it to be.
As tokenize is not a MATLAB builtin function, unless you show us that code we can't say what those inputs should be. However, as akfaz mentioned in comments, it is likely that you want to pass docs{C} (a string) to tokenize instead of docs (a cell array). Otherwise, there's no point in having a loop as it just repeatedly passes the same input, docs, into the function.
There are additional problems with the loop:
Mylist = [list; C]; will be overwritten each loop to consist of the latest version of list plus C, which is just a number (the index of the loop). Depending on what the output of tokenize looks like, Mylist = [Mylist; list] may work but you should initialise Mylist first.
Mylist = [];
for C = 1:length(docs)
list = tokenize(docs{C}, ' .,-');
Mylist = [Mylist; list];
end

Using a string to refer to a structure array - matlab

I am trying to take the averages of a pretty large set of data, so i have created a function to do exactly that.
The data is stored in some struct1.struct2.data(:,column)
there are 4 struct1 and each of these have between 20 and 30 sub-struct2
the data that I want to average is always stored in column 7 and I want to output the average of each struct2.data(:,column) into a 2xN array/double (column 1 of this output is a reference to each sub-struct2 column 2 is the average)
The omly problem is, I can't find a way (lots and lots of reading) to point at each structure properly. I am using a string to refer to the structures, but I get error Attempt to reference field of non-structure array. So clearly it doesn't like this. Here is what I used. (excuse the inelegence)
function [avrg] = Takemean(prefix,numslits)
% place holder arrays
avs = [];
slits = [];
% iterate over the sub-struct (struct2)
for currslit=1:numslits
dataname = sprintf('%s_slit_%02d',prefix,currslit);
% slap the average and slit ID on the end
avs(end+1) = mean(prefix.dataname.data(:,7));
slits(end+1) = currslit;
end
% transpose the arrays
avs = avs';
slits = slits';
avrg = cat(2,slits,avs); % slap them together
It falls over at this line avs(end+1) = mean(prefix.dataname.data,7); because as you can see, prefix and dataname are strings. So, after hunting around I tried making these strings variables with genvarname() still no luck!
I have spent hours on what should have been 5min of coding. :'(
Edit: Oh prefix is a string e.g. 'Hs' and the structure of the structures (lol) is e.g. Hs.Hs_slit_XX.data() where XX is e.g. 01,02,...27
Edit: If I just run mean(Hs.Hs_slit_01.data(:,7)) it works fine... but then I cant iterate over all of the _slit_XX
If you simply want to iterate over the fields with the name pattern <something>_slit_<something>, you need neither the prefix string nor numslits for this. Pass the actual structure to your function, extract the desired fields and then itereate them:
function avrg = Takemean(s)
%// Extract only the "_slit_" fields
names = fieldnames(s);
names = names(~cellfun('isempty', strfind(names, '_slit_')));
%// Iterate over fields and calculate means
avrg = zeros(numel(names), 2);
for k = 1:numel(names)
avrg(k, :) = [k, mean(s.(names{k}).data(:, 7))];
end
This method uses dynamic field referencing to access fields in structs using strings.
First of all, think twice before you use string construction to access variables.
If you really really need it, here is how it can be used:
a.b=123;
s1 = 'a';
s2 = 'b';
eval([s1 '.' s2])
In your case probably something like:
Hs.Hs_slit_01.data= rand(3,7);
avs = [];
dataname = 'Hs_slit_01';
prefix = 'Hs';
eval(['avs(end+1) = mean(' prefix '.' dataname '.data(:,7))'])

assigning values to a field of an structure array in MATLAB

I want to replace the value of the fields in a structure array. For example, I want to replace all 1's with 3's in the following construction.
a(1).b = 1;
a(2).b = 2;
a(3).b = 1;
a([a.b] == 1).b = 3; % This doesn't work and spits out:
% "Insufficient outputs from right hand side to satisfy comma separated
% list expansion on left hand side. Missing [] are the most likely cause."
Is there an easy syntax for this? I want to avoid ugly for loops for such simple operation.
Credits go to #Slayton, but you actually can do the same thing for assigning values too, using deal:
[a([a.b]==1).b]=deal(3)
So breakdown:
[a.b]
retrieves all b fields of the array a and puts this comma-separated-list in an array.
a([a.b]==1)
uses logical indexing to index only the elements of a that satisfy the constraint. Subsequently the full command above assigns the value 3 to all elements of the resulting comma-separated-list according to this.
You can retrieve that the value of a field for each struct in an array using cell notation.
bVals = {a.b};
bVals = cell2mat( bVals );
AFAIK, you can't do the same thing for inserting values into an array of structs. You'll have to use a loop.