Ismember, multiple times - matlab

I have an array of letters
mystring = 'abcdefghijklmnopqrstuvwxyz';
and I have the word Elephant. I want to know how many times the letters appear in Elephant. I have tried ismember and it gives me if they appear but not how many times. How can I get the number of times a letter occurs in a word?

You could use histcounts:
mystring = 'bcdfgijkmoqrsuvwxyzelphant';
myword = 'elephant';
[sortstring, idx] = sort(mystring); % Bin edges for histcounts need to be increasing
N = histcounts(double(myword), [double(sortstring) 257]); % Add 257 to the array so we capture the last character in a bin
N(idx) = N; % Undo the sort
Which returns:
N =
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 1 1 1 1
Note that due to the conversion to ASCII this method is case sensitive. You can adjust for this using lower or upper, if necessary.

mystring = char(['A':'Z','a':'z']);
Alphabet = zeros(numel(mystring),1);
for ii = 1:numel(mystring)
Alphabet(ii,1) = sum(ismember('Elephant',mystring(ii)));
end
ismember checks whether the current letter of the alphabet as dictated by the loop exists in the word. If it does, it sums all occurrences to obtain the total occurrence times of each letter, stored in Alphabet, where each entry corresponds to the letter at that position in the alphabet.
I used the method of creating the alphabet as per #Daniel's comment; capitals do now work.
Example, test for William Shakespeare:
Alphabet.'
ans =
Columns 1 through 15
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Columns 16 through 30
0 0 0 1 0 0 0 1 0 0 0 3 0 0 0
Columns 31 through 45
3 0 0 1 2 0 1 2 1 0 0 1 0 1 1
Columns 46 through 52
0 0 0 0 0 0 0

Related

permutation/combination with specific condition

Let us we have binary number to fill out 9 spots with specific condition: 0 always comes before 1. the possible conditions is 10:
1 1 1 1 1 1 1 1 1
0 1 1 1 1 1 1 1 1
0 0 1 1 1 1 1 1 1
0 0 0 1 1 1 1 1 1
0 0 0 0 1 1 1 1 1
0 0 0 0 0 1 1 1 1
0 0 0 0 0 0 1 1 1
0 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0
Now lest us extent it to 0, 1, 2 with same rule. 0 should be always before 1 and/or 2. 1 should be before 1. Again, 9 spots are available to fill out.
I know that this yields to 55 combinations.
Question:
(1) what is the mathematical formulation to generalize this?
(2) How can I store all those 55 combinations? [any matlab code?]
Thanks
As the commenter said, the answer comes down to stars and bars. You can also think of this as counting the number of non-decreasing sequences i_1 <= i_2 <= ... <= i_k, where k is the number of symbols available and each i_j is a number between 0 and 9.
That said, here's a matlab script that generates all possibilities. Each row of the output matrix is one possible string of digits.
function M = bin_combs(L,k)
% L: length
% k: number of symbols
if k == 1
M = zeros(1,L);
else
M = zeros(0,L);
N = bin_combs(L,k-1);
for i = 1:size(N,1)
row = N(i,:);
for j=find(row==k-2)
new_row = row;
new_row(j:end) = new_row(j:end) + 1;
M = [M;new_row];
end
M = [M;row];
end
end
Some sample output:
>> size(bin_combs(9,3))
ans =
55 9
>> size(bin_combs(9,4))
ans =
220 9

Split array of strings into a matrix of numeric digits

I want to convert n integers to base b, and write every digit as a single number in a matrix.
I get the base b representation with:
stringBaseB=dec2base(0:1:1000,b,10)
but don't know how to split every string into a single char
[[0,0,0,0];[0,0,0,1];[0,0,0,2];...]
I can use array2table to split the individual characters:
tableBaseB=array2table(dec2base(stringBaseB,b,10))
but that's not a numeric matrix. Also, in base b>10 I get alphanumeric characters, which I need to convert to numeric by an equivalence like
alphanumeric=["1","A","c","3"]
numericEquivalence=[1,1+i,-3,0]
There is a vectorized way to do it?
For some base b < 11, where the character array from dec2base is always going to be single-digit numeric characters, you can simply do
b = arrayfun( #str2double, stringBaseB );
For some generic base b, you can make a map between the characters and the values (your numericEquivalence)
charMap = containers.Map( {'1','2','3'}, {1,2,3} );
Then you can use arrayfun (not strictly "vectorized" as you requested but it's unclear why that's a requirement)
stringBaseB=dec2base(0:1:1000,b,10);
b = arrayfun( #(x)charMap(x), stringBaseB );
This gives you a numeric output array, for example stringBaseB=dec2base(0:5,3,10) gives
stringBaseB =
6×10 char array
'0000000000'
'0000000001'
'0000000002'
'0000000010'
'0000000011'
'0000000012'
b =
6x10 double array
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 2
0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 0 1 2

Performing an averaging operation over every n elements in a vector

I have a logical vector in which I would like to iterate over every n-elements. If in any given window at least 50% are 1's, then I change every element to 1, else I keep as is and move to the next window. For example.
n = 4;
input = [0 0 0 1 0 1 1 0 0 0 0 1 0 1 0 1 0 0 0 1];
output = func(input,4);
output = [0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 0 1];
This function is trivial to implement but is it possible to apply a vectorized implementation using logical indexing?. I am trying to build up the intuition of applying this technique.
here's a one liner (that works for your input):
func = #(input,n) input | kron(sum(reshape(input ,n,[]))>=n/2,ones(1,n));
of course, there are cases to solve that this doesnt answer, what if the size of the input is not commensurate in n? etc...
i'm not sure if that's what you meant by vectorization, and I didnt benchmark it vs a for loop...
Here is one way of doing it. Once understood you can compact it in less lines but I'll details the intermediate steps for the sake of clarity.
%% The inputs
n = 4;
input = [0 0 0 1 0 1 1 0 0 0 0 1 0 1 0 1 0 0 0 1];
1) Split your input into blocks of size n (note that your final function will have to check that the number of elements in input is a integer multiple of n)
c = reshape(input,n,[]) ;
Gives you a matrix with your blocks organized in columns:
c =
0 0 0 0 0
0 1 0 1 0
0 1 0 0 0
1 0 1 1 1
2) Perform your test condition on each of the block. For this we'll take advantage that Matlab is working column wise for the sum function:
>> cr = sum(c) >= (n/2)
cr =
0 1 0 1 0
Now you have a logical vector cr containing as many elements as initial blocks. Each value is the result of the test condition over the block. The 0 blocks will be left unchanged, the 1 blocks will be forced to value 1.
3) Force 1 columns/block to value 1:
>> c(:,cr) = 1
c =
0 1 0 1 0
0 1 0 1 0
0 1 0 1 0
1 1 1 1 1
4) Now all is left is to unfold your matrix. You can do it several ways:
res = c(:) ; %% will give you a column vector
OR
>> res = reshape(c,1,[]) %% will give you a line vector
res =
0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 0 1

Measure how spread out the data in an array is

I have an array of zeros and ones and I need to know if the data is spread out across the columns or concentrated in clumps.
For example:
If I have array x and it has these values:
Column 1 values: 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
Column 2 values: 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 1
if we counted the number of ones we can know that it is the same number but the ones are more well spread out and distributed in column 2 compared with column 1.
I am trying to make a score that gives me a high value if the spreading is good and low value if the spreading is bad... any ideas??
Sample of Data:
1 0 0 0 5 0 -2 -3 0 0 1
1 0 0 0 0 0 0 0 0 0 1
2 0 0 0 0 0 0 3 -3 1 0
1 2 3 0 5 0 2 13 4 5 1
1 0 0 0 0 0 -4 34 0 0 1
I think what you're trying to measure is the variance of the distribution of the number of 0s between the 1s, i.e:
f = #(x)std(diff(find(x)))
So for you data:
a = [1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1]
b = [1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 1]
f(a)
= 8.0498
f(b)
= 2.0736
But I still think you're essentially trying to measure the disorder of the system which is what I imagine entropy measures but I don't know how
Note that this gives a low value if the "spreading" is good and a high value if it is bad (i.e. the opposite of your request).
Also if you want it per column then it becomes a little more complicated:
f = #(x)arrayfun(#(y)std(diff(find(x(:,y)))), 1:size(x,2))
data = [a', b'];
f(data)
WARNING: This method pretty much does not consider trailing and leading 0s. I don't know if that's a problem or not. but basically f([0; 0; 0; 1; 1; 1; 0; 0; 0]) returns 0 where as f([1; 0; 0; 1; 0; 1; 0; 0; 0]) returns a positive indicating (incorrectly) that first case is more distributed. One possible fix might be to prepend and append a row of ones to the matrix...
I think you would need an interval to find the "spreadness" locally, otherwise the sample 1 (which is named as Column 1 in the question) would appear as spread too between the 2nd and 3rd ones.
So, following that theory and assuming input_array to be the input array, you can try this approach -
intv = 10; %// Interval
diff_loc = diff(find(input_array))
spread_factor = sum(diff_loc(diff_loc<=intv)) %// desired output/score
For sample 1, spread_factor gives 4 and for sample 2 it is 23.
Another theory that you can employ would be if you assume an interval such that distance between consecutive ones must be greater than or equal to that interval. This theory would lead us to a code like this -
intv = 3; %// Interval
diff_loc = diff(find(input_array))
spread_factor = sum(diff_loc>=intv)
With this new approach - For sample 1, spread_factor is 1 and for sample 2 it is 5.

Get the indexes of the boundary cells of a subset of a matrix. Matlab

Given a matrix where 1 is the current subset
test =
0 0 0 0 0 0
0 0 0 0 0 0
0 0 1 1 0 0
0 0 1 1 0 0
0 0 0 0 0 0
0 0 0 0 0 0
Is there a function, or quick method to get change the subset to the boundary of the current subset?
Eg. Get this subset from 'test' above
test =
0 0 0 0 0 0
0 1 1 1 1 0
0 1 0 0 1 0
0 1 0 0 1 0
0 1 1 1 1 0
0 0 0 0 0 0
In the end I just want to get the minimum of the cells surrounding a subset of a matrix. Sure I could loop through and get the minimum of the boundary (cell by cell), but there must be a way to do it with the method i've shown above.
Note the subset WILL be connected, but may not be rectangular. This may be the big catch.
This is a possible subset.... (Would pad this with a NaN border)
test =
0 0 0 0 0 0
0 0 0 0 0 0
0 0 1 1 0 0
0 0 1 1 0 0
0 0 1 1 1 1
0 0 1 1 1 1
Ideas?
The basic steps I'd use are:
Perform a dilation on the shape to get a new area which is the shape plus its boundary
Subtract the original shape from the dilated shape to leave just the boundary
Use the boundary to index your data matrix, then take the minimum.
Dilation
What I want to do here is pass a 3x3 window over each cell and take the maximum value in that window:
[m, n] = size(A); % assuming A is your original shape matrix
APadded = zeros(m + 2, n + 2);
APadded(2:end-1, 2:end-1) = A; % pad A with zeroes on each side
ADilated = zeros(m + 2, n + 2); % this will hold the dilated shape.
for i = 1:m
for j = 1:n
mask = zeros(size(APadded));
mask(i:i+2, j:j+2) = 1; % this places a 3x3 square of 1's around (i, j)
ADilated(i + 1, j + 1) = max(APadded(mask));
end
end
Shape subtraction
This is basically a logical AND and a logical NOT to remove the intersection:
ABoundary = ADilated & (~APadded);
At this stage you may want to remove the border we added to do the dilation, since we don't need it any more.
ABoundary = ABoundary(2:end-1, 2:end-1);
Find the minimum data point along the boundary
We can use our logical boundary to index the original data into a vector, then just take the minimum of that vector.
dataMinimum = min(data(ABoundary));
You should look at this as morphology problem, not set theory. This can be solved pretty easily with imdilate() (requires the image package). You basically only need to subtract the image to its dilation with a 3x3 matrix of 1.
octave> test = logical ([0 0 0 0 0 0
0 0 0 0 0 0
0 0 1 1 0 0
0 0 1 1 0 0
0 0 1 1 1 1
0 0 1 1 1 1]);
octave> imdilate (test, true (3)) - test
ans =
0 0 0 0 0 0
0 1 1 1 1 0
0 1 0 0 1 0
0 1 0 0 1 1
0 1 0 0 0 0
0 1 0 0 0 0
It does not, however, pads with NaN. If you really want that, you could pad your original matrix with false, do the operation, and then check if there's any true values in the border.
Note that you don't have to use logical() in which case you'll have to use ones() instead of true(). But that takes more memory and has worse performance.
EDIT: since you are trying to do it without using any matlab toolbox, take a look at the source of imdilate() in Octave. For the case of logical matrices (which is your case) it's a simple usage of filter2() which belongs to matlab core. That said, the following one line should work fine and be much faster
octave> (filter2 (true (3), test) > 0) - test
ans =
0 0 0 0 0 0
0 1 1 1 1 0
0 1 0 0 1 0
0 1 0 0 1 1
0 1 0 0 0 0
0 1 0 0 0 0
One possible solution is to take the subset and add it to the original matrix, but ensure that each time you add it, you offset its position by +1 row, -1 row and +1 column, -1 column. The result will then be expanded by one row and column all around the original subset. You then use the original matrix to mask the original subet to zero.
Like this:
test_new = test + ...
[[test(2:end,2:end);zeros(1,size(test,1)-1)],zeros(size(test,1),1)] + ... %move subset up-left
[[zeros(1,size(test,1)-1);test(1:end-1,2:end)],zeros(size(test,1),1)] + ... %move down-left
[zeros(size(test,1),1),[test(2:end,1:end-1);zeros(1,size(test,1)-1)]] + ... %move subset up-right
[zeros(size(test,1),1),[zeros(1,size(test,1)-1);test(1:end-1,1:end-1)]]; %move subset down-right
test_masked = test_new.*~test; %mask with original matrix
result = test_masked;
result(result>1)=1; % ensure that there is only 1's, not 2, 3, etc.
The result for this on your test matrix is:
result =
0 0 0 0 0 0
0 1 1 1 1 0
0 1 0 0 1 0
0 1 0 0 1 1
0 1 0 0 0 0
0 1 0 0 0 0
Edited - it now grabs the corners as well, by moving the subset up and to the left, up and to the right, down then left and down then right.
I expect this would be a very quick way to achieve this - it doesn't have any loops, nor functions - just matrix operations.