Split array of strings into a matrix of numeric digits - matlab

I want to convert n integers to base b, and write every digit as a single number in a matrix.
I get the base b representation with:
stringBaseB=dec2base(0:1:1000,b,10)
but don't know how to split every string into a single char
[[0,0,0,0];[0,0,0,1];[0,0,0,2];...]
I can use array2table to split the individual characters:
tableBaseB=array2table(dec2base(stringBaseB,b,10))
but that's not a numeric matrix. Also, in base b>10 I get alphanumeric characters, which I need to convert to numeric by an equivalence like
alphanumeric=["1","A","c","3"]
numericEquivalence=[1,1+i,-3,0]
There is a vectorized way to do it?

For some base b < 11, where the character array from dec2base is always going to be single-digit numeric characters, you can simply do
b = arrayfun( #str2double, stringBaseB );
For some generic base b, you can make a map between the characters and the values (your numericEquivalence)
charMap = containers.Map( {'1','2','3'}, {1,2,3} );
Then you can use arrayfun (not strictly "vectorized" as you requested but it's unclear why that's a requirement)
stringBaseB=dec2base(0:1:1000,b,10);
b = arrayfun( #(x)charMap(x), stringBaseB );
This gives you a numeric output array, for example stringBaseB=dec2base(0:5,3,10) gives
stringBaseB =
6×10 char array
'0000000000'
'0000000001'
'0000000002'
'0000000010'
'0000000011'
'0000000012'
b =
6x10 double array
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 2
0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 0 1 2

Related

How to convert list of binary values to int32 type?

I have a list of binary numbers in little-endian format in MATLAB workspace, and I want to convert them to int32. a is a double vector of zeroes and ones, like so:
a = [0 0 0 1 1 0 0 1 1 1 1 0 1 0 1 0 0 0 0 1 1 0 0 0 1 1 1 1 1 0 0 0];
int32(a) gives me a row vector of the 32 binary values without converting it to 32 bit integer.
The solutions from this related question (which is specific to unsigned integers) can be modified to handle a signed integer. The easiest approach would be to convert the output from there to uint32, then convert to int32 using typecast. Building on this solution:
>> a = [0 0 0 1 1 0 0 1 1 1 1 0 1 0 1 0 0 0 0 1 1 0 0 0 1 1 1 1 1 0 0 0];
>> b = typecast(uint32(sum(pow2(find(a)-1))), 'int32')
b =
int32
521688984
If you know the specific representation used for the bit pattern (my guess would be two's complement), you could avoid using typecast by accounting for the sign bit and complement directly in the calculations.
If a is an N-by-32 matrix, you can simply replace the sum(...) with the vectorized calculations from the linked solution:
b = typecast(uint32(a*(2.^(0:size(a, 2)-1)).'), 'int32');

How to sort the columns of a matrix in order of some other vector in MATLAB?

Say I have a vector A of item IDs:
A=[50936
332680
107430
167940
185820
99732
198490
201250
27626
69375];
And I have a matrix B whose rows contains values of 8 parameters for each of the items in vector A:
B=[0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
1 0 1 0 0 1 0 1 1 1
1 0 1 0 0 1 0 1 1 1
0 0 1 0 0 0 0 1 0 1
0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1];
So, column 1 in matrix B represents data of item in row 1 of vector A, column 2 in matrix B represents data of item in row 2 of vector A, and so on. However, I want matrix B to contain the information in a different order of items stored in vector A2:
A2=[185820
198490
69375
167940
99732
332680
27626
107430
50936
201250];
How do I sort them, so that column 1 of matrix B contains data for item in row 1 of vector A2, column 2 of matrix B contains data for item in row 2 of vector A2, and so on?
My extremely crude solution to do this is the following:
A=A'; A2=A2';
for i=1:size(A,2)
A(2:size(B,1)+1,i)=B(:,i);
end
A2(2:size(B,1)+1,:)=zeros(size(B,1),size(B,2));
for i=size(A2,2)
for j=size(A,2)
if A2(1,i)==A(1,j)
A2(2:end,i)=A(2:end,j);
end
end
end
B2 = A2(2:end,:);
But I would like to know a cleaner, more elegant and less time consuming method to do this.
A possible solution
You can use second output of ismember function.
[~ ,idx] = ismember(A2,A);
B2 = B(:,idx);
Update:I tested both my solution and another proposed by hbaderts
disp('-----ISMEMBER:-------')
tic
[~,idx]=ismember(A2,A);
toc
disp('-----SORT:-----------')
tic
[~,idx1] = sort(A);
[~,idx2] = sort(A2);
map = zeros(1,size(idx2));
map(idx2) = idx1;
toc
Here is the result in Octave:
-----ISMEMBER:-------
Elapsed time is 0.00157714 seconds.
-----SORT:-----------
Elapsed time is 4.41074e-05 seconds.
Conclusion: the sort method is more efficient!
As both A and A2 contain the exact same elements, just sorted differently, we can create a mapping from the A-sorting to the A2-sorting. For that, we run the sort function on both and save indexes (which are the second output).
[~,idx1] = sort(A);
[~,idx2] = sort(A2);
Now, the first element in idx1 corresponds to the first element in idx2, so A(idx1(1)) is the same as A2(idx2(1)) (which is 27626). To create a mapping idx1 -> idx2, we use matrix indexing as follows
map = zeros(size(idx2));
map(idx2) = idx1;
To sort B accordingly, all we need to do is
B2 = B(:, map);
[A2, sort_order] = sort(A);
B2 = B(:, sort_order)
MATLAB's sort function returns the order in which the items in A are sorted. You can use this to order the columns in B.
Transpose B so you can concatenate it with A:
C = [A B']
Now you have
C = [ 50936 0 0 1 1 0 0 0 0;
332680 0 0 0 0 0 0 0 0;
107430 0 0 1 1 1 0 0 0;
167940 0 0 0 0 0 0 0 0;
185820 0 0 0 0 0 0 0 0;
99732 0 0 1 1 0 0 0 0;
198490 0 0 0 0 0 0 0 0;
201250 0 0 1 1 1 1 0 0;
27626 0 0 1 1 0 0 0 0;
69375 0 0 1 1 1 0 0 1];
You can now sort the rows of the matrix however you want. For example, to sort by ID in ascending order, use sortrows:
C = sortrows(C)
To just swap rows around, use a permutation of 1:length(A):
C = C(perm, :)
where perm could be something like [4 5 6 3 2 1 8 7 9 10].
This way, your information is all contained in one structure and the data is always correctly matched to the proper ID.

Ismember, multiple times

I have an array of letters
mystring = 'abcdefghijklmnopqrstuvwxyz';
and I have the word Elephant. I want to know how many times the letters appear in Elephant. I have tried ismember and it gives me if they appear but not how many times. How can I get the number of times a letter occurs in a word?
You could use histcounts:
mystring = 'bcdfgijkmoqrsuvwxyzelphant';
myword = 'elephant';
[sortstring, idx] = sort(mystring); % Bin edges for histcounts need to be increasing
N = histcounts(double(myword), [double(sortstring) 257]); % Add 257 to the array so we capture the last character in a bin
N(idx) = N; % Undo the sort
Which returns:
N =
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 1 1 1 1
Note that due to the conversion to ASCII this method is case sensitive. You can adjust for this using lower or upper, if necessary.
mystring = char(['A':'Z','a':'z']);
Alphabet = zeros(numel(mystring),1);
for ii = 1:numel(mystring)
Alphabet(ii,1) = sum(ismember('Elephant',mystring(ii)));
end
ismember checks whether the current letter of the alphabet as dictated by the loop exists in the word. If it does, it sums all occurrences to obtain the total occurrence times of each letter, stored in Alphabet, where each entry corresponds to the letter at that position in the alphabet.
I used the method of creating the alphabet as per #Daniel's comment; capitals do now work.
Example, test for William Shakespeare:
Alphabet.'
ans =
Columns 1 through 15
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Columns 16 through 30
0 0 0 1 0 0 0 1 0 0 0 3 0 0 0
Columns 31 through 45
3 0 0 1 2 0 1 2 1 0 0 1 0 1 1
Columns 46 through 52
0 0 0 0 0 0 0

Measure how spread out the data in an array is

I have an array of zeros and ones and I need to know if the data is spread out across the columns or concentrated in clumps.
For example:
If I have array x and it has these values:
Column 1 values: 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
Column 2 values: 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 1
if we counted the number of ones we can know that it is the same number but the ones are more well spread out and distributed in column 2 compared with column 1.
I am trying to make a score that gives me a high value if the spreading is good and low value if the spreading is bad... any ideas??
Sample of Data:
1 0 0 0 5 0 -2 -3 0 0 1
1 0 0 0 0 0 0 0 0 0 1
2 0 0 0 0 0 0 3 -3 1 0
1 2 3 0 5 0 2 13 4 5 1
1 0 0 0 0 0 -4 34 0 0 1
I think what you're trying to measure is the variance of the distribution of the number of 0s between the 1s, i.e:
f = #(x)std(diff(find(x)))
So for you data:
a = [1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1]
b = [1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 1]
f(a)
= 8.0498
f(b)
= 2.0736
But I still think you're essentially trying to measure the disorder of the system which is what I imagine entropy measures but I don't know how
Note that this gives a low value if the "spreading" is good and a high value if it is bad (i.e. the opposite of your request).
Also if you want it per column then it becomes a little more complicated:
f = #(x)arrayfun(#(y)std(diff(find(x(:,y)))), 1:size(x,2))
data = [a', b'];
f(data)
WARNING: This method pretty much does not consider trailing and leading 0s. I don't know if that's a problem or not. but basically f([0; 0; 0; 1; 1; 1; 0; 0; 0]) returns 0 where as f([1; 0; 0; 1; 0; 1; 0; 0; 0]) returns a positive indicating (incorrectly) that first case is more distributed. One possible fix might be to prepend and append a row of ones to the matrix...
I think you would need an interval to find the "spreadness" locally, otherwise the sample 1 (which is named as Column 1 in the question) would appear as spread too between the 2nd and 3rd ones.
So, following that theory and assuming input_array to be the input array, you can try this approach -
intv = 10; %// Interval
diff_loc = diff(find(input_array))
spread_factor = sum(diff_loc(diff_loc<=intv)) %// desired output/score
For sample 1, spread_factor gives 4 and for sample 2 it is 23.
Another theory that you can employ would be if you assume an interval such that distance between consecutive ones must be greater than or equal to that interval. This theory would lead us to a code like this -
intv = 3; %// Interval
diff_loc = diff(find(input_array))
spread_factor = sum(diff_loc>=intv)
With this new approach - For sample 1, spread_factor is 1 and for sample 2 it is 5.

Using find function on columns and rows in matlab

I am having some problems with the find function in MATLAB. I have a matrix consisting of zeros and ones (representing the geometry of a structural element), where material is present when the matrix element = 1, and where no material is present when the matrix element = 0. The matrix may have the general form shown below (it will update as the geometry is changed, but that isn't too important).
Geometry = [0 0 0 0 0 0 0 0 0 0;
0 0 1 0 1 0 1 1 0 0;
0 0 1 0 0 0 0 1 0 0;
0 0 1 0 0 0 0 0 0 0;
0 0 0 0 0 0 0 1 0 0;
0 0 0 0 0 0 0 0 0 0;
0 0 1 0 0 0 0 1 0 0;
0 0 1 0 0 0 0 1 0 0;
0 0 1 1 1 1 0 1 0 0;
0 0 0 0 0 0 0 0 0 0;]
I'm trying to find the the rows and columns that are not continuously connected (i.e. where the row and columns are not all equal to 1 between the outer extents of the row or column) and then update them so they are all connected. I.e. the matrix above becomes:
Geometry = [0 0 0 0 0 0 0 0 0 0;
0 0 1 1 1 1 1 1 0 0;
0 0 1 0 0 0 0 1 0 0;
0 0 1 0 0 0 0 1 0 0;
0 0 1 0 0 0 0 1 0 0;
0 0 1 0 0 0 0 1 0 0;
0 0 1 0 0 0 0 1 0 0;
0 0 1 0 0 0 0 1 0 0;
0 0 1 1 1 1 1 1 0 0;
0 0 0 0 0 0 0 0 0 0;]
The problem I am having is I want to be able to find the indices of the first and last element that is equal to 1 in each row (and column), which will then be used to update the geoemtry matrix.
Ideally, I want to represent these in vectors, so going across the columns, find the row number of the first element equal to 1 and store this in a vector called rowfirst.
I.e.:
rowfirst = zeros(1,numcols)
for i = 1:numcols % Going across the columns
rowfirst(i) = find(Geometry(i,1) == 1, 1,'first')
% Store values in vector called rowfirst
end
and the repeat this for the columns and to find the last elements in each row.
For some reason, I can't get the values to store properly in the vector, does anyone have an idea of where I'm going wrong?
Thanks in advance. Please let me know if that isn't clear, as I may not have explained the problem very well.
0) bwmorph(Geometry,'close') dose it all in one line. If the holes may be bigger, try bwmorph(Geometry,'close',Inf).
Regarding your attempt:
1) It should be Geometry(i,:) instead of Geometry(i,1).
2) Your real problem here is empty matrices. Actually, what do you want rowfirst(i) to be if there are no 1s in the i'th row?
Ok, I can spot two mistakes:
You should use an array as the first argument of find. So, if you want to find the row number of the first element of each column, then you should use find(Geometry(:, i), 1, 'first').
Find returns an empty array if the column contains only zeros. You should handle this case and decide what number you want to put into rownumber (e.g. you can put -1, to indicate that the corresponding column contains no non-zero elements).
Following the above, you can try this:
for i = 1:numcols
tmp = find(Geometry(:, i), 1, 'first');
if(tmp)
rowfirst(i) = tmp;
else
rowfirst(i) = -1;
end;
end;
I'm pretty sure there's a more efficient way of doing this, but if you replace your call to find with this, it should work ok:
find(Geometry(i,:), 1,'first')
(otherwise you're just looking at the first cell of the ith row. And the == 1 is useless, since find already returns only non-zero elements, and your matrix is binary)
Use the AccumArray() function to find the min and max col (row) number.
Imagine finding the last (first) row in each column that contains a NaN.
a = [1 nan nan nan ;
2 2 3 4;
3 nan 3 3;
4 nan 4 4]
This code gets the row indices for the last NaN in each column.
[row,col] = find(isnan(a))
accumarray(col,row,[],#max)
This code gets the row indices for the first NaN in each column.
[row,col] = find(isnan(a))
accumarray(col,row,[],#min)
Swap the row and col variables to scan row-wise instead of column-wise.
This answer inspired by Finding value and index of min value in a matrix, grouped by column values