How to convert char to number in Matlab - matlab

I am having trouble converting a character variable to a number in Matlab.
Each cell in the char variable contains one of two possible words. I need to convert word_one (for example) to represent '1', and word_two to represent '2'.
Is there a command that will let me do this?
So far I've tried:
%First I converted 'Word' from cell to char
Word = char(Word);
Word(Word == 'Word_one') = '1';
Word(Word == 'Word_two') = '2';
However, I get the:
Error using ==
Matrix dimensions must agree.
When I try to include the first letter only (ie. 'W'), it only changes the first letter in the full word (ie. 1ord_one).
Is there an easy way to do this?
Thanks for your help - any advice is much appreciated!

Use ismember:
possibleWords = {'Word_one', 'Word_two'}; %// template: words corresponding to 1, 2, ...
words = {'Word_two', 'Word_one', 'Word_two'}; %// data: words you need to convert
[~, result] = ismember(words, possibleWords);
In this example,
result =
2 1 2
If you need more flexibility, you can specify the value corresponding to each word:
possibleWords = {'Word_one', 'Word_two'}; %// template: words corresponding to 1, 2, ...
correspondingValues = [1.1, 2.2]; %// template: value corresponding to each word
words = {'Word_two', 'Word_one', 'Word_two'}; %// data: words you need to convert
[~, ind] = ismember(words, possibleWords);
result = correspondingValues(ind);
which gives
result =
2.2000 1.1000 2.2000

Looks like there are a couple of potential issues here.
Use strcmp() (string compare) in place of your current equivalence statement. Comparing strings using == compares element by element and returns a logical vector (where here you want a single logical value). String comparison, strcmp(), will compare the entire strings instead and return a single value.
It's also probably not necessary for you to convert your cell array. You can maintain the cell array structure and address each cell individually.
Try something along the lines of the following snippet.
for i = 1:length(Word)
if strcmp(Word{i},'Word_one')
Word{i} = '1';
elseif strcmp(Word{i},'Word_two')
Word{i} = '2';
end
end

There are a number of ways to solve this problem. Here's my approach.
% define your words
words = {'word_one','word_two','word_two','word_one','word_one'};
% define a function to get the indexes of the words of interest
getindex = #(c, y) cellfun(#(x) strcmp(x,y), c);
% replace 'word_one' with '1'
words(getindex(words, 'word_one'))={'1'};
% replace 'word_two' with '2'
words(getindex(words, 'word_two'))={'2'};
words =
'1' '2' '2' '1' '1'

You can use short n simple unique -
input_cellarr = {'Word_two','Word_one','Word_two','Word_two','Word_one','Word_one'}
[~,~,out] = unique(input_cellarr)
Sample run -
input_cellarr =
'Word_two' 'Word_one' 'Word_two' 'Word_two' 'Word_one' 'Word_one'
out =
2
1
2
2
1
1
Explanation: unique works here because it will produce an ascending order sorted array with numeric arrays. Now, when used on cell arrays, that ascending order translates to alphabetical order sorting. Thus, unique(input_cellarr) would always have {'Word_one' , 'Word_two'} because one is alphabetically higher up than two. Therefore the out indices would always have the first unique ID as 1 for 'Word_one' and the second ID as 2 for 'Word_two'.

Related

How to perform XOR in a recursive scenario

I have a 1x5 char matrix. I need to perform a bitwise XOR operation on all the elements in the matrix.If T is the char matrix , I need a matrix T' such that
T'= T XOR (T-1)' for all T
T for T=1
Let the char matrix be T
T=['0000000000110111' '0000000001000001' '0000000001001010' '0000000010111000' '0000000000101111']
T'=['0000000000110111' '0000000001110110' '0000000000111100' '0000000010000100' '0000000010101011']
ie; Leaving the first element as such , I need to XOR all the other elements with the newly formed matrix. I tried the following code but I'm unable to get the correct result.
Yxor1d = [T(1) cellfun(#(a,b) char((a ~= b) + '0'), T(2:end), T'(1:end-1), 'UniformOutput', false)]
I need to perform the XOR operation such that , for obtaining the elements of T'
T' (2)= T(2) XOR T' (1)
T' (3)= T(3) XOR T' (2)
It'll be really helpful to know where I went wrong.Thanks.
You are using cellfun when a cell array is expected as the input. You are using a character array, and what you're actually doing is taking each of those 5 strings and creating a single character array out of them. Chaining those strings together is actually performing a character concatenation.
You probably don't want that. To fix this, all you have to do is make T a cell array by placing {} characters instead of array ([]) characters to declare your characters:
T={'0000000000110111' '0000000001000001' '0000000001001010' '0000000010111000' '0000000000101111'};
Because you have edited your post after I provided my answer, my previous answer using cellfun is now incorrect. Because you are using a recurrence relation where you are referring to the previous output rather than input, you can no longer use cellfun. You'll need to use a for loop. There are probably more elegant ways to do it, but this is the easiest if you want to get something working.
As such, initialize an output cell array that is the same size as the input cell array like above, then you'll need to initialize the first cell to be the first cell of the input, then iterate through each pair of input and output elements yourself.
So do something like this:
Yxor1d = cell(1,numel(T));
Yxor1d{1} = T{1};
for idx = 2 : numel(T)
Yxor1d{idx} = char(double(T{idx} ~= Yxor1d{idx-1}) + '0');
end
For each value i of T', we XOR with the current input at T{i} with the previous output of T'{i-1}.
Use the above and your input cell array T, we get:
Yxor1d =
Columns 1 through 3
'0000000000110111' '0000000001110110' '0000000000111100'
Columns 4 through 5
'0000000010000100' '0000000010101011'
This matches with your specifications in your modified post.
Edit: There is a solution without a loop:
T=['0000000000110111';'0000000001000001';'0000000001001010';'0000000010111000' ;'0000000000101111'];
Yxor = dec2bin(bi2de(mod(cumsum(de2bi(bin2dec(T))),2)),16)
Yxor =
0000000000110111
0000000001110110
0000000000111100
0000000010000100
0000000010101011
This uses the fact that you effectively want a cumulative xor operation on the elements of your array.
For N booleans it should be either any one of them or else all of them. So if you do a cumulative sum of each of your bits, the sum should be an odd number for a true answer to 'xor'.
The one liner above can be decomposed like that:
Y = bin2dec(T) ; %// convert char array T into decimal numbers
Y = de2bi( Y ) ; %// convert decimal array Tbin into array of "bit"
Y = cumsum(Y) ; %// do the cumulative sum on each bit column
Y = mod(Y,2) ; %// convert all "even" numbers to '0', and 'odd' numbers to '1'
Y = bi2de(Y) ; %// re-assemble the bits into decimal numbers
Yxor = dec2bin(Y,16) ; %// get their string representation
Note that if you are happy to handle arrays of bits (boolean) instead of character arrays, you can shave off a few lines from above ;-)
Initial answer (simpler to grasp, but with a loop):
You can use the bitxor function, but you have to convert your char array in numeric value first:
T=['0000000000110111';'0000000001000001';'0000000001001010' ;'0000000010111000' ;'0000000000101111'];
Tbin = bin2dec(T) ; %// convert to numeric values
Ybin = Tbin ; %// pre-assign result, then loop ...
for idx = 2 : numel(Tbin)
Ybin(idx) = bitxor( Ybin(idx) , Ybin(idx-1) ) ;
end
Ychar = dec2bin(Ybin,16) %// convert back to 16bit char array representation if necessary
Ychar =
0000000000110111
0000000001110110
0000000000111100
0000000010000100
0000000010101011
edited answer after you redefined your problem

Huffman encoding using cell array

I have some problems in final part of Huffman encoding.
Currently I have my coding table in cell array
code =
{
[1,1] = 000
[1,2] = 001
[1,3] = 010
[1,4] = 011
[1,5] = 100
...
}
Where second index represent ascii character in my other cell array
huffman_tree =
{
[1,1] = A
[1,2] = B
[1,3] = C
[1,4] = D
[1,5] = E
...
}
I'm using following code for encoding input to output:
output= [];
for i=1:length(input)
x = findInArray(huffman_tree, input(i));
output= [output code(x)];
end
function [index] = findInArray(array, searched)
index = -1;
for i=1:length(array)
if array{i} == searched
index = i;
end
end
end
At this point my code is O(n^2) or even worse. I'm having problem with large input where
length(input) = 1000000
There must be some faster way to transform input with my coding table to output.
Because you're using cell arrays, that's going to be inherently slow so you have no choice but to iterate over each cell. However, I can provide some suggestions to help speed things up. What you can do is use strcmp to compare strings. I'm assuming that each character in your cell array is represented as a one character string. strcmp has the ability to take an individual string and compare itself to a cell array of strings. The output will be an array that is the same size as the cell array of strings and give you a logical true if the input string matches a position in the cell array and false otherwise.
Because your Huffman dictionary will contain a unique set of characters, you will only get one possible match per character. Therefore, we can use this logical array output to index the codebook directly to retrieve the corresponding code you want. Logical indexing works by supplying a logical vector that is the same length as the vector of interest and it retrieves those values whose corresponding positions are true. Therefore, if we only had one true value in the logical vector with the rest of the elements being false, this means that we would get just the corresponding element we desire and nothing else.
Therefore, we can change your code to do this. Note that I've changed the loop counter i to idx because it has actually been shown that using i as a loop counter slows down your code by a slight amount. See this post by Shai Bagon for more details: Using i and j as variables in Matlab. Also, I've changed the length call to numel... mainly because I don't like using length.... just a personal choice though.
output = [];
for idx = 1:numel(input)
output = [output code(strcmp(input(idx), huffman_tree))];
end
Give the above a whirl and see if it performs any faster. For one thing, this will escape using an additional for loop for searching for a match as strcmp is very efficiently implemented, so the above code won't be O(n^2), but could be slightly better than quadratic.

Search for a specific digit in an integer

I'm looking for a really quick method in MATLAB of searching for a specific digit within an integer, ideally in a given position. For example:
Simple case...
I want to look through an array of integers and return all those which contain the number 1 eg 1234, 4321, 6515, 847251737 etc
More complex case...
I want to loop through an array of integers and return all those which contain the number 1 in the third digit eg 6218473, 541846, 3115473 BUT 175846 would not be returned.
Any thoughts?
There's a few answers here already, I'll throw my try into the pot.
Conversion to string can be expensive, so if it can be avoided, it should be.
n = 1:100000; % sample numbers
m = 3; % digit to check
x = 1; % number to find
% Length of the numbers in digits
num_length = floor(log10(abs(n)))+1;
% digit (from the left) to check
num_place = num_length-m;
% get the digit
digit_in_place = mod(floor(abs(n)./(10.^num_place)),10);
found_number = n(digit_in_place==x);
By casting to strings, the trick to vectorising is just to make sure x is a column vector. x(:) guarantees this. Also you need to left-align the strings which is done with the format specifier '%-d' where - is for left-alignment and d is for integers:
s = num2str(x(:), '%-d');
ind = s(:,3)=='1'
and this also allows you to easily solve your first case:
ind = any(s=='1',2)
in either case to recover your original number just go:
x(ind)
One way of getting there is to cast your numbers as strings and then check if the 3rd position of that string is '1'. It works perfectly fine in a loop, but I am confident that there is also a vectorized solution:
numbers = [6218473, 541846, 3115473, 175846]'
returned_numbers = [];
for i = 1:length(numbers)
number = numbers(i);
y = sprintf('%d', number) %// cast to string
%// add number to list, if its third character is 11
if strcmp(y(3), '1')
returned_numbers = [returned_numbers, number];
end
end
% // it returns:
returned_numbers =
6218473 541846 3115473
Code
%// Input array
array1 = [-94341 1234 4321 6515 847251737 6218473 541846 3115473 175846]
N = numel(array1); %// number of elements in input array
digits_sep = num2str(array1(:))-'0'; %//' Seperate the digits into a matrix
%// Simple case
output1 = array1(any(digits_sep==1,2))
%// More complex case output
col_num = 3;
%// Get column numbers for each row of the digits matrix and thus
%// the actual linear index corresponding to 3rd digit for each input element
ind1 =sub2ind(size(digits_sep),1:N,...
size(digits_sep,2)-floor(log10(abs(array1))-col_num+1));
%// Select the third digits, check which ones have `1` and use them to logically
%// index into input array to get the output
output2 = array1(digits_sep(ind1)==1)
Code run -
array1 =
-94341 1234 4321 6515 847251737 6218473 541846 3115473 175846
output1 =
-94341 1234 4321 6515 847251737 6218473 541846 3115473 175846
output2 =
6515 6218473 541846 3115473

matlab parse file into cell array

I have a file in the following format in matlab:
user_id_a: (item_1,rating),(item_2,rating),...(item_n,rating)
user_id_b: (item_25,rating),(item_50,rating),...(item_x,rating)
....
....
so each line has values separated by a colon where the value to the left of the colon is a number representing user_id and the values to the right are tuples of item_ids (also numbers) and rating (numbers not floats).
I would like to read this data into a matlab cell array or better yet ultimately convert it into a sparse matrix wherein the user_id represents the row index, and the item_id represents the column index and store the corresponding rating in that array index. (This would work as I know a-priori the number of users and items in my universe so ids cannot be greater than that ).
Any help would be appreciated.
I have thus far tried the textscan function as follows:
c = textscan(f,'%d %s','delimiter',':') %this creates two cells one with all the user_ids
%and another with all the remaining string values.
Now if I try to do something like str2mat(c{2}), it works but it stores the '(' and ')' characters also in the matrix. I would like to store a sparse matrix in the fashion that I described above.
I am fairly new to matlab and would appreciate any help regarding this matter.
f = fopen('data.txt','rt'); %// data file. Open as text ('t')
str = textscan(f,'%s'); %// gives a cell which contains a cell array of strings
str = str{1}; %// cell array of strings
r = str(1:2:end);
r = cellfun(#(s) str2num(s(1:end-1)), r); %// rows; numeric vector
pairs = str(2:2:end);
pairs = regexprep(pairs,'[(,)]',' ');
pairs = cellfun(#(s) str2num(s(1:end-1)), pairs, 'uni', 0);
%// pairs; cell array of numeric vectors
cols = cellfun(#(x) x(1:2:end), pairs, 'uni', 0);
%// columns; cell array of numeric vectors
vals = cellfun(#(x) x(2:2:end), pairs, 'uni', 0);
%// values; cell array of numeric vectors
rows = arrayfun(#(n) repmat(r(n),1,numel(cols{n})), 1:numel(r), 'uni', 0);
%// rows repeated to match cols; cell array of numeric vectors
matrix = sparse([rows{:}], [cols{:}], [vals{:}]);
%// concat rows, cols and vals into vectors and use as inputs to sparse
For the example file
1: (1,3),(2,4),(3,5)
10: (1,1),(2,2)
this gives the following sparse matrix:
matrix =
(1,1) 3
(10,1) 1
(1,2) 4
(10,2) 2
(1,3) 5
I think newer versions of Matlab have a stringsplit function that makes this approach overkill, but the following works, if not quickly. It splits the file into userid's and "other stuff" as you show, initializes a large empty matrix, and then iterates through the other stuff, breaking it apart and placing in the correct place in the matrix.
(I Didn't see the previous answer when I opened this for some reason - it is more sophisticated than this one, though this may be a little easier to follow at the expense of slowness). I throw in the \s* into the regex in case the spacing is inconsistent, but otherwise don't perform much in the way of data-sanity-checking. Output is the full array, that you can then turn into a sparse array if desired.
% matlab_test.txt:
% 101: (1,42),(2,65),(5,0)
% 102: (25,78),(50,12),(6,143),(2,123)
% 103: (23,6),(56,3)
clear all;
fclose('all');
% your path will vary, of course
file = '<path>/matlab_test.txt';
f = fopen(file);
c = textscan(f,'%d %s','delimiter',':');
celldisp(c)
uids = c{1}
tuples = c{2}
% These are stated as known
num_users = 3;
num_items = 40;
desired_array = zeros(num_users, num_items);
expression = '\((\d+)\s*,\s*(\d+)\)'
% Assuming length(tuples) == num_users for simplicity
for k = 1:num_users
uid = uids(k)
tokens = regexp(tuples{k}, expression, 'tokens');
for l = 1:length(tokens)
item_id = str2num(tokens{l}{1})
rating = str2num(tokens{l}{2})
desired_array(uid, item_id) = rating;
end
end

Regarding storage issue for mixed-type value matrix

There has a loop in my program, and during each iteration an ID will be generated. I want to store these IDs into a two dimensional array, i.e., A. The first column of A stores the iteration number, i.e., A(1,1) = 1 and A(2,1) = 2. The second column of A stores the ID generated during each iteration, i.e., A(1,2) stores the ID generated during the first iteration. The tricky part is that these IDs can be either a numerical value or a string. For instance, A(1,2) = 12345; A(2,2) = abcde
Which kind of data structure should I use to store this mixed-value matrix?
You have two good options, a cell array or an array of structures.
To use a cell array you need to use braces:
A{1,1} = 1;
A{2,1} = 2;
A{1,2} = 12345;
A{2,2} = 'abcd';
You cannot use most vectorized code with cell arrays, although you can convert numeric subsets to numeric arrays, for example:
col1 = cell2mat(A(:,1));
To use an array of structures, you need to define fields. This has the advantage that you can name your columns of data.
A(1).iteration = 1;
A(2).iteration = 2;
A(1).result = 12345;
A(2).result = 'abcd';
To access a single row of data, use A(1), like this
>> A(1)
ans =
iteration: 1
result: 12345
To access a column of data, use brackets or braces
>> [A.iteration] %This results a numeric array, or an error if not possible
ans =
1 2
>> {A.result} %This returns a cell array, as discussed above.
ans =
[12345] 'abcd'
Which option you use depends on the nature of your task and what method is more suitable to your style. I usually start with a cell array, and eventually convert to an array of structs to take advantage of the named fields.