Number to letter swapping in MATLAB - matlab

I have a vector, for example, V = [ 1, 2, 3, 4 ]. Is there a way to change this to the letters, [ a,b,c,d ]?

Using 'a' directly instead of ascii codes might be slightly more readable
charString = char(V-1+'a');
Uppercase is then obtained with
charString = char(V-1+'A');

There are two simple ways to do this. One way is a simple index.
C = 'abcdefghijklmnopqrstuvwxyz';
V = [8 5 12 12 15 23 15 18 12 4];
C(V)
ans =
helloworld
Of course, char will do it too. The char answer is better because it does not require you to store a list of letters to index into.
char('a' + V - 1)
ans =
helloworld
This is best since when you add 'a' to something, it converts 'a' to its ascii representation on the fly. +'a' will yield 97, the ascii form of 'a'.
A nice thing is it also works for 'A', so if you wanted caps, just add 'A' instead.
char('A' + V - 1)
ans =
HELLOWORLD
You can find more information about working with strings in MATLAB from these commands:
help strings
doc strings

Something like
C = char(V+ones(size(V)).*(97-1))
should work (97 is the ASCII code for 'a', and you want 1 to map to 'a' it looks like).

Using the CHAR function, which turns a number (i.e. ASCII code) into a character:
charString = char(V+96);
EDIT: To go backwards (mapping 'a' to 1, 'b' to 2, etc.), use the DOUBLE function to recast the character back to its ASCII code number:
V = double(charString)-96;

Related

How to calculate the number of appearance of each letter(A-Z ,a-z as well as '.' , ',' and ' ' ) in a text file in matlab?

How can I go about doing this? So far I've opened the file like this
fileID = fopen('hamlet.txt'.'r');
[A,count] = fscanf(fileID, '%s');
fclose(fileID);
Getting spaces from the file
First, if you want to capture spaces, you'll need to change your format specifier. %s reads only non-whitespace characters.
>> fileID = fopen('space.txt','r');
>> A = fscanf(fileID, '%s');
>> fclose(fileID);
>> A
A = Thistexthasspacesinit.
Instead, we can use %c:
>> fileID = fopen('space.txt','r');
>> A = fscanf(fileID, '%c');
>> fclose(fileID);
>> A
A = This text has spaces in it.
Mapping between characters and values (array indices)
We could create a character array that contains all of the target characters to look for:
search_chars = ['A':'Z', 'a':'z', ',', '.', ' '];
That would work, but to map the character to a position in the array you'd have to do something like:
>> char_pos = find(search_chars == 'q')
char_pos = 43
You could also use containters.Map, but that seems like overkill.
Instead, let's use the ASCII value of each character. For convenience, we'll use only values 1:126 (0 is NUL, and 127 is DEL. We should never encounter either of those.) Converting from characters to their ASCII code is easy:
>> c = 'q'
c = s
>> a = uint8(c) % MATLAB actually does this using double(). Seems wasteful to me.
a = 115
>> c2 = char(a)
c2 = s
Note that by doing this, you're counting characters that are not in your desired list like ! and *. If that's a problem, then use search_chars and figure out how you want to map from characters to indices.
Looping solution
The most intuitive way to count each character is a loop. For each character in A, find its ASCII code and increment the counter array at that index.
char_count = zeros(1, 126);
for current_char = A
c = uint8(current_char);
char_count(c) = char_count(c) + 1;
end
Now you've got an array of counts for each character with ASCII codes from 1 to 126. To find out how many instances of 's' there are, we can just use its ASCII code as an index:
>> char_count(115)
ans = 4
We can even use the character itself as an index:
>> char_count('s')
ans = 4
Vectorized solution
As you can see with that last example, MATLAB's weak typing makes characters and their ASCII codes pretty much equivalent. In fact:
>> 's' == 115
ans = 1
That means that we can use implicit broadcasting and == to create a logical 2D array where L(c,a) == 1 if character c in our string A has an ASCII code of a. Then we can get the count for each ASCII code by summing along the columns.
L = (A.' == [1:126]);
char_count = sum(L, 1);
A one-liner
Just for fun, I'll show one more way to do this: histcounts. This is meant to put values into bins, but as we said before, characters can be treated like values.
char_count = histcounts(uint8(A), 1:126);
There are dozens of other possibilities, for instance you could use the search_chars array and ismember(), but this should be a good starting point.
With [A,count] = fscanf(fileID, '%s'); you'll only count all string letters, doesn't matter which one. You can use regexp here which search for each letter you specify and will put it in a cell array. It consists of fields which contains the indices of your occuring letters. In the end you only sum the number of indices and you have the count for each letter:
fileID = fopen('hamlet.txt'.'r');
A = fscanf(fileID, '%s');
indexCellArray = regexp(A,{'A','B','C','D',... %I'm too lazy to add the other letters now^^
'a','b','c','d',...
','.' '};
letterCount = cellfun(#(x) numel(x),indexCellArray);
fclose(fileID);
Maybe you put the cell array in a struct where you can give fieldnames for the letters, otherwise you might loose track which count belongs to which number.
Maybe there's much easier solution, cause this one is kind of exhausting to put all the letters in the regexp but it works.

Matlab: Function that returns a string with the first n characters of the alphabet

I'd like to have a function generate(n) that generates the first n lowercase characters of the alphabet appended in a string (therefore: 1<=n<=26)
For example:
generate(3) --> 'abc'
generate(5) --> 'abcde'
generate(9) --> 'abcdefghi'
I'm new to Matlab and I'd be happy if someone could show me an approach of how to write the function. For sure this will involve doing arithmetic with the ASCII-codes of the characters - but I've no idea how to do this and which types that Matlab provides to do this.
I would rely on ASCII codes for this. You can convert an integer to a character using char.
So for example if we want an "e", we could look up the ASCII code for "e" (101) and write:
char(101)
'e'
This also works for arrays:
char([101, 102])
'ef'
The nice thing in your case is that in ASCII, the lowercase letters are all the numbers between 97 ("a") and 122 ("z"). Thus the following code works by taking ASCII "a" (97) and creating an array of length n starting at 97. These numbers are then converted using char to strings. As an added bonus, the version below ensures that the array can only go to 122 (ASCII for "z").
function output = generate(n)
output = char(97:min(96 + n, 122));
end
Note: For the upper limit we use 96 + n because if n were 1, then we want 97:97 rather than 97:98 as the second would return "ab". This could be written as 97:(97 + n - 1) but the way I've written it, I've simply pulled the "-1" into the constant.
You could also make this a simple anonymous function.
generate = #(n)char(97:min(96 + n, 122));
generate(3)
'abc'
To write the most portable and robust code, I would probably not want those hard-coded ASCII codes, so I would use something like the following:
output = 'a':char(min('a' + n - 1, 'z'));
...or, you can just generate the entire alphabet and take the part you want:
function str = generate(n)
alphabet = 'a':'z';
str = alphabet(1:n);
end
Note that this will fail with an index out of bounds error for n > 26, so you might want to check for that.
You can use the char built-in function which converts an interger value (or array) into a character array.
EDIT
Bug fixed (ref. Suever's comment)
function [str]=generate(n)
a=97;
% str=char(a:a+n)
str=char(a:a+n-1)
Hope this helps.
Qapla'

Replacing letters with numbers in a MATLAB array

I am trying to write a function to mark the results of a test. The answers given by participants are stored in a nx1 cell array. However, theses are stored as letters. I am looking for a way to convert (a-d) these into numbers (1-4) ie. a=1, b=2 so these can be compared the answers using logical operations.
What I have so far is:
[num,txt,raw]=xlsread('FolkPhysicsMERGE.xlsx', 'X3:X142');
FolkPhysParAns=txt;
I seem to be able to find how to convert from numbers into letters but not the other way around. I feel like there should be a relatively easy way to do this, any ideas?
If you have a cell array of letters:
>> data = {'a','b','c','A'};
you only need to:
Convert to lower-case with lower, to treat both cases equally;
Convert to a character array with cell2mat;
Subtract (the ASCII code of) 'a' and add 1.
Code:
>> result = cell2mat(lower(data))-'a'+1
result =
1 2 3 1
More generally, if the possible answers are not consecutive letters, or even not single letters, use ismember:
>> possibleValues = {'s', 'm', 'l', 'xl', 'xxl'};
>> data = {'s', 'm', 'xl', 'l', 'm', 'l', 'aaa'};
>> [~, result] = ismember(data, possibleValues)
result =
1 2 4 3 2 3 0
Thought I might as well write an answer...
you can use strrep to replace 'a' with '1' (note it is the string format), and do it for all 26 letters and then use cell2mat to convert string '1' - '26' etc to numeric 1 -26.
Lets say:
t = {'a','b','c'} //%Array of Strings
t = strrep(t,'a','1') //%replace all 'a' with '1'
t = strrep(t,'b','2') //%replace all 'b' with '2'
t = strrep(t,'c','3') //%replace all 'c' with '3'
%// Or 1 line:
t = strrep(g,{'a','b','c'},{'1','2','3'})
>> t =
'1' '2' '3'
output = cellfun(#str2num,t,'un',0) //% keeps the cell structure
>> output =
[1] [2] [3]
alternatively:
output = str2num(cell2mat(t')) //% uses the matrix structure instead, NOTE the inversion ', it is crucial here.
>> output =
1
2
3

issue with sscanf

I'm having issue with getting what I want from sscanf;
e.g. getting varname, year, month, day from a filename;
filename = 'stn2014021412598cjgafe.cnv'
format = '%3s%4d%2d%2d%5d%*10s';
test = sscanf(filename,format);
and I get the result:
test =
115
116
110
2014
2
14
12598
but what I want is the
varname = 'stn'
year = 2014
month = 2
day = 14
and then record or not the 5 digits
num = 12598
and skip everything else.
However, I have no understanding on why I get those 3 numbers 115, 116, 110.
Those first three values are the character codes for 's', 't' and 'n'. The sscanf documentation explains why it comes out this way for your format specifier.
Mixing character and numeric conversion specifications causes the
resulting matrix to be numeric and any characters read to show up
as their numeric values, one character per MATLAB matrix element.
In other words:
>> char(test(1:3))'
ans =
stn
An easier solution is probably textscan since it stores the components in a cell array, allowing different types:
>> C = textscan(filename,format)
C =
{1x1 cell} [2014] [2] [14] [12598]
>> C{1}
ans =
'stn'

How does sprintf('%d',A) - '0' work?

I was looking for a way to separate the digits of an array in Matlab i.e.
if A = 1024 then I would like it to be A = [1, 0, 2, 4].
I searched on the net and found this code (also posted on the title):
sprintf('%d',A) - '0'
which converted [1024] -> [1, 0, 2, 4].
It did solve my problem but I did not understand it, especially the - '0' part.
can someone please explain how this works?
Also if I write sprintf('%d',A) + '0' (for A = [1024]) in MATLAB command window then it showed the following:
97 96 98 100
this puzzled me even more can anyone explain this?
It takes advantage of the automatic casting from a char array to a double array when the - operator is used. Remember that each character has an ascii value so if you type
double('0') in the command line and you'll see you get 48 as an answer. While double('1024') gives you
ans =
49 48 50 52
sprintf('%d', A) just convert the integer to a string (i.e. a char array). The minus casts both sides to double so you end up with
double('1024') - double('0')
which is
[49, 48, 50, 52] - [48]
which ends up as [1,0,2,4]
From here it should be clear why adding '0' resulted in [97, 96, 98, 100]
The command sprintf('%d',A) converts integer A=1024 into a string representation of the number, '1024'.
In addition, a string in matlab is really a character array, so if A = '1024' then A(1) = '1'.
The rest of the explanation follows from the answer #Dan posted. When numeric operations (+ - * / mod ^ ...) are applied to character arrays they are converted to the equivalent numeric representation according to the ASCII code, retaining the array format as type double.