I am trying to write a function to mark the results of a test. The answers given by participants are stored in a nx1 cell array. However, theses are stored as letters. I am looking for a way to convert (a-d) these into numbers (1-4) ie. a=1, b=2 so these can be compared the answers using logical operations.
What I have so far is:
[num,txt,raw]=xlsread('FolkPhysicsMERGE.xlsx', 'X3:X142');
FolkPhysParAns=txt;
I seem to be able to find how to convert from numbers into letters but not the other way around. I feel like there should be a relatively easy way to do this, any ideas?
If you have a cell array of letters:
>> data = {'a','b','c','A'};
you only need to:
Convert to lower-case with lower, to treat both cases equally;
Convert to a character array with cell2mat;
Subtract (the ASCII code of) 'a' and add 1.
Code:
>> result = cell2mat(lower(data))-'a'+1
result =
1 2 3 1
More generally, if the possible answers are not consecutive letters, or even not single letters, use ismember:
>> possibleValues = {'s', 'm', 'l', 'xl', 'xxl'};
>> data = {'s', 'm', 'xl', 'l', 'm', 'l', 'aaa'};
>> [~, result] = ismember(data, possibleValues)
result =
1 2 4 3 2 3 0
Thought I might as well write an answer...
you can use strrep to replace 'a' with '1' (note it is the string format), and do it for all 26 letters and then use cell2mat to convert string '1' - '26' etc to numeric 1 -26.
Lets say:
t = {'a','b','c'} //%Array of Strings
t = strrep(t,'a','1') //%replace all 'a' with '1'
t = strrep(t,'b','2') //%replace all 'b' with '2'
t = strrep(t,'c','3') //%replace all 'c' with '3'
%// Or 1 line:
t = strrep(g,{'a','b','c'},{'1','2','3'})
>> t =
'1' '2' '3'
output = cellfun(#str2num,t,'un',0) //% keeps the cell structure
>> output =
[1] [2] [3]
alternatively:
output = str2num(cell2mat(t')) //% uses the matrix structure instead, NOTE the inversion ', it is crucial here.
>> output =
1
2
3
Related
How can I connect these two parts?
In Excel if you say 'state'&2 you will get a combined phrase state2.
I want to join 'state' and 'i' where i is a number between e.g. 1,2,3...
Then I can end up with state1 or state5 for example depending on what i is equal to.
How can I do this?
You can
Use num2str to convert 2 to '2', and then concatenation to build your char array
Use sprintf to create a char array with a specified placeholder format
Use strings.
Importantly here I've made a distinction between strings ("double quotes") and character arrays ('single quotes') - read here for more details about their differences.
Corresponding code would look like
% 1. Use num2str and concatenation
str = ['state', num2str(2)]; % -> 'state2' (char)
% 2. Use sprintf
str = sprintf( 'state%d', 2 ); % -> 'state2' (char)
% 3. Use strings
str = "state" + 2 % -> "state2" (string)
I would opt for number 2, since I think it's cleaner than 1 and more flexible, and I have used MATLAB since before strings existed so I'm predisposed to dislike them!
How can I go about doing this? So far I've opened the file like this
fileID = fopen('hamlet.txt'.'r');
[A,count] = fscanf(fileID, '%s');
fclose(fileID);
Getting spaces from the file
First, if you want to capture spaces, you'll need to change your format specifier. %s reads only non-whitespace characters.
>> fileID = fopen('space.txt','r');
>> A = fscanf(fileID, '%s');
>> fclose(fileID);
>> A
A = Thistexthasspacesinit.
Instead, we can use %c:
>> fileID = fopen('space.txt','r');
>> A = fscanf(fileID, '%c');
>> fclose(fileID);
>> A
A = This text has spaces in it.
Mapping between characters and values (array indices)
We could create a character array that contains all of the target characters to look for:
search_chars = ['A':'Z', 'a':'z', ',', '.', ' '];
That would work, but to map the character to a position in the array you'd have to do something like:
>> char_pos = find(search_chars == 'q')
char_pos = 43
You could also use containters.Map, but that seems like overkill.
Instead, let's use the ASCII value of each character. For convenience, we'll use only values 1:126 (0 is NUL, and 127 is DEL. We should never encounter either of those.) Converting from characters to their ASCII code is easy:
>> c = 'q'
c = s
>> a = uint8(c) % MATLAB actually does this using double(). Seems wasteful to me.
a = 115
>> c2 = char(a)
c2 = s
Note that by doing this, you're counting characters that are not in your desired list like ! and *. If that's a problem, then use search_chars and figure out how you want to map from characters to indices.
Looping solution
The most intuitive way to count each character is a loop. For each character in A, find its ASCII code and increment the counter array at that index.
char_count = zeros(1, 126);
for current_char = A
c = uint8(current_char);
char_count(c) = char_count(c) + 1;
end
Now you've got an array of counts for each character with ASCII codes from 1 to 126. To find out how many instances of 's' there are, we can just use its ASCII code as an index:
>> char_count(115)
ans = 4
We can even use the character itself as an index:
>> char_count('s')
ans = 4
Vectorized solution
As you can see with that last example, MATLAB's weak typing makes characters and their ASCII codes pretty much equivalent. In fact:
>> 's' == 115
ans = 1
That means that we can use implicit broadcasting and == to create a logical 2D array where L(c,a) == 1 if character c in our string A has an ASCII code of a. Then we can get the count for each ASCII code by summing along the columns.
L = (A.' == [1:126]);
char_count = sum(L, 1);
A one-liner
Just for fun, I'll show one more way to do this: histcounts. This is meant to put values into bins, but as we said before, characters can be treated like values.
char_count = histcounts(uint8(A), 1:126);
There are dozens of other possibilities, for instance you could use the search_chars array and ismember(), but this should be a good starting point.
With [A,count] = fscanf(fileID, '%s'); you'll only count all string letters, doesn't matter which one. You can use regexp here which search for each letter you specify and will put it in a cell array. It consists of fields which contains the indices of your occuring letters. In the end you only sum the number of indices and you have the count for each letter:
fileID = fopen('hamlet.txt'.'r');
A = fscanf(fileID, '%s');
indexCellArray = regexp(A,{'A','B','C','D',... %I'm too lazy to add the other letters now^^
'a','b','c','d',...
','.' '};
letterCount = cellfun(#(x) numel(x),indexCellArray);
fclose(fileID);
Maybe you put the cell array in a struct where you can give fieldnames for the letters, otherwise you might loose track which count belongs to which number.
Maybe there's much easier solution, cause this one is kind of exhausting to put all the letters in the regexp but it works.
In this instance I have a cell array of lat/long coordinates that I am reading from file as strings with format:
x = {'27° 57'' 21.4" N', '7° 34'' 11.1" W'}
where the ° is actually a degree symbol (U+00B0).
I want to use strsplit() or some equivalent to get out the numerical components, but I don't know how to specify the degree symbol as a delimiter.
I'm hesitant to simply split at the ',' and index out the number, since as demonstrated above I don't know how many digits to expect.
I found elsewhere on the site the following suggestion:
x = regexp(split{1}, '\D+', 'split')
however this also separates the integer and decimal components of the decimal numbers.
Is there a strsplit() option, or some other equivalent I could use?
You can copy-paste the degree symbol from your data file to your M-file script. MATLAB fully supports Unicode characters in its strings. For example:
strsplit(str, {'°','"',''''})
to split the string at the three symbols.
Alternatively, you could use sscanf (or fscanf if reading directly from file) to parse the string:
str = '27° 57'' 21.4"';
dot( sscanf(str, '%f° %f'' %f"'), [1, 1/60, 1/3600] );
The easiest solution is to copy-paste any Unicode character into your MATLAB editor as Cris suggested by Cris.
You can get these readily from the internet, or from the Windows Character Map
You can also use unicode2native and native2unicode if you want to use byte values for your native Unicode settings.
% Get the Unicode value for '°'
>> unicode2native('°')
ans = uint8(176)
% Check the symbol for a given Unicode value
>> native2unicode(176)
ans = '°'
So
>> strsplit( 'Water freezes at 0°C', native2unicode(176) )
ans =
1×2 cell array
{'Water freezes at 0'} {'C'}
You can get the Unicode value by using hex2dec on the Hex value which you already knew, if you want to avoid unicode2native:
hex2dec('00B0') % = 176
You can also improve your regular expression in order to catch the decimal part:
x = {'27° 57'' 21.4" N', '7° 34'' 11.1" W'}
x = regexp(x, '\d+\.?\d?', 'match')
x{:}
Result:
ans =
{
[1,1] = 27
[1,2] = 57
[1,3] = 21.4
}
ans =
{
[1,1] = 7
[1,2] = 34
[1,3] = 11.1
}
Where \d+\.?\d? means:
\d+ : one or more digit
%followed by
\.? : zero or one point
%followed by
\d? : zero or one digit
Consider using split and double with string:
>> x = {'27° 57'' 21.4" N'; '7° 34'' 11.1" W'};
>> x = string(x)
x =
2×1 string array
"27° 57' 21.4" N"
"7° 34' 11.1" W"
>> x = split(x,["° " "' " '" '])
x =
2×4 string array
"27" "57" "21.4" "N"
"7" "34" "11.1" "W"
>> double(x(:,1:3))
ans =
27.0000 57.0000 21.4000
7.0000 34.0000 11.1000
I'm having issue with getting what I want from sscanf;
e.g. getting varname, year, month, day from a filename;
filename = 'stn2014021412598cjgafe.cnv'
format = '%3s%4d%2d%2d%5d%*10s';
test = sscanf(filename,format);
and I get the result:
test =
115
116
110
2014
2
14
12598
but what I want is the
varname = 'stn'
year = 2014
month = 2
day = 14
and then record or not the 5 digits
num = 12598
and skip everything else.
However, I have no understanding on why I get those 3 numbers 115, 116, 110.
Those first three values are the character codes for 's', 't' and 'n'. The sscanf documentation explains why it comes out this way for your format specifier.
Mixing character and numeric conversion specifications causes the
resulting matrix to be numeric and any characters read to show up
as their numeric values, one character per MATLAB matrix element.
In other words:
>> char(test(1:3))'
ans =
stn
An easier solution is probably textscan since it stores the components in a cell array, allowing different types:
>> C = textscan(filename,format)
C =
{1x1 cell} [2014] [2] [14] [12598]
>> C{1}
ans =
'stn'
I have a vector, for example, V = [ 1, 2, 3, 4 ]. Is there a way to change this to the letters, [ a,b,c,d ]?
Using 'a' directly instead of ascii codes might be slightly more readable
charString = char(V-1+'a');
Uppercase is then obtained with
charString = char(V-1+'A');
There are two simple ways to do this. One way is a simple index.
C = 'abcdefghijklmnopqrstuvwxyz';
V = [8 5 12 12 15 23 15 18 12 4];
C(V)
ans =
helloworld
Of course, char will do it too. The char answer is better because it does not require you to store a list of letters to index into.
char('a' + V - 1)
ans =
helloworld
This is best since when you add 'a' to something, it converts 'a' to its ascii representation on the fly. +'a' will yield 97, the ascii form of 'a'.
A nice thing is it also works for 'A', so if you wanted caps, just add 'A' instead.
char('A' + V - 1)
ans =
HELLOWORLD
You can find more information about working with strings in MATLAB from these commands:
help strings
doc strings
Something like
C = char(V+ones(size(V)).*(97-1))
should work (97 is the ASCII code for 'a', and you want 1 to map to 'a' it looks like).
Using the CHAR function, which turns a number (i.e. ASCII code) into a character:
charString = char(V+96);
EDIT: To go backwards (mapping 'a' to 1, 'b' to 2, etc.), use the DOUBLE function to recast the character back to its ASCII code number:
V = double(charString)-96;