Strange behaviour in size(strfind(n,',')) for n = 44

Strange behaviour in size(strfind(n,',')) for n = 44 - matlab

For some reason in
size(strfind(n,','))
the number 44 is special and produces a comma found result:
value={55}
numCommas = size(strfind(value{1},','),2)
ans= 0 ...(GOOD)
value={44}
numCommas = size(strfind(value{1},','),2)
ans= 1 ...(BAD) - Why is it doing this?
value={'44,44,44'}
numCommas = size(strfind(value{1},','),2)
ans= 2 ...(GREAT)
I need to find the number of commas in a cell element, where the element can either be an integer or a string.

To elaborate on my comment. The ASCII code for a comma, (,), is 44. Effectively what you are doing in your code is
size(strfind(44,','),2)
or
size(strfind(char(44),','),2)
where 44 is not a string but is interpreted as a numeric value which is then converted to a character and results in a comma, (,) which we can see when we use char
>> char(44)
ans =
,
You can fix your code by changing
value={44}
to
value={'44'}
so then you will be performing strfind on a string instead of a numeric value.
>> size(strfind('44', ','), 2)
ans =
0
which provides the correct answer.
Alternatively you could use num2str
>> size(strfind(num2str(value{1}), ','), 2)
ans =
0

You can avoid this by simply doing value{1} = '44'. Or if that's not an alternative, use num2str like this:
value={44};
numCommas = size(strfind(num2str(value{1}),','),2)
numCommas =
0
This will also work for string inputs:
value={'44,44,44'};
numCommas = size(strfind(num2str(value{1}),','),2)
numCommas =
2
Why do you get "wrong" results?`
It's because 44 is the ASCII code for comma ,.
You can check this quite simply by casting the value to char.
char(44)
ans =
,
You are checking for commas in a string. As the input to strfind is an integer, it automatically cast it to char. In the last example, your are inserting a "real" string, thus it finds the two commas in there.

Try this one:
value={'44'}
numCommas = size(strfind(value{1},','),2)
instead of:
value={44}
numCommas = size(strfind(value{1},','),2)
It should work, since it's a char now.

Related

How to calculate the number of appearance of each letter(A-Z ,a-z as well as '.' , ',' and ' ' ) in a text file in matlab?

How can I go about doing this? So far I've opened the file like this
fileID = fopen('hamlet.txt'.'r');
[A,count] = fscanf(fileID, '%s');
fclose(fileID);

Getting spaces from the file
First, if you want to capture spaces, you'll need to change your format specifier. %s reads only non-whitespace characters.
>> fileID = fopen('space.txt','r');
>> A = fscanf(fileID, '%s');
>> fclose(fileID);
>> A
A = Thistexthasspacesinit.
Instead, we can use %c:
>> fileID = fopen('space.txt','r');
>> A = fscanf(fileID, '%c');
>> fclose(fileID);
>> A
A = This text has spaces in it.
Mapping between characters and values (array indices)
We could create a character array that contains all of the target characters to look for:
search_chars = ['A':'Z', 'a':'z', ',', '.', ' '];
That would work, but to map the character to a position in the array you'd have to do something like:
>> char_pos = find(search_chars == 'q')
char_pos = 43
You could also use containters.Map, but that seems like overkill.
Instead, let's use the ASCII value of each character. For convenience, we'll use only values 1:126 (0 is NUL, and 127 is DEL. We should never encounter either of those.) Converting from characters to their ASCII code is easy:
>> c = 'q'
c = s
>> a = uint8(c) % MATLAB actually does this using double(). Seems wasteful to me.
a = 115
>> c2 = char(a)
c2 = s
Note that by doing this, you're counting characters that are not in your desired list like ! and *. If that's a problem, then use search_chars and figure out how you want to map from characters to indices.
Looping solution
The most intuitive way to count each character is a loop. For each character in A, find its ASCII code and increment the counter array at that index.
char_count = zeros(1, 126);
for current_char = A
c = uint8(current_char);
char_count(c) = char_count(c) + 1;
end
Now you've got an array of counts for each character with ASCII codes from 1 to 126. To find out how many instances of 's' there are, we can just use its ASCII code as an index:
>> char_count(115)
ans = 4
We can even use the character itself as an index:
>> char_count('s')
ans = 4
Vectorized solution
As you can see with that last example, MATLAB's weak typing makes characters and their ASCII codes pretty much equivalent. In fact:
>> 's' == 115
ans = 1
That means that we can use implicit broadcasting and == to create a logical 2D array where L(c,a) == 1 if character c in our string A has an ASCII code of a. Then we can get the count for each ASCII code by summing along the columns.
L = (A.' == [1:126]);
char_count = sum(L, 1);
A one-liner
Just for fun, I'll show one more way to do this: histcounts. This is meant to put values into bins, but as we said before, characters can be treated like values.
char_count = histcounts(uint8(A), 1:126);
There are dozens of other possibilities, for instance you could use the search_chars array and ismember(), but this should be a good starting point.

With [A,count] = fscanf(fileID, '%s'); you'll only count all string letters, doesn't matter which one. You can use regexp here which search for each letter you specify and will put it in a cell array. It consists of fields which contains the indices of your occuring letters. In the end you only sum the number of indices and you have the count for each letter:
fileID = fopen('hamlet.txt'.'r');
A = fscanf(fileID, '%s');
indexCellArray = regexp(A,{'A','B','C','D',... %I'm too lazy to add the other letters now^^
'a','b','c','d',...
','.' '};
letterCount = cellfun(#(x) numel(x),indexCellArray);
fclose(fileID);
Maybe you put the cell array in a struct where you can give fieldnames for the letters, otherwise you might loose track which count belongs to which number.
Maybe there's much easier solution, cause this one is kind of exhausting to put all the letters in the regexp but it works.

Matlab strsplit at non-keyboard characters

In this instance I have a cell array of lat/long coordinates that I am reading from file as strings with format:
x = {'27° 57'' 21.4" N', '7° 34'' 11.1" W'}
where the ° is actually a degree symbol (U+00B0).
I want to use strsplit() or some equivalent to get out the numerical components, but I don't know how to specify the degree symbol as a delimiter.
I'm hesitant to simply split at the ',' and index out the number, since as demonstrated above I don't know how many digits to expect.
I found elsewhere on the site the following suggestion:
x = regexp(split{1}, '\D+', 'split')
however this also separates the integer and decimal components of the decimal numbers.
Is there a strsplit() option, or some other equivalent I could use?

You can copy-paste the degree symbol from your data file to your M-file script. MATLAB fully supports Unicode characters in its strings. For example:
strsplit(str, {'°','"',''''})
to split the string at the three symbols.
Alternatively, you could use sscanf (or fscanf if reading directly from file) to parse the string:
str = '27° 57'' 21.4"';
dot( sscanf(str, '%f° %f'' %f"'), [1, 1/60, 1/3600] );

The easiest solution is to copy-paste any Unicode character into your MATLAB editor as Cris suggested by Cris.
You can get these readily from the internet, or from the Windows Character Map
You can also use unicode2native and native2unicode if you want to use byte values for your native Unicode settings.
% Get the Unicode value for '°'
>> unicode2native('°')
ans = uint8(176)
% Check the symbol for a given Unicode value
>> native2unicode(176)
ans = '°'
So
>> strsplit( 'Water freezes at 0°C', native2unicode(176) )
ans =
1×2 cell array
{'Water freezes at 0'} {'C'}
You can get the Unicode value by using hex2dec on the Hex value which you already knew, if you want to avoid unicode2native:
hex2dec('00B0') % = 176

You can also improve your regular expression in order to catch the decimal part:
x = {'27° 57'' 21.4" N', '7° 34'' 11.1" W'}
x = regexp(x, '\d+\.?\d?', 'match')
x{:}
Result:
ans =
{
[1,1] = 27
[1,2] = 57
[1,3] = 21.4
}
ans =
{
[1,1] = 7
[1,2] = 34
[1,3] = 11.1
}
Where \d+\.?\d? means:
\d+ : one or more digit
%followed by
\.? : zero or one point
%followed by
\d? : zero or one digit

Consider using split and double with string:
>> x = {'27° 57'' 21.4" N'; '7° 34'' 11.1" W'};
>> x = string(x)
x =
2×1 string array
"27° 57' 21.4" N"
"7° 34' 11.1" W"
>> x = split(x,["° " "' " '" '])
x =
2×4 string array
"27" "57" "21.4" "N"
"7" "34" "11.1" "W"
>> double(x(:,1:3))
ans =
27.0000 57.0000 21.4000
7.0000 34.0000 11.1000

Extracting certain part of a string using strtok

I'm trying to extract a part of the string by using strtok(), but I am unable to get complete output.
For input:
string = '3_5_2_spd_20kmin_corrected_1_20190326.txt';
Output:
>> strtok(string)
ans =
'3_5_2_spd_20kmin_corrected_1_20190326.txt'
>> strtok(string,'.txt')
ans =
'3_5_2_spd_20kmin_correc'
>> strtok(string,'0326')
ans =
'_5_'
>> strtok(string,'2019')
ans =
'3_5_'
>> strtok(string,'.txt')
ans =
'3_5_2_spd_20kmin_correc'
I expect the output 3_5_2_spd_20kmin_corrected_1_20190326, but the actual output was 3_5_2_spd_20kmin_correc. Why is that and how can I get the correct output?

strtok treats every character inside the second input argument as a separate delimiter.
For example, when calling:
strtok("3_5_2_spd_20kmin_corrected_1_20190326.txt",'.txt')
Matlab sees as separate delimiters the .,t,x and therefore splits your input at the first t it encounters and gives back the result 3_5_2_spd_20kmin_correc.
In your other example using '2019', again '2019' is not a single delimiter but delimiterS, in the sense that the actual delimiters used are all '2','0','1','9'. Therefore the first delimiter encountered in the string (left to right) is '2', right after '3_5_'. That's why it returns '3_5_'.
To achieve your expected output, I think you would be better off using
strsplit
instead:
result = strsplit(string,".txt");
result{1}

extractBefore does what you're looking to do:
>> string = '3_5_2_spd_20kmin_corrected_1_20190326.txt';
>> extractBefore(string,'.txt')
ans =
'3_5_2_spd_20kmin_corrected_1_20190326'

If your strings are file names/paths, and your goal is to extract the file name without extension, the best option would be to use fileparts, like so:
>> str = '3_5_2_spd_20kmin_corrected_1_20190326.txt';
>> [~, name] = fileparts(str)
name =
'3_5_2_spd_20kmin_corrected_1_20190326'

Read specific character from cell-array of string

I have an cell-array of dimensions 1x6 like this:
A = {'25_2.mat','25_3.mat','25_4.mat','25_5.mat','25_6.mat','25_7.mat'};
I want to read for example from the A{1} , the number after the '_' i.e 2 for my example

Using cellfun, strfind and str2double
out = cellfun(#(x) str2double(x(strfind(x,'_')+1:strfind(x,'.')-1)),A)
How does it work?
This code simply finds the index of character one number after the occurrence of '_'. Lets call it as start_index. Then finds the character one number lesser than the index of occurrence of '.' character. Lets call it as end_index. Then retrieves all the characters between start_index and end_index. Finally converts those characters to numbers using str2double.
Sample Input:
A = {'2545_23.mat','2_3.mat','250_4.mat','25_51.mat','25_6.mat','25_7.mat'};
Output:
>> out
out =
23 3 4 51 6 7

You can access the contents of the cell by using the curly braces{...}. Once you have access to the contents, you can use indexes to access the elements of the string as you would do with a normal array. For example:
test = {'25_2.mat', '25_3.mat', '25_4.mat', '25_5.mat', '25_6.mat', '25_7.mat'}
character = test{1}(4);
If your string length is variable, you can use strfind to find the index of the character you want.

Assuming the numbers are non-negative integers after the _ sign: use a regular expression with lookbehind, and then convert from string to number:
numbers = cellfun(#(x) str2num(x{1}), regexp(A, '(?<=\_)\d+', 'match'));

xor between two numbers (after hex to binary conversion)

i donot know why there is error in this coding:
hex_str1 = '5'
bin_str1 = dec2bin(hex2dec(hex_str1))
hex_str2 = '4'
bin_str2 = dec2bin(hex2dec(hex_str2))
c=xor(bin_str1,bin_str2)
the value of c is not correct when i transform the hex to binary by using the xor function.but when i used the array the value of c is correct.the coding is
e=[1 1 1 0];
f=[1 0 1 0];
g=xor(e,f)
what are the mistake in my first coding to xor of hec to binary value??anyone can help me find the solution...

Your mistake is applying xor on two strings instead of actual numerical arrays.
For the xor command, logical "0"s are represented by actual zero elements. Any non-zero elements are interpreted as logical "1"s.
When you apply xor on two strings, the numerical value of each character (element) is its ASCII value. From xor's point of view, the zeroes in your string are not really zeroes, but simply non-zero values (being equal to the ASCII value of the character '0'), which are interpreted as logical "1"s. The bottom line is that in your example you're xor-ing 111b and 111b, and so the result is 0.
The solution is to convert your strings to logical arrays:
num1 = (bin_str1 == '1');
num2 = (bin_str2 == '1');
c = xor(num1, num2);
To convert the result back into a string (of a binary number), use this:
bin_str3 = sprintf('%d', c);
... and to a hexadecimal string, add this:
hex_str3 = dec2hex(bin2dec(bin_str3));

it is really helpful, and give me the correct conversion while forming HMAC value in matlab...
but in matlab you can not convert string of length more than 52 character using bin2dec() function and similarly hex2dec() can not take hexadecimal character string more than 13 length.