Regexp equivalent for floating number in matlab - matlab

I'd like to know how regexp is used for floating numbers or is there any other function to do this.
For example, the following returns {'2', '5'} rather than {'2.5'}.
nums= regexp('2.5','\d+','match')

Regular expressions are a tool for low-level text parsing and they have no concept of numeric datatypes. If you will want to parse decimals, you need to consider what characters compose a decimal number and design a regexp to explicitly match all of those characters.
Your example only returns the '2' and '5' because your pattern only matches characters that are digits (\d). To handle the decimal numbers, you need to explicitly include the . in your pattern. The following will match any number of digits followed by 0 or 1 radix points and 0 or more numbers after the radix point.
regexp('2.5', '\d+\.?\d*', 'match')
This assumes that you'll always have a leading digit (i.e. not '.5')
Alternately, you may consider using something like textscan or sscanf instead to parse your string which is going to be more robust than a custom regex since they are aware of different numeric datatypes.
C = textscan('2.5', '%f');
C = sscanf('2.5', '%f');
If your string only contains this floating point number, you can just use str2double
val = str2double('2.5');

#Suever answer has already been accepted, anyway here is some more complete one that should accept all sorts of floating points syntaxes (including NaN and +/-Inf by default):
% Regular expression for capturing a double value
function [s] = cdouble(supportPositiveInfinity, supportNegativeInfinity,
supportNotANumber)
%[
if (nargin < 3), supportNotANumber = true; end
if (nargin < 2), supportNegativeInfinity = true; end
if (nargin < 1), supportPositiveInfinity = true; end
% A double
s = '[+\-]?(?:(?:\d+\.\d*)|(?:\.\d+)|(?:\d+))(?:[eE][+\-]?\d+)?'; %% This means a numeric double
% Extra for nan or [+/-]inf
extra = '';
if (supportNotANumber), extra = ['nan|' extra]; end
if (supportNegativeInfinity), extra = ['-inf|' extra]; end
if (supportPositiveInfinity), extra = ['inf|\+inf|' extra]; end
% Adding capture
if (~isempty(extra))
s = ['((?i)(?:', extra, s, '))']; % (?i) => Locally case-insensitive for captured group
else
s = ['(', s, ')'];
end
%]
end
Basically above pattern says:
Eventually start with '+' or '-' sign
Then followed by either
One or more digits followed by a dot and eventually zero to many digits
A dot followed by one or more digits
One or more digits only
Then followed by exponant pattern, that is:
'e' or 'E' (eventually followed with '+' or '-') and one or more digits
Pattern is later completed with support of Inf, +Inf, -Inf and NaN in case insensitive way. Finally everthing is enclosed between ( and ) for capturing purpose.
Here is some online test example: https://regex101.com/r/K87z6e/1
Or you can test in matlab:
>> regexp('Hello 1.235 how .2e-7 are INF you +.7 doing ?', cdouble(), 'match')
ans =
'1.235' '.2e-7' 'INF' '+.7'

Related

How to calculate the number of appearance of each letter(A-Z ,a-z as well as '.' , ',' and ' ' ) in a text file in matlab?

How can I go about doing this? So far I've opened the file like this
fileID = fopen('hamlet.txt'.'r');
[A,count] = fscanf(fileID, '%s');
fclose(fileID);
Getting spaces from the file
First, if you want to capture spaces, you'll need to change your format specifier. %s reads only non-whitespace characters.
>> fileID = fopen('space.txt','r');
>> A = fscanf(fileID, '%s');
>> fclose(fileID);
>> A
A = Thistexthasspacesinit.
Instead, we can use %c:
>> fileID = fopen('space.txt','r');
>> A = fscanf(fileID, '%c');
>> fclose(fileID);
>> A
A = This text has spaces in it.
Mapping between characters and values (array indices)
We could create a character array that contains all of the target characters to look for:
search_chars = ['A':'Z', 'a':'z', ',', '.', ' '];
That would work, but to map the character to a position in the array you'd have to do something like:
>> char_pos = find(search_chars == 'q')
char_pos = 43
You could also use containters.Map, but that seems like overkill.
Instead, let's use the ASCII value of each character. For convenience, we'll use only values 1:126 (0 is NUL, and 127 is DEL. We should never encounter either of those.) Converting from characters to their ASCII code is easy:
>> c = 'q'
c = s
>> a = uint8(c) % MATLAB actually does this using double(). Seems wasteful to me.
a = 115
>> c2 = char(a)
c2 = s
Note that by doing this, you're counting characters that are not in your desired list like ! and *. If that's a problem, then use search_chars and figure out how you want to map from characters to indices.
Looping solution
The most intuitive way to count each character is a loop. For each character in A, find its ASCII code and increment the counter array at that index.
char_count = zeros(1, 126);
for current_char = A
c = uint8(current_char);
char_count(c) = char_count(c) + 1;
end
Now you've got an array of counts for each character with ASCII codes from 1 to 126. To find out how many instances of 's' there are, we can just use its ASCII code as an index:
>> char_count(115)
ans = 4
We can even use the character itself as an index:
>> char_count('s')
ans = 4
Vectorized solution
As you can see with that last example, MATLAB's weak typing makes characters and their ASCII codes pretty much equivalent. In fact:
>> 's' == 115
ans = 1
That means that we can use implicit broadcasting and == to create a logical 2D array where L(c,a) == 1 if character c in our string A has an ASCII code of a. Then we can get the count for each ASCII code by summing along the columns.
L = (A.' == [1:126]);
char_count = sum(L, 1);
A one-liner
Just for fun, I'll show one more way to do this: histcounts. This is meant to put values into bins, but as we said before, characters can be treated like values.
char_count = histcounts(uint8(A), 1:126);
There are dozens of other possibilities, for instance you could use the search_chars array and ismember(), but this should be a good starting point.
With [A,count] = fscanf(fileID, '%s'); you'll only count all string letters, doesn't matter which one. You can use regexp here which search for each letter you specify and will put it in a cell array. It consists of fields which contains the indices of your occuring letters. In the end you only sum the number of indices and you have the count for each letter:
fileID = fopen('hamlet.txt'.'r');
A = fscanf(fileID, '%s');
indexCellArray = regexp(A,{'A','B','C','D',... %I'm too lazy to add the other letters now^^
'a','b','c','d',...
','.' '};
letterCount = cellfun(#(x) numel(x),indexCellArray);
fclose(fileID);
Maybe you put the cell array in a struct where you can give fieldnames for the letters, otherwise you might loose track which count belongs to which number.
Maybe there's much easier solution, cause this one is kind of exhausting to put all the letters in the regexp but it works.

Matlab: Function that returns a string with the first n characters of the alphabet

I'd like to have a function generate(n) that generates the first n lowercase characters of the alphabet appended in a string (therefore: 1<=n<=26)
For example:
generate(3) --> 'abc'
generate(5) --> 'abcde'
generate(9) --> 'abcdefghi'
I'm new to Matlab and I'd be happy if someone could show me an approach of how to write the function. For sure this will involve doing arithmetic with the ASCII-codes of the characters - but I've no idea how to do this and which types that Matlab provides to do this.
I would rely on ASCII codes for this. You can convert an integer to a character using char.
So for example if we want an "e", we could look up the ASCII code for "e" (101) and write:
char(101)
'e'
This also works for arrays:
char([101, 102])
'ef'
The nice thing in your case is that in ASCII, the lowercase letters are all the numbers between 97 ("a") and 122 ("z"). Thus the following code works by taking ASCII "a" (97) and creating an array of length n starting at 97. These numbers are then converted using char to strings. As an added bonus, the version below ensures that the array can only go to 122 (ASCII for "z").
function output = generate(n)
output = char(97:min(96 + n, 122));
end
Note: For the upper limit we use 96 + n because if n were 1, then we want 97:97 rather than 97:98 as the second would return "ab". This could be written as 97:(97 + n - 1) but the way I've written it, I've simply pulled the "-1" into the constant.
You could also make this a simple anonymous function.
generate = #(n)char(97:min(96 + n, 122));
generate(3)
'abc'
To write the most portable and robust code, I would probably not want those hard-coded ASCII codes, so I would use something like the following:
output = 'a':char(min('a' + n - 1, 'z'));
...or, you can just generate the entire alphabet and take the part you want:
function str = generate(n)
alphabet = 'a':'z';
str = alphabet(1:n);
end
Note that this will fail with an index out of bounds error for n > 26, so you might want to check for that.
You can use the char built-in function which converts an interger value (or array) into a character array.
EDIT
Bug fixed (ref. Suever's comment)
function [str]=generate(n)
a=97;
% str=char(a:a+n)
str=char(a:a+n-1)
Hope this helps.
Qapla'

Function to split string in matlab and return second number

I have a string and I need two characters to be returned.
I tried with strsplit but the delimiter must be a string and I don't have any delimiters in my string. Instead, I always want to get the second number in my string. The number is always 2 digits.
Example: 001a02.jpg I use the fileparts function to delete the extension of the image (jpg), so I get this string: 001a02
The expected return value is 02
Another example: 001A43a . Return values: 43
Another one: 002A12. Return values: 12
All the filenames are in a matrix 1002x1. Maybe I can use textscan but in the second example, it gives "43a" as a result.
(Just so this question doesn't remain unanswered, here's a possible approach: )
One way to go about this uses splitting with regular expressions (MATLAB's strsplit which you mentioned):
str = '001a02.jpg';
C = strsplit(str,'[a-zA-Z.]','DelimiterType','RegularExpression');
Results in:
C =
'001' '02' ''
In older versions of MATLAB, before strsplit was introduced, similar functionality was achieved using regexp(...,'split').
If you want to learn more about regular expressions (abbreviated as "regex" or "regexp"), there are many online resources (JGI..)
In your case, if you only need to take the 5th and 6th characters from the string you could use:
D = str(5:6);
... and if you want to convert those into numbers you could use:
E = str2double(str(5:6));
If your number is always at a certain position in the string, you can simply index this position.
In the examples you gave, the number is always the 5th and 6th characters in the string.
filename = '002A12';
num = str2num(filename(5:6));
Otherwise, if the formating is more complex, you may want to use a regular expression. There is a similar question matlab - extracting numbers from (odd) string. Modifying the code found there you can do the following
all_num = regexp(filename, '\d+', 'match'); %Find all numbers in the filename
num = str2num(all_num{2}) %Convert second number from str

Match a numeric string from a plain text fileusing regexpi

I have. Txt file and I am looking for specific string
Example
Row1
Row2
Pressure 95.8
There could be more white-space characters between string and numbers and in every cycle there is different value: 1. Cycle 95.8>>next cycle 1009.6543>>..., but its always in this format
I tried this
Fid = fileread ('search.txt')
Value = regexpi (fid, '(? <=Pressure\s*)\d*', 'match')
It saved just '95' instead of '95.8'
To be as complete as possible, some quite generic pattern to match a double (excluding for NaN and +/-Inf cases) is the following:
[+\-]?(?:(?:\d+\.\d*)|(?:\.\d+)|(?:\d+))(?:[eE][+\-]?\d+)?
Quickly reviewing the pattern it says:
Optionally the sign + or -
Followed by (either and in order):
Some digits (at least one), a dot, some digits (eventually zero)
A dot, some digits (at least one)
Some digits (at least one)
Optionally followed by:
The letter e or E
Optionally the sign + or -
Some digits (at least one)
The pattern to match many whitespaces (including none) is the following:
\s*
So to match the word Pressure followed by many spaces and then a double in the string str, you can use the following code:
function [pressures] = ParsePressures(str)
%[
% Some default string
if (nargin < 1)
str = cell(0,1);
str{end+1} = 'Row1';
str{end+1} = 'Row2';
str{end+1} = 'Pressure 95.8';
str{end+1} = 'Row1';
str{end+1} = 'Row2';
str{end+1} = 'Pressure 1009.6543';
str = sprintf('%s\n', str{1:end});
end
% Obtain 'tokens' for all macthes in 'str'
% NB1: 'tokens' is a cell vector whose length is the number of match
% NB2: For each match 'tokens{i}' is a cell vector containing captured groups
tokens = regexpi(str, sprintf('%s', 'Pressure', mspaces, cdouble), 'tokens');
% Transform maches in a array of double
pressures = arrayfun(#(x)str2double(x{1}), tokens);
%]
end
%% --- Regular expression helpers
function [s] = mspaces()
%[
s = '\s*'; %% This means 'as many spaces as you want'
%]
end
function [s] = cdouble()
%[
s = '([+\-]?(?:(?:\d+\.\d*)|(?:\.\d+)|(?:\d*))(?:[eE][+\-]?\d+)?)'; %% This means 'capture a double'
%]
end
It will return all pressure values in str as an array of double:
>> ParsePressures
ans =
95.8 1009.6543
NB: In the code i'm using mspaces and cdouble helper functions to build the complete pattern (i.e. sprintf('%s', 'Pressure', mspaces, cdouble), that is the word Pressure, followed by many spaces and followed by a double that I want to capture).

xor between two numbers (after hex to binary conversion)

i donot know why there is error in this coding:
hex_str1 = '5'
bin_str1 = dec2bin(hex2dec(hex_str1))
hex_str2 = '4'
bin_str2 = dec2bin(hex2dec(hex_str2))
c=xor(bin_str1,bin_str2)
the value of c is not correct when i transform the hex to binary by using the xor function.but when i used the array the value of c is correct.the coding is
e=[1 1 1 0];
f=[1 0 1 0];
g=xor(e,f)
what are the mistake in my first coding to xor of hec to binary value??anyone can help me find the solution...
Your mistake is applying xor on two strings instead of actual numerical arrays.
For the xor command, logical "0"s are represented by actual zero elements. Any non-zero elements are interpreted as logical "1"s.
When you apply xor on two strings, the numerical value of each character (element) is its ASCII value. From xor's point of view, the zeroes in your string are not really zeroes, but simply non-zero values (being equal to the ASCII value of the character '0'), which are interpreted as logical "1"s. The bottom line is that in your example you're xor-ing 111b and 111b, and so the result is 0.
The solution is to convert your strings to logical arrays:
num1 = (bin_str1 == '1');
num2 = (bin_str2 == '1');
c = xor(num1, num2);
To convert the result back into a string (of a binary number), use this:
bin_str3 = sprintf('%d', c);
... and to a hexadecimal string, add this:
hex_str3 = dec2hex(bin2dec(bin_str3));
it is really helpful, and give me the correct conversion while forming HMAC value in matlab...
but in matlab you can not convert string of length more than 52 character using bin2dec() function and similarly hex2dec() can not take hexadecimal character string more than 13 length.