How can I write regular expression that recognize any expression from this form: "\xdd" while dd represents hexadecimal number out of the range 00-7F ?
Regular expressions do not express numerical ranges, but sequences of characters in a character set. You have to express those ranges one character at a time.
So the hex digits are [0-9A-F] which describes the set of characters for one digit using the two ranges [0-9] and [A-F] (you'd also have to decide if lower case letters are permitted). For two digits you'd have to notice that the first digit is of a shorter range using only [0-7]. The combined result would be:
[0-7][0-9A-Fa-f]
Putting the other symbols in place we could get:
\\x[0-7][0-9A-Fa-f]
(Assuming \ is a meta-character that needs escaping).
Related
I used dlmwrite to output some data in the following form:
-1.7693255974E+00,-9.7742420654E-04, 2.1528647648E-04,-1.4866241234E+00
What I really want is the following format:
-.1769325597E+00, -.9774242065E-04, .2152864764E-04, -.1486624123E+00
A space is required before each number, followed by a sign, if the number is negative, and the number format is comma delimited, in exponential form to 10 significant digits.
Just in case Matlab is not able to write to this format (-.1769325597E+00), what is it called specifically so that I can research other means of solving my problem?
Although this feels morally wrong, one can use regular expressions to move the decimal point. This is what the function
myFormat = #(x) regexprep(sprintf('%.9e', 10*x), '(\d)\.', '\.$1');
does. The input value is multiplied by 10 prior to formatting, to account for the point being moved. Example: myFormat(-pi^7) returns -.3020293228e+04.
The above works for individual numbers. The following version is also able to format arrays, providing comma separators. The second regexprep removes the trailing comma.
myArrayFormat = #(x) regexprep(regexprep(sprintf('%.9e, ', 10*x), '(\d)\.', '\.$1'), ', $', '');
Example: myArrayFormat(1000*rand(1,5)-500) returned
-.2239749230e+03, .1797026769e+03, .1550980040e+03, -.3373882648e+03, -.3810023184e+03
For individual numbers, myArrayFormat works identically to myFormat.
I want to read a matrix from file using the following syntax in MATLAB . This matrix is of double numbers .
readmtx(fname,nrows,ncols,precision)
Here all the inputs are quite familiar to me . But I want to know about precision . The precision of int is 'int16'. What is the precision of double number?
In this case, the documentation states:
Both binary and formatted data files can be read. If the file is binary, the precision argument is a format string recognized by fread. Repetition modifiers such as '40*char' are not supported. If the file is formatted, precision is a fscanf and sscanf-style format string of the form '%nX', where n is the number of characters within which the formatted data is found, and X is the conversion character such as 'g' or 'd'. Fortran-style double-precision output such as '0.0D00' can be read using a precision string such as '%nD', where n is the number of characters per element. This is an extension to the C-style format strings accepted by sscanf. Users unfamiliar with C should note that '%d' is preferred over '%i' for formatted integers. MATLAB syntax follows C in interpreting '%i' integers with leading zeros as octal. Formatted files with line endings need to provide the number of trailing bytes per row, which can be 1 for platforms with carriage returns or linefeed (Macintosh, UNIX®), or 2 for platforms with carriage returns and linefeeds (DOS).
In addition it is helpful to look at the table summary in the fread documentation:
What are the valid formats are for numbers in MATLAB? The following seem to be valid:
x=0;
x=0.;
x=0.0;
x=0e0;
x=0E0;
x=0000.00; % Trailing and leading zeros seem to be irrelevant
Are there other valid general number specifications? I can't find this in the documentation.
I believe this is the regex of floating-point number formats, valid in MATLAB:
^[-+]*([0-9]+|[0-9]*\.[0-9]+|[0-9]+\.[0-9]*)([eEdD][+-]?[0-9]+)?$
Compiled from here, and slightly modified for MATLAB:
added 'd' exponent character (as is common in FORTRAN, MATLAB's ancestor)
added uppercase exponent characters
added extra case in the required order before and after the decimal symbol
I'm pretty sure the locale can mess this up, e.g., the decimal separator . might be set to , as is common here in Europe. Oh well.
The regex in words:
string start, followed by
zero or more consecutive sign symbols, followed by
non-zero length string of consecutive integers, OR
possibly zero-length string of consecutive integers, followed by a dot, followed by non-zero length string of consecutive integers, OR
non-zero length string of consecutive integers, followed by a dot, followed by a possibly zero-length string of consecutive integers
optionally followed by the exponent part:
one of e, E, d or D.
zero or one sign symbols, followed by
non-zero length string of consecutive integers
followed by string terminator
Note that this is for non-complex floating point values. For complex values, you'd have to
use the regex once for the real, once for the imaginary part
append [ij]{1} to the imaginary part (only lower case)
take care of spacing (\s*) and a [+-]{1} in between the two parts
take care of the fact that the imaginary part may appear alone, but the real part may not appear with a trailing [+-]{1}, but no imaginary part.
Is there a nice and clean way to find strings of capital letters of size 2-4 in length within a larger string in matlab. For example, lets say I have a string...
stringy = 'I imagine I could FLY';
Is there a nice way to just extract the FLY portion of the string? Currently I'm using the upper() function to identify all the characters in the string that are upper case like this...
for count = 1:length(stringy)
if upper(stringy(count))==stringy(count)
isupper(count)=1;
else
isupper(count)=0;
end
end
And then, I'm just going through the binary vector and identifying when
there there are 2-4 1's in the row.
This method is working... but I'm wondering if there is a cleaner way
to be doing this... thanks!!!
You can use regular expressions for this. The regular expression [A-Z]{2,4} will search for 2-4 capital letters in a string.
The corresponding matlab function is called regexp.
regexp(string,pattern) returns subindexes into string of all the places it matches pattern.
For your pattern I have two suggestions:
\<[A-Z]{2,4}\>. This searches for whole words that consist of 2-4 capital letters (so it doesn't grab TOUCH below):
stringy = 'I imagine I could FLY and TOUCH THE SKY';
regexp(stringy,'\<[A-Z]{2,4}\>') % returns 19, 33, 37 ('FLY','THE','SKY')
(Edit: Matlab uses \< and \> for word boundaries not the standard \b).
If you have strings where case can be mixed within a word and you want to extract those, try (?<![A-Z])[A-Z]{2,4}(?![A-Z]) (which means "2-4 capital letters that aren't surrounded by capital letters):
stringy = 'I image I could FLYandTouchTHEsky';
% returns 17 and 28 ('FLY', 'THE')
regexp(stringy,'(?<![A-Z])[A-Z]{2,4}(?![A-Z])')
% note '\<[A-Z]{2,4}\>' wouldn't match anything here since it looks for
% *whole words* that consist of 2-4 capital letters only.
% 'FLYandTouchTHEsky' doesn't satisfy this.
Pick the regex based on what behaviour you want to occur.
I have a requirement in Ab Initio to format a number in left alignment. I shouldn't be using String conversion (as Strings are left aligned by default), as it might cause compatibility problems in the other end.
For example, if my Field has 7 bytes length, and I'm getting only two digits as my input, then these two digits should go into the first two bytes of my field (left aligned), instead of the last two bytes.
So, is there any in-built function in Ab Initio, that can format a number as left aligned?
You can convert it to string and let it ride. Ab Initio will automatically convert between string and decimal. Also, the physical representation will be the same for these two types.
If you are trying to use a non-ascii based format (int, float, etc.) I don't think there is a built-in function for this and you will probably have to do something rough like cast it to a void type then to a string type using hex_to_string() to preserve the exact bits and then right pad with spaces.