What are the valid formats are for numbers in MATLAB? The following seem to be valid:
x=0;
x=0.;
x=0.0;
x=0e0;
x=0E0;
x=0000.00; % Trailing and leading zeros seem to be irrelevant
Are there other valid general number specifications? I can't find this in the documentation.
I believe this is the regex of floating-point number formats, valid in MATLAB:
^[-+]*([0-9]+|[0-9]*\.[0-9]+|[0-9]+\.[0-9]*)([eEdD][+-]?[0-9]+)?$
Compiled from here, and slightly modified for MATLAB:
added 'd' exponent character (as is common in FORTRAN, MATLAB's ancestor)
added uppercase exponent characters
added extra case in the required order before and after the decimal symbol
I'm pretty sure the locale can mess this up, e.g., the decimal separator . might be set to , as is common here in Europe. Oh well.
The regex in words:
string start, followed by
zero or more consecutive sign symbols, followed by
non-zero length string of consecutive integers, OR
possibly zero-length string of consecutive integers, followed by a dot, followed by non-zero length string of consecutive integers, OR
non-zero length string of consecutive integers, followed by a dot, followed by a possibly zero-length string of consecutive integers
optionally followed by the exponent part:
one of e, E, d or D.
zero or one sign symbols, followed by
non-zero length string of consecutive integers
followed by string terminator
Note that this is for non-complex floating point values. For complex values, you'd have to
use the regex once for the real, once for the imaginary part
append [ij]{1} to the imaginary part (only lower case)
take care of spacing (\s*) and a [+-]{1} in between the two parts
take care of the fact that the imaginary part may appear alone, but the real part may not appear with a trailing [+-]{1}, but no imaginary part.
Related
How can I write regular expression that recognize any expression from this form: "\xdd" while dd represents hexadecimal number out of the range 00-7F ?
Regular expressions do not express numerical ranges, but sequences of characters in a character set. You have to express those ranges one character at a time.
So the hex digits are [0-9A-F] which describes the set of characters for one digit using the two ranges [0-9] and [A-F] (you'd also have to decide if lower case letters are permitted). For two digits you'd have to notice that the first digit is of a shorter range using only [0-7]. The combined result would be:
[0-7][0-9A-Fa-f]
Putting the other symbols in place we could get:
\\x[0-7][0-9A-Fa-f]
(Assuming \ is a meta-character that needs escaping).
I used dlmwrite to output some data in the following form:
-1.7693255974E+00,-9.7742420654E-04, 2.1528647648E-04,-1.4866241234E+00
What I really want is the following format:
-.1769325597E+00, -.9774242065E-04, .2152864764E-04, -.1486624123E+00
A space is required before each number, followed by a sign, if the number is negative, and the number format is comma delimited, in exponential form to 10 significant digits.
Just in case Matlab is not able to write to this format (-.1769325597E+00), what is it called specifically so that I can research other means of solving my problem?
Although this feels morally wrong, one can use regular expressions to move the decimal point. This is what the function
myFormat = #(x) regexprep(sprintf('%.9e', 10*x), '(\d)\.', '\.$1');
does. The input value is multiplied by 10 prior to formatting, to account for the point being moved. Example: myFormat(-pi^7) returns -.3020293228e+04.
The above works for individual numbers. The following version is also able to format arrays, providing comma separators. The second regexprep removes the trailing comma.
myArrayFormat = #(x) regexprep(regexprep(sprintf('%.9e, ', 10*x), '(\d)\.', '\.$1'), ', $', '');
Example: myArrayFormat(1000*rand(1,5)-500) returned
-.2239749230e+03, .1797026769e+03, .1550980040e+03, -.3373882648e+03, -.3810023184e+03
For individual numbers, myArrayFormat works identically to myFormat.
These two long numbers are the same except for the last digit.
test = [];
test(1) = 33777100285870080;
test(2) = 33777100285870082;
but the last digit is lost when the numbers are put in the array:
unique(test)
ans = 3.3777e+16
How can I prevent this? The numbers are ID codes and losing the last digit is screwing everything up.
Matlab uses 64-bit floating point representation by default for numbers. Those have a base-10 16-digit precision (more or less) and your numbers seem to exceed that.
Use something like uint64 to store your numbers:
> test = [uint64(33777100285870080); uint64(33777100285870082)];
> disp(test(1));
33777100285870080
> disp(test(2));
33777100285870082
This is really a rounding error, not a display error. To get the correct strings for output purposes, use int2str, because, again, num2str uses a 64-bit floating point representation, and that has rounding errors in this case.
To add more explanation to #rubenvb's solution, your values are greater than flintmax for IEEE 754 double precision floating-point, i.e, greater than 2^53. After this point not all integers can be exactly represented as doubles. See also this related question.
I need to perform casefolding on a set of strings, and must ensure beforehand that they will not exceed a given length after this is done (to hard-code the needed buffer size). The problem is that a string length (in code points) may change after casefolding is applied. See, e.g., in Python3:
>>> "süß".casefold()
'süss'
Now, the maximum number of code points a string may contain after performing casefolding can be computed easily:
>>> max(len(chr(s).casefold()) for s in range(0x10FFFF + 1))
3
But is it valid in all cases? I mean, is it possible that the sequence of code points (the order in which they appear) might affect the final length of the string, due to some arcane property of Unicode? Or can I assume that the final string will always be at most 3 times longer than the original?
The Unicode standard defines casefolding as follows:
toCasefold(X): Map each character C in X to Case_Folding(C).
So every character in a string is casefolded regardless of context and the results are concatenated. This means that your assumption is correct: A casefolded string is guaranteed to have at most three times the number of code points of the original.
I want to read a matrix from file using the following syntax in MATLAB . This matrix is of double numbers .
readmtx(fname,nrows,ncols,precision)
Here all the inputs are quite familiar to me . But I want to know about precision . The precision of int is 'int16'. What is the precision of double number?
In this case, the documentation states:
Both binary and formatted data files can be read. If the file is binary, the precision argument is a format string recognized by fread. Repetition modifiers such as '40*char' are not supported. If the file is formatted, precision is a fscanf and sscanf-style format string of the form '%nX', where n is the number of characters within which the formatted data is found, and X is the conversion character such as 'g' or 'd'. Fortran-style double-precision output such as '0.0D00' can be read using a precision string such as '%nD', where n is the number of characters per element. This is an extension to the C-style format strings accepted by sscanf. Users unfamiliar with C should note that '%d' is preferred over '%i' for formatted integers. MATLAB syntax follows C in interpreting '%i' integers with leading zeros as octal. Formatted files with line endings need to provide the number of trailing bytes per row, which can be 1 for platforms with carriage returns or linefeed (Macintosh, UNIX®), or 2 for platforms with carriage returns and linefeeds (DOS).
In addition it is helpful to look at the table summary in the fread documentation: