java double to BigDecimal convertion adding extra trailing zero - double

double d =0.0000001;
BigDecimal d1 = BigDecimal.valueOf(d);
System.out.println(d1.toPlainString());
prints output => 0.00000010 i am expecting 0.0000001

First, to get a precise BigDecimal value, avoid using a float or double to initialize it. The conversion from text to floating point type is by definition not guaranteed to be precise. The conversion from such a type to BigDecimal is precise, but may not exactly yield the result you expected.
If you can avoid using a double or float for initialization, then do something like this instead:
BigDecimal d = new BigDecimal("0.0000001");
This will already give you the desired output, because the conversion from (decimal) text to BigDecimal is precise.
But if you really must convert from a double, then try:
double d = 0.0000001;
BigDecimal d1 = BigDecimal.valueOf(d);
System.out.println(d1.stripTrailingZeros().toPlainString());
This will strip (remove) any trailing zeroes.

You can use a DecimalFormat to format your BigDecimal.
double d = 0.0000001;
BigDecimal d1 = BigDecimal.valueOf(d);
DecimalFormat df = new DecimalFormat("#0.0000000");
System.out.println(df.format(d1));

Related

Why does this convert Int to Double in Swift playground not work?

f = 802
var df = Double(f / 4)
print(df)
result is 200.0
I expected 200.5
Your expression creates a Double from the division of the integers 820 by 4 which is 200.
If you want a floating point division you have to do the conversion before the division. Or simpler without a conversion declare f as Double
let f = 802.0
let df = f / 4 // 200.5
It's a good practice anyway to declare numeric literals as the actual type. I would even write
let df = f / 4.0
The benefit is that the compiler complains if the types don't match
This is a common bug and easy mistake to make. When you want a floating point result, you need to ensure that the operands of your arithmetic expressions are floating point numbers instead of integers. This will produce the expected result of 200.5:
var df = Double(f) / 4.0
(Edit: if your variable f really is going to be a hard-coded constant 802, I actually recommend vadian's solution of declaring f itself as Double rather than Int.)
A more detailed explanation:
Looking at the order of operations of var df = Double(f / 4):
The innermost expression is f / 4. This is evaluated first. f and 4 are both integers, so this is calculated using integer division which rounds down, so 802/4 => 200.
Then the result 200 is used in the Double() conversion, thus the result of 200.0. Finally, the result is assigned to the newly-declared variable df, which Swift infers to have the type Double based on the expression to the right of the equals sign.
Compare this to var df = Double(f) / 4.0: the Double(f) is evaluated first, converting the integer 802 to a double value 802.0. Now the division is performed, and since both operands of the division sign are floating point, floating-point division is performed and you get the result 802.0 / 4.0 => 200.5. This result is a Double value, so the variable df is declared to be a Double and assigned the value 200.5.
Some other approaches that don't work:
var df = f / 4: f and 4 are both integers, integer division is performed automatically, and df is now a variable of type Int with value 200
var df: Double = f / 4: trying to explicitly declare df as Double will produce a compiler error. The right side of the equals sign is still an integer division operation, and Swift won't automatically cast from Integer to Double, it wants you to explicitly decide how to cast
var df = f / 4.0: in some languages, this type of expression would automatically convert f to a Double and thus perform floating-point division like you want. But again Swift will not automatically convert and wants you to be explicit…this leads to my recommended solution of Double(f)/4.0
In your example you are dividing Integers and then casts to double.
Fix:
f = 802
var df = Double(f) / 4
print(df)

Error in converting String column to binary column [duplicate]

I have ported Java code to C#.
Could you please explain why I have compile-time error in the follow line (I use VS 2008):
private long l = 0xffffffffffffffffL; // 16 'f' got here
Cannot convert source type ulong to target type long
I need the same value here as for origin Java code.
Java doesn't mind if a constant overflows in this particular situation - the value you've given is actually -1.
The simplest way of achieving the same effect in C# is:
private long l = -1;
If you want to retain the 16 fs you could use:
private long l = unchecked((long) 0xffffffffffffffffUL);
If you actually want the maximum value for a signed long, you should use:
// Java
private long l = Long.MAX_VALUE;
// C#
private long l = long.MaxValue;
Assuming you aren't worried about negative values, you could try using an unsigned long:
private ulong l = 0xffffffffffffffffL;
In Java the actual value of l would be -1, because it would overflow the 2^63 - 1 maximum value, so you could just replace your constant with -1.
0xffffffffffffffff is larger than a signed long can represent.
You can insert a cast:
private long l = unchecked( (long)0xffffffffffffffffL);
Since C# uses two's complement, 0xffffffffffffffff represents -1:
private long l = -1;
Or declare the variable as unsigned, which is probably the cleanest choice if you want to represent bit patterns:
private ulong l = 0xffffffffffffffffL;
private ulong l = ulong.MaxValue;
The maximal value of a singed long is:
private long l = 0x7fffffffffffffffL;
But that's better written as long.MaxValue.
You could do this:
private long l = long.MaxValue;
... but as mdm pointed out, you probably actually want a ulong.
private ulong l = ulong.MaxValue;

convert number string into float with specific precision (without getting rounding errors)

I have a vector of cells (say, size of 50x1, called tokens) , each of which is a struct with properties x,f1,f2 which are strings representing numbers. for example, tokens{15} gives:
x: "-1.4343429"
f1: "15.7947111"
f2: "-5.8196158"
and I am trying to put those numbers into 3 vectors (each is also 50x1) whose type is float. So I create 3 vectors:
x = zeros(50,1,'single');
f1 = zeros(50,1,'single');
f2 = zeros(50,1,'single');
and that works fine (why wouldn't it?). But then when I try to populate those vectors: (L is a for loop index)
x(L)=tokens{L}.x;
.. also for the other 2
I get :
The following error occurred converting from string to single:
Conversion to single from string is not possible.
Which I can understand; implicit conversion doesn't work for single. It does work if x, f1 and f2 are of type 50x1 double.
The reason I am doing it with floats is because the data I get is from a C program which writes the some floats into a file to be read by matlab. If I try to convert the values into doubles in the C program I get rounding errors...
So, (after what I hope is a good question,) how might I be able to get the numbers in those strings, at the right precision? (all the strings have the same number of decimal places: 7).
The MCVE:
filedata = fopen('fname1.txt','rt');
%fname1.txt is created by a C program. I am quite sure that the problem isn't there.
scanned = textscan(filedata,'%s','Delimiter','\n');
raw = scanned{1};
stringValues = strings(50,1);
for K=1:length(raw)
stringValues(K)=raw{K};
end
clear K %purely for convenience
regex = 'x=(?<x>[\-\.0-9]*),f1=(?<f1>[\-\.0-9]*),f2=(?<f2>[\-\.0-9]*)';
tokens = regexp(stringValues,regex,'names');
x = zeros(50,1,'single');
f1 = zeros(50,1,'single');
f2 = zeros(50,1,'single');
for L=1:length(tokens)
x(L)=tokens{L}.x;
f1(L)=tokens{L}.f1;
f2(L)=tokens{L}.f2;
end
Use function str2double before assigning into yours arrays (and then cast it to single if you want). Strings (char arrays) must be explicitely converted to numbers before using them as numbers.

How to specify output type in MATLAB when using calllib

Can I specify the output type when using calllib? My problem is MATLAB is automatically converting my output to a double even though I need an int64 and am losing needed precision.
Example
I have the following function defined in my_header.h
__int64 my_function(int arg1);
I can call the function like this:
loadlibrary('my_library', 'my_header.h')
output = calllib('my_library', 'my_function', arg1)
But then output is a double and I am losing needed precision.
What I tried
output = int64(calllib('my_library', 'my_function', arg1))
as well as
output = zeros(1, 'int64')
output(1) = calllib('my_library', 'my_function', arg1)
but these just convert my double to int64 after it has already lost the needed precision.

Vector/Array to integer

-(void)userShow{
xVal = new vector<double>();
yVal = new vector<double>();
xyVal = new vector<double>();
xxVal = new vector<double>();
value = new vector<double>();
for(it = xp->begin(); it != xp->end(); ++it){
xVal->push_back(it->y);
xxVal->push_back(it->x);
}
for(it = yp->begin(); it != yp->end(); ++it){
xyVal->push_back(it->x);
yVal->push_back(it->y);
}
for (int i = 0; i < xVal->size(); i++){
int c = (*xVal)[i];
for(int i = 0; xyVal[i] < xxVal[i]; i++){
double value = yVal[c-1] + (yVal[c] - yVal[c-1])*(xxVal[i] - xyVal[c-1])/(xyVal[c] - xyVal[c-1]);
yVal->push_back(value);
}
}
}
I am having an issue with the double value = ... part of my code. I get three errors saying invalid operands to binary expression ('vector<double>' and 'vector<double>') pointing to the c.
should int c = (*xVal)[i]; be double c = (*xVal)[i]; when i try to use double i get 6 errors saying Array subscript is not an integer. Which means I need to convert the array into an integer. How am I getting an array if I am using vectors? Just a lot of confusion at the moment.
Not really sure if i really need to explain what the code is supposed to do, but if it helps. I am trying to get it so it take two vectors splits the vectors x and y's into x and y. then take the y of xp and the y of yp and put them together. but because xp and yp vectors do not match i need to use the for loop and the double value algorithm to get a decent set of numbers.
The c is fine. The problem really is in double value = .., as your compiler says. You have pointers, so you can't access the array's elements like this:
double value = yVal[c-1] + ...
It must be
double value = (*yVal)[c-1] +
The same for xyVal, xxVal, etc. You need to fix the whole inner for loop.
But why you allocate the vectors like this...? Is there any reason to use new? This is so error prone. I'd use just
vector<double> xVar;
instead of
xVal = new vector<double>();
And then use . instead of -> combined with *. It so much easier.
Ah, forgot about the question for c - no, it should not be double. You can't use floating point numbers for indices. Also, if xVal is supposed to contain integer numbers (so that they can be used for indices), why don't you just declare the vector as vector< int > instead of vector< double >? I don't what's the logic in your program, but it looks like it(the logic) should be improved, IMO.