T-SQL Decimal Division Accuracy - tsql

Does anyone know why, using SQLServer 2005
SELECT CONVERT(DECIMAL(30,15),146804871.212533)/CONVERT(DECIMAL (38,9),12499999.9999)
gives me 11.74438969709659,
but when I increase the decimal places on the denominator to 15, I get a less accurate answer:
SELECT CONVERT(DECIMAL(30,15),146804871.212533)/CONVERT(DECIMAL (38,15),12499999.9999)
give me 11.74438969

For multiplication we simply add the number of decimal places in each argument together (using pen and paper) to work out output dec places.
But division just blows your head apart. I'm off to lie down now.
In SQL terms though, it's exactly as expected.
--Precision = p1 - s1 + s2 + max(6, s1 + p2 + 1)
--Scale = max(6, s1 + p2 + 1)
--Scale = 15 + 38 + 1 = 54
--Precision = 30 - 15 + 9 + 54 = 72
--Max P = 38, P & S are linked, so (72,54) -> (38,20)
--So, we have 38,20 output (but we don use 20 d.p. for this sum) = 11.74438969709659
SELECT CONVERT(DECIMAL(30,15),146804871.212533)/CONVERT(DECIMAL (38,9),12499999.9999)
--Scale = 15 + 38 + 1 = 54
--Precision = 30 - 15 + 15 + 54 = 84
--Max P = 38, P & S are linked, so (84,54) -> (38,8)
--So, we have 38,8 output = 11.74438969
SELECT CONVERT(DECIMAL(30,15),146804871.212533)/CONVERT(DECIMAL (38,15),12499999.9999)
You can do the same math if follow this rule too, if you treat each number pair as
146804871.212533000000000 and 12499999.999900000
146804871.212533000000000 and 12499999.999900000000000

To put it shortly, use DECIMAL(25,13) and you'll be fine with all calculations - you'll get precision right as declared: 12 digits before decimal dot, and 13 decimal digits after.
Rule is: p+s must equal 38 and you will be on safe side!
Why is this?
Because of very bad implementation of arithmetic in SQL Server!
Until they fix it, follow that rule.

I've noticed that if you cast the dividing value to float, it gives you the correct answer, i.e.:
select 49/30 (result = 1)
would become:
select 49/cast(30 as float) (result = 1.63333333333333)

We were puzzling over the magic transition,
P & S are linked, so:
(72,54) -> (38,29)
(84,54) -> (38,8)
Assuming (38,29) is a typo and should be (38,20), the following is the math:
i. 72 - 38 = 34,
ii. 54 - 34 = 20
i. 84 - 38 = 46,
ii. 54 - 46 = 8
And this is the reasoning:
i. Output precision less max precision is the digits we're going to throw away.
ii. Then output scale less what we're going to throw away gives us... remaining digits in the output scale.
Hope this helps anyone else trying to make sense of this.

Convert the expression not the arguments.
select CONVERT(DECIMAL(38,36),146804871.212533 / 12499999.9999)

Using the following may help:
SELECT COL1 * 1.0 / COL2

Related

Find the number at the n position in the infinite sequence

Having an infinite sequence s = 1234567891011...
Let's find the number at the n position (n <= 10^18)
EX: n = 12 => 1; n = 15 => 2
import Foundation
func findNumber(n: Int) -> Character {
var i = 1
var z = ""
while i < n + 1 {
z.append(String(i))
i += 1
}
print(z)
return z[z.index(z.startIndex, offsetBy: n-1)]
}
print(findNumber(n: 12))
That's my code but when I find the number at 100.000th position, it returns an error, I thought I appended too many i to z string.
Can anyone help me, in swift language?
The problem we have here looks fairly straight forward. Take a list of all the number 1-infinity and concatenate them into a string. Then find the nth digit. Straight forward problem to understand. The issue that you are seeing though is that we do not have an infinite amount of memory nor time to be able to do this reasonably in a computer program. So we must find an alternative way around this that does not just add the numbers onto a string and then find the nth digit.
The first thing we can say is that we know what the entire list is. It will always be the same. So can we use any properties of this list to help us?
Let's call the input number n. This is the position of the digit that we want to find. Let's call the output digit d.
Well, first off, let's look at some examples.
We know all the single digit numbers are just in the same position as the number itself.
So, for n<10 ... d = n
What about for two digit numbers?
Well, we know that 10 starts at position 10. (Because there are 9 single digit numbers before it). 9 + 1 = 10
11 starts at position 12. Again, 9 single digits + one 2 digit number before it. 9 + 2 + 1 = 12
So how about, say... 25? Well that has 9 single digit numbers and 15 two digit numbers before it. So 25 starts at 9*1 + 15*2 + 1 = 40 (+ 1 as the sum gets us to the end of 24 not the start of 25).
So... 99 starts at? 9*1 + 89*2 + 1 = 188.
The we do the same for the three digit numbers...
100... 9*1 + 90*2 + 1 = 190
300... 9*1 + 90*2 + 199*3 + 1 = 787
1000...? 9*1 + 90*2 + 900*3 + 1 = 2890
OK... so now I'm seeing a pattern here that seems to need to know the number of digits in each number. Well... I can get the number of digits in a number by rounding up the log(base 10) of that number.
rounding up log base 10 of 5 = 1
rounding up log base 10 of 23 = 2
rounding up log base 10 of 99 = 2
rounding up log base 10 of 627 = 3
OK... so I think I need something like...
// in pseudo code
let lengthOfNumber = getLengthOfNumber(n)
var result = 0
for each i from 0 to lengthOfNumber - 1 {
result += 9 * 10^i * (i + 1) // this give 9*1 + 90*2 + 900*3 + ...
}
let remainder = n - 10^(lengthOfNumber - 1) // these have not been added in the loop above
result += remainder * lengthOfNumber
So, in the above pseudo code you can give it any number and it will return the position in the list that that number starts on.
This isn't the exact same as the problem you are trying to solve. And I don't want to solve it for you.
This is just a leg up on how I would go about solving it. Hopefully, this will give you some guidance on how you can take this further and solve the problem that you are trying to solve.

How to efficiently perform nested-loop in Spark/Scala?

So I have this main dataframe, called main_DF which contain all measurement values:
main_DF
group index width height
--------------------------------
1 1 21.3 15.2
1 2 11.3 45.1
2 3 23.2 25.2
2 4 26.1 85.3
...
23 986453 26.1 85.3
And another table called selected_DF, derived from main_DF, which contain the start & end index of important rows in main_DF, along with the length (end_index - start_index). The fields start_index and end_index correspond with field index in main_DF.
selected_DF
group start_index end_index length
--------------------------------
1 1 154 153
2 236 312 76
3 487 624 137
...
238 17487 18624 1137
Now, for each row in selected_DF, I need to perform filtering for all measurement values between the start_index and end_index. For example, let's say row 1 is for index = 1 until 154. After some filtering, dataframe derived from this row is:
peak_DF
peak_start peak_end
--------------------------------
1 12
15 21
27 54
86 91
...
143 150
peak_start and peak_end indicate the area where width exceeds the threshold. It was obtained by selecting all width > threshold, and then check the position of its index (sorry but it's kind of hard to explain, even with the code)
Then I need to take the measurement value (width) based on peak_DF and calculate the average, making it something like:
peak_DF_summary
peak_start peak_end avg_width
--------------------------------
1 12 25.6
15 21 35.7
27 54 24.2
86 91 76.6
...
143 150 13.1
And, lastly, calculate the average of avg_width, and save the result.
After that, the curtain moves to the next row in selected_DF, and so on.
So far I somehow managed to obtain what I want with this code:
val main_DF = spark.read.parquet("hdfs_path_here")
df.createOrReplaceTempView("main_DF")
val selected_DF = spark.read.parquet("hdfs_path_here").collect.par //parallelized array
val final_result_array = scala.collection.mutable.ArrayBuffer.empty[Array[Double]] //for final result
selected_DF.foreach{x =>
val temp = x.split(',')
val start_index = temp(1)
val end_index = temp(2)
//obtain peak_start and peak_end (START)
val temp_df_1 = spark.sql( " (SELECT index, width, height FROM main_DF WHERE width > 25 index BETWEEN " + start_index + " AND " + end_index + ")")
val temp_df_2 = temp_df_1.withColumn("next_index", lead(temp_df("index"), 1).over(window_spec) ).withColumn("previous_index", lag(temp_df("index"), 1).over(window_spec) )
val temp_df_3 = temp_df_2.withColumn("rear_gap", temp_df_2.col("index") - temp_df_2.col("previous_index") ).withColumn("front_gap", temp_df_2.col("next_index") - temp_df_2.col("index") )
val temp_df_4 = temp_df_3.filter("front_gap > 9 or rear_gap > 9")
val temp_df_5 = temp_df_4.withColumn("next_front_gap", lead(temp_df_4("front_gap"), 1).over(window_spec) ).withColumn("next_front_gap_index", lead(temp_df_4("index"), 1).over(window_spec) )
val temp_df_6 = temp_df_5.filter("rear_gap > 9 and next_front_gap > 9").sort("index")
//obtain peak_start and peak_end (END)
val peak_DF = temp_df_6.select("index" , "next_front_gap_index").toDF("peak_start", "peak_end").collect
val peak_DF_temp = peak_DF.map { y =>
spark.sql( " (SELECT avg(width) as avg_width FROM main_DF WHERE index BETWEEN " + y(0) + " AND " + y(1) + ")")
}
val peak_DF_summary = peak_DF_temp.reduceLeft( (dfa, dfb) => dfa.unionAll(dfb) )
val avg_width = peak_DF_summary.agg(mean("avg_width")).as[(Double)].first
final_result_array += avg_width._1
}
spark.catalog.dropTempView("main_DF")
(reference)
The problem is, the code can only run until around halfway (after 20-30 iterations) until it crashed and give out java.lang.OutOfMemoryError: Java heap space. It runs okay when I ran the iterations 1-by-1, though.
So my questions are:
How can there be insufficient memory? I thought the reason should be
accumulated usage of memory, so I add .unpersist() for every
dataframe inside foreach loop (even though I do no .persist()) to no avail.
But then, every memory consumption should be reset along with
re-initiation of variables when we enter new iteration in foreach
loop, no?
Is there any efficient way to do this kind of calculation? I am
doing nested-loop in Spark and I feel like this is a very
inefficient way to do this, but so far it's the only way I can get
result.
I'm using CDH 5.7 with Spark 2.1.0. My cluster has 6 nodes with 32GB memory (each) & 40 cores (total). main_DF is based on 30GB parquet file.

How can I sum up functions that are made of elements of the imported dataset?

See the code and error. I have already tried Do, For,...and it is not working.
CODE + Error from Mathematica:
Import of survival probabilities _{k}p_x and _{k}p_y (calculated in excel)
px = Import["C:\Users\Eva\Desktop\kpx.xlsx"];
px = Flatten[Take[px, All], 1];
NOTE: The probability _{k}p_x can be found on the position px[[k+2, x -16]
i = 0.04;
v = 1/(1 + i);
JointLifeIndep[x_, y_, n_] = Sum[v^k*px[[k + 2, x - 16]]*py[[k + 2, y - 16]], {k , 0, n - 1}]
Part::pkspec1: The expression 2+k cannot be used as a part specification.
Part::pkspec1: The expression 2+k cannot be used as a part specification.
Part::pkspec1: The expression 2+k cannot be used as a part specification.
General::stop: Further output of Part::pkspec1 will be suppressed during this calculation.
Part of dataset (left corner of the dataset):
k\x 18 19 20
0 1 1 1
1 0.999478086278185 0.999363078716059 0.99927911905056
2 0.998841497412202 0.998642656911039 0.99858030519133
3 0.998121451605207 0.99794428814123 0.99788275311401
4 0.997423447323642 0.997247180349674 0.997174407432264
5 0.996726703362208 0.996539285828369 0.996437857252448
6 0.996019178300768 0.995803204773039 0.99563600297737
7 0.995283481416241 0.995001861216016 0.994823584922968
8 0.994482556091416 0.994189960607964 0.99405569519175
9 0.993671079225432 0.99342255996206 0.993339856748282
10 0.992904079096455 0.992707177451333 0.992611817294026
11 0.992189069953677 0.9919796017009 0.991832027835091
Without having the exact same data files to work with it is often easy for each of us to make mistakes that the other cannot reproduce or understand.
From your snapshot of your data set I used Export in Mathematica to try to reproduce your .xlsx file. Then I tried the following
px = Import["kpx.xlsx"];
px = Flatten[Take[px, All], 1];
py = px; (* fake some py data *)
i = 0.04;
v = 1/(1 + i);
JointLifeIndep[x_, y_, n_] := Sum[v^k*px[[k+2,x-16]]*py[[k+2,y-16]], {k,0,n-1}];
JointLifeIndep[17, 17, 12]
and it displays 362.402
Notice I used := instead of = in my definition of JointLifeIndep. := and = do different things in Mathematica. = will immediately evaluate the right hand side of that definition. This is possibly the reason that you are getting the error that you do.
You should also be careful with your subscript values and make sure that every subscript is between 1 and the number of rows (or columns) in your matrix.
So see if you can try this example with an Excel sheet containing only the snapshot of data that you showed and see if you get the same result that I do.
Hopefully that will be enough for you to make progress.

finding values of x and y using Octal Base system

In finding the values of x and y, if (x567) + (2yx5) = (71yx) ( all in base 8) I proceeded as under.
I assumed x=abc and y=def and followed.
(abc+010 def+101 110+abc 111+101)=(111 001 def abc) //adding ()+()=() and equating LHS=RHS.
abc=111-010=101 which is 5 in base 8 and then def=001-101 which is -4
so x=5 and y=-4
Now the Question is that the answer mentioned in my book is x=4 and y=3.
Is the above method correct.If so,then what's issue here ??
you can't compare the digits beginning with the most significant digit, because you don't know the carry from the digit below. Also a digit cannot have a negative value.
You can start with the least significant digit, because there is no carry:
7 + 5 = 14
so x = 4 with a carry of 1 at the next digit.
now you can rewrite your equation to:
(4567) + (2y45) = (71y4)
now you can look at the second least significant digit (the carry in mind):
6 + 4 + 1 (carry) = 13
so y = 3, also with a carry of 1.
the whole equation is:
(4567) + (2345) = (7134)
which is true for the octal system.

Calculations with Real Numbers, Verilog HDL

I noticed that Verilog rounds my real number results into integer results. For example when I look at simulator, it shows the result of 17/2 as 9. What should I do? Is there anyway to define something like a: output real reg [11:0] output_value ? Or is it something that has to be done by simulator settings?
Simulation only (no synthesis). Example:
x defined as a signed input and output_value defined as output reg.
output_value = ((x >>> 1) + x) + 5;
If x=+1 then output value has to be: 13/2=6.5.
However when I simulate I see output_value = 6.
Code would help, but I suspect your not dividing reals at all. 17 and 2 are integers, and so a simple statement like that will do integer division.
17 / 2 = 8 (not 9, always rounds towards 0)
17.0 / 2.0 = 8.5
In your second case
output_value = ((x >>> 1) + x) + 5
If x is 1, x >>> 1 is 0, not 0.5 because you've just gone off the bottom of the word.
output_value = ((1 >>> 1) + 1) + 5 = 0 + 1 + 5 = 6
There's nothing special about verilog here. This is true for the majority of languages.