Use ApplySimple in MicroStrategy - microstrategy

​I have a problem and can't understand how create metrics.
​For example:
•​1/ Attribute [Net Weight] = 0,5 (type of data is decimal)
•​2/ Use function ApplySimple("Replace(#0,',','.')"; Max([Net Weight])) {Product}
​Could you please help me understand how to use function "ApplySimple".

Let's see:
(Sum(VOL) {~} / Max(ApplySimple("replace(#0, ',', '.')"; [Net Weight])) {Product} )
This metric divides the sum of VOL fact at report level by [Net Weight] after applying it a character replace at BBDD level with ORACLE's replace function (it seems that the BBDD is ORACLE), and breaks it down by product.
Splitting the fraction:
Numerator: Sum(VOL) {~}
Denominator: Max( ApplySimple("replace(#0, ',', '.')"; [Net Weight]) )
The #0 is the parameter's reference to apply the BBDD function, [Net Weight] in this case.
Regards!

Related

PL/PGSQL function set upper and lower bound to an integer value

I am working on a pgsql function that calculates a score based on some logic. But one of the requirements is that a parameter after calculation should be in the range [100000, 9900000].
I can't figure out how to do this with existing functions, obviously possible with if conditions, any help?
v_running_sum += (30 - v_calcuated_value)* 100000;
I want v_running_sum to be in the range mentioned above. Is there any way to bound the value of the variable if lower than the lower bound (100,000) to 100,000 and vice versa for the upper bound?
This is how you can easily do this check, using a range:
SELECT 1 <# int4range(100000, 9900000,'[]');
There are many options how to implement this in your logic.
----edit----
When the outcome of a calculation should always be something between 100000 and 9900000, you can use this:
SELECT LEAST(GREATEST(var, 100000), 9900000);
Whatever you stick into "var", the result will always be between these boundaries
If you want a verbose solution use a CASE statement
case when val <= 100000 then 100000
when val >= 9900000 then 9900000
else val end as val

Results to two decimals using ST_Area

I have performed ST_Area on a shapefile but the resulting numbers are VERY long. Need to reduce them to two decimals. This is the code so far:
SELECT mtn_name, ST_Area(geom) / 1000000 AS km2 FROM mountain ORDER BY 2 DESC;
This is what I get:
mtn_name KM2
character varying double precision
1 Monte del Pueblo de Jerez del Marquesado 6.9435657067528e-9
2 Monte de La Peza 6.113288075418532e-9
I tried ROUND() but it brings KM to 0.00
Since it is not simply possible to round a decimal value (Decimal Precision problem) you will not get a double value which is exactly 6.94e-9. It would be something like 6.9400000001e-9 after rounding.
You can do:
demos:db<>fiddle
If the exponent is always the same (in your example it is always e-9) you can round with a fixed value. With double values, this results in the problem described above.
SELECT
round(area * 10e8 * 100) / 100 / 10e8
FROM area_result
To avoid these precision problems, you can use numeric type
SELECT
round(area * 10e8 * 100)::numeric / 100 / 10e8
FROM area_result
If you have different exponents, you have to calculate the multiplicator first. According to this solution you can do:
For double output
SELECT
round(area / mul * 100) * mul / 100
FROM (
SELECT
area,
pow(10, floor(log10(area))) as mul
FROM area_result
) s
For numeric output
SELECT
round((area / mul) * 100)::numeric * mul / 100
FROM (
SELECT
area,
pow(10, floor(log10(area)))::numeric as mul
FROM area_result
) s
However, your exponential result is just a view of the values. This can vary from database tool to database tool. Internally they are not stored as the view. So, if you fetch these values, you will, in fact, get a value like 0.00000000694 and not 6.94e-9, which is just a textual representation.
If you want to ensure to get exactly this textual representation, you can use number formatting to_char() for this, which, of course, returns a type text, not a number anymore:
SELECT
to_char(area, '9.99EEEE')
FROM area_result

About a loss of precision when calculating an aggregate sum with data frames

i have a Dataframe with this kind of data:
unit,sensitivity currency,trading desk ,portfolio ,issuer ,bucket ,underlying ,delta ,converted sensitivity
ES ,USD ,EQ DERIVATIVES,ESEQRED_LH_MIDX ,5GOY ,5 ,repo ,0.00002 ,0.00002
ES ,USD ,EQ DERIVATIVES,IND_GLOBAL1 ,no_localizado ,8 ,repo ,-0.16962 ,-0.15198
ES ,EUR ,EQ DERIVATIVES,ESEQ_UKFLOWN ,IGN2 ,8 ,repo ,-0.00253 ,-0.00253
ES ,USD ,EQ DERIVATIVES,BASKETS1 ,9YFV ,5 ,spot ,-1003.64501 ,-899.24586
and I have to do an aggregation operation over this data, doing something like this:
val filteredDF = myDF.filter("unit = 'ES' AND `trading desk` = 'EQ DERIVATIVES' AND issuer = '5GOY' AND bucket = 5 AND underlying = 'repo' AND portfolio ='ESEQRED_LH_MIDX'")
.groupBy("unit","trading desk","portfolio","issuer","bucket","underlying")
.agg(sum("converted_sensitivity"))
But I am seeing that I am loosing precision on the aggregated sum, so how can I be sure about that every value of "converted_sensitivity" is converted to a BigDecimal(25,5) before doing the sum operation over the new aggregated column?
Thank you very much.
To be sure of the convertion you can use the DecimalType in your DataFrame.
According to Spark documentation the DecimalType is:
The data type representing java.math.BigDecimal values. A Decimal that must have fixed precision (the maximum number of digits) and scale (the number of digits on right side of dot).
The precision can be up to 38, scale can also be up to 38 (less or equal to precision).
The default precision and scale is (10, 0).
You can see this here.
To convert the data you can use the function cast of the Column object. Like this:
import org.apache.spark.sql.types.DecimalType
val filteredDF = myDF.filter("unit = 'ES' AND `trading desk` = 'EQ DERIVATIVES' AND issuer = '5GOY' AND bucket = 5 AND underlying = 'repo' AND portfolio ='ESEQRED_LH_MIDX'")
.withColumn("new_column_big_decimal", col("converted_sensitivity").cast(DecimalType(25,5))
.groupBy("unit","trading desk","portfolio","issuer","bucket","underlying")
.agg(sum("new_column_big_decimal"))

Why does supressing weights improve Tensorflow neural net performance?

I have a 2-layer non-convolutional network in Tensorflow, using tanh as the activation function. I understand that weights should be initialized with a truncated normal distribution divided by sqrt(nInputs) e.g.:
weightsLayer1 = tf.Variable(tf.div(tf.truncated_normal([nInputUnits, nUnitsHiddenLayer1),math.sqrt(nInputUnits))))
Being a bit of a bumbling newbie in NN and Tensorflow, I mistakenly implemented this as 2 lines only to make it more readable:
weightsLayer1 = tf.Variable(tf.truncated_normal([nInputUnits, nUnitsHiddenLayer1])
weightsLayer1 = tf.div(weightsLayer1, math.sqrt(nInputUnits))
I now know that this is wrong and that the 2nd line causes the weights to be recomputed at each learning step. However, to my suprise, the "incorrect" implementation consistently yields better performance, both in train and test/evaluation datasets. I thought that the incorrect 2-line implementation should be a train wreck, since it is recomputing (suppressing) weights to values other than those chosen by the optimizer, which I would expect would wreak havoc in the optimization process, but it actually improves it. Does anyone have any explanation for this? I am using the Tensorflow adam optimizer.
Update 2016.6.22 - updated the 2nd code block above.
You are right that weightsLayer1 = tf.div(weightsLayer1, math.sqrt(nInputUnits)) is executed at each step. But that does NOT mean that the values in the weight variable are scaled down by sqrt(nInputUnits) in each step. This line is not an in-place operation that affects the values stored in the variable. It computes a new tensor, holding the values in the variable divided by sqrt(nInputUnits) and that tensor, I assume, then goes into the rest of your computation graph. This does not interfere with the optimizer. You are still defining a valid computation graph, just with an somewhat arbitrary scaling of the weights. The optimizer can still compute the gradients with respect to this variable (it will back-propagate through your division operation) and create the corresponding update operations.
In terms of the model that you are defining, the two versions are totally equivalent. For any set of values of weightsLayer1 in the original model (where you don't do the division), you can simply scale them up by sqrt(nInputUnits) and you will get the identical results with your second model. The two represent exactly the same model class, if you will.
Why one works better than the other? Your guess is as good as mine. If you have done the same division for all your variables, you have effectively divided your learning rate by sqrt(nInputUnits). This smaller learning rate might have been beneficial to the problem at hand.
Edit: I think the fact that you give the same name to the variable and the newly created tensor causes confusion. When you do
A = tf.Variable(1.0)
A = tf.mul(A, 2.0)
# Do something with A
then the second line creates a new tensor (as discussed above) and you re-bind the name (and it is only a name) A to that new tensor. For the graph being defined, the naming is absolutely irrelevant. The following code defines the same graph:
A = tf.Variable(1.0)
B = tf.mul(A, 2.0)
# Do something with B
Maybe this becomes clear if you execute the following code:
A = tf.Variable(1.0)
print A
B = A
A = tf.mul(A, 2.0)
print A
print B
The output is
<tensorflow.python.ops.variables.Variable object at 0x7ff025c02bd0>
Tensor("Mul:0", shape=(), dtype=float32)
<tensorflow.python.ops.variables.Variable object at 0x7ff025c02bd0>
The first time you print A it tells you that A is a variable object. After executing A = tf.mul(A, 2.0) and printing A again, you can see that the name A is now bound to a tf.Tensor object. However, the variable still exists, as can be seen by looking at the object behind the name B.
This is what the single line of code does:
t = tf.truncated_normal( [ nInputUnits, nUnitsHiddenLayer1 ] )
Creates a Tensor with shape [ nInputUnits, nUnitsHiddenLayer1 ], initialized with 1.0 as the standard deviation of the truncated normal distribution. ( 1.0 is standard stddev value )
t1 = tf.div( t, math.sqrt( nInputUnits ) )
divide all values in t with math.sqrt( nInputUnits )
Your two lines of code do exactly the same thing. On the first line and the second line all values are divided by math.sqrt( nInputUnits ).
As for your statement:
I now know that this is wrong and that the 2nd line causes the weights to be recomputed at each learning step.
EDIT my mistake
Indeed you are right, they are divided by math.sqrt( nInputUnits ) at every execuction, but not reinitialized! The point of importance is where you put tf.variable()
Here both lines are only initialized once:
weightsLayer1 = tf.truncated_normal( [ nInputUnits, nUnitsHiddenLayer1 ] )
weightsLayer1 = tf.Variable( tf.div( weightsLayer1, math.sqrt( nInputUnits ) ) )
and here the second line is preformed at every step:
weightsLayer1 = tf.Variable( tf.truncated_normal( [ nInputUnits, nUnitsHiddenLayer1 ] )
weightsLayer1 = tf.div( weightsLayer1, math.sqrt( nInputUnits ) )
Why does the second yield better results? it looks like some kind normalization to me, but somebody more knowledgeable should verify that.
Ps.
you can write it more readable like this:
weightsLayer1 = tf.Variable( tf.truncated_normal( [ nInputUnits, nUnitsHiddenLayer1 ] , stddev = 1. / math.sqrt( nInputUnits ) )

TSQL Round() inconsistency?

The problem we have is reduced to the following two statements:
select convert(float, (convert(float,5741.61)/convert(float, 196.00)) * convert(float,14.00)) as unrounded, round(convert(float, (convert(float,5741.61)/convert(float, 196.00)) * convert(float,14.00)), 2) as roundedTo2dp
select convert(float, 410.115) as unrounded, ROUND( convert(float, 410.115), 2) as roundedTo2dp
The first statement uses floats to calculate a value of 410.115, and also that result with a round() to 2 decimal places. The rounded value comes out at 410.11.
The second statement uses the float value 410.115 and also rounds it to 2 decimal places. The rounded result comes out as 410.12.
Why is one rounding down and the other rounding up when the value being rounded it the same?
How can I get the first statement to round to 410.12?
EDIT: apologies for formatting -- stackoverflow isn't showing any formatting on this machine (very odd).
Decimals are better with precision than floats. If you changed up the float to be something like DECIMAL(18,2), you'll get what you are expecting and you don't need to call the round function anymore.
select convert(decimal(18,2), (convert(decimal(18,2),5741.61)/convert(decimal(18,2), 196.00)) * convert(decimal(18,2),14.00)) as unrounded, round(convert(decimal(18,2), (convert(decimal(18,2),5741.61)/convert(decimal(18,2), 196.00)) * convert(decimal(18,2),14.00)), 2) as roundedTo2dp
results in
unrounded roundedTo2dp
410.12 410.12
Link to the MSDN about decimals. http://msdn.microsoft.com/en-us/library/ms187746.aspx
Hope that helps...
The numbers are not equal:
SELECT CAST(convert(float, (convert(float,5741.61)/convert(float, 196.00)) * convert(float,14.00)) AS BINARY(8))
UNION ALL
SELECT CAST(convert(float, 410.115) AS BINARY(8)) as bin
----
0x4079A1D70A3D70A3
0x4079A1D70A3D70A4
'float' is an approximate number data type and hence not all values in the data type range can be represented exactly.
This is based on http://msdn.microsoft.com/en-us/library/ms173773.aspx.
I believe this is why there is rounding issue while using float values. You can never be 100% sure!
Ex.
Select round(convert(float, 1.5555), 2) --Gives 1.56
Select round(convert(float, 1.555), 2) --Gives 1.55!
With such a simple number there is difference in expected result while using float.