Druid doesn't support number? - druid

I'm working with Druid 0.8.2 and i tried to import number (as a dimension) with Tranquility.
I'd this error :
java.lang.IllegalStateException: Dimensions of type[FLOAT] are not supported
I read that Druid 0.8.2 doesn't support number. Is it true ? If it is, I must use type string in my dimensions ?

Dimension columns in Druid must be strings. If you have a float field - it is most likely a Metric. I use this rule of thumb - if I can think of a query that uses a column in a GROUP BY statement - it's a dimension, otherwise, it's most likely a metric(a value being measured, whether the dimension is an attribute of a measurement).
Explanation of a difference between dimensions and metrics - https://altitudemarketing.com/blog/metrics-vs-dimensions-difference/
If you are sure that your Float column is a dimension, then you need to convert it to string type, otherwise, don't mark it as a dimension and leave it as float type(which is supported for metrics)

Related

Spark Imputer for filling up missing values

requirement -
In the Picture attached, consider the first 3 columns as my raw data. Some rows have quantity column as NULL value which is exactly what I want to fill up.
In an Ideal case, I would fill up any NULL value with the previous KNOWN value.
Spark Imputer seemed to be a very easily implementable library that can help me fill missing values.
But here the issue is,Spark Imputer is limited to mean or Median calculation according to all NON-BULL values present in the data frame as a result of which I don't get desired result (4th column in the Pic).
Logic -
val imputer = new Imputer()
.setInputCols(Array("quantity"))
.setOutputCols(Array("quantity_imputed"))
.setStrategy("mean")
val model = imputer.fit(new_combinedDf)
model.transform(new_combinedDf).show()
Result -
Now is it possible to limit the Mean calculation for EACH null value to be the MEAN of last n values ?
i.e
For 2020-09-26 , where we get the first null value, Is it possible to tweak Spark Imputer to calculate the Mean over last n values only instead of all non-null values in the dataframe ?

filtering in Pyspark using integer vs decimal values

I am filtering a DataFrame and when I pass an integer value, it considers only those that satisfy the condition when the DataFrame column value is rounded to an integer. Why is this happening? See the screenshot below, the two filters give different results. I am using Spark 2.2. I tested it with python 2.6 and python 3.5. The results are the same.
Update
I tried it with Spark-SQL. If I do not convert the field to double, it gives the same answer as the first one above. However, if I cast the column to double before filtering, it gives correct answer.
for lat > 60
Given a double and an integer spark is implicitly converting both of them to integers. The result is appropriate, showing latitudes >= 61
for lat > cast(60 as double) or lat > 60.0
Given two doubles spark returns everything in the set [Infinity, 60.0), as expected
This might be slightly un-intuitive, but you must remember that spark is performing implicit conversions between IntegerType() and DoubleType()
Although you use pyspark, under the hood it is in Scala and ultimately Java. So Java's conversion rules apply here.
To be specific
https://docs.oracle.com/javase/specs/jls/se10/html/jls-5.html#jls-5.1.3
...Otherwise, if the floating-point number is not an infinity, the floating-point value is rounded to an integer value V, rounding toward zero using IEEE 754 round-toward-zero mode (ยง4.2.3).

PostgreSQL adds trailing zeros to numeric

Recently I migrated a DB to PostgreSQL that has some columns defined as numeric(9,3) and numeric(9,4). In testing the app I have found that when data is saved to these columns there are trailing zeros being added to the value inserted. I am using Hibernate, and my logs show the correct values being built for the prepared statements.
An example of the data I am inserting is 0.75 in the numeric(9,3) column and the value stored is 0.750. Another example for the numeric(9,4) column: I insert the value 12 and the DB is holding 12.0000.
I found this related question: postgresql numeric type without trailing zeros. But it did not offer a solution other than to quote the 9.x documentation saying trailing zeros are not added. From that question, the answer quoted the docs (which I have also read) which said:
Numeric values are physically stored without any extra leading or
trailing zeroes. Thus, the declared precision and scale of a column
are maximums, not fixed allocations.
However, like that question poster, I see trailing zeros being added. The raw insert generated by Hibernate in the logs does not show this extra baggage. So I am assuming it is a PostgreSQL thing I have not set correctly, I just can't find how I got it wrong.
I think this is it, if I am understanding "coerce" correctly in this context. This is from the PostgreSQL docs:
Both the maximum precision and the maximum scale of a numeric column
can be configured. To declare a column of type numeric use the syntax:
NUMERIC(precision, scale)
The precision must be positive, the scale zero or positive.
Alternatively:
NUMERIC(precision)
selects a scale of 0. Specifying:
NUMERIC
without any precision or scale creates a column in
which numeric values of any precision and scale can be stored, up to
the implementation limit on precision. A column of this kind will not
coerce input values to any particular scale, whereas numeric
columns with a declared scale will coerce input values to that scale.
Bold emphasis mine.
So it is misleading later in the same section:
Numeric values are physically stored without any extra leading or
trailing zeroes. Thus, the declared precision and scale of a column
are maximums, not fixed allocations.
Bold emphasis mine again.
This may be true of the precision part, but since the scale is being coerced when it is defined, trailing zeros are being added to the input values to meet the scale definition (and I would assume truncated if too large).
I am using precision,scale definitions for constraint enforcement. It is during the DB insert that the trailing zeros are being added to the numeric scale, which seems to support the coercion and conflicts with the statement of no trailing zeros being added.
Correct or not, I had to handle the problem in code after the select is made. Lucky for me the impacted attributes are BigDecimal so stripping trailing zeros was easy (albeit not graceful). If someone out there has a better suggestion for not having PostgreSQL add trailing zeros to the numeric scale on insert, I am open to them.
If you specify a precision and scale, Pg pads to that precision and scale.
regress=> SELECT '0'::NUMERIC(8,4);
numeric
---------
0.0000
(1 row)
There's no way to turn that off. It's still the same number, and the precision is defined by the type, not the value.
If you want to have the precision defined by the value you have to use unconstrained numeric:
regress=> SELECT '0'::NUMERIC, '0.0'::NUMERIC;
numeric | numeric
---------+---------
0 | 0.0
(1 row)
You can strip training zeros with the trim_scale function from PostgreSQL v13 on. That will reduce the storage size of the number.

How to convert a Hex-value to float in modbus-protocol using C#

i am using the modbus-protocoll to retrieve an analog-value from a module.
On the webpanel i can see that the value is 09FD in Hex and 0.780 in float.
The function returns only the 09FD into C# and must be manuall converted to the float-value.
For this there is a convert-function in the modbus-dll:
public static float GetSingle(
ushort highOrderValue,
ushort lowOrderValue
)
But what part of "09FD" must be set to the two ushorts?
I dont figure out how to pass it to retrieve the double-value.
Thanks for help
The values returned in the register are integers, scaled in some way to permit representation of the fractional floating point values. You will need to review the documentation of your module to determine how to scale (and possibly offset) the result to convert it into actual engineering units.
If you don't have documentation, then you could retrieve multiple values from the web panel, plot them and fit a line through them to estimate the parameters used to scale the output value.
The GetSingle() function you reference applies if the values are represented in true floating point form (and passed in two adjacent integers) by the device. Your example with only a single integer suggests this is not the case.

How do I deterimine if a double is not infinity, -infinity or Nan in SSRS?

If got a dataset returned from SSAS where some records may be infinity or -infinity (calculated in SSAS not in the report).
I want to calculate the average of this column but ignore those records that are positive or negative infinity.
My thought is to create a calculated field that would logically do this:
= IIF(IsInfinity(Fields!ASP.Value) or IsNegativeInfinity(Fields!ASP.Value), 0 Fields!ASP.Value)
What I can't figure out is how to do the IsInfinity or IsNegativeInfinity.
Or conversely is there a way to calculate Average for a column ignoring those records?
Just stumbled across this problem and found a simple solution for determining whether a numeric field is infinity.
=iif((Fields!Amount.Value+1).Equals(Fields!Amount.Value), false,true)
I'm assuming you are using the business intelligence studio rather than the report builder tool.
Maybe you are trying a formula because you can't change the SSAS MDX query, but if you could then the infinity would likely be caused by a divide by zero. The NaN is likely caused by trying to do Maths with NULL values.
Ideally change the cube itself so that the measure is safe from divide by zero (e.g. IIF [measure] = 0,don't do the division just return "", otherwise do it) . Second option would be create a calculated measure in the MDX query that does something similar.
As for a formula, there are no IsInfinity functions so you would have to look at the value of the field and see if its 1.#IND or 1.#INF or NaN.