scala specialization - using object instead of class causes slowdown? - scala

I've done some benchmarks and have results that I don't know how to explain.
The situation in a nutshell:
I have 2 classes doing the same (computation heavy) thing with generic arrays, both of them use specialization (#specialized, later #spec). One class is defined like:
class A [#spec T] {
def a(d: Array[T], c: Whatever[T], ...) = ...
...
}
Second: (singleton)
object B {
def a[#spec T](d: Array[T], c: Whatever[T], ...) = ...
...
}
And in the second case, I get huge performance hit. Why this can happen? (Note: at the moment I don't understand Java bytecode very well, and Scala compiler internals too.)
More details:
Full code is here: https://github.com/magicgoose/trashbox/tree/master/sorting_tests/src/magicgoose/sorting
This is sorting algorithm ripped from Java, (almost)automagically converted to Scala and comparison operations changed to generic ones to allow using custom comparisons with primitive types without boxing. Plus simple benchmark (tests on different lengths, with JVM warmup and averaging)
The results looks like: (left column is original Java Arrays.sort(int[]))
JavaSort | JavaSortGen$mcI$sp | JavaSortGenSingleton$mcI$sp
length 2 | time 0.00003ms | length 2 | time 0.00004ms | length 2 | time 0.00006ms
length 3 | time 0.00003ms | length 3 | time 0.00005ms | length 3 | time 0.00011ms
length 4 | time 0.00005ms | length 4 | time 0.00006ms | length 4 | time 0.00017ms
length 6 | time 0.00008ms | length 6 | time 0.00010ms | length 6 | time 0.00036ms
length 9 | time 0.00013ms | length 9 | time 0.00015ms | length 9 | time 0.00069ms
length 13 | time 0.00022ms | length 13 | time 0.00028ms | length 13 | time 0.00135ms
length 19 | time 0.00037ms | length 19 | time 0.00040ms | length 19 | time 0.00245ms
length 28 | time 0.00072ms | length 28 | time 0.00060ms | length 28 | time 0.00490ms
length 42 | time 0.00127ms | length 42 | time 0.00096ms | length 42 | time 0.01018ms
length 63 | time 0.00173ms | length 63 | time 0.00179ms | length 63 | time 0.01052ms
length 94 | time 0.00280ms | length 94 | time 0.00280ms | length 94 | time 0.01522ms
length 141 | time 0.00458ms | length 141 | time 0.00479ms | length 141 | time 0.02376ms
length 211 | time 0.00731ms | length 211 | time 0.00763ms | length 211 | time 0.03648ms
length 316 | time 0.01310ms | length 316 | time 0.01436ms | length 316 | time 0.06333ms
length 474 | time 0.02116ms | length 474 | time 0.02158ms | length 474 | time 0.09121ms
length 711 | time 0.03250ms | length 711 | time 0.03387ms | length 711 | time 0.14341ms
length 1066 | time 0.05099ms | length 1066 | time 0.05305ms | length 1066 | time 0.21971ms
length 1599 | time 0.08040ms | length 1599 | time 0.08349ms | length 1599 | time 0.33692ms
length 2398 | time 0.12971ms | length 2398 | time 0.13084ms | length 2398 | time 0.51396ms
length 3597 | time 0.20300ms | length 3597 | time 0.20893ms | length 3597 | time 0.79176ms
length 5395 | time 0.32087ms | length 5395 | time 0.32491ms | length 5395 | time 1.30021ms
The latter is the one defined inside object and it's awful (about 4 times slower).
Update 1
I've run benchmark with and without scalac optimise option, and there are no noticeable differences (only slower compilation with optimise).

It's just one of many bugs in specialization--I'm not sure whether this one's been reported on the bug tracker or not. If you throw an exception from your sort, you'll see that it calls the generic version not the specialized version of the second sort:
java.lang.Exception: Boom!
at magicgoose.sorting.DualPivotQuicksortGenSingleton$.magicgoose$sorting$DualPivotQuicksortGenSingleton$$sort(DualPivotQuicksortGenSingleton.scala:33)
at magicgoose.sorting.DualPivotQuicksortGenSingleton$.sort$mFc$sp(DualPivotQuicksortGenSingleton.scala:13)
Note how the top thing on the stack is DualPivotQuicksortGenSingleton$$sort(...) instead of ...sort$mFc$sp(...)? Bad compiler, bad!
As a workaround, you can wrap your private methods inside a final helper object, e.g.
def sort[# spec T](a: Array[T]) { Helper.sort(a,0,a.length) }
private final object Helper {
def sort[#spec T](a: Array[T], i0: Int, i1: Int) { ... }
}
For whatever reason, the compiler then realizes that it ought to call the specialized variant. I haven't tested whether every specialized method that is called by another needs to be inside its own object; I'll leave that to you via the exception-throwing trick.

Related

Simulate Lag Function - Spark structured streaming

I'm using Spark Structured Streaming to analyze sensor data and need to perform calculations based on a sensors previous timestamp. My incoming data stream has three columns: sensor_id, timestamp, and temp. I need to add a fourth column that is that sensors previous timestamp so that I can then calculate the time between data points for each sensor.
This is easy using traditional batch processing using a lag function and grouping by sensor_id. What is the best way to approach this in a streaming situation?
So for example if my streaming dataframe looked like this:
+----------+-----------+------+
| SensorId | Timestamp | Temp |
+----------+-----------+------+
| 1800 | 34 | 23 |
| 500 | 36 | 54 |
| 1800 | 45 | 23 |
| 500 | 60 | 54 |
| 1800 | 78 | 23 |
+----------+-----------+------+
I would like something like this:
+----------+-----------+------+---------+
| SensorId | Timestamp | Temp | Prev_ts |
+----------+-----------+------+---------+
| 1800 | 34 | 23 | 21 |
| 500 | 36 | 54 | 27 |
| 1800 | 45 | 23 | 34 |
| 500 | 60 | 54 | 36 |
| 1800 | 78 | 23 | 45 |
+----------+-----------+------+---------+
If I try
test = filteredData.withColumn("prev_ts", lag("ts").over(Window.partitionBy("sensor_id").orderBy("ts")))
I get an AnalysisException: 'Non-time-based windows are not supported on streaming DataFrames/Datasets
Could I save the previous timestamp of each sensor in a data structure that I could reference and then update with each new timestamp?
There is no need to "simulate" anything. Standard window functions can be used with Structured Streaming.
s = spark.readStream.
...
load()
s.withColumn("prev_ts", lag("Temp").over(
Window.partitionBy("SensorId").orderBy("Timestamp")
)

SQL Database Design - Combinations of factors affecting a final value

The price of a service may depend on various combinations of factors. Example Tables:
ServiceType_Duration (combination of 2 factors)
+-------------+----------+--------+
| ServiceType | Duration | Amount |
+-------------+----------+--------+
| Massage | 30 | 30 |
| Massage | 60 | 50 |
| Reflexology | 30 | 50 |
| Reflexology | 60 | 70 |
+-------------+----------+--------+
DiscountCode (1 factor)
+--------------+--------+---------+
| DiscountCode | Amount | Percent |
+--------------+--------+---------+
| D1 | -10 | NULL |
| D2 | NULL | -10% |
+--------------+--------+---------+
In this very simple example, a 30 minute massage with discount code D1 would have total price 30 - 10 = 20.
However, there may be many more such tables of the general format:
Factor1
Factor2
...
FactorN
Amount (possibly)
Percent (possibly)
I'm not sure about having lots of 'pricing tables' when the general format is always going to be the same. It could mean having to add/remove the same column from lots of tables.
Are the tables above OK? If not, what's the best practice for storing this sort of data?

In postgresql, how do you find aggregate base on time range

For example, if I have a database table of transactions done over the counter. And I would like to search whether there was any time that was defined as extremely busy (Processed more than 10 transaction in the span of 10 minutes). How would I go about querying it? Could I aggregate based on time range and count the amount of transaction id within those ranges?
Adding example to clarify my input and desired output:
+----+--------------------+
| Id | register_timestamp |
+----+--------------------+
| 25 | 08:10:50 |
| 26 | 09:07:36 |
| 27 | 09:08:06 |
| 28 | 09:08:35 |
| 29 | 09:12:08 |
| 30 | 09:12:18 |
| 31 | 09:12:44 |
| 32 | 09:15:29 |
| 33 | 09:15:47 |
| 34 | 09:18:13 |
| 35 | 09:18:42 |
| 36 | 09:20:33 |
| 37 | 09:20:36 |
| 38 | 09:21:04 |
| 39 | 09:21:53 |
| 40 | 09:22:23 |
| 41 | 09:22:42 |
| 42 | 09:22:51 |
| 43 | 09:28:14 |
+----+--------------------+
Desired output would be something like:
+-------+----------+
| Count | Min |
+-------+----------+
| 1 | 08:10:50 |
| 3 | 09:07:36 |
| 7 | 09:12:08 |
| 8 | 09:20:33 |
+-------+----------+
How about this:
SELECT time,
FROM (
SELECT count(*) AS c, min(time) AS time
FROM transactions
GROUP BY floor(extract(epoch from time)/600);
)
WHERE c > 10;
This will find all ten minute intervals for which more than ten transactions occurred within that interval. It assumes that the table is called transactions and that it has a column called time where the timestamp is stored.
Thanks to redneb, I ended up with the following query:
SELECT count(*) AS c, min(register_timestamp) AS register_timestamp
FROM trak_participants_data
GROUP BY floor(extract(epoch from register_timestamp)/600)
order by register_timestamp
It works close enough for me to be able tell which time chunks are the most busiest for the counter.

Error in org-table-sum org-mode?

I am just getting started with Emacs org-mode and I am already getting really confused about a simple column sum (org-table-sum). I start with
| date | sum |
|------+-------|
| | 16.2 |
| | 6.16 |
| | 6.16 |
| | |
When I hit C-c + (org-table-sum) below the second column I get the correct sum 28.52. If I add another line to make it
| date | sum |
|------+-------|
| | 16.2 |
| | 6.16 |
| | 6.16 |
| | 13.11 |
| | |
C-c + gives me 41.629999999999995. ???
If I change the last line from 13.11to 13.12, C-c +will give me (the correct) 41.64.
WTF?
Any explanation appreciated! Thanks!
Most decimal numbers cannot be represented exactly in binary floating point encoding (either single or double precision).
Test 13.11 here, to see that after conversion to double precision, the nearest number represented is 13.109999656677246.
This problem is not emacs related, but is a fundamental issue when working with floating point representation in a different base (binary rather than decimal).
Using calc's vsum, the result is OK:
| date | sum |
|------+-------|
| | 16.2 |
| | 6.16 |
| | 6.16 |
| | 13.11 |
|------+-------|
| | 41.63 |
#+TBLFM: #6$2=vsum(#I..#II)
This works because calc works with arbitrary precision and will not encode the numbers in a binary floating point format.

Scala find missing values in a range

For a given range, for instance
val range = (1 to 5).toArray
val ready = Array(2,4)
the missing values (not ready) are
val missing = range.toSet diff ready.toSet
Set(5, 1, 3)
The real use case includes thousands of range instances with (possibly) thousands of missing or not ready values. Is there a more time-efficient approach in Scala?
The diff operation is implemented in Scala as a foldLeft over the left operand where each element of the right operand is removed from the left collection. Let's assume that the left and right operand have m and n elements, respectively.
Calling toSet on an Array or Range object will return a HashTrieSet, which is a HashSet implementation and, thus, offers a remove operation with complexity of almost O(1). Thus, the overall complexity for the diff operation is O(m).
Considering now a different approach, we'll see that this is actually quite good. One could also solve the problem by sorting both ranges and then traversing them once in a merge-sort fashion to eliminate all elements which occur in both ranges. This will give you a complexity of O(max(m, n) * log(max(m, n))), because you have to sort both ranges.
Update
I ran some experiments to investigate whether you can speed up your computation by using mutable hash sets instead of immutable. The result as shown in the tables below is that it depends on the size ration of range and ready.
It seems as if using immutable hash sets is more efficient if the ready.size/range.size < 0.2. Above this ratio, the mutable hash sets outperform the immutable hash sets.
For my experiments, I set range = (1 to n), with n being the number of elements in range. For ready I selected a random sub set of range with the respective number of elements. I repeated each run 20 times and summed up the times calculated with System.currentTimeMillis().
range.size == 100000
+-----------+-----------+---------+
| Fraction | Immutable | Mutable |
+-----------+-----------+---------+
| 0.01 | 28 | 111 |
| 0.02 | 23 | 124 |
| 0.05 | 39 | 115 |
| 0.1 | 113 | 129 |
| 0.2 | 174 | 140 |
| 0.5 | 472 | 200 |
| 0.75 | 722 | 203 |
| 0.9 | 786 | 202 |
| 1.0 | 743 | 212 |
+-----------+-----------+---------+
range.size == 500000
+-----------+-----------+---------+
| Fraction | Immutable | Mutable |
+-----------+-----------+---------+
| 0.01 | 73 | 717 |
| 0.02 | 140 | 771 |
| 0.05 | 328 | 722 |
| 0.1 | 538 | 706 |
| 0.2 | 1053 | 836 |
| 0.5 | 2543 | 1149 |
| 0.75 | 3539 | 1260 |
| 0.9 | 4171 | 1305 |
| 1.0 | 4403 | 1522 |
+-----------+-----------+---------+