Perfect hash in Scala

Perfect hash in Scala - scala

I have some class C:
class C (...) { ... }
I want to use it to index an efficient map. The most efficient map is an Array.
So I add a "global" "static" counter in companion object to give each object unique id:
object C {
var id_counter = 0
}
In primary constructor of C, with each creation of C I want to
remember global counter value and increase it.
Question 1: How to do it?
Now I can use id in C objects as perfect hash to index array.
But array does not preserve type information like map would, that a given array is indexed by C's id.
Question 2: Is it possible to have it with type safety?
Update:
Type safety in question 2 concerns type of index of map, to avoid mixing two not related ints.
The value of course is (type) safe..
Question 1 asks how to increment a variable in default contructor?
Ie: Where to put?
id_counter += 1

Answer to your question 2:
case class C_Id(val asInt: Int)
object C {
private var list: ArrayBuffer[C]
// resizable array in scala.collection.mutable
// you can also use ArrayList
def apply(id: C_Id) = list(id.asInt) // only accepts an id of C
...
}
class C (...) {
// in constructor:
list += this
}
To edited question 1: The default constructor is just the body of the type, except the definitions of methods and other constructors.

I don't see the problem. I would probably make the counter private so code outside class and object C cannot alter it. Incrementing a var of type Int is trivial:
idCounter += 1
Arrays are type-safe in Scala, since they are implemented directly by JVM arrays (starting in 2.8).
I suspect I've not really understood your questions...
Update:
Increment the counter in the constructor, presumably.
As for creating an actual perfect hash function, I don't think you're really on the right track. (You've just pushed the mapping from whatever your actual keys are into your own code.) You should read up on techniques for creating minimal and / or perfect hash functions.

Could you make the default constructor of C private, and provide a factory method in the companion object (which could easily handle updating the counter)?

Related

Why this map function does not give traits' simple names

I try to get names of all trait a class extends using getInterfaces which returns an array of trait's names. When I manually access each member of the array, the method getName returns simple names like this
trait A
trait B
class C() extends A, B
val c = C()
val arr = c.getClass.getInterfaces
arr(0).getName // : String = A
arr(1).getName // : String = B
However, when I use map function on arr. The resulting array contains a cryptic version of trait's names
arr.map(t => t.getName) // : Array[String] = Array(repl$.rs$line$1$A, repl$.rs$line$2$B)
The goal of this question is not about how to get the resulting array that contains simple names (for that purpose, I can just use arr.map(t => t.getSimpleName).) What I'm curious about is that why accessing array manually and using a map do not yield a compatible result. Am I wrong to think that both ways are equivalent?

I believe you run things in Scala REPL or Ammonite.
When you define:
trait A
trait B
class C() extends A, B
classes A, B and C aren't defined in top level of root package. REPL creates some isolated environment, compiles the code and loads the results into some inner "anonymous" namespace.
Except this is not true. Where this bytecode was created is reflected in class name. So apparently there was something similar (not necessarily identical) to
// repl$ suggest object
object repl {
// .rs sound like nested object(?)
object rs {
// $line sounds like nested class
class line { /* ... */ }
// $line$1 sounds like the first anonymous instance of line
new line { trait A }
// import from `above
// $line$2 sounds like the second anonymous instance of line
new line { trait B }
// import from above
//...
}
}
which was made because of how scoping works in REPL: new line creates a new scope with previous definitions seen and new added (possibly overshadowing some old definition). This could be achieved by creating a new piece of code as code of new anonymous class, compiling it, reading into classpath, instantiating and importing its content. Byt putting each new line into separate class REPL is able to compile and run things in steps, without waiting for you to tell it that the script is completed and closed.
When you are accessing class names with runtime reflection you are seeing the artifacts of how things are being evaluated. One path might go trough REPLs prettifiers which hide such things, while the other bypass them so you see the raw value as JVM sees it.

The problem is not with map rather with Array, especially its toString method (which is one among the many reasons for not using Array).
Actually, in this case it is even worse since the REPL does some weird things to try to pretty-print Arrays which in this case didn't work well (and, IMHO, just add to the confusion)
You can fix this problem calling mkString directly like:
val arr = c.getClass.getInterfaces
val result = arr.map(t => t.getName)
val text = result.mkString("[", ", ", "]")
println(text)
However, I would rather suggest just not using Array at all, instead convert it to a proper collection (e.g. List) as soon as possible like:
val interfaces = c.getClass.getInterfaces.toList
interfaces .map(t => t.getName)
Note: About the other reasons for not using Arrays
They are mutable.
Thet are invariant.
They are not part of the collections hierarchy thus you can't use them on generic methods (well, you actually can but that requires more tricks).
Their equals is by reference instead of by value.

Why is an array of a value classes compiled to an array of objects?

As I understand, if you create an array of a value class, you're actually creating an array of objects rather than the wrapped primitive. What's the reason behind this?
source:
class Wrapper(val underlying: Int) extends AnyVal
class Main {
val i: Int = 1
val w: Wrapper = new Wrapper(1)
val wrappers = Array[Wrapper](new Wrapper(1), new Wrapper(2))
val ints = Array[Int](1, 2)
}
javap output:
public class Main {
public int i();
public int w();
public Wrapper[] wrappers(); // <----why can't this be int[] as well
public int[] ints();
public Main();
}

One of the constraints of value classes is that x.isInstanceOf[ValueClass] should still work correctly. Where correctly means: transparently, without the programmer having to be aware when values may or may not be boxed.
If an Array[Meter] would be represented as an Array[Int] at runtime the following code would not work as expected, because the information that the ints in the array are actually meters is lost.
class Meter(val value: Int) extends AnyVal
def centimeters[A](as: Array[A]) = as.collect{ case m: Meter => m.value * 100 }
Note that if you have val m = new Meter(42); m.isInstanceOf[Meter] then the compiler knows that m is a Meter even though it's an Int at runtime and he can inline the isInstanceOf call to true.
Also note that this wouldn't work for arrays. If you would box the values in the array on demand you'd have to create a new array, which wouldn't be transparent to the programmer because arrays are mutable and use reference equality. It would also be a disaster for performance with large arrays.

According to https://docs.scala-lang.org/overviews/core/value-classes.html:
Allocation Summary
A value class is actually instantiated when:
a value class is treated as another type.
a value class is assigned to an array.
doing runtime type tests, such as pattern matching.
[...]
Another situation where an allocation is necessary is when assigning
to an array, even if it is an array of that value class. For example,
val m = Meter(5.0)
val array = Array[Meter](m)
The array here contains actual Meter instances and not just the underlying double primitives.
This has probably something to do with type erasure, or simply because you're creating an array of a specific type, which doesn't allow one type to be treated as another. In any case, it's a technical limitation.

OPAL: null-value in operandsArray

I am currently developing a static analysis of Java code using the OPAL framework.
I want to analyze the following Java method:
private void indirectCaller2b(double d, Object o1, Object o2) {
indirectCaller1(d, o1, o2);
}
I know, that indirectCaller2b is only called with the parameters (double, ArrayList, LinkedList).
Having this in mind, I constructed an IndexedSeq of DomainValues, which I pass to the perform-method ob BaseAI.
It looks like this:
Vector({ai.native_methods_parameter_type_approximation.PublicClass, null}[#0;t=101], ADoubleValue, {_ <: java.util.ArrayList, null}[#-4;t=102], {_ <: java.util.LinkedList, null}[#-5;t=103])
The this-parameter ({ai.native_methods_parameter_type_approximation.PublicClass, null}[#0;t=101]) was created with the following code:
domain.TypedValue(0, project.classFile(caller).thisType)
The other domain values were created using the parameterToValueIndex method:
org.opalj.ai.parameterToValueIndex(caller.isStatic, caller.descriptor, index), t)
Here, caller stands for the method indirectCaller2b and t is the known runtime type of the parameter (ArrayList for parameter index 1 and LinkedList for parameter index 2).
When I now perform the abstract interpretation of the method with
BaseAI.perform(classFile, caller, domain)(Some(parameters))
and print the stack index at the program counter, where the call of indirectCaller1 happens with the following code,
for (i <- 0 to analysisResult.operandsArray(pc).size - 1) {
println(s"stack index $i: ${analysisResult.operandsArray(pc)(i)}")
}
I get the following output:
stack index 0: null
stack index 1: {_ <: java.util.LinkedList, null}[#-5;t=103]
stack index 2: ADoubleValue
stack index 3: {ai.native_methods_parameter_type_approximation.PublicClass, null}[#0;t=101]
This is a bit confusing, since I just pass the arguments of indirectCaller2b to indirectCaller1. Therefore, the output should be the same as the IndexedSeq is passed to the perform method.
But in the output, parameter after the double parameter is LinkedList instead of ArrayList. The ArrayList parameter somehow disappeared, and the last parameter on the operandStack is "null".
Can anyone explain me, how this can happen?

Representation of "this"
To get the correct representation for the "this" reference you should use the method
InitializedObjectValue(
origin: ValueOrigin,
objectType: ObjectType ): DomainReferenceValue
to create a representation of the this value. The difference is that in this case the AI will try to use the information that (a) the value is guaranteed to be non-null and is also guaranteed to be initialized. In particular the former property is often interesting and generally leads to more precise results.
Initializing Locals
The function: org.opalj.ai.parameterToValueIndex only calculates the logical origin information (the "pc" that is associated with the value to make it possible to identify the respective values as parameters later on).
To correctly map operands to locals you can either use the method mapOperandsToParameters or you just add all values to an IndexedSeq but add another null value for computational type category 2 values.

Mutable Sorted Set in Scala doesn't work I expect (maybe I'm missing something)

Let's say you have the following:
case class Foo(x: Int, y: Int) extends Ordered[Foo] {
override def compare(that: Foo): Int = x compareTo that.x
}
val mutableSet: scala.collection.mutable.SortedSet[Foo] =
scala.collection.mutable.SortedSet(Foo(1, 2), Foo(1,3))
I expect the result of mutableSet.size to be 2. Why, Foo(1,2) and Foo(1,3) are not equal but they have the same ordering. So the sorted set should be (IMO) Foo(1,2), Foo(1,3). As this is the order they were created in (even the other way around would be fine, counter intuitive but fine).
However, the result of mutableSet.size is 1 and it saves the last value i.e. Foo(1,3).
What am I missing?

The behavior is similar to the Java SortedSet collections. SortedSet uses compareTo in order to define equality, thus it eliminates the same Foo case classes from your example.
In Scala 2.11 it uses scala.collection.TreeSet for the implementation SortedSet implementation. The best way to figure out this is to put breakpoint into your compareTo method.
TreeSet is implemented using AVL Tree data structure, you can inspect the behavior by looking into the AVLTree.scala insert method of the Node class. It compare compareTo result with 0 in order to figure out does it duplicated element in the collection.

You have overridden compare so that only the first field is used in comparison. The Set uses this compare function not just to sort the items but also to determine if the items are equal for the purpose of storing them in the Set.

Can I dynamically construct a trait in Scala that is computed based on the class it extends?

I want to accomplish something a little different from standard mixins. I want to construct a trait whose new fields are computed based on the fields of the class (or trait) it extends.
For instance, if I had a class like this:
class Point {
var x: Int = 0
var y: Int = 0
}
and I wanted to produce a class like this:
class Point' {
var x: Int = 0
var y: Int = 0
var myx: Int = 0
var myy: Int = 0
}
I'd like to be able to write a function that computes the field names myx and myy and then mixes them into the class using a trait. Here's some made up psuedo-Scala for what I want to do:
def addMy(cls: Class) {
newFields = cls.fields.map( f => createField("my" + f.name, f.type) )
myTrait = createTrait(newFields)
extendClass(cls, myTrait)
}

The easiest way I can think of to achieve such a behavior would be to use Scala 2.10's Dynamic feature. Thus, Point must extend Dynamic and you can "mix in" arbitrary fields by adding the calculation logic to the member functions selectDynamic and updateDynamic. This is not quite what "mix in" actually refers to, but it nevertheless allows you to add functionality dynamically.
To ensure that it is possible to check whether a specific functionality has been mixed in, you can for instance use the following convention. In order to check whether a field is mixed in, a user can call hasMyx or has... in general. By default your implementation returns false. Each time you mix in a new field the corresponding has... function would return true.
Please note that the return type of all your mixed in fields will be Any (unless all your fields are of the same type, see for instance this question). I currently see no way to avoid a lot of casting with this design.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Perfect hash in Scala - scala

Could you make the default constructor of C private, and provide a factory method in the companion object (which could easily handle updating the counter)?

Related

Why this map function does not give traits' simple names

Why is an array of a value classes compiled to an array of objects?

OPAL: null-value in operandsArray

Mutable Sorted Set in Scala doesn't work I expect (maybe I'm missing something)

Can I dynamically construct a trait in Scala that is computed based on the class it extends?

Categories

Resources