Does it make sense to define a class for Complex numbers, where real/imaginary parts use Numeric[T] instead of a concrete type? - scala

Would something like
class Complex[T: Numeric](real: T, imag: T)
make sense, instead of writing a Complex class using Doubles, one using Longs, one using BigInts, so that everyone can choose the number type he needs?
How would performance compare to the non-generic approach?

For the moment, Numeric is not #specialized. So the generic version using it will suffer from boxing and unboxing and the performances will be greatly reduced. Here is a nice blog post with performance measurments:
http://www.azavea.com/blogs/labs/2011/06/scalas-numeric-type-class-pt-2/
However, you could directly write a #specialized version of your Complex number class without using Numeric and get all the benefits.
On a strictly pragmatic point of view, I am not sure to understand what's the usage of a complex number with integer parts...

Related

Why I need a new primitive?

I am learning functional programming in Scala and reading the book FPiS.
One the page 131 about API design(algebra), the author mentioned following:
As we discussed in chapter 7, we’re interested in understanding what
operations are primitive and what operations are derived, and in
finding a small yet expressive set of primitives. A good way to
explore what is expressible with a given set of primitives is to pick
some concrete examples you’d like to express, and see if you can
assemble the functionality you want. As you do so, look for patterns,
try factoring out these patterns into combinators, and refine your set
of primitives. We encourage you to stop reading here and simply play
with the primitives and combinators we’ve written so far. If you want
some concrete examples to inspire you, here are some ideas:
What does he mean with expressive primitives? What is a primitive?
(The chapter 7 does not explain about primitive at all.)
I am imaging, a primitive is a smallest thing of something that can be build with combinator to get a higher level thing.
The question from the book:
If we can generate a single Int in some range, do we need a new primitive to
generate an (Int,Int) pair in some range?
What is the answer of the question?
Is Int a primitive?
What is a primitive function?
What is a primitive?
I am imaging, a primitive is a smallest thing of something that can be build with combinator to get a higher level thing.
Right. So in this case: you are developing an API which has type Gen[A] denoting "a generator of values of type A". Combinators in this API will be methods which operate on Gens and produce Gens. However, you also need something to apply those combinators to: methods which produce Gens without starting with one.
One of these may be range(min: Int, max: Int): Gen[Int]. If you now need a pairInRange(min: Int, max: Int): Gen[(Int, Int)], do you need to implement it in the same way range is implemented, or can you build it from range using combinators? And if you can, what combinators do you need? In this case, you should conclude that
def pairInRange(min: Int, max: Int) = pair(range(min, max), range(min, max))
is a reasonable implementation, assuming there is a pair combinator (think what its type should be), so pairInRange doesn't need to be a primitive: it's a derived operation. Primitive operations are those which aren't derived.
You could equally have pairInRange as primitive and def pair(min: Int, max: Int) = pairInRange(min, max).map(_._1) as derived; this is just an unnatural way to design it!
(The chapter 7 does not explain about primitive at all.)
Now that I've got home and have access to the book: yes, it does. If you search for "primitive" in the book (assuming you have the electronic version), you'll find this quote in 7.1.3 Explicit forking:
The function lazyUnit is a simple example of a derived combinator, as opposed to a
primitive combinator like unit. We were able to define lazyUnit just in terms of other
operations. Later, when we pick a representation for Par, lazyUnit won’t need to
know anything about this representation—its only knowledge of Par is through the
operations fork and unit that are defined on Par.
Note in particular the last sentence, which makes a point I didn't emphasize above. You should also read what follows: there are more explanations in 7.3.
What does he mean with expressive primitives?
A primitive is not expressive, it's a set of primitives (including combinators) that is expressive. That is, it should allow you to express not just range and pairInRange but all other generators you want.
Is Int a primitive?
No. This is a meaning of "primitive" unrelated to "primitive types" (Int, Double, etc.)

Why didn't scala design around Integer Overflow?

I am a former Java developer and I have recently watched the insightful and entertaining introduction to Scala for Java developers by professor Venkat Subramaniam (https://www.youtube.com/watch?v=LH75sJAR0hc).
A major point introduced is the elimination of declared types in lieu of "type inference". Presumably, this means the higher-order compiler recognizes the type I intend to use, by the context.
Being an application security expert by trade, the first thing I tried to do is break this type inference... Example:
// declare a function that returns the square of an input Int. The return type is to be inferred.
scala> val square = (x:Int) => x*x
square: Int => Int = <function1>
// I can see the compiler inferred an Int for the output value, which I do not agree with.
scala> square(2147483647)
res1: Int = 1
// integer overflow
My question is why did the compiler not see that "*" is an operator with a threat of overflow, and wrap the inputs in something a little more protective like a BigInteger?
According to the professor, I am supposed to forget about the internal implementation and just get on with my business logic. But after my quick demonstration I'm not so sure that Scala is safe for a programmer who doesn't understand what the compiler is doing with my methods.
I think #rightføld somewhat overstates how often overflows do or don't happen (particularly when considering an attacker who is actively trying to overflow you). But I agree with his basic point. Converting all math to BigInteger would almost certainly have created a massive performance impact over Java. For developers to choose such a language, they'd have to get something visible for that cost.
String objects have a much smaller performance overhead over cstrings for many operations. They also provide very visible benefits to the developer, which is why people use them, not security per se. There are many common things that string objects make easy to do over cstrings. BigInteger provides none of that. It requires exactly the same code at a fraction of the speed, but just won't overflow (a bug few developers see day to day, even if security guys see it more often).
The equivalent would have been a cstring (with strcmp, strcpy, strcat, etc.) that ran at a fraction of the speed, but just didn't require a null terminator. I don't think many people would have jumped to use that, either, no matter how much that would help security over null-terminated strings. And if the language required it, I don't see a lot of people anxious to use the language.
And as #rightføld suggests in the comments, interoperability with Java would be trashed, since most if not all numbers would wind up being BigInteger. You'd constantly be converting, which raises the same dangers of overflows while adding a lot of code complexity (and more performance impacts).
A from-scratch language might get away with ubiquitous BigInteger (like python) if the language had a lot of other compelling features, but it's a very hard thing to retrofit into a language that wants to be a natural transition from (and with) Java.
In addition to the above answers, I think this question misunderstands the purpose of type inference in a statically typed language. Type inference does not make the choices that you are referring to - promoting a Int to a BigInt. It is restricted to simply "inferring" the type of an expression based the the known types of subexpressions at compile time.
The * function in Int returns an Int when supplied with an Int input parameter
def *(x: Int): Int
In this case, since x is declared to be an Int, then x*x must be an Int based on the signature of *.
If we really wanted this behavior, we could define a function that promotes Int to BigInt when multiplying.
implicit class SafeInt(x: Int) {
def safeMult(a: Int): scala.math.BigInt = scala.math.BigInt(x)*a
}
Then when we can define a square with the desired property:
scala> val square = (x: Int) => x safeMult x
square: Int => scala.math.BigInt = <function1>
The compiler infers based on the methods available. Int has a method *(Int): Int that is, as far as the compiler knows, perfectly well defined; 2147483647*2147483647 is a perfectly good method call with the result 1, it doesn't throw ClassCastException or anything like that.
Why is the Int type written this way? Largely for Java/JVM compatibility; many parts of Scala have design compromises for the sake of Java compatibility. If you don't need that functionality, you might prefer to use Haskell or a similar language. (I suspect that even without the requirement for JVM compatibility, Scala would have wanted to expose the machine-native integer types so that users could make that performance/correctness tradeoff where desired. They might not have been the default though)
If you're doing numeric computation in Scala you probably want to use the Spire library, which makes it easy to abstract over numeric types, and provides several high-performance numeric types with particular properties. In particular it has a SafeLong type that handles arbitrary-precision integers but with much better performance than BigInt for values which fall within the Long range, similar to Python's integer type.
Because overflow occurs almost never in practice, and BigInteger is slow as a dog compared to Int. It is also most inconvenient to have all * operations on Ints return BigIntegers.
"Recognizes the type I intend to use" is not an accurate description of what scala tries to do. It infers the most generic type possible given the constraints imposed by the context. Hence if you write List(Nil, "1"), you'll get List[Serializable], because Serializable is an interface that List and String share - disregarding that Serializable was probably not on your mind at all.
The question you're asking could be asked more precisely as "why is Int the type of numeric literals instead of BigInteger?" - inference doesn't have much to do with it.
And we can opine all we want on that topic, but there's one most accurate answer describing why Scala is what it is: "because Java".
If you wanted the type of safety that you seem to want, then one approach is to define via a partial function which guards against numeric overflow and then returns either an Option[Int] or even perhaps an Either[Int, BigInteger].
The type inference for your square function is correct - given that it's inferred from the input types you've specified and the type of the * function...it's not really broken in my opinion.

Memory overhead of Case classes in scala

What is the memory overhead of a case class in scala ?
I've implemented some code to hold a lexicon with multiple types of interned tokens for NLP processing. I've got a case class for each token type.
For example, the canonical lemma/stem token is as follows:
sealed trait InternedLexAtom extends LexAtom{
def id : Int
}
case class Lemma(id: Int) extends InternedLexAtom
I'm going to be returning document vectors of these interned tokens, the reason I wrap them in case classes is to be able to add methods to the tokens via implicit classes. The reason I use this way of adding behaviour to the lexeme's is because I want the lexemes to have different methods based on different contexts.
So I'm hoping the answer will be zero memory overhead due to type erasure. Is this the case ?
I have a suspicion that a single pointer might be packed with the parameters for some of the magic Scala can do :(
justification
To put things in perspective. The JVM uses 1.5-2gigs of memory with my lexicon loaded (the lexicon does not use cases classes in it's in-memory representation), and C++ does the same in 500-700 mb of memory. If my codebase keeps scaling it's memory requirements the way it is now I'm not going to be able to do this stuff on my laptop (in-memory)
I'll sidestep the problem by structuring my code differently. For example I can just strip away the case classes in vector representations if I need to. Would be nice if I didn't have to.
Question Extension.
Robin and Pedro have addressed the use-case, thank you. In this case I was missing value classes. With those there are no more downsides. additionally: I tried my best not to mention C++'s POD concept. But now I must ask :D A c++ POD is just a struct with primitive values. If I wanted to pack more than just one value into value class, how would I achieve this ? I am assuming this would be what I want to do ?
class SuperTriple(val underlying: Tuple2[Int,Int]) extends AnyVal {
def super: underlying._1
def triple: underlying._2
}
I do actually need the above construct, since a SuperTriple is what I am using as my vector model symbol :D
The original question still remains "what is the overhead of a case class".
In Scala 2.10 you can use value classes. (In older versions of Scala, for something with zero overhead for just one member, you need to use unboxed tagged types.)

Everything's an object in Scala

I am new to Scala and heard a lot that everything is an object in Scala. What I don't get is what's the advantage of "everything's an object"? What are things that I cannot do if everything is not an object? Examples are welcome. Thanks
The advantage of having "everything" be an object is that you have far fewer cases where abstraction breaks.
For example, methods are not objects in Java. So if I have two strings, I can
String s1 = "one";
String s2 = "two";
static String caps(String s) { return s.toUpperCase(); }
caps(s1); // Works
caps(s2); // Also works
So we have abstracted away string identity in our operation of making something upper case. But what if we want to abstract away the identity of the operation--that is, we do something to a String that gives back another String but we want to abstract away what the details are? Now we're stuck, because methods aren't objects in Java.
In Scala, methods can be converted to functions, which are objects. For instance:
def stringop(s: String, f: String => String) = if (s.length > 0) f(s) else s
stringop(s1, _.toUpperCase)
stringop(s2, _.toLowerCase)
Now we have abstracted the idea of performing some string transformation on nonempty strings.
And we can make lists of the operations and such and pass them around, if that's what we need to do.
There are other less essential cases (object vs. class, primitive vs. not, value classes, etc.), but the big one is collapsing the distinction between method and object so that passing around and abstracting over functionality is just as easy as passing around and abstracting over data.
The advantage is that you don't have different operators that follow different rules within your language. For example, in Java to perform operations involving objects, you use the dot name technique of calling the code (static objects still use the dot name technique, but sometimes the this object or the static object is inferred) while built-in items (not objects) use a different method, that of built-in operator manipulation.
Number one = Integer.valueOf(1);
Number two = Integer.valueOf(2);
Number three = one.plus(two); // if only such methods existed.
int one = 1;
int two = 2;
int three = one + two;
the main differences is that the dot name technique is subject to polymorphisim, operator overloading, method hiding, and all the good stuff that you can do with Java objects. The + technique is predefined and completely not flexible.
Scala circumvents the inflexibility of the + method by basically handling it as a dot name operator, and defining a strong one-to-one mapping of such operators to object methods. Hence, in Scala everything is an object means that everything is an object, so the operation
5 + 7
results in two objects being created (a 5 object and a 7 object) the plus method of the 5 object being called with the parameter 7 (if my scala memory serves me correctly) and a "12" object being returned as the value of the 5 + 7 operation.
This everything is an object has a lot of benefits in a functional programming environment, for example, blocks of code now are object too, making it possible to pass back and forth blocks of code (without names) as parameters, yet still be bound to strict type checking (the block of code only returns Long or a subclass of String or whatever).
When it boils down to it, it makes some kinds of solutions very easy to implement, and often the inefficiencies are mitigated by the lack of need to handle "move into primitives, manipulate, move out of primitives" marshalling code.
One specific advantage that comes to my mind (since you asked for examples) is what in Java are primitive types (int, boolean ...) , in Scala are objects that you can add functionality to with implicit conversions. For example, if you want to add a toRoman method to ints, you could write an implicit class like:
implicit class RomanInt(i:Int){
def toRoman = //some algorithm to convert i to a Roman representation
}
Then, you could call this method from any Int literal like :
val romanFive = 5.toRoman // V
This way you can 'pimp' basic types to adapt them to your needs
In addition to the points made by others, I always emphasize that the uniform treatment of all values in Scala is in part an illusion. For the most part it is a very welcome illusion. And Scala is very smart to use real JVM primitives as much as possible and to perform automatic transformations (usually referred to as boxing and unboxing) only as much as necessary.
However, if the dynamic pattern of application of automatic boxing and unboxing is very high, there can be undesirable costs (both memory and CPU) associated with it. This can be partially mitigated with the use of specialization, which creates special versions of generic classes when particular type parameters are of (programmer-specified) primitive types. This avoids boxing and unboxing but comes at the cost of more .class files in your running application.
Not everything is an object in Scala, though more things are objects in Scala than their analogues in Java.
The advantage of objects is that they're bags of state which also have some behavior coupled with them. With the addition of polymorphism, objects give you ways of changing the implicit behavior and state. Enough with the poetry, let's go into some examples.
The if statement is not an object, in either scala or java. If it were, you could be able to subclass it, inject another dependency in its place, and use it to do stuff like logging to a file any time your code makes use of the if statement. Wouldn't that be magical? It would in some cases help you debug stuff, and in other cases it would make your hairs grow white before you found a bug caused by someone overwriting the behavior of if.
Visiting an objectless, statementful world: Imaging your favorite OOP programming language. Think of the standard library it provides. There's plenty of classes there, right? They offer ways for customization, right? They take parameters that are other objects, they create other objects. You can customize all of these. You have polymorphism. Now imagine that all the standard library was simply keywords. You wouldn't be able to customize nearly as much, because you can't overwrite keywords. You'd be stuck with whatever cases the language designers decided to implement, and you'd be helpless in customizing anything there. Such languages exist, you know them well, they're the sequel-like languages. You can barely create functions there, but in order to customize the behavior of the SELECT statement, new versions of the language had to appear which included the features most desired. This would be an extreme world, where you'd only be able to program by asking the language designers for new features (which you might not get, because someone else more important would require some feature incompatible with what you want)
In conclusion, NOT everything is an object in scala: Classes, expressions, keywords and packages surely aren't. More things however are, like functions.
What's IMHO a nice rule of thumb is that more objects equals more flexibility
P.S. in Python for example, even more things are objects (like the classes themselves, the analogous concept for packages (that is python modules and packages). You'd see how there, black magic is easier to do, and that brings both good and bad consequences.

Is it possible to implement F#'s infrastructure for Units of Measurement in Scala?

F# ships with special support for a unit of measurement system, which provides static type safety while compiling down to the numeric types instead of burdening the runtime with wrapping/unwrapping operations.
Is it possible to use some of Scala's type system magic to implement something comparable to that?
The answer is no.
Now, someone is bound to point me to Scalar, but that gives runtime checking. Perhaps, then, point to the efforts of Jesper Nordenberg's type-safe units or Jim McBeath's take on it, but these are cumbersome and awkward.
I'll point, instead to the Units compiler plugin. It gave Scala, back in 2008/2009, a pretty good system of units, as can be seen in this post. It did so, however, by extending the compiler, which would not be necessary if the type system was enough. Alas, it has not been maintained and it doesn't work anymore.
I don't know anything about it, but I just stumbled accross this talk at Scala Days: https://wiki.scala-lang.org/display/SW/ScalaDays+2011+Resources#ScalaDays2011Resources-ScalaUImplementingaScalalibraryforUnitsofMeasure
Kind of. You can encode the SI units quite easily using a type representation of integers in a tuple of exponents. See http://svn.assembla.com/svn/metascala/src/metascala/Units.scala for an example implementation.
It should also be possible to support an extensible units system if the units are encoded as a TList of pairs of a unit type and an integer (for example, ((M, _1), (S, _2)) where M <: Unit and S <: Unit). Calculating the types for quantity operations becomes a bit more complicated in this encoding.
Regarding performance there will always be a memory overhead for wrapping the value in a type containing the unit information. However there is probably no performance overhead in the actual operations as all unit checking is done at compile time.
Have a look at Units of Measure - A Scala Macro System. It seems to satisfy your requirements.