Scala constraint based types and literals - scala

I was thinking whether it would be possible in Scala to define a type like NegativeNumber. This type would represent a negative number and it would be checked by the compiler similarly to Ints, Strings etc.
val x: NegativeNumber = -34
val y: NegativeNumber = 34 // should not compile
Likewise:
val s: ContainsHello = "hello world"
val s: ContainsHello = "foo bar" // this should not compile either
I could use these types just like other types, eg:
def myFunc(x: ContainsHello): Unit = println(s"$x contains hello")
These constrained types could be backed by casual types (Int, String).
Is it possible to implement these types (maybe with macros)?
How about custom literals?
val neg = -34n //neg is of type NegativeNumber because of the suffix
val pos = 34n // compile error

Unfortunately, no this isn't something you could easily check at compile time. Well - at least not if you aren't restricting the operations on your type. If your goal is simply to check that a number literal is non-zero, you could easily write a macro that checks this property. However, I do not see any benefit in proving that a negative literal is indeed negative.
The problem isn't a limitation of Scala - which has a very strong type system - but the fact that (in a reasonably complex program) you can't statically know every possible state. You can however try to overapproximate the set of all possible states.
Let us consider the example of introducing a type NegativeNumber that only ever represents a negative number. For simplicity, we define only one operation: plus.
Say you would only allow addition of multiple NegativeNumber, then, the type system could be used to guarantee that each NegativeNumber is indeed a negative number. But this seems really restrictive, so a useful example would certainly allow us to add at least a NegativeNumber and a general Int.
What if you had an expression val z: NegativeNumber = plus(x, y) where you don't know the value of x and y statically (maybe they are returned by a function). How do you know (statically) that z is indead a negative number?
An approach to solve the problem would be to introduce Abstract Interpretation which must be run on a representation of your program (Source Code, Abstract Syntax Tree, ...).
For example, you could define a Lattice on the numbers with the following elements:
Top: all numbers
+: all positive numbers
0: the number 0
-: all negative numbers
Bottom: not a number - only introduced that each pair of elements has a greatest lower bound
with the ordering Top > (+, 0, -) > Bottom.
Then you'd need to define semantics for your operations. Taking the commutative method plus from our example:
plus(Bottom, something) is always Bottom, as you cannot calculate something using invalid numbers
plus(Top, x), x != Bottom is always Top, because adding an arbitrary number to any number is always an arbitrary number
plus(+, +) is +, because adding two positive numbers will always yield a positive number
plus(-, -) is -, because adding two negative numbers will always yield a negative number
plus(0, x), x != Bottom is x, because 0 is the identity of the addition.
The problem is that
plus - + will be Top, because you don't know if it's a positive or negative number.
So to be statically safe, you'd have to take the conservative approach and disallow such an operation.
There are more sophisticated numerical domains but ultimately, they all suffer from the same problem: They represent an overapproximation to the actual program state.
I'd say the problem is similar to integer overflow/underflow: Generally, you don't know statically whether an operation exhibits an overflow - you only know this at runtime.

It could be possible if SIP-23 was implemented, using implicit parameters as a form of refinement types. However, it would be of questionable value though as the Scala compiler and type system is not really not well equipped for proving interesting things about for example integers. For that it would be much nicer to use a language with dependent types (Idris etc.) or refinement types checked by an SMT solver (LiquidHaskell etc.).

Related

Conversion und comparison between Long and Double in Scala

I'm playing around with Scala AnyVal Types and having trouble to unterstand the following: I convert Long.MaxValue to Double and back to Long. As Long (64bit) can hold more digits than Double's mantissa (52 bits), I expected that this conversion will make some data be lost, but somehow this is not the case:
Long.MaxValue.equals(Long.MaxValue.toDouble.toLong) // is true
I thought there is maybe some magic/optimisation in Long.MaxValue.toDouble.toLong such that the conversion is not really happening. So I also tried:
Long.MaxValue.equals("9.223372036854776E18".toDouble.toLong) // is true
If I evaluate the expression "9.223372036854776E18".toDouble.toLong, this gives:
9223372036854775807
This really freaks me out, the last 4 digits seem just to pop up from nowhere!
First of all: as usual with questions like this, there is nothing special about Scala, all modern languages (that I know of) use IEEE 754 floating point (at least in practice, if the language specification doesn't require it) and will behave the same, just with different type and operation names.
Yes, data is lost. If you try e.g. (Long.MaxValue - 1).toDouble.toLong, you'll still get Long.MaxValue back. You can find the next smallest Long you can get from someDouble.toLong as follows:
scala> Math.nextDown(Long.MaxValue.toDouble).toLong
res0: Long = 9223372036854774784
If I evaluate the expression "9.223372036854776E18".toDouble.toLong, this gives:
9223372036854775808
This really freaks me out, the last 4 digits seem just to pop up from nowhere!
You presumably mean 9223372036854775807. 9.223372036854776E18 is of course actually larger than that: it represents 9223372036854776000. But you'll get the same result if you use any other Double larger than Long.MaxValue as well, e.g. 1E30.toLong.
Just remark, Double and Long values in Scala are equivalent of primitive types double and long.
The result of Long.MaxValue.toDouble is in reality bigger then Long.MaxValue, the reason is, that value is rounded. Next conversion i.e. Long.MaxValue.toDouble.toLong is "rounded" back to the value of Long.MaxValue.

Working with Sets as Functions

From a FP course:
type Set = Int => Boolean // Predicate
/**
* Indicates whether a set contains a given element.
*/
def contains(s: Set, elem: Int): Boolean = s(elem)
Why would that make sense?
assert(contains(x => true, 100))
Basically what it does is provide the value 100 to the function x => true. I.e., we provide 100, it returns true.
But how is this related to sets?
Whatever we put, it returns true. Where is the sense of it?
I understand that we can provide our own set implementation/function as a parameter that would represent the fact that provided value is inside a set (or not) - then (only) this implementation makes the contains function be filled by some sense/meaning/logic/functionality.
But so far it looks like a nonsense function. It is named contains but the name does not represent the logic. We could call it apply() because what it does is to apply a function (the 1st argument) to a value (the 2nd argument). Having only the name contains may tell to a reader what an author might want to say. Isn't it too abstract, maybe?
In the code snippet you show above, any set S is represented by what is called its characteristic function, i.e., a function that given some integer i checks whether i is contained in the set S or not. Thus you can think of such a characteristic function f like it was a set, namely
{i | all integers i for which f i is true}
If you think of any function with type Int => Boolean as set (which is indicated by the type synonym Set = Int => Boolean) then you could describe contains by
Given a set f and an integer i, contains(f, i) checks whether i is an element of f or not.
Some example sets might make the idea more obvious:
Set Characeristic Function
empty set x => false
universal set x => true
set of odd numbers x => x % 2 == 1
set of even numbers x => x % 2 == 0
set of numbeers smaller than 10 x => x < 10
Example: The set {1, 2, 3} can be represented by
val S: Set = (x => 0 <= x && x <= 3)
If you want to know whether some number n is in this set you could do
contains(S, n)
but of course (as you already observed yourself) you would get the same result by directly doing
S(n)
While this is shorter, the former is maybe easier to read (since the intention is somewhat obvious).
Sets (both mathematically and in the context of computer representation) can be represented in various different ways. Using characteristic functions is one possibility. The idea is that a subset S of a given universal set U is completely determined by a function f:U-->{true,false} (called the characteristic function of the subset). simply since you can treat f(u) as answering the question "is u an element in S?".
Any particular choice of representing sets has advantages and disadvantages when compared to other methods. In particular, some representations are better suited to be modeled in a functional language than in imperative languages. If we compare managing sets as characteristic functions vs. as (either sorted or unsorted) lists (or arrays), then, for instance, creating unions, intersections, and set difference, is very efficient with characteristic functions but not so efficient with lists. Checking for the existence of an element is as easy as computing f(-) with characteristic functions, as opposed to searching a list. However, printing out the elements in the set is immediate with a list, but may require lots of computations with a characteristic function.
Having said that, a fundamental difference is that with characteristic functions one can model infinite sets, while this is impossible with array. Of course, no set will actually be infinite, but a set like (x: BigInt) x => (x % 2) == 0 truly represents that set of all even integers and one can actually compute with it (as long as you don't try to print all the elements).
So, every representation has pros and cons (duh).

Why asterisk is overloaded for types?

I still don't understand the motivation.
Why did they made two different operators (* and *.) for multiplication of integers and floats respectively as if they afraid of overloading, but at the same time they used * to denote Cartesian product of types?
type a = int * int ;;
Why suddenly they became so brave? Why not to write
type a = int *.. int ;;
or something?
Is there some relation, which makes Cartesian product closer to integer product and more far from float product?
It's not overloaded, on the right hand-side of type t = you are defining another kind of concept, you are defining a type, not a value.
In ML-like languages you can see two distinct language:
The language for types that allows you to define types (a specification of the structure of your values).
The language for values that allows you to define values (actual values corresponding to a type, functions are also values). That's what gets evaluated.
Since the domain of the two language are really separate there's no theoretical problem/ambiguity in reusing similar syntactic construct in each language and hence has absolutely nothing to do with overloading.
In mathematics, you note cartesian product using the multiplication character. So it is logic to note it the same way in OCaml...

scala new Range with step equals zero

Is(and why) this really should be prohibited with exception?
scala> val r2 = 15 until (10, 0)
java.lang.IllegalArgumentException: requirement failed
scala> new Range(10,15,0)
java.lang.IllegalArgumentException: requirement failed
at scala.Predef$.require(Predef.scala:133)
Is(and why) this really should be prohibited with exception?
Quoting from scaladoc:
The Range class represents integer values in range [start;end) with non-zero step value step. Sort of acts like a sequence also (supports length and contains).
This restriction makes sense. A range with step-size zero would always be infine and just consist of the lower bound value. Whereas one could argue that infinite ranges are possible (lazy evaluation), the concept of an upper bound in the range would be taken ad absurdum. A range with step 0 is simply not a range, even if it's infinitely long, because the upper bound has no importance.
So if one really wants an infinite stream of a single value, Scala rightfully forces us to be more explicit.

Where can I find a cheat sheet for hungarian notation?

I'm working on a legacy COM C++ project that makes use of system hungarian notation. Because it's maintenance of legacy code, the convention is to code in the original style it was written in - our newer code isn't coded this way. So I'm not interested in changing that standard or having a a discussion of our past sins =)
Is there an online cheat-sheet available out there for systems hungarian notation?
The best I can find thus far is a pre stack-overflow discussion post, but it doesn't quite have everything I've needed in the past. Does anyone have any other links?
(making this community wiki in the hope this becomes a self populating list)
If this is for a legacy COM project, you'll probably want to follow Microsoft's Hungarian Notation specifications, which are documented on MSDN.
Note that this is Apps Hungarian, i.e. the "good" kind of Hungarian Notation. Systems Hungarian is the "bad" kind, where names are prefixed with their compiler types, e.g. i for int.
Tables from the MSDN article
Table 1. Some examples for procedure names
Name Description
InitSy Takes an sy as its argument and initializes it.
OpenFn fn is the argument. The procedure will "open" the fn. No value is returned.
FcFromBnRn Returns the fc corresponding to the bn,rn pair given. (The names cannot tell us what the types sy, fn, fc, and so on, are.)
The following is a list of standard type constructions. (X and Y stand for arbitrary tags. According to standard punctuation, the actual tags are lowercase.)
Table 2. Standard type constructions
pX Pointer to X.
dX Difference between two instances of type X. X + dX is of type X.
cX Count of instances of type X.
mpXY An array of Ys indexed by X. Read as "map from X to Y."
rgX An array of Xs. Read as "range X." The indices of the array are called:
iX index of the array rgX.
dnX (rare) An array indexed by type X. The elements of the array are called:
eX (rare) Element of the array dnX.
grpX A group of Xs stored one after another in storage. Used when the X elements are of variable size and standard array indexing would not apply. Elements of the group must be referenced by means other then direct indexing. A storage allocation zone, for example, is a grp of blocks.
bX Relative offset to a type X. This is used for field displacements in a data structure with variable size fields. The offset may be given in terms of bytes or words, depending on the base pointer from which the offset is measured.
cbX Size of instances of X in bytes.
cwX Size of instances of X in words.
The following are standard qualifiers. (The letter X stands for any type tag. Actual type tags are in lowercase.)
Table 3. Standard qualifiers
XFirst The first element in an ordered set (interval) of X values.
XLast The last element in an ordered set of X values. XLast is the upper limit of a closed interval, hence the loop continuation condition should be: X <= XLast.
XLim The strict upper limit of an ordered set of X values. Loop continuation should be: X < XLim.
XMax Strict upper limit for all X values (excepting Max, Mac, and Nil) for all other X: X < XMax. If X values start with X=0, XMax is equal to the number of different X values. The allocated length of a dnx vector, for example, will be typically XMax.
XMac The current (as opposed to constant or allocated) upper limit for all X values. If X values start with 0, XMac is the current number of X values. To iterate through a dnx array, for example:
for x=0 step 1 to xMac-1 do ... dnx[x] ...
or
for ix=0 step 1 to ixMac-1 do ... rgx[ix] ...
XNil A distinguished Nil value of type X. The value may or may not be 0 or -1.
XT Temporary X. An easy way to qualify the second quantity of a given type in a scope.
Table 4. Some common primitive types
f Flag (Boolean, logical). If qualifier is used, it should describe the true state of the flag. Exception: the constants fTrue and fFalse.
w Word with arbitrary contents.
ch Character, usually in ASCII text.
b Byte, not necessarily holding a coded character, more akin to w. Distinguished from the b constructor by the capital letter of the qualifier in immediately following.
sz Pointer to first character of a zero terminated string.
st Pointer to a string. First byte is the count of characters cch.
h pp (in heap).
Here's one for 'Systems Hungarian', which in my experience was the more commonly used (and less useful):
http://web.mst.edu/~cpp/common/hungarian.html
But how universally followed this is, I have no idea.
The other form of Hungarian Notation is "Apps Hungarian", which apparently is Simonyi's original intent (the use of the variable was encoded rather than the type). See http://en.wikipedia.org/wiki/Hungarian_notation for some details.
Because this is a legacy project, your software department manager should have a copy of the style guide for whatever version of Hungarian Notation the original programmers used. (I'm assuming that the original programmers have long since fled to more enlightened workplaces.)
You really should reconsider your use of Hungarian notation. It was originally a patch for the lack of strong typing (and compiler type-checking) in C. Modern compilers enforce type-correctness, making Hungarian notation redundant at best, and erroneous otherwise.
There doesn't seem to be any one exhaustive resource for looking up Hungarian Notation prefixes, probably because a lot of it varied from code base to code base. There, of course, were a lot of very commonly used ones.
The best list I could find was here
The rest cover the commonly used conventions such as this entry
MSDN's enty on Hungarian Notation is here
and a couple of short papers on the subject (overlapping each other perhaps) here and here
Your best bet would be to see how the variables are used and that (may) help you figure out the definition of the prefixes (though in practice the naming rarey reflected the use of the variable, sadly).
You might be able to piece together some semblance of notation from those various links.
Just to be complete(!) how about Hungarian Object Notation for Visual Basic from Microsoft Support no less.