Losing path dependent type when extracting value from Try in scala - scala

I'm working with scalax to generate a graph of my Spark operationS. So, I have a custom library that generates my graph. So, let me show a sample:
val DAGWithoutGet = createGraphFromOps(ops)
val DAGWithGet = createGraphFromOps(ops).get
The return type of DAGWithoutGet is
scala.util.Try[scalax.collection.Graph[typeA, scalax.collection.GraphEdge.DiEdge]],
and, for DAGWithGet is
scalax.collection.Graph[typeA, scalax.collection.GraphEdge.DiEdge].
Here, typeA is a project related class representing a single Spark operation, not relevant for the context of this question. (for context only: What my custom library does is, essentially, generate a map of dependencies between those operations, creating a big Map object, and calling Graph(myBigMap: _*) to generate the graph).
As far as I know, calling the .get command on this point of my code or later should not make any difference, but that is not what I'm seeing.
Calling DAGWithoutGet.get.nodes has a return type of scalax.collection.Graph[typeA,DiEdge]#NodeSetT,
while calling DAGWithGet.nodes returns DAGWithGet.NodeSetT.
When I extract one of those nodes (using the .find method), I receive scalax.collection.Graph[typeA,DiEdge]#NodeT and DAGWithGet.NodeT types, respectively. Much to my dismay, even the methods available in each case are different - I cannot use pathTo (which happens to be what I want) or withSubgraph on the former, only on the latter.
My doubt is, then, after this relatively complex example: what is going on here? Why extracting the value from the Try construct on different moments leads to different types, one path dependent, and the other not - or, if that isn't correct, what may I be missing here?

Related

How to use Scala Cats Validated the correct way?

Following is my use case
I am using Cats for validation of my config. My config file is in json.
I deserialize my config file to my case class Config using lift-json and then validate it using Cats. I am using this as a guide.
My motive for using Cats is to collect all errors iff present at time of validation.
My problem is the examples given in the guide, are of the type
case class Person(name: String, age: Int)
def validatePerson(name: String, age: Int): ValidationResult[Person] = {
(validateName(name),validate(age)).mapN(Person)
}
But in my case I already deserialized my config into my case class ( below is a sample ) and then I am passing it for validation
case class Config(source: List[String], dest: List[String], extra: List[String])
def vaildateConfig(config: Config): ValidationResult[Config] = {
(validateSource(config.source), validateDestination(config.dest))
.mapN { case _ => config }
}
The difference here is mapN { case _ => config }. As I already have a config if everything is valid I dont want to create the config anew from its members. This arises as I am passing config to validate function not it's members.
A person at my workplace told me this is not the correct way, as Cats Validated provides a way to construct an object if its members are valid. The object should not exist or should not be constructible if its members are invalid. Which makes complete sense to me.
So should I make any changes ? Is the above I'm doing acceptable ?
PS : The above Config is just an example, my real config can have other case classes as its members which themselves can depend on other case classes.
One of the central goals of the kind of programming promoted by libraries like Cats is to make invalid states unrepresentable. In a perfect world, according to this philosophy, it would be impossible to create an instance of Config with invalid member data (through the use of a library like Refined, where complex constraints can be expressed in and tracked by the type system, or simply by hiding unsafe constructors). In a slightly less perfect world, it might still be possible to construct invalid instances of Config, but discouraged, e.g. through the use of safe constructors (like your validatePerson method for Person).
It sounds like you're in an even less perfect world where you have instances of Config that may or may not contain invalid data, and you want to validate them to get "new" instances of Config that you know are valid. This is totally possible, and in some cases reasonable, and your validateConfig method is a perfectly legitimate way to solve this problem, if you're stuck in that imperfect world.
The downside, though, is that the compiler can't track the difference between the already-validated Config instances and the not-yet-validated ones. You'll have Config instances floating around in your program, and if you want to know whether they've already been validated or not, you'll have to trace through all the places they could have come from. In some contexts this might be just fine, but for large or complex programs it's not ideal.
To sum up: ideally you'd validate Config instances whenever they are created (possibly even making it impossible to create invalid ones), so that you don't have to remember whether any given Config is good or not—the type system can remember for you. If that's not possible, because of e.g. APIs or definitions you don't control, or if it just seems too burdensome for a simple use case, what you're doing with validateConfig is totally reasonable.
As a footnote, since you say above that you're interested in looking in more detail at Refined, what it provides for you in a situation like this is a way to avoid even more functions of the shape A => ValidationResult[A]. Right now your validateName method, for example, probably takes a String and returns a ValidationResult[String]. You can make exactly the same argument against this signature as I have against Config => ValidationResult[Config] above—once you're working with the result (by mapping a function over the Validated or whatever), you just have a string, and the type doesn't tell you that it's already been validated.
What Refined allows you to do is write a method like this:
def validateName(in: String): ValidationResult[Refined[String, SomeProperty]] = ...
…where SomeProperty might specify a minimum length, or the fact that the string matches a particular regular expression, etc. The important point is that you're not validating a String and returning a String that only you know something about—you're validating a String and returning a String that the compiler knows something about (via the Refined[A, Prop] wrapper).
Again, this may be (okay, probably is) overkill for your use case—you just might find it nice to know that you can push this principle (tracking validation in types) even further down through your program.

Why are SessionVars in Lift implemented using singletons?

One typical way of managing state in Lift is to create a singleton object extending SessionVar, like in this example taken from the documentation:
object MySnippetCompanion {
object mySessionVar extends SessionVar[String]("hello")
}
The case for using SessionVars is clear and I've been using them in practice as needed. I also roughly understand how they work inside.
Still, I can't help but wonder why the mechanism for "session variables", which are clearly associated with the current session (usually just one out of many sessions in the system), was designed to be used via a singleton? This goes so against my intuition that at first glance I was tempted to believe that Lift was somehow able to override Scala's language features and to make object mean something different that in regular Scala.
Even though I now understand how it works, I can't grasp the rationale for such a design, which, at least for me, breaks the rule of least astonishment. Can someone point out any advantages or perhaps explain why such a design decision could have been made?
Session variables in Lift use Scala's DynamicVariable. Basically they allow you to statically reference a variable in a code-block and then later on call the code and substitute a value:
import scala.util.DynamicVariable
val x = new DynamicVariable(1)
def printIt() {
println(x.value)
}
printIt()
//> 1
x.withValue(2)(printIt())
//> 2
So each time a request is handled, the scope of these dynamic variables is changed to the current session, completely hiding the state change of the current session to you as a programmer.
The other option would be to pass around a "sessionID" object which you would have to use when you want to access session specific data. Not really handy.
The reason you have to use the object keyword is that object is unique in that it defines both a value and a class. This allows Lift to call getClass to get a name that uniquely identifies this SessionVar vs. any other one, which Lift needs in order to serialize and deserialize every piece of session state in the right place(s). Furthermore if the SessionVar is in a class that has two instances (for instance a snippet rendered in two tabs), they will both refer to the same piece of session state. (The flip side of the coin is that the same SessionVar instance can be referenced by two different sessions and mean the right thing to each.)
Actually at times this is insufficient --- for instance, if you define a SessionVar in a trait, and have two different classes that inherit the trait, but you need them two have two different values. The solution in that case is to override the def for the "name salt", which is combined with getClass to identify the SessionVar.

Everything's an object in Scala

I am new to Scala and heard a lot that everything is an object in Scala. What I don't get is what's the advantage of "everything's an object"? What are things that I cannot do if everything is not an object? Examples are welcome. Thanks
The advantage of having "everything" be an object is that you have far fewer cases where abstraction breaks.
For example, methods are not objects in Java. So if I have two strings, I can
String s1 = "one";
String s2 = "two";
static String caps(String s) { return s.toUpperCase(); }
caps(s1); // Works
caps(s2); // Also works
So we have abstracted away string identity in our operation of making something upper case. But what if we want to abstract away the identity of the operation--that is, we do something to a String that gives back another String but we want to abstract away what the details are? Now we're stuck, because methods aren't objects in Java.
In Scala, methods can be converted to functions, which are objects. For instance:
def stringop(s: String, f: String => String) = if (s.length > 0) f(s) else s
stringop(s1, _.toUpperCase)
stringop(s2, _.toLowerCase)
Now we have abstracted the idea of performing some string transformation on nonempty strings.
And we can make lists of the operations and such and pass them around, if that's what we need to do.
There are other less essential cases (object vs. class, primitive vs. not, value classes, etc.), but the big one is collapsing the distinction between method and object so that passing around and abstracting over functionality is just as easy as passing around and abstracting over data.
The advantage is that you don't have different operators that follow different rules within your language. For example, in Java to perform operations involving objects, you use the dot name technique of calling the code (static objects still use the dot name technique, but sometimes the this object or the static object is inferred) while built-in items (not objects) use a different method, that of built-in operator manipulation.
Number one = Integer.valueOf(1);
Number two = Integer.valueOf(2);
Number three = one.plus(two); // if only such methods existed.
int one = 1;
int two = 2;
int three = one + two;
the main differences is that the dot name technique is subject to polymorphisim, operator overloading, method hiding, and all the good stuff that you can do with Java objects. The + technique is predefined and completely not flexible.
Scala circumvents the inflexibility of the + method by basically handling it as a dot name operator, and defining a strong one-to-one mapping of such operators to object methods. Hence, in Scala everything is an object means that everything is an object, so the operation
5 + 7
results in two objects being created (a 5 object and a 7 object) the plus method of the 5 object being called with the parameter 7 (if my scala memory serves me correctly) and a "12" object being returned as the value of the 5 + 7 operation.
This everything is an object has a lot of benefits in a functional programming environment, for example, blocks of code now are object too, making it possible to pass back and forth blocks of code (without names) as parameters, yet still be bound to strict type checking (the block of code only returns Long or a subclass of String or whatever).
When it boils down to it, it makes some kinds of solutions very easy to implement, and often the inefficiencies are mitigated by the lack of need to handle "move into primitives, manipulate, move out of primitives" marshalling code.
One specific advantage that comes to my mind (since you asked for examples) is what in Java are primitive types (int, boolean ...) , in Scala are objects that you can add functionality to with implicit conversions. For example, if you want to add a toRoman method to ints, you could write an implicit class like:
implicit class RomanInt(i:Int){
def toRoman = //some algorithm to convert i to a Roman representation
}
Then, you could call this method from any Int literal like :
val romanFive = 5.toRoman // V
This way you can 'pimp' basic types to adapt them to your needs
In addition to the points made by others, I always emphasize that the uniform treatment of all values in Scala is in part an illusion. For the most part it is a very welcome illusion. And Scala is very smart to use real JVM primitives as much as possible and to perform automatic transformations (usually referred to as boxing and unboxing) only as much as necessary.
However, if the dynamic pattern of application of automatic boxing and unboxing is very high, there can be undesirable costs (both memory and CPU) associated with it. This can be partially mitigated with the use of specialization, which creates special versions of generic classes when particular type parameters are of (programmer-specified) primitive types. This avoids boxing and unboxing but comes at the cost of more .class files in your running application.
Not everything is an object in Scala, though more things are objects in Scala than their analogues in Java.
The advantage of objects is that they're bags of state which also have some behavior coupled with them. With the addition of polymorphism, objects give you ways of changing the implicit behavior and state. Enough with the poetry, let's go into some examples.
The if statement is not an object, in either scala or java. If it were, you could be able to subclass it, inject another dependency in its place, and use it to do stuff like logging to a file any time your code makes use of the if statement. Wouldn't that be magical? It would in some cases help you debug stuff, and in other cases it would make your hairs grow white before you found a bug caused by someone overwriting the behavior of if.
Visiting an objectless, statementful world: Imaging your favorite OOP programming language. Think of the standard library it provides. There's plenty of classes there, right? They offer ways for customization, right? They take parameters that are other objects, they create other objects. You can customize all of these. You have polymorphism. Now imagine that all the standard library was simply keywords. You wouldn't be able to customize nearly as much, because you can't overwrite keywords. You'd be stuck with whatever cases the language designers decided to implement, and you'd be helpless in customizing anything there. Such languages exist, you know them well, they're the sequel-like languages. You can barely create functions there, but in order to customize the behavior of the SELECT statement, new versions of the language had to appear which included the features most desired. This would be an extreme world, where you'd only be able to program by asking the language designers for new features (which you might not get, because someone else more important would require some feature incompatible with what you want)
In conclusion, NOT everything is an object in scala: Classes, expressions, keywords and packages surely aren't. More things however are, like functions.
What's IMHO a nice rule of thumb is that more objects equals more flexibility
P.S. in Python for example, even more things are objects (like the classes themselves, the analogous concept for packages (that is python modules and packages). You'd see how there, black magic is easier to do, and that brings both good and bad consequences.

Scala immutable map, when to go mutable?

My present use case is pretty trivial, either mutable or immutable Map will do the trick.
Have a method that takes an immutable Map, which then calls a 3rd party API method that takes an immutable Map as well
def doFoo(foo: String = "default", params: Map[String, Any] = Map()) {
val newMap =
if(someCondition) params + ("foo" -> foo) else params
api.doSomething(newMap)
}
The Map in question will generally be quite small, at most there might be an embedded List of case class instances, a few thousand entries max. So, again, assume little impact in going immutable in this case (i.e. having essentially 2 instances of the Map via the newMap val copy).
Still, it nags me a bit, copying the map just to get a new map with a few k->v entries tacked onto it.
I could go mutable and params.put("bar", bar), etc. for the entries I want to tack on, and then params.toMap to convert to immutable for the api call, that is an option. but then I have to import and pass around mutable maps, which is a bit of hassle compared to going with Scala's default immutable Map.
So, what are the general guidelines for when it is justified/good practice to use mutable Map over immutable Maps?
Thanks
EDIT
so, it appears that an add operation on an immutable map takes near constant time, confirming #dhg's and #Nicolas's assertion that a full copy is not made, which solves the problem for the concrete case presented.
Depending on the immutable Map implementation, adding a few entries may not actually copy the entire original Map. This is one of the advantages to the immutable data structure approach: Scala will try to get away with copying as little as possible.
This kind of behavior is easiest to see with a List. If I have a val a = List(1,2,3), then that list is stored in memory. However, if I prepend an additional element like val b = 0 :: a, I do get a new 4-element List back, but Scala did not copy the orignal list a. Instead, we just created one new link, called it b, and gave it a pointer to the existing List a.
You can envision strategies like this for other kinds of collections as well. For example, if I add one element to a Map, the collection could simply wrap the existing map, falling back to it when needed, all while providing an API as if it were a single Map.
Using a mutable object is not bad in itself, it becomes bad in a functional programming environment, where you try to avoid side-effects by keeping functions pure and objects immutable.
However, if you create a mutable object inside a function and modify this object, the function is still pure if you don't release a reference to this object outside the function. It is acceptable to have code like:
def buildVector( x: Double, y: Double, z: Double ): Vector[Double] = {
val ary = Array.ofDim[Double]( 3 )
ary( 0 ) = x
ary( 1 ) = y
ary( 2 ) = z
ary.toVector
}
Now, I think this approach is useful/recommended in two cases: (1) Performance, if creating and modifying an immutable object is a bottleneck of your whole application; (2) Code readability, because sometimes it's easier to modify a complex object in place (rather than resorting to lenses, zippers, etc.)
In addition to dhg's answer, you can take a look to the performance of the scala collections. If an add/remove operation doesn't take a linear time, it must do something else than just simply copying the entire structure. (Note that the converse is not true: it's not beacuase it takes linear time that your copying the whole structure)
I like to use collections.maps as the declared parameter types (input or return values) rather than mutable or immutable maps. The Collections maps are immutable interfaces that work for both types of implementations. A consumer method using a map really doesn't need to know about a map implementation or how it was constructed. (It's really none of its business anyway).
If you go with the approach of hiding a map's particular construction (be it mutable or immutable) from the consumers who use it then you're still getting an essentially immutable map downstream. And by using collection.Map as an immutable interface you completely remove all the ".toMap" inefficiency that you would have with consumers written to use immutable.Map typed objects. Having to convert a completely constructed map into another one simply to comply to an interface not supported by the first one really is absolutely unnecessary overhead when you think about it.
I suspect in a few years from now we'll look back at the three separate sets of interfaces (mutable maps, immutable maps, and collections maps) and realize that 99% of the time only 2 are really needed (mutable and collections) and that using the (unfortunately) default immutable map interface really adds a lot of unnecessary overhead for the "Scalable Language".

Scala Case Class Map Expansion

In groovy one can do:
class Foo {
Integer a,b
}
Map map = [a:1,b:2]
def foo = new Foo(map) // map expanded, object created
I understand that Scala is not in any sense of the word, Groovy, but am wondering if map expansion in this context is supported
Simplistically, I tried and failed with:
case class Foo(a:Int, b:Int)
val map = Map("a"-> 1, "b"-> 2)
Foo(map: _*) // no dice, always applied to first property
A related thread that shows possible solutions to the problem.
Now, from what I've been able to dig up, as of Scala 2.9.1 at least, reflection in regard to case classes is basically a no-op. The net effect then appears to be that one is forced into some form of manual object creation, which, given the power of Scala, is somewhat ironic.
I should mention that the use case involves the servlet request parameters map. Specifically, using Lift, Play, Spray, Scalatra, etc., I would like to take the sanitized params map (filtered via routing layer) and bind it to a target case class instance without needing to manually create the object, nor specify its types. This would require "reliable" reflection and implicits like "str2Date" to handle type conversion errors.
Perhaps in 2.10 with the new reflection library, implementing the above will be cake. Only 2 months into Scala, so just scratching the surface; I do not see any straightforward way to pull this off right now (for seasoned Scala developers, maybe doable)
Well, the good news is that Scala's Product interface, implemented by all case classes, actually doesn't make this very hard to do. I'm the author of a Scala serialization library called Salat that supplies some utilities for using pickled Scala signatures to get typed field information
https://github.com/novus/salat - check out some of the utilities in the salat-util package.
Actually, I think this is something that Salat should do - what a good idea.
Re: D.C. Sobral's point about the impossibility of verifying params at compile time - point taken, but in practice this should work at runtime just like deserializing anything else with no guarantees about structure, like JSON or a Mongo DBObject. Also, Salat has utilities to leverage default args where supplied.
This is not possible, because it is impossible to verify at compile time that all parameters were passed in that map.