How do I make these line of codes more scala-ish (shorter?). I still get the Java feeling in it (which I want to stay away from). Thanks in advance!
import scala.collection.mutable
val outstandingUserIds: mutable.LinkedHashSet[String] = mutable.LinkedHashSet[String]()
val tweetJson = JacksMapper.readValue[Map[String, AnyRef]](body)
val userObj = tweetJson.get("user")
tweetJson.get("user").foreach(userObj => {
userObj.asInstanceOf[Map[String, AnyRef]].get("id_str").foreach(idStrObj => {
if (outstandingUserIds.exists(outstandingIdStr => outstandingIdStr.equals(idStrObj))) {
outstandingUserIds.remove(idStrObj.asInstanceOf[String])
}
})
})
One thing you want to do in Scala is take advantage of type inference. That way, you don't need to repeat yourself on the LHS:
val outstandingUserIds = mutable.LinkedHashSet[String]()
You also don't need the inner braces after the closure variable userObj =>. Instead, use braces after foreach {} to execute multiple statements:
tweetJson.get("user").foreach { userObj =>
}
In fact, you could use the anonymous variable '_' and say:
tweetJson.get("user").foreach {
_.get("id_str").foreach ...
}
Scala encourages the use of immutable collections. One way to simplify the above even further would be to use collect (instead of exists+delete) which would return a new collection with only the elements you want.
Related
I am reading the source code of Spark, and I am not sure if I understand this line readFunction: (PartitionedFile) => InputPartitionReader[T].
Questions:
So we can pass a method readFunction as a parameter to a case class?
Is there a terminology for this?
Are there any special motivations for this syntax?
case class FileInputPartition[T](
file: FilePartition,
readFunction: (PartitionedFile) => InputPartitionReader[T],<-- This line
ignoreCorruptFiles: Boolean = false,
ignoreMissingFiles: Boolean = false)
extends InputPartition[T] {
override def createPartitionReader(): InputPartitionReader[T] = {
val taskContext = TaskContext.get()
val iter = file.files.iterator.map(f => PartitionedFileReader(f,
readFunction(f)))
FileInputPartitionReader(taskContext, iter, ignoreCorruptFiles,
ignoreMissingFiles)
}
override def preferredLocations(): Array[String] = {
FilePartitionUtil.getPreferredLocations(file)
}
}
In Scala functions are first class objects. It means that:
the language supports passing functions as arguments to other functions, returning them as the values from other functions, and assigning them to variables or storing them in data structures
In this case constructor can take Function1[PartitionedFile, InputPartitionReader[T]]. There is nothing particularly unusual here and functions as arguments are ubiquitous in Scala, with the most prominent example of collection API.
And in fact this how the function is used here - to map over collection:
file.files.iterator.map(f => PartitionedFileReader(f, readFunction(f)))
This usage pretty much explains the motivation.
I have a Future of Tuple like this Future[(WriteResult, MyObject)] mytuplefuture, I'd like to map it and do something with MyObject so I am doing this:
mytuplefuture.map((wr,obj)=>{ //do sth});
but my eclipse scala IDE does not allow and recommend me to do:
mytuplefuture.map{ case(wr,obj) => { //do sth }}
what is the difference between those two?
I am used to doing the first one, I do not know about the second one until I try returning that tuple that wrapped in a future
myfuture.map((obj) => { // do sth with obj })
it was clear, I am mapping the content of the Future and do something with it, which will return another future because the original myfuture only contains something (obj) in the future..
Would anyone explain please?
The difference is this:
map is a higher-order function (HOF) that takes a function as its argument. This function - let's call it the mapping function for convenience - itself takes a single argument, which is the value of the completed Future. In this particular case, this value happens to be a tuple. Your first attempt assumed that the tuple could be broken open into two arguments, which would then be accepted by the mapping function - but that's not going to happen, hence the error.
It might seem that you could define the mapping function like this (note the extra parentheses around the arguments):
mytuplefuture.map(((wr,obj)) => /* do sth */)
however this is not currently supported by the Scala compiler. (That said, I think this might be a feature of a future Scala release.)
So, the alternative is to write the mapping function as a partial function using the case statement. The following:
mytuplefuture.map {
case (wr,obj) => //
}
is actually a kind of shorthand for:
mytuplefuture.map {
tuple: (WriteResult, MyObject) => tuple match {
case (wr,obj) => // do sth
}
}
In fact, this shorthand is generally useful for situations other than just breaking open tuples. For instance:
myList.filter {
case A => true
case _ => false
}
is short for:
myList.filter {
x => x match {
case A => true
case _ => false
}
}
So, let's say you wish to look at just the MyObject member of the tuple. You would define this as follows:
val myfuture = mytuplefuture.map {
case (_, obj) => obj
}
or, alternatively, being explicit with the tuple argument:
val myfuture = mytuplefuture.map(tuple => tuple._2)
which can in turn be simplified to just:
val myfuture = mytuplefuture.map(_._2)
where the first underscore is shorthand for the first argument to the mapping function. (The second underscore, as in _2, is part of the name for the second value in the tuple, and is not shorthand - this is where Scala can get a little confusing.)
All of the previous three examples return a Future[MyObject].
If you then apply map to this value, the single mapping function argument in this case will be your MyObject instance. Hence you can now write:
myfuture.map(obj => /* Do something with obj */)
As to the remainder of your question, the mapping function as applied to a Future's value does indeed apply to the result of the original future, since it can't be executed until the first future has completed. Therefore, map returns a future that completes (successfully or otherwise) when the first future completes.
UPDATED: Clarified what the argument to map actually is. Thanks to #AlexeyRomanov for putting me right, and to #RhysBradbury for pointing out my initial error. ;-)
The difference is, that case indicates decomposition (or extraction) of the object (invoking unapply, which you can implement yourself).
myfuture.map(obj => obj._2 ) in this case obj - is your tuple, so you can access its elements by ._1 and ._2
mytuplefuture.map{ case(wr,obj) => { //do sth }} this decompose tuple to its elements.
You can better feel the difference, by using this approach on case class which comes with a default unapply implementation
case class MyClass(int: Int)
List(MyClass(1)) map { myclass => myclass.int } // accesing the elements
List(MyClass(1)) map { case MyClass(i) => i + 1 } // decomposition
In your case I'd write
mytuplefuture.map(_.2).map( // do somthing )
P.S.
You can do the extraction for many other classes (Option for example).
It is also allowing you to write something like
val (a, b) = tuple
val MyClass(x) = myclass
I encountered the following weird issue when having option in nested collections:
val works: Array[Option[Int]] = Array(1)
.map { t => Some(t)}
val fails: Array[Array[Option[Int]]] = Array(Array(1))
.map { ts => ts.map { Some(_)} }
// error: type mismatch; found : Array[Array[Some[Int]]] required: Array[Array[Option[Int]]]
val worksButUgly: Array[Array[Option[Int]]] = Array(Array(1))
.map { ts => ts.map { case t => (Some(t).asInstanceOf[Option[Int]])}}
I imagine it may be a problem with some type erasure along the way but is it the expected behaviour in Scala? Does anyone knows what happens exactly?
Arrays in Scala are invariant. This prevents some problems that arrays have in e.g. Java, where you can create an array of something, proclaim it to be an array of superclass of something, and then put another subclass in. For example, saying that an array of apples is an array of fruit and then putting bananas in. The worst part about this is that it fails at runtime, not at compile time.
For this reason Scala decided that arrays should be invariant. This means that Array[Apple] is not a subclass of Array[Fruit]. (Note that unlike arrays, immutable collections are most often covariant, e.g. List, because immutability prevents us from putting any bananas inside later on)
So yes. Some is a subclass of Option, but Array[Some] is not a subclass of Array[Option]. These will work:
val foo1: Array[Array[Option[Int]]] = Array(Array(1))
.map { ts => ts.map { Option(_)} }
val foo2: Array[List[Option[Int]]] = Array(List(1))
.map { ts => ts.map { Some(_)} }
Use Some(t): Option[Int] instead of Some(t).asInstanceOf[Option[Int]]. It's both shorter and safer: it'll fail to compile if the types don't match.
In the code below, does it matter where I put the query object (within the loop, or outside)? For readability, I would prefer this 1st version:
class MyClass {
db withSession {
names.foreach { name =>
val query = MyEntity.createFinderBy(_.name) <----------
query.list(text).foreach(res =>
doSomething
}
}
}
}
But isn't this 2nd version better?
class MyClass {
db withSession {
val query = MyEntity.createFinderBy(_.name) <----------
names.foreach { name =>
query.list(text).foreach(res =>
doSomething
}
}
}
}
Or even?
class MyClass {
val query = MyEntity.createFinderBy(_.name) <----------
db withSession {
names.foreach { name =>
query.list(text).foreach(res =>
doSomething
}
}
}
}
In Java, I would put it in a static final field at the top of the class...
No it does not, because in the first example you specifically told it to call createFinderBy (and also list) for each element. There's no guarantee that those methods will return the same value each time (i.e. are referentially transparent), so the values can't be memoized or the code optimized to the second or third examples.
It would be a nice feature if you could annotate methods as such, or if the compiler could work it out, but at the moment it doesn't.
As an aside, you could change the createFinderBy method so that it stores values in a mutable.Map cache and returns values using the getOrElseUpdate method, which returns a cached value if your query is already in the map, or else calculates the query and caches it.
scalac does not (can not) optimize your code because it doesn't use an effect system. This makes it almost impossible for scalac to care if an optimization can break code. For example, if MyEntity.createFinderBy(_.name) is not pure (maybe it increments a counter which counts the number of accesses) it will make a change if it is executed once or in each iteration.
In such cases I can suggest to change the position of the function literal:
scala> 1 to 3 foreach {x => println("x"); println(x)}
x
1
x
2
x
3
scala> 1 to 3 foreach {println("x"); x => println(x)}
x
1
2
3
In the second example a block returning a function passed to foreach is created (after executing other expressions), whereas in the former one, the whole block is a single function.
I have to translate the following code from Java to Scala:
EDIT: added if-statements in the source (forgot them in first version)
for (Iterator<ExceptionQueuedEvent> i = getUnhandledExceptionQueuedEvents().iterator(); i.hasNext();)
{
if (someCondition) {
ExceptionQueuedEvent event = i.next();
try {
//do something
} finally {
i.remove();
}
}
}
I'm using the JavaConversions library to wrap the Iterable. But as i'm not using the original Iterator, i don't know how to remove the current element correctly from the collection the same way as i did in Java:
import scala.collection.JavaConversions._
(...)
for (val event <- events) {
if (someCondition) {
try {
// do something
} finally {
// how can i remove the current event from events?
// the underlying type of events is java.lang.Iterable[javax.faces.event.ExceptionQueuedEvent]
}
}
}
Can someone help me?
I guess it's easy, but i'm still kinda new to Scala and don't understand what's going on when Scala wraps something of Java.
When you use JavaConversions to wrap Java collections, you just get an object that adapts the Java collection to the appropriate Scala trait (interface). In Java, you might see the same thing (for example, you could imagine a adapter class that implements the Iterator interface and wraps an Enumeration.) The only difference is that in Scala you can add the 'implicit' modifier to a declaration to tell the compiler to automatically insert calls to that method if it will make the code compile.
As for your specific use case, Iterators in Scala intentionally omit the remove() method for a number of reasons. The conversion from scala.collection.Iterator to java.util.Iterator promises to unwrap a j.u.Iterator if possible, so I suppose you could rely on that to access the remove() method. However, if you are iterating over the entire collection and removing everything, why not just do your work in a foreach loop and then clear the collection or replace it with an empty one after you finish?
Does this suggest how do accomplish what you want?
scala> val l1 = List("How", "do", "I", "love", "you")
l1: List[java.lang.String] = List(How, do, I, love, you)
scala> val evens = for ( w <- l1; if w.length % 2 == 0 ) yield { printf("even: %s%n", w); w }
even: do
even: love
evens: List[java.lang.String] = List(do, love)
Basically, you get your Scala Iterable or Iterator using the appropriate implicit conversion from JavaConversions, use a for comprehension that includes the condition on which elements you want to process and collect the results. Use exception handling as necessary.
Thanks for all the help. So i had to do without using JavaConversions. But it still looks nice&scalafied ;)**
This is my final code, which seems to work:
val eventsIterator = events.iterator
for (eventsIterator.hasNext) {
if (someCondition) {
try {
// do something
} finally {
eventsIterator.remove
}
}
}