Scala initialize collection from Java iterable - scala

In scala, how can I initialize a scala collection from a Java iterable, in a clean idiomatic way?
Here's somewhat lame code taking a less functional approach for that:
var collection = Seq[MyClass]()
while (iterator.hasNext) {
val asArray: Array[String] = iterator.next.toArray
val val2 = asArray(2)
val val3 = asArray(3)
collection = collection :+ new MyClass(val2, val3)
}
How can initialization of a collection from a Java iterable take place more idiomatically?

import scala.collection.JavaConverters._
val collection = iterator.asScala.map{ x =>
val asArray = x.toArray
new MyClass(asArray(2), asArray(3))
}.toIndexedSeq

Scala can convert to and from Java collections seamlessly, provided you have imported the conversion helpers like below:
import scala.collection.JavaConversions._
val jl = new java.util.ArrayList[String]()
jl.add("Hello")
jl.add("There")
val collection = j1.map{ x => new MyClass(x(2), x(3)) }.toList

Related

Scala function does not return a value

I think I understand the rules of implicit returns but I can't figure out why splithead is not being set. This code is run via
val m = new TaxiModel(sc, file)
and then I expect
m.splithead
to give me an array strings. Note head is an array of strings.
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD
class TaxiModel(sc: SparkContext, dat: String) {
val rawData = sc.textFile(dat)
val head = rawData.take(10)
val splithead = head.slice(1,11).foreach(splitData)
def splitData(dat: String): Array[String] = {
val splits = dat.split("\",\"")
val split0 = splits(0).substring(1, splits(0).length)
val split8 = splits(8).substring(0, splits(8).length - 1)
Array(split0).union(splits.slice(1, 8)).union(Array(split8))
}
}
foreach just evaluates expression, and do not collect any data while iterating. You probably need map or flatMap (see docs here)
head.slice(1,11).map(splitData) // gives you Array[Array[String]]
head.slice(1,11).flatMap(splitData) // gives you Array[String]
Consider also a for comprehension (which desugars in this case into flatMap),
for (s <- head.slice(1,11)) yield splitData(s)
Note also that Scala strings are equipped with ordered collections methods, thus
splits(0).substring(1, splits(0).length)
proves equivalent to any of the following
splits(0).drop(1)
splits(0).tail

Flattening nested java lists in Scala

I am working in Scala with java libraries. One of these libraries returns a list of lists. I want to flatten the list.
Example:
import scala.collection.JavaConverters._
var parentList : util.List[util.List[Int]] = null
parentList = new util.ArrayList[util.List[Int]]
parentList.asScala.flatten // error
I have used asScala converter but I'm still meeting an error.
You need to call .asScala on every inner list :
scala> parentList.asScala.map(_.asScala)
res0: scala.collection.mutable.Buffer[scala.collection.mutable.Buffer[Int]] = ArrayBuffer()
scala> parentList.asScala.map(_.asScala).flatten
res1: scala.collection.mutable.Buffer[Int] = ArrayBuffer()
Note that calling .map and then .flatten can be done in one step using .flatMap :
scala> parentList.asScala.flatMap(_.asScala)
res2: scala.collection.mutable.Buffer[Int] = ArrayBuffer()
You also need to convert the inner List[Int]:
parentList.asScala.flatMap(_.asScala)
Try like this
import scala.jdk.CollectionConverters._
parentList.asScala.flatMap.map(_.toSeq)
This will do the trick.

sortByKey in Spark

New to Spark and Scala. Trying to sort a word counting example. My code is based on this simple example.
I want to sort the results alphabetically by key. If I add the key sort to an RDD:
val wordCounts = names.map((_, 1)).reduceByKey(_ + _).sortByKey()
then I get a compile error:
error: No implicit view available from java.io.Serializable => Ordered[java.io.Serializable].
[INFO] val wordCounts = names.map((_, 1)).reduceByKey(_ + _).sortByKey()
I don't know what the lack of an implicit view means. Can someone tell me how to fix it? I am running the Cloudera 5 Quickstart VM. I think it bundles Spark version 0.9.
Source of the Scala job
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SparkWordCount {
def main(args: Array[String]) {
val sc = new SparkContext(new SparkConf().setAppName("Spark Count"))
val files = sc.textFile(args(0)).map(_.split(","))
def f(x:Array[String]) = {
if (x.length > 3)
x(3)
else
Array("NO NAME")
}
val names = files.map(f)
val wordCounts = names.map((_, 1)).reduceByKey(_ + _).sortByKey()
System.out.println(wordCounts.collect().mkString("\n"))
}
}
Some (unsorted) output
("INTERNATIONAL EYELETS INC",879)
("SHAQUITA SALLEY",865)
("PAZ DURIGA",791)
("TERESSA ALCARAZ",824)
("MING CHAIX",878)
("JACKSON SHIELDS YEISER",837)
("AUDRY HULLINGER",875)
("GABRIELLE MOLANDS",802)
("TAM TACKER",775)
("HYACINTH VITELA",837)
No implicit view means there is no scala function like this defined
implicit def SerializableToOrdered(x :java.io.Serializable) = new Ordered[java.io.Serializable](x) //note this function doesn't work
The reason this error is coming out is because in your function you are returning two different types with a super type of java.io.Serializable (ones a String the other an Array[String]). Also reduceByKey for obvious reasons requires the key to be an Orderable. Fix it like this
object SparkWordCount {
def main(args: Array[String]) {
val sc = new SparkContext(new SparkConf().setAppName("Spark Count"))
val files = sc.textFile(args(0)).map(_.split(","))
def f(x:Array[String]) = {
if (x.length > 3)
x(3)
else
"NO NAME"
}
val names = files.map(f)
val wordCounts = names.map((_, 1)).reduceByKey(_ + _).sortByKey()
System.out.println(wordCounts.collect().mkString("\n"))
}
}
Now the function just returns Strings instead of two different types

How to convert, group and sort java.util.List[java.util.Map[String, Object]]?

I convert this:
import scala.collection.JavaConverters._
import scala.collection.JavaConverters._
val list:java.util.List[java.util.Map[String, Object]] = new java.util.ArrayList[java.util.Map[String, Object]]()
val map1:java.util.Map[String, AnyRef] = new java.util.HashMap[String,AnyRef]()
map1.put("payout", 3.asInstanceOf[AnyRef])
list.add(map1)
val map2:java.util.Map[String, AnyRef] = new java.util.HashMap[String, AnyRef]()
map2.put("payout", 2.asInstanceOf[AnyRef])
list.add(map2)
val map3:java.util.Map[String, AnyRef] = new java.util.HashMap[String, AnyRef]()
map3.put("payout", 2.asInstanceOf[AnyRef])
list.add(map3)
val map4:java.util.Map[String, AnyRef] = new java.util.HashMap[String, AnyRef]()
map4.put("payout", 1.asInstanceOf[AnyRef])
list.add(map4)
println(list)
val result = list.asScala
//result Buffer({payout=3}, {payout=2}, {payout=2}, {payout=1})
And i wish:
list.asScala.groupBy(_("payout")).toList save its ordering (sort by payout)
but .toList.sortBy(_._1) throw error:
error: No implicit Ordering defined for java.lang.Object.
val result = list.groupBy(_("payout")).toList.sortBy(_._1)
This gives a result, but I don't know if its what you wanted:
val result = list.asScala.map(_.asScala).groupBy(_("payout")).toList.sortWith(_._1.asInstanceOf[Int] > _._1.asInstanceOf[Int])
I added map(.asScala) in order to convert your java maps to scala maps. The group by value is a java.lang.Object which does not have an ordering; using sortWith(._1.asInstanceOf[Int] > _._1.asInstanceOf[Int]) I cast it to Int in order to sort it. This will of course crash if some other object is used, but there is no way to order an object that you don't know anything about.

Getting a Scala Map from a Java Properties

I was trying to pull environment variables into a scala script using java Iterators and / or Enumerations and realised that Dr Frankenstein might claim parentage, so I hacked the following from the ugly tree instead:
import java.util.Map.Entry
import System._
val propSet = getProperties().entrySet().toArray()
val props = (0 until propSet.size).foldLeft(Map[String, String]()){(m, i) =>
val e = propSet(i).asInstanceOf[Entry[String, String]]
m + (e.getKey() -> e.getValue())
}
For example to print the said same environment
props.keySet.toList.sortWith(_ < _).foreach{k =>
println(k+(" " * (30 - k.length))+" = "+props(k))
}
Please, please don't set about polishing this t$#d, just show me the scala gem that I'm convinced exists for this situation (i.e java Properties --> scala.Map), thanks in advance ;#)
Scala 2.10.3
import scala.collection.JavaConverters._
//Create a variable to store the properties in
val props = new Properties
//Open a file stream to read the file
val fileStream = new FileInputStream(new File(fileName))
props.load(fileStream)
fileStream.close()
//Print the contents of the properties file as a map
println(props.asScala.toMap)
Scala 2.7:
val props = Map() ++ scala.collection.jcl.Conversions.convertMap(System.getProperties).elements
Though that needs some typecasting. Let me work on it a bit more.
val props = Map() ++ scala.collection.jcl.Conversions.convertMap(System.getProperties).elements.asInstanceOf[Iterator[(String, String)]]
Ok, that was easy. Let me work on 2.8 now...
import scala.collection.JavaConversions.asMap
val props = System.getProperties() : scala.collection.mutable.Map[AnyRef, AnyRef] // or
val props = System.getProperties().asInstanceOf[java.util.Map[String, String]] : scala.collection.mutable.Map[String, String] // way too many repetitions of types
val props = asMap(System.getProperties().asInstanceOf[java.util.Map[String, String]])
The verbosity, of course, can be decreased with a couple of imports. First of all, note that Map will be a mutable map on 2.8. On the bright side, if you convert back the map, you'll get the original object.
Now, I have no clue why Properties implements Map<Object, Object>, given that the javadocs clearly state that key and value are String, but there you go. Having to typecast this makes the implicit option much less attractive. This being the case, the alternative is the most concise of them.
EDIT
Scala 2.8 just acquired an implicit conversion from Properties to mutable.Map[String,String], which makes most of that code moot.
In Scala 2.9.1 this is solved by implicit conversions inside collection.JavaConversions._ . The other answers use deprecated functions. The details are documented here. This is a relevant snippet out of that page:
scala> import collection.JavaConversions._
import collection.JavaConversions._
scala> import collection.mutable._
import collection.mutable._
scala> val jul: java.util.List[Int] = ArrayBuffer(1, 2, 3)
jul: java.util.List[Int] = [1, 2, 3]
scala> val buf: Seq[Int] = jul
buf: scala.collection.mutable.Seq[Int] = ArrayBuffer(1, 2, 3)
scala> val m: java.util.Map[String, Int] = HashMap("abc" -> 1, "hello" -> 2)
m: java.util.Map[String,Int] = {hello=2, abc=1}
Getting from a mutable map to an immutable map is a matter of calling toMap on it.
In Scala 2.8.1 you can do it with asScalaMap(m : java.util.Map[A, B]) in a more concise way:
var props = asScalaMap(System.getProperties())
props.keySet.toList.sortWith(_ < _).foreach { k =>
println(k + (" " * (30 - k.length)) + " = " + props(k))
}
In Scala 2.13.2:
import scala.jdk.javaapi.CollectionConverters._
val props = asScala(System.getProperties)
Looks like in the most recent version of Scala (2.10.2 as of the time of this answer), the preferred way to do this is using the explicit .asScala from scala.collection.JavaConverters:
import scala.collection.JavaConverters._
val props = System.getProperties().asScala
assert(props.isInstanceOf[Map[String, String]])