I use a DataFrame to handle data in spark. I have a array column in this dataframe. At the end of all the transformation I want to do, I have a dataframe with one array column and one row. In order to apply groupby, map and reduce I want to have this array as a list but I can't do it.
.drop("ScoresArray")
.filter($"min_score" < 0.2)
.select("WordsArray")
.agg(collect_list("WordsArray"))
.withColumn("FlattenWords", flatten($"collect_list(WordsArray)"))
.drop("collect_list(WordsArray)")
.collect()
val test1 = words(0).getAs[immutable.List[String]](0)
Here is the error message :
[error] (run-main-0) java.lang.ClassCastException: scala.collection.mutable.WrappedArray$ofRef cannot be cast to scala.collection.immutable.List
[error] java.lang.ClassCastException: scala.collection.mutable.WrappedArray$ofRef cannot be cast to scala.collection.immutable.List
[error] at analysis.Analysis$.main(Analysis.scala:37)
[error] at analysis.Analysis.main(Analysis.scala)
[error] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[error] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error] at java.lang.reflect.Method.invoke(Method.java:498)
[error] stack trace is suppressed; run last Compile / bgRun for the full output
Thoughts ?
You can't cast an array to list but you can convert one to the other.
val test1 = words(0).getSeq[String](0).toList
Related
I am trying to process a 300+ line long csv file in scala into an array.
The csv file contains looks like this
20200522T0600,26.852346
20200522T0700,26.862345
20200522T0800,27.262346
20200522T0900,28.562346
20200522T1000,29.472345
20200522T1100,29.432346
These are the date, time and temperature. I have to put (datetime) and (temperature) into separate parallel arrays, I'm supposed to calculate the average temperatures later on but I can do that. I just don't know how to read them into those 2 arrays. I know I have to use fromFile() and .getLines to obtain the lines but I'm stuck with the array part.
I tried this
object Weather {
def main(args: Array[String]){
val source = Source.fromFile("Weather.csv")
var matrix :Array[String] = Array.empty
for(line <- source.getLines.drop(10)){
val cols = line.split(",").map(_.trim).toArray
matrix = :+ cols
println(matrix)
}
}
}
but I get this result
[error] matrix = :+ cols
[error] ^
[error] one error found
[error] (Compile / compileIncremental) Compilation failed
[error] Total time: 0 s, completed Jul 9, 2020 6:26:56 AM
Thanks in advance
First, split() returns an array so .toArray isn't needed. And = :+ isn't Scala so that's not going to get you very far.
Try this.
val lines = io.Source
.fromFile("Weather.csv") //open file
.getLines //Iterator[String]
.drop(10) // ?
.map(_.split(",")) //Iterator[Array[String]]
Add a .toList at the end if you need to load everything into memory.
To see the result.
lines.foreach(println)
Or you might have to do something like this.
lines.foreach(arr => println(arr.mkString("Array(", ",", ")")))
I have a RDD of a 1 dimensional matrix. I am trying to do a very basic reduce operation to sum up the values of the same position of the matrix from various partitions.
I am using:
var z=x.reduce((a,b)=>a+b)
or
var z=x.reduce(_ + _)
But I am getting an error saying:
type mismatch; found Array[Int], expected:String
I looked it up and found the link
Is there a better way for reduce operation on RDD[Array[Double]]
So I tried using
import.spire.implicits._
So now I don't have any compilation error, but after running the code I am getting a java.lang.NoSuchMethodError. I have provided the entire error below. Any help would be appreciated.
java.lang.NoSuchMethodError: scala.Product.$init$(Lscala/Product;)V
at spire.math.NumberTag$Integral$.<init>(NumberTag.scala:9)
at spire.math.NumberTag$Integral$.<clinit>(NumberTag.scala)
at spire.std.BigIntInstances.$init$(bigInt.scala:80)
at spire.implicits$.<init>(implicits.scala:6)
at spire.implicits$.<clinit>(implicits.scala)
at main.scala.com.ucr.edu.SparkScala.HistogramRDD$$anonfun$9.apply(HistogramRDD.scala:118)
at main.scala.com.ucr.edu.SparkScala.HistogramRDD$$anonfun$9.apply(HistogramRDD.scala:118)
at scala.collection.TraversableOnce$$anonfun$reduceLeft$1.apply(TraversableOnce.scala:190)
at scala.collection.TraversableOnce$$anonfun$reduceLeft$1.apply(TraversableOnce.scala:185)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.reduceLeft(TraversableOnce.scala:185)
at scala.collection.AbstractIterator.reduceLeft(Iterator.scala:1336)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$15.apply(RDD.scala:1012)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$15.apply(RDD.scala:1010)
at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:2125)
at org.apache.spark.SparkContext$$anonfun$33.apply(SparkContext.scala:2125)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
From my understanding you're trying to reduce the items by position in the arrays. You should consider to zip your arrays while reducing the rdd :
val a: RDD[Array[Int]] = ss.createDataset[Array[Int]](Seq(Array(1,2,3), Array(4,5,6))).rdd
a.reduce{case (a: Array[Int],b: Array[Int]) =>
val ziped = a.zip(b)
ziped.map{case (i1, i2) => i1 + i2}
}.foreach(println)
outputs :
5
7
9
I tried to add an element to a Scala HashMap
val c2 = new collection.mutable.HashMap[String,Int]()
c2 += ("hh",1)
but the above gives me a compile error.
[error] found : String("hh")
[error] required: (String, Int)
[error] c2 += ("hh", 1)
[error] ^
[error] /scalathing/Main.scala:5: type mismatch;
[error] found : Int(1)
[error] required: (String, Int)
[error] c2 += ("hh", 1)
[error] ^
[error] two errors found
[error] (compile:compileIncremental) Compilation failed
[error] Total time: 3 s, completed Sep 1, 2016 1:22:52 AM
The pair I'm adding seems to be of the correct type as demanded by the HashMap. Why do I get a compile error?
The += operator is overloaded to work with variadic arguments. Therefore when the compiler sees c2 += ("hh", 1) it interprets that as two arguments being passed in, one of which is "hh" and the other of which is 1. You can fix that either by using the -> operator, i.e. c2 += ("hh" -> 1) or enclosing the tuple in another series of parantheses, i.e. c2 += (("hh, 1)).
Slightly gory details below as requested in the comments.
As for how all this works, in the case of mutable collections such as HashMap, += is simply an ordinary method called with operator syntax (i.e. spaces instead of a .) or "infix notation" as the Scala community calls it, as any method in Scala can be. It is provided by the Growable trait which mutable collections mix in. You can see on the documentation for Growable both the single argument += method and the variadic method. In other words the following code would have also worked.
c2.+=(("hh", 1))
Not all +=s are created equal however. += commonly shows up in vars as well. Although it can be called with method syntax ., it's magic syntax sugar implemented directly by the Scala compiler. In particular any nonalphanumeric name followed by an = gets desugared. x $NAME= y becomes x = x.$NAME(y). In this case $NAME= is variadic if and only if $NAME is variadic.
var i = 0
i += 1
i.+=(1) // Also compiles
case class SuperInt(raw: Int) {
def &*&(x: SuperInt) = SuperInt(raw + x.raw)
}
var x = SuperInt(1)
x &*&= SuperInt(1) // This works
x.&*&=(SuperInt(1)) // As does this
x &*&= (SuperInt(1), SuperInt(1)) // Does not compile because &*& is not variadic
I am trying to convert a column which contains Array[String] to String, but I consistently get this error
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 78.0 failed 4 times, most recent failure: Lost task 0.3 in stage 78.0 (TID 1691, ip-******): java.lang.ClassCastException: scala.collection.mutable.WrappedArray$ofRef cannot be cast to [Ljava.lang.String;
Here's the piece of code
val mkString = udf((arrayCol:Array[String])=>arrayCol.mkString(","))
val dfWithString=df.select($"arrayCol").withColumn("arrayString",
mkString($"arrayCol"))
WrappedArray is not an Array (which is plain old Java Array not a natve Scala collection). You can either change signature to:
import scala.collection.mutable.WrappedArray
(arrayCol: WrappedArray[String]) => arrayCol.mkString(",")
or use one of the supertypes like Seq:
(arrayCol: Seq[String]) => arrayCol.mkString(",")
In the recent Spark versions you can use concat_ws instead:
import org.apache.spark.sql.functions.concat_ws
df.select(concat_ws(",", $"arrayCol"))
The code work for me:
df.select("wifi_ids").rdd.map(row =>row.get(0).asInstanceOf[WrappedArray[WrappedArray[String]]].toSeq.map(x=>x.toSeq.apply(0)))
In your case,I guess it is:
val mkString = udf(arrayCol=>arrayCol.asInstanceOf[WrappedArray[String]].toArray.mkString(","))
val dfWithString=df.select($"arrayCol").withColumn("arrayString",mkString($"arrayCol"))
I am having a problem in my Chisel code, I tried the following approach
deqReg := Cat((0 until ports).map(ownReg === Cat(io.configVal(portBits*(_) + 2),io.configVal(portBits*(_)+ 1), io.configVal(portBits*(_)))))
but I am getting the following error when running the above code
[error] /home/jayant/Dropbox/FIFO/fifo.scala:24: missing parameter type for expanded function ((x$1) => portBits.$times(x$1).$plus(2))
[error] deqReg := Cat((0 until ports).map(ownReg === Cat(io.configVal(portBits*(_) + 2),io.configVal(portBits*(_)+ 1), io.configVal(portBits*(_)))))
[error] ^
[error] one error found
[error] (compile:compileIncremental) Compilation failed
[error] Total time: 2 s, completed 4 Sep, 2015 12:31:40 PM
can any one tell what is this error and how to correct it.
You have multiple nested functions in your map which would make it impossible for the Scala compiler to infer the type of the argument. In other words you cannot user the "_" placeholder here. The placeholder just replaces the argument of the innermost function within the expression. Try a fully specified anonymous function (or a partial function) like this:
deqReg := Cat((0 until ports).map{ case i:Int => ownReg === Cat(io.configVal(portBits*i + 2), io.configVal(portBits*i + 1), io.configVal(portBits*i))})
Scala is a quite powerful language and you'd most probably be able to find a more elegant way to write that code.