My code is crashing with java.util.NoSuchElementException: next on empty iterator exception.
def myfunction(arr : Array[(Int,(String,Int))]) = {
val values = (arr.sortBy(x => (-x._2._2, x._2._1.head)).toList)
...........................
The code is crashing in the first line where I am trying to sort an array.
var arr = Array((1,("kk",1)),(1,("hh",1)),(1,("jj",3)),(1,("pp",3)))
I am trying to sort the array on the basis of 2nd element of the inner tuple. If there is equality the sort should take place on first element of inner tuple.
output - ((1,("pp",3)),(1,("jj",3)),(1,("hh",1)),(1,("kk",1)))
This is crashing under some scenarios (normally it works fine) which I guess is due to empty array.
How can I get rid of this crash or any other elegant way of achieving the same result.
It happens because one of your array items (Int,(String,Int)) contains empty string.
"".head
leads to
java.util.NoSuchElementException: next on empty iterator
use x._2._1.headOption
val values = (arr.sortBy(x => (-x._2._2, x._2._1)).toList)
Removing head from the statement works.This crashes because of the empty string in arr
var arr = Array((1,("kk",1)),(1,("hh",1)),(1,("jj",3)),(1,("pp",3)),(1,("",1)))
I use MLlib in spark and get this error, It turned out that I predict for a non-existing userID or itemID, ALS will generate a matrix for prediction(userIDs * itemIDs), you must make sure that your request is included in this matrix.
Related
I am new to Scala, while running one spark program I am getting null Pointer exception. Can anyone point me how to solve this.
val data = spark.read.csv("C:\\File\\Path.csv").rdd
val result = data.map{ line => {
val population = line.getString(10).replaceAll(",","")
var popNum = 0L
if (population.length()> 0)
popNum = Long.parseLong(population)
(popNum, line.getString(0))
}}
.sortByKey(false)
.first()
//spark.sparkContext.parallelize(Seq(result)).saveAsTextFile(args(1))
println("The result is: "+ result)
spark.stop
Error message :
Caused by: java.lang.NullPointerException
at com.nfs.WBI.KPI01.HighestUrbanPopulation$$anonfun$1.apply(HighestUrbanPopulation.scala:23)
at com.nfs.WBI.KPI01.HighestUrbanPopulation$$anonfun$1.apply(HighestUrbanPopulation.scala:22)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
I guess that in your input data there is at least one row that does not contain a value in column 10, so that line.getString(10) returns null. When calling replaceAll(",","") on that result, the NullPointerException occurs.
A quick fix would be to wrap the the call to getString in an Option:
val population = Option(line.getString(10)).getOrElse("")
This returns the value of column 10 or an empty string if the column is null.
Some care must be taken when parsing the long. Unless you are absolutely sure that the column always contains a number, a NumberFormatException could be thrown.
In general, you should check the inferSchema option of the CSV reader of Spark and try to avoid parsing the data yourself.
In addition to the parsing issues mentioned elsewhere in this post, it seems that you have numbers separated by commas in your data. This is going to complicate csv parsing and cause potentially undesirable behavior. You may have to sanitize the data even before reading in spark.
Also if you're using Spark 2.0, it's best to use Dataframes/Datasets along with GroupBy constructs. See this post - How to deal with null values in spark reduceByKey function?. I suspect you have null values in your sort key as well.
I have built an RDD from a file where each element in the RDD is section from the file separated by a delimiter.
val inputRDD1:RDD[(String,Long)] = myUtilities.paragraphFile(spark,path1)
.coalesce(100*spark.defaultParallelism)
.zipWithIndex() //RDD[String, Long]
.filter(f => f._2!=0)
The reason I do the last operation above (filter) is to remove the first index 0.
Is there a better way to remove the first element rather than to check each element for the index value as done above?
Thanks!
One possibility is to use RDD.mapPartitionsWithIndex and to remove the first element from the iterator at index 0:
val inputRDD = myUtilities
.paragraphFile(spark,path1)
.coalesce(100*spark.defaultParallelism)
.mapPartitionsWithIndex(
(index, it) => if (index == 0) it.drop(1) else it,
preservesPartitioning = true
)
This way, you only ever advance a single item on the first iterator, where all others remain untouched. Is this be more efficient? Probably. Anyway, I'd test both versions to see which one performs better.
I am trying to create an array where each element is an empty array.
I have tried this:
var result = Array.fill[Array[Int]](Array.empty[Int])
After looking here How to create and use a multi-dimensional array in Scala?, I also tried this:
var result = Array.ofDim[Array[Int]](Array.empty[Int])
However, none of these work.
How can I create an array of empty arrays?
You are misunderstanding Array.ofDim here. It creates a multidimensional array given the dimensions and the type of value to hold.
To create an array of 100 arrays, each of which is empty (0 elements) and would hold Ints, you need only to specify those dimensions as parameters to the ofDim function.
val result = Array.ofDim[Int](100, 0)
Array.fill takes two params: The first is the length, the second the value to fill the array with, more precisely the second parameter is an element computation that will be invoked multiple times to obtain the array elements (Thanks to #alexey-romanov for pointing this out). However, in your case it results always in the same value, the empty array.
Array.fill[Array[Int]](length)(Array.empty)
Consider also Array.tabulate as follows,
val result = Array.tabulate(100)(_ => Array[Int]())
where the lambda function is applied 100 times and for each it delivers an empty array.
I am trying to read a multidimensional array line by line, as shown beneath:
var a = Array(MAX_N)(MAX_M)
for(i <- 1 to m) {
a(i) = readLine.split(" ").map(_.toInt)
}
However, I am getting the error:
error: value update is not a member of Int
So, how can I read the array line by line?
The main problem here is actually in your first line of code.
Array(MAX_N)(MAX_M) doesn't mean what you think it means.
The first part, Array(MAX_N), means "make an array of size 1 containing MAX_N", and then (MAX_M) means "return the MAX_M'th element of that array". So for example:
scala> Array(9)(0)
res1: Int = 9
To make a two-dimensional array, use Array.ofDim. See How to create and use a multi-dimensional array in Scala?
(There are more problems in your code after the first line. Perhaps someone else will point them out.)
I have worked on python
In python there is a function .pop() which delete the last value in a list and return that
deleted value
ex. x=[1,2,3,4]
x.pop() will return 4
I was wondering is there is a scala equivalent for this function?
If you just wish to retrieve the last value, you can call x.last. This won't remove the last element from the list, however, which is immutable. Instead, you can call x.init to obtain a list consisting of all elements in x except the last one - again, without actually changing x. So:
val lastEl = x.last
val rest = x.init
will give you the last element (lastEl), the list of all bar the last element (rest), and you still also have the original list (x).
There are a lot of different collection types in Scala, each with its own set of supported and/or well performing operations.
In Scala, a List is an immutable cons-cell sequence like in Lisp. Getting the last element is not a well optimised solution (the head element is fast). Similarly Queue and Stack are optimised for retrieving an element and the rest of the structure from one end particularly. You could use either of them if your order is reversed.
Otherwise, Vector is a good performing general structure which is fast both for head and last calls:
val v = Vector(1, 2, 3, 4)
val init :+ last = v // uses pattern matching extractor `:+` to get both init and last
Where last would be the equivalent of your pop operation, and init is the sequence with the last element removed (you can also use dropRight(1) as suggested in the other answers). To just retrieve the last element, use v.last.
I tend to use
val popped :: newList = list
which assigns the first element of the list to popped and the remaining list to newList
The first answer is correct but you can achieve the same doing:
val last = x.last
val rest = x.dropRight(1)
If you're willing to relax your need for immutable structures, there's always Stack and Queue:
val poppable = scala.collection.mutable.Stack[String]("hi", "ho")
val popped = poppable.pop
Similar to Python's ability to pop multiple elements, Queue handles that:
val multiPoppable = scala.collection.mutable.Queue[String]("hi", "ho")
val allPopped = poppable.dequeueAll(_ => true)
If it is mutable.Queue, use dequeue function
/** Returns the first element in the queue, and removes this element
* from the queue.
*
* #throws java.util.NoSuchElementException
* #return the first element of the queue.
*/
def dequeue(): A =
if (isEmpty)
throw new NoSuchElementException("queue empty")
else {
val res = first0.elem
first0 = first0.next
decrementLength()
res
}