Terminating a scala program? - scala

I have used try catch as part of my mapreduce code. I am reducing my values based on COUNT in the below code. how do i terminate the job using the code in the below
class RepReducer extends Reducer[NullWritable, Text, Text, IntWritable] {
override def reduce(key: NullWritable, values: Iterable[Text], context: Reducer[NullWritable, Text, Text, IntWritable]#Context): Unit = {
val count = values.toList.length
if (count == 0){
try {
context.write(new Text("Number of tables with less than 40% coverage"), new IntWritable(count))
} catch {
case e: Exception =>
Console.err.println(" ")
e.printStackTrace()
}
}
else
{
System.out.println("terminate job") //here i want to terminate if count is not equal to 0
}
}
}

I think you still need to call context.write to return the control back to Hadoop even if you decide to skip certain data in the 'else'.

Related

Performance disadvantage of using Datasets vs RDD with spark

I've rewrite my code partially to use dataset instead of rdds, however I experience significant performance decrease for some operations.
For example:
val filtered = trips.filter(t => exportFilter.check(t)).cache()
seems to be much slower, and CPU mostly idle:
What the reason for this? Is that bad idea to use datasets when trying to access plain objects?
UPDATE:
Here is filter check method:
override def check(trip: Trip): Boolean = {
if (trip == null || !trip.isCompleted) {
return false
}
// Return if no extended filter configured or we already
if (exportConfiguration.isBasicFilter) {
return trip.isCompleted
}
// Here trip is completed, check other conditions
// Filter out trips from future
val isTripTimeOk = checkTripTime(trip)
return isTripTimeOk
}
/**
* Trip time should have end time today or inside yesterday midnight interval
*/
def checkTripTime(trip: Trip): Boolean = {
// Check inclusive trip low bound. Should have end time today or inside yesterday midnight interval
val isLowBoundOk = tripTimingProcessor.isLaterThanYesterdayMidnightIntervalStarts(trip.getEndTimeMillis)
if (!isLowBoundOk) {
updateLowBoundMetrics(trip)
return false
}
// Check trip high bound
val isHighBoundOk = tripTimingProcessor.isBeforeMidnightIntervalStarts(trip.getEndTimeMillis)
if (!isHighBoundOk) {
metricService.inc(trip.getStartTimeMillis, trip.getProviderId,
ExportMetricName.TRIPS_EXPORTED_S3_SKIPPED_END_INSIDE_MIDNIGHT_INTERVAL)
}
return isHighBoundOk
}
private def updateLowBoundMetrics(trip: Trip) = {
metricService.inc(trip.getStartTimeMillis, trip.getProviderId,
ExportMetricName.TRIPS_EXPORTED_S3_SKIPPED_END_BEFORE_YESTERDAY_MIDNIGHT_INTERVAL)
val pointIter = trip.getPoints.iterator()
while (pointIter.hasNext()) {
val point = pointIter.next()
metricService.inc(point.getCaptureTimeMillis, point.getProviderId,
ExportMetricName.POINT_EXPORTED_S3_SKIPPED_END_BEFORE_YESTERDAY_MIDNIGHT_INTERVAL)
}
}

Can anyone explain interesting spin-lock behavior?

Given the following code
case class Score(value: BigInt, random: Long = randomLong) extends Comparable[Score] {
override def compareTo(that: Score): Int = {
if (this.value < that.value) -1
else if (this.value > that.value) 1
else if (this.random < that.random) -1
else if (this.random > that.random) 1
else 0
}
override def equals(obj: _root_.scala.Any): Boolean = {
val that = obj.asInstanceOf[Score]
this.value == that.value && this.random == that.random
}
}
#tailrec
private def update(mode: UpdateMode, member: String, newScore: Score, spinCount: Int, spinStart: Long): Unit = {
// Caution: there is some subtle logic below, so don't modify it unless you grok it
try {
Metrics.checkSpinCount(member, spinCount)
} catch {
case cause: ConcurrentModificationException =>
throw new ConcurrentModificationException(Leaderboard.maximumSpinCountExceeded.format("update", member), cause)
}
// Set the spin-lock
put(member, None) match {
case None =>
// BEGIN CRITICAL SECTION
// Member's first time on the board
if (scoreToMember.put(newScore, member) != null) {
val message = s"$member: added new member in memberToScore, but found old member in scoreToMember"
logger.error(message)
throw new ConcurrentModificationException(message)
}
memberToScore.put(member, Some(newScore)) // remove the spin-lock
// END CRITICAL SECTION
case Some(option) => option match {
case None => // Update in progress, so spin until complete
//logger.debug(s"update: $member locked, spinCount = $spinCount")
for (i <- -1 to spinCount * 2) {Thread.`yield`()} // dampen contention
update(mode, member, newScore, spinCount + 1, spinStart)
case Some(oldScore) =>
// BEGIN CRITICAL SECTION
// Member already on the leaderboard
if (scoreToMember.remove(oldScore) == null) {
val message = s"$member: oldScore not found in scoreToMember, concurrency defect"
logger.error(message)
throw new ConcurrentModificationException(message)
} else {
val score =
mode match {
case Replace =>
//logger.debug(s"$member: newScore = $newScore")
newScore
case Increment =>
//logger.debug(s"$member: newScore = $newScore, oldScore = $oldScore")
Score(newScore.value + oldScore.value)
}
//logger.debug(s"$member: updated score = $score")
scoreToMember.put(score, member)
memberToScore.put(member, Some(score)) // remove the spin-lock
//logger.debug(s"update: $member unlocked")
}
// END CRITICAL SECTION
// Do this outside the critical section to reduce time under lock
if (spinCount > 0) Metrics.checkSpinTime(System.nanoTime() - spinStart)
}
}
}
There are two important data structures: memberToScore and scoreToMember. I have experimented using both TrieMap[String,Option[Score]] and ConcurrentHashMap[String,Option[Score]] for memberToScore and both have the same behavior.
So far my testing indicates the code is correct and thread safe, but the mystery is the performance of the spin-lock. On a system with 12 hardware threads, and 1000 iterations on 12 Futures: hitting the same member all the time results in spin cycles of 50 or more, but hitting a random distribution of members can result in spin cycles of 100 or more. The behavior gets worse if I don't dampen the spin without iterating over yield() calls.
So, this seems counter intuitive, I was expecting the random distribution of keys to result in less spin than the same key, but testing proves otherwise.
Can anyone offer some insight into this counter-intuitive behavior?
Granted there may be better solutions to my design, and I am open to them, but for now I cannot seem to find a satisfactory explanation for what my tests are showing, and my curiosity leaves me hungry.
As an aside, while the single member test has a lower ceiling for the spin count, the random member test has a lower ceiling for time spinning, which is what I would expect. I just cannot explain why the random member test generally produces a higher ceiling for spin count.

Scala/Akka How do you reference the message being received?

I have a Java program that I must implement in Scala, but I am extremely new to Scala. After reading a number of SO question & answers as well as reading through a number of Google-retrieved resources on case classes, I am still having trouble grasping how to acquire a reference to the message I received? Example code is below:
case class SpecialMessage(key: Int) {
val id: Int = Main.idNum.getAndIncrement().intValue()
def getId(): Int = {
return id
}
}
Then in another class's receive I am trying to reference that number with:
def receive() = {
case SpecialMessage(key) {
val empID = ?? getId() // Get the id stored in the Special Message
// Do stuff with empID
}
}
I cannot figure out what to put on the right sight of empID = in order to get that id. Is this really simple, or something that isn't normally done?
These are 2 ways to do what you want, pick the one that suits best
case msg: SpecialMessage => {
val empID = msg.getId() // Get the id stored in the Special Message
// Do stuff with empID
}
case msg # SpecialMessage(key) => {
val empID = msg.getId() // Get the id stored in the Special Message
// Do stuff with empID
}
Pim's answer is good.
But maybe you can modify the structure of SpecialMessage like
case class SpecialMessage(key: Int,val id: Int = Main.idNum.getAndIncrement().intValue())
so you can get id directly from pattern matching.
def receive() = {
case SpecialMessage(key, empID) {
// Do stuff with empID
}
}

validation of fields in a form in scala with lift frame work

I am working with the Lift framework and Scala. I have a form to sign up to my application, and I want to validate all the fields in it. I have a snippet where I access my form values, and one validation class where I wrote my validation functions. The following code is what I've tried so far. In my Snippet:
if(validationClassObject.validateName(first_name)){
if(validationClassObject.validateName(last_name)){
if(validationClassObject.validateEmail(email)){
if(validationClassObject.validateUserName(name)){
// Adding values to the DB
S.redirectTo("/")
}
else{
S.notice("Invalid User Name")
}
}
else{
S.notice("Invalid Mail Id")
}
}
else{
S.notice("Invalid Last name")
}
}
else{
S.notice("Invalid First Name")
}
In the validationClass I wrote the validation code looks like:
//function for validating mail address
def validateEmail(email: String): Boolean =
"""(\w+)#([\w\.]+)""".r.unapplySeq(email).isDefined
//code for validating remaining fileds like above
This is working, but I know this is not the best way of coding this operation in Scala. How could I modify my code in a more scalable way? How can I use case classes here?
You could do:
def av[T,V](validationFunction: => Boolean, error: => T)(f: => V)={
if(!validationFunction) error
else f
}
def v[V](validationFunction: => Boolean, error: => String)(f: => V)=av(validationFunction,S.notice(error))(f)
import validationCalssObject._
v(validateName(last_name),"Invalid Last name"){v(validateName(name),"Invalid User Name"){...}}
av is a abstract method with T and V as result types for the error function and continue function f. v is the more specific function what expects a string for error and encapsulates the notice() call. we give f as the part in the curly braces v(validation, errormsg){/*todo when there is no problem*/}.
I can't do formatting in comments so I'll post a new answer.
def badName() = if ("name" == "") Some("bad name") else None
def badEmail() = if ("email" == "") Some("bad email") else None
val verifications = List[() => Option[String]](badName, badEmail)
val failed = verifications.flatMap(_())
if (failed.nonEmpty) {
// handle failed
} else {
// your custom logic here
}
if (badName) S.notice
else if (badEmail) S.notice
else if (badDay) S.notice
else { // everything OK...
// return a JsCmd or what else do you wanted here
}
An alternative solution can be written using Option and flatMap, without these all "if"-s hardcoded. If you're interested in that -- ask..

Writing own key publisher for MainFrame

I want my MainFrame to catch key events. I didn't find any key publisher already in it, so I'm going to write my own ... I have something like this:
class ImageView(image: ImageIcon, parent: UIElement = null) extends MainFrame {
object keys extends Publisher {
peer.addKeyListener(new KeyListener {
def keyPressed(e: java.awt.event.KeyEvent) {
publish(new KeyPressed(e))
}
def keyReleased(e: java.awt.event.KeyEvent) {
publish(new KeyReleased(e))
}
def keyTyped(e: java.awt.event.KeyEvent) {
publish(new KeyTyped(e))
}
})
}
listenTo(keys)
reactions += {
case KeyPressed(_, key,_,_) =>
if (key == Key.Escape) dispose
}
}
Anyway when I press any key, I get this exception:
Exception in thread "AWT-EventQueue-0" java.lang.ClassCastException: scala.swing.Frame$$anon$1 cannot be cast to javax.swing.JComponent
at scala.swing.event.KeyPressed.<init>(KeyEvent.scala:33)
at pip.gui.ImageView$keys$$anon$2.keyPressed(ImageView.scala:35)
at java.awt.Component.processKeyEvent(Component.java:6225)
at java.awt.Component.processEvent(Component.java:6044)
at java.awt.Container.processEvent(Container.java:2041)
at java.awt.Window.processEvent(Window.java:1836)
at java.awt.Component.dispatchEventImpl(Component.java:4630)
at java.awt.Container.dispatchEventImpl(Container.java:2099)
at java.awt.Window.dispatchEventImpl(Window.java:2478)
at java.awt.Component.dispatchEvent(Component.java:4460)
at java.awt.KeyboardFocusManager.redispatchEvent(KeyboardFocusManager.java:1850)
at java.awt.DefaultKeyboardFocusManager.dispatchKeyEvent(DefaultKeyboardFocusManager.java:712)
at java.awt.DefaultKeyboardFocusManager.preDispatchKeyEvent(DefaultKeyboardFocusManager.java:990)
at java.awt.DefaultKeyboardFocusManager.typeAheadAssertions(DefaultKeyboardFocusManager.java:855)
at
.
.
.
.
(continues long further)
I brought up this publisher code from Component.keys, so what is actually wrong here?
Thanks in advance,
Tony
This seems bad design in the library. Looking into KeyEvent.scala, there's all kinds of casting to JComponent going on, and JFrame is a subclass of java.awt.Component but not JComponent, so it should be impossible to call listenTo(keys).
What you want is to listen to the top-most component in the frame's contents. For instance:
import scala.swing._; import event._
import javax.swing._
class ImageView(image: ImageIcon, parent: UIElement = null) extends MainFrame {
val b = new BorderPanel {
listenTo( keys )
reactions += {
case KeyPressed(_, key,_,_) =>
println( "PRESSED : " + key )
if (key == Key.Escape) dispose
}
}
contents = b
}
val w = new ImageView( null )
w.peer.setSize( 200, 200 )
w.visible = true
w.b.requestFocus
the requestFocus is essential because the panel doesn't request the focus by itself even if you click on it, so otherwise it wouldn't receive key events.