LinkedHashMap variable is not accessable out side the foreach loop - scala

Here is my code.
var link = scala.collection.mutable.LinkedHashMap[String, String]()
var fieldTypeMapRDD = fixedRDD.mapPartitionsWithIndex((idx, itr) => itr.map(s => (s(8), s(9))))
fieldTypeMapRDD.foreach { i =>
println(i)
link.put(i._1, i._2)
}
println(link.size)// here size is zero
I want to access link out side loop .Please help.

Why your code is not supposed to work:
Before your foreach task is started, whole your function's closure inside foreach block is serialized and sent first to master, then to each of workers. This means each of them will have its own instance of mutable.LinkedHashMap as copy of link.
During foreach block each worker will put each of its items inside its own link copy
After your task is done you have still empty local link and several non-empty former copies on each of worker nodes.
Moral is clear: don't use local mutable collections with RDD. It's just not going to work.
One way to get whole collection to local machine is collect method.
You can use it as:
val link = fieldTypeMapRDD.collect.toMap
or in case of need to preserve the order:
import scala.collection.immutable.ListMap
val link = ListMap(fieldTypeMapRDD.collect:_*)
But if you are really into mutable collections, you can modify your code a bit. Just change
fieldTypeMapRDD.foreach {
to
fieldTypeMapRDD.toLocalIterator.foreach {
See also this question.

Related

Scala.js - Convert Uint8Array to Array[Byte]

How do I implement the following method in Scala.js?
import scala.scalajs.js
def toScalaArray(input: js.typedarray.Uint8Array): Array[Byte] =
// code in question
edited per request: tl;dr
input.view.map(_.toByte).toArray
Original answer
I'm not intimately familiar with Scala-js, but I can elaborate on some of the questions that came up in the comments, and improve upon your self-answer.
Also I don't quite get why I need toByte calls
class Uint8Array extends Object with TypedArray[Short, Uint8Array]
Scala treats a Uint8Array as a collection of Short, whereas you are expecting it to be a collection of Byte
Uint8Array's toArray method notes:
This member is added by an implicit conversion from Uint8Array to
IterableOps[Short] performed by method iterableOps in scala.scalajs.js.LowestPrioAnyImplicits.
So the method is returning an Array[Short] which you then .map to convert the Shorts to Bytes.
In your answer you posted
input.toArray.map(_.toByte)
which is technically correct, but it has the downside of allocating an intermediate array of the Shorts. To avoid this allocation, you can perform the .map operation on a .view of the array, then call .toArray on the view.
Views in Scala (and by extension Scala.js) are lightweight objects that reference an original collection plus some kind of transformation/filtering function, which can be iterated like any other collection. You can compose many transformation/filters on a view without having to allocate intermediate collections to represent the results. See the docs page (linked) for more.
input.view.map(_.toByte).toArray
Depending on how you intend to pass the resulting value around, you may not even need to call .toArray. For example if all you need to do is iterate the elements later on, you could just pass the view around as an Iterable[Byte] without ever having to allocate a separate array.
All the current answers require iterating over the array in user space.
Scala.js has optimizer supported conversions for typed arrays (in fact, Array[Byte] are typed arrays in modern configs). You'll likely get better performance by doing this:
import scala.scalajs.js.typedarray._
def toScalaArray(input: Uint8Array): Array[Byte] = {
// Create a view as Int8 on the same underlying data.
new Int8Array(input.buffer, input.byteOffset, input.length).toArray
}
The additional new Int8Array is necessary to re-interpret the underlying buffer as signed (the Byte type is signed). Only then, Scala.js will provide the built in conversion to Array[Byte].
When looking at the generated code, you'll see no user space loop is necessary: The built-in slice method is used to copy the TypedArray. This will almost certainly not be beatable in terms of performance by any user-space loop.
$c_Lhelloworld_HelloWorld$.prototype.toScalaArray__sjs_js_typedarray_Uint8Array__AB = (function(input) {
var array = new Int8Array(input.buffer, $uI(input.byteOffset), $uI(input.length));
return new $ac_B(array.slice())
});
If we compare this with the currently accepted answer (input.view.map(_.toByte).toArray) we see quite a difference (comments mine):
$c_Lhelloworld_HelloWorld$.prototype.toScalaArray__sjs_js_typedarray_Uint8Array__AB = (function(input) {
var this$2 = new $c_sjs_js_IterableOps(input);
var this$5 = new $c_sc_IterableLike$$anon$1(this$2);
// We need a function
var f = new $c_sjsr_AnonFunction1(((x$1$2) => {
var x$1 = $uS(x$1$2);
return ((x$1 << 24) >> 24)
}));
new $c_sc_IterableView$$anon$1();
// Here's the view: So indeed no intermediate allocations.
var this$8 = new $c_sc_IterableViewLike$$anon$6(this$5, f);
var len = $f_sc_TraversableOnce__size__I(this$8);
var result = new $ac_B(len);
// This function actually will traverse.
$f_sc_TraversableOnce__copyToArray__O__I__V(this$8, result, 0);
return result
});
import scala.scalajs.js
def toScalaArray(input: js.typedarray.Uint8Array): Array[Byte] =
input.toArray.map(_.toByte)

Binding.scala: Strategy to avoid too many dom-tree updates

In my project scala-adapters I display log entries that are sent over a websocket.
As I have no control on how many entries are sent, I am looking for a strategy to avoid that the screen freezes.
I created a ScalaFiddle to simulate that: https://scalafiddle.io/sf/kzr28tq
This function with these parameters works perfectly:
setInterval(1000) { // note the absence of () =>
entries.value += (0 to 100).map(_.toString).mkString("")
}
If the interval gets smaller and the String longer - the screen freezes, e.g. with:
setInterval(100) { // note the absence of () =>
entries.value += (0 to 10000).map(_.toString).mkString("")
}
Is there a solution to solve that on the client side - or do I have to solve that on the server side?
You can try:
#dom
def render = {
<div>
{
for (entry <- entries) yield {
entryDiv(entry).bind
}
}
</div>
}
The problem is that you bind entries too early. Binding.scala does its magic by CPS transform, every .bind triggers re-evaluation of all code after, so you should bind a variable as late as possible.
And for Vars, use for comprehension instead of bind directly, to avoid updating the whole list. When you use += to modify the content of Vars, Binding.scala "patches" the list internally, but if you do .bind on the Vars instance directly to get the whole list, the framework cannot do any optimization for you.
Here is the updated ScalaFiddle: https://scalafiddle.io/sf/kzr28tq/3

Accessing Constants which rely on a list buffer to be populated Scala

Encountering a problem whereby I am specifying Private Constants at the start of a scala step definiton file which relies on a List Buffer element to be populated, however when compiling I get a 'IndexOutOfBoundsException' because the list is empty initially and only gets populated later in a for loop.
For Example I have the following 2 constants:
private val ConstantVal1= globalExampleList(2)
private val ConstantVal2= globalExampleList(3)
globalExampleList is populated further down in the file using a for loop:
for (i <- 1 to numberOfW) {
globalExampleList += x.xy }
This List Buffer adds as many values as required to a global mutable ListBuffer.
Is there a better way to declare these constants? I've tried to declare them after the for loop but then other methods are not able to access these. I have around 4 different methods within the same file which use these values and instead of accessing it via index each time i thought it would be better to declare them as a constant to keep it neat and efficient for whenever they require changing.
Thanks
You can create list buffer of necessary size with default value and populate it later:
val globalExampleList: ListBuffer[Int] = ListBuffer.fill(numberOfW)(0)
for (i <- 0 until numberOfW) {
globalExampleList(i) = x.xy
}
But ConstantVal1, ConstantVal2 will still have original default value. So you can make them vars and re-assign them after you populate the buffer.
Your code seems to have a lot of mutations and side effects.
You have 2 ways to go.
First you can use lazy modifier
private lazy val ConstantVal1= globalExampleList(2)
private lazy val ConstantVal2= globalExampleList(3)
Or you can write the two lines after the for loop.
val globalExampleList = XXXX
for (i <- 1 to numberOfW) { globalExampleList += x.xy }
private val ConstantVal1= globalExampleList(2)
private val ConstantVal2= globalExampleList(3)

How to add nested message into already created message ? (in Scala)

After adding nested message i recieve nested messages from main message and got nothing.
You can see it in logs: 1 and 2. Size of List 0 !
Any ideas?
message PacketPlayers
{
repeated PacketPlayer players = 1;
}
ScalaPB case classes are immutable. In your example, addPlayers would not modify the instance it's called on, but return a new instance of PacketPlayer that has the additional players.
It is possible to avoid mutable arrays and vars in constructing the new object. For example:
val players = onlinePlayers.keySet.map(makePacketPlayer)
val packetPlayers = PacketPlayers().withPlayers(players)

Querying a continously running operation for its current state/value in Scala

I have a procedure that continuously updates a value. I want to be able to periodically query the operation for the current value. In my particular example, every update can be considered an improvement and the procedure will eventually converge on a final, best answer, but I want/need access to the intermediate results. The speed with which the loop executes and the time it takes to converge matters.
As an example, consider this loop:
var current = 0
while(current < 100){
current = current + 1
}
I want to be able to get value of current on any loop iteration.
A solution with an Actor would be:
class UpdatingActor extends Actor{
var current : Int = 0
def receive = {
case Update => {
current = current + 1
if (current < 100) self ! Update
}
case Query => sender ! current
}
}
You could get rid of the var using become or FSM, but this example is more clear IMO.
Alternatively, one actor could run the operation and send updated results on every loop iteration to another actor, whose sole responsibility is updating the value and responding to queries about it. I don't know much about "agents" in Akka, but this seems like a potential use case for one.
What are better/alternative ways of doing this using Scala? I don't need to use actors; that was just one solution that came to mind.
Your actor-based solution is ok.
Sending the intermediate result after each change to a "result provider" actor would be a good idea as well if the calculation blocks the actor for a long time and you want to make sure that you can always get the intermediate result. Another alternative would be to make the actual calculator actor a child of the actor that collects the best result. That way the thing acts as a single actor from the outside, and you have the actor that has state (the current best result) separated from the actor that does the computation, which might fail.
An agent would be a solution somewhat between the very low level #volatile/AtomicInteger approach and an Actor. An agent is something that can only be modified by running a transform on it (and there is a queue for transforms), but which has a current state that can always be accessed. It is not location transparent though. so stay with the actor approach if you need that.
Here is how you would solve this with an agent. You have one thread which does a long-running calculation (simulated by Thread.sleep) and another thread that just prints out the best current result in regular intervals (also simulated by Thread.sleep).
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
import scala.concurrent._
import akka.agent.Agent
object Main extends App {
val agent = Agent(0)
def computation() : Unit = {
for(i<-0 until 100) {
agent.send { current =>
Thread.sleep(1000) // to simulate a long-running computation
current + 1
}
}
}
def watch() : Unit = {
while(true) {
println("Current value is " + agent.get)
Thread.sleep(1000)
}
}
global.execute(new Runnable {
def run() = computation
})
watch()
}
But all in all I think an actor-based solution would be superior. For example you could do the calculation on a different machine than the result tracking.
The scope of the question is a little wide, but I'll try :)
First, your example is perfectly fine, I don't see the point of getting rid of the var. This is what actors are for: protect mutable state.
Second, based on what you describe you don't need an actor at all.
class UpdatingActor {
private var current = 0
def startCrazyJob() {
while(current < 100){
current = current + 1
}
}
def soWhatsGoingOn: Int = current
}
You just need one thread to call startCrazyJob and a second one that will periodically call soWhatsGoingOn.
IMHO, the actor approach is better, but it's up to you to decide if it's worth importing the akka library just for this use case.