I have implemented routine to extract item from head, update it's priority and put it back to the queue with non-blocking approach like this (using AtomicReference)
def head(): Entry = {
def takeAndUpdate(e: Entry, success: Boolean): Entry = {
if (success) {
return e
}
val oldQueue = queueReference.get()
val newQueue = oldQueue.clone()
val item = newQueue.dequeue().increase()
newQueue += item
takeAndUpdate(item.e, queueReference.compareAndSet(oldQueue, newQueue))
}
takeAndUpdate(null, false)
}
Now I need to find out arbitrary Entry in the queue, change it's priority and put it back to the queue. It seems that PriorityQueue doesn't support this, so which class I should use to accomplish desired behavior?
It's related to Change priority of items in a priority queue
Use an immutable tree map (immutable.TreeMap) to accomplish this. You have to find the entry you want somehow - best way to do this is to associate the part of the information (Key) you use to find the entry with a key, and the actual entry that you wish to return with the value in the map (call this Entry).
Rename queueReference with treeReference. In the part of the code where you create the collection, use immutable.TreeMap[Key, Entry](<list of elements>).
Then modify the code like this:
def updateKey(k: Key) {
#annotation.tailrec def update(): (Key, Entry) = {
val oldTree = treeReference.get()
val entry = oldTree(k)
val newPair = modifyHoweverYouWish(k, entry)
val newTree = (oldTree - k) + newPair
if (treeReference.compareAndSet(oldTree, newTree)) newPair
else update()
}
update()
}
If the compareAndSet fails, you have to repeat as you already do in your source. Best to use #tailrec as shown above, to ensure that the function is tail recursive and prevent a potential stack overflow.
Alternatively, you could use an immutable.TreeSet - if you know the exact reference to your Entry object, you can use it to remove the element via - as above and then add it back after calling increase(). The code is almost the same.
Related
I have a pipeline with a set of PTransforms and my method is getting very long.
I'd like to write my DoFns and my composite transforms in a separate package and use them back in my main method. With python it's pretty straightforward, how can I achieve that with Scio? I don't see any example of doing that. :(
withFixedWindows(
FIXED_WINDOW_DURATION,
options = WindowOptions(
trigger = groupedWithinTrigger,
timestampCombiner = TimestampCombiner.END_OF_WINDOW,
accumulationMode = AccumulationMode.ACCUMULATING_FIRED_PANES,
allowedLateness = Duration.ZERO
)
)
.sumByKey
// How to write this in an another file and use it here?
.transform("Format Output") {
_
.withWindow[IntervalWindow]
.withTimestamp
}
If I understand your question correctly, you want to bundle your map, groupBy, ... transformations in a separate package, and use them in your main pipeline.
One way would be to use applyTransform, but then you would end up using PTransforms, which are not scala-friendly.
You can simply write a function that receives an SCollection and returns the transformed one, like:
def myTransform(input: SCollection[InputType]): Scollection[OutputType] = ???
But if you intend to write your own Source/Sink, take a look at the ScioIO class
You can use map function to map your elements example.
Instead of passing a lambda, you can pass a method reference from another class
Example .map(MyClass.MyFunction)
I think one way to solve this could be to define an object in another package and then create a method in that object that would have the logic required for your transformation. For example:
def main(cmdlineArgs: Array[String]): Unit = {
val (sc, args) = ContextAndArgs(cmdlineArgs)
val defaulTopic = "tweets"
val input = args.getOrElse("inputTopic", defaulTopic)
val output = args("outputTopic")
val inputStream: SCollection[Tweet] = sc.withName("read from pub sub").pubsubTopic(input)
.withName("map to tweet class").map(x => {parse(x).extract[Tweet]})
inputStream
.flatMap(sentiment.predict) // object sentiment with method predict
}
object sentiment {
def predict(tweet: Tweet): Option[List[TweetSentiment]] = {
val data = tweet.text
val emptyCase = Some("")
Some(data) match {
case `emptyCase` => None
case Some(v) => Some(entitySentimentFile(data)) // I used another method, //not defined
}
}
Please also this link for an example given in the Scio examples
I have code like below
val g = new Graph(vertices)
//Firts part
(1 to vertices).par.foreach( i => g + new Vertex(i))
//Second part
for (i <- 1 to edges) {
val data = scala.io.StdIn.readLine()
val d = data.split(" ")
val v1 = d(0).toInt
val v2 = d(1).toInt
val length = d(2).toInt
g+(v1, v2, length)
}
I want to execute first and second part of code sequentially.
At present for loop run before the all Vertex have added to g.
In code + (plus) define add new instance of Vertex to MutableList.
I'am new in scala, please help
You could wrap each call in the following
new Thread(new Runnable {
override def run(): Unit = {
//Code part here:
}
}).start()
You also need to ensure that the your Graph implementation is thread safe as you will have two threads modifying it concurrently.
It doesn't look like you're returning anything from either part but if you were you could use a Future instead. See here for details.
Parallel collection only parallelises the computation for the collection. The second part of the computation where you add edges should happen after the adding of the vertices.
What I would assume is that may be from parsing there are some vertices which does not exists in the vertices at all, may be some empty space or something like that.
I am not sure about what goes on while adding the vertices to the graph but the foreach on parallel should be careful if the operation should be free of side-effect. Please see this for more information. May be this is not even relevant.
I find out the solution. I read more about add new elements to collection parallel and it isn't thread save. I replaced MutableList to fixedsize array, and I add new element by index.
Some code below:
class Graph(val end: Int) {
private val vertices : Array[Vertex] = new Array[Vertex](end)
def +(index: Int, v: Vertex): Unit = {
vertices(index) = v
}
(...)
}
//Firts part
(1 to vertices).par.foreach( i => g + (i-1,new Vertex(i))) //add new vertex to array by index
I am new to Scala and I have a function as follows:
def selectSame(messages: BufferedIterator[Int]) = {
val head = messages.head
messages.takeWhile(_ == head)
}
Which is selecting from a buffered iterator only the elems matching the head. I am subsequently using this code:
val messageStream = List(1,1,1,2,2,3,3)
if (!messageStream.isEmpty) {
var lastTimeStamp = messageStream.head.timestamp
while (!messageStream.isEmpty) {
val messages = selectSame(messageStream).toList
println(messages)
}
Upon first execution I am getting (1,1,1) as expected, but then I only get the List(2), like if I lost one element down the line... Probably I am doing sth wrong with the iterators/lists, but I am a bit lost here.
Scaladoc of Iterator says about takeWhile:
Reuse: After calling this method, one should discard the iterator it
was called on, and use only the iterator that was returned. Using the
old iterator is undefined, subject to change, and may result in
changes to the new iterator as well.
So that's why. This basically means you cannot directly do what you want with Iterators and takeWhile. IMHO, easiest would be to quickly write your own recursive function to do that.
If you want to stick with Iterators, you could use the sameElements method on the Iterator to generate a duplicate where you'd call dropWhile.
Even better: Use span repeatedly:
def selectSame(messages: BufferedIterator[Int]) = {
val head = messages.head
messages.span(_ == head)
}
def iter(msgStream: BufferedIterator[Int]): Unit = if (!msgStream.isEmpty) {
val (msgs, rest) = selectSame(msgStream)
println(msgs.toList)
iter(rest)
}
val messageStream = List(1,1,1,2,2,3,3)
if (!messageStream.isEmpty) {
var lastTimeStamp = messageStream.head.timestamp
iter(messageStream0
}
I am a newbie to scala and I am writing scala code to implement pastry protocol. The protocol itself does not matter. There are nodes and each node has a routing table which I want to populate.
Here is the part of the code:
def act () {
def getMatchingNode (initialMatch :String) : Int = {
val len = initialMatch.length
for (i <- 0 to noOfNodes-1) {
var flag : Int = 1
for (j <- 0 to len-1) {
if (list(i).key.charAt(j) == initialMatch(j)) {
continue
}
else {
flag = 0
}
}
if (flag == 1) {
return i
}
}
return -1
}
// iterate over rows
for (ii <- 0 to rows - 1) {
for (jj <- 0 to 15) {
var initialMatch = ""
for (k <- 0 to ii-1) {
initialMatch = initialMatch + key.charAt(k)
}
initialMatch += jj
println("initialMatch",initialMatch)
if (getMatchingNode(initialMatch) != -1) {
Routing(0)(jj) = list(getMatchingNode(initialMatch)).key
}
else {
Routing(0)(jj) = "NULL"
}
}
}
}// act
The problem is when the function call to getMatchingNode takes place then the actor dies suddenly by itself. 'list' is the list of all nodes. (list of node objects)
Also this behaviour is not consistent. The call to getMatchingNode should take place 15 times for each actor (for 10 nodes).
But while debugging the actor kills itself in the getMatchingNode function call after one call or sometimes after 3-4 calls.
The scala library code which gets executed is this :
def run() {
try {
beginExecution()
try {
if (fun eq null)
handler(msg)
else
fun()
} catch {
case _: KillActorControl =>
// do nothing
case e: Exception if reactor.exceptionHandler.isDefinedAt(e) =>
reactor.exceptionHandler(e)
}
reactor.kill()
}
Eclipse shows that this code has been called from the for loop in the getMatchingNode function
def getMatchingNode (initialMatch :String) : Int = {
val len = initialMatch.length
for (i <- 0 to noOfNodes-1)
The strange thing is that sometimes the loop behaves normally and sometimes it goes to the scala code which kills the actor.
Any inputs what wrong with the code??
Any help would be appreciated.
Got the error..
The 'continue' clause in the for loop caused the trouble.
I thought we could use continue in Scala as we do in C++/Java but it does not seem so.
Removing the continue solved the issue.
From the book: "Programming in Scala 2ed" by M.Odersky
You may have noticed that there has been no mention of break or continue.
Scala leaves out these commands because they do not mesh well with function
literals, a feature described in the next chapter. It is clear what continue
means inside a while loop, but what would it mean inside a function literal?
While Scala supports both imperative and functional styles of programming,
in this case it leans slightly towards functional programming in exchange
for simplifying the language. Do not worry, though. There are many ways to
program without break and continue, and if you take advantage of function
literals, those alternatives can often be shorter than the original code.
I really suggest reading the book if you want to learn scala
Your code is based on tons of nested for loops, which can be more often than not be rewritten using the Higher Order Functions available on the most appropriate Collection.
You can rewrite you function like the following [I'm trying to make it approachable for newcomers]:
//works if "list" contains "nodes" with an attribute "node.key: String"
def getMatchingNode (initialMatch :String) : Int = {
//a new list with the corresponding keys
val nodeKeys = list.map(node => node.key)
//zips each key (creates a pair) with the corresponding index in the list and then find a possible match
val matchOption: Option[(String, Int)] = (nodeKeys.zipWithIndex) find {case (key, index) => key == initialMatch}
//we convert an eventual result contained in the Option, with the right projection of the pair (which contains the index)
val idxOption = matchOption map {case (key, index) => index} //now we have an Option[Int] with a possible index
//returns the content of option if it's full (Some) or a default value of "-1" if there was no match (None). See Option[T] for more details
idxOption.getOrElse(-1)
}
The potential to easily transform or operate on the Collection's elements is what makes continues, and for loops in general, less used in Scala
You can convert the row iteration in a similar way, but I would suggest that if you need to work a lot with the collection's indexes, you want to use an IndexedSeq or one of its implementations, like ArrayBuffer.
i was wondering if there is any 'easy' way to update immutable scala collections safely. Consider following code:
class a {
private var x = Map[Int,Int]()
def update(p:(Int,Int)) { x = x + (p) }
}
This code is not thread safe, correct? By that i mean that if we have two threads invoking update method and lets say that x is map containing { 1=>2 } and thread A invokes update((3,4)) and only manages to execute the x + (p) part of the code. Then rescheduling occurs and thread B invokes update((13,37)) and successfully updates the variable x. The thread A continues and finishes.
After all this finishes, value x would equal map containing { 1=>2, 3=>4 }, correct? Instead of desired { 1=>2, 3=>4, 13=>37 }. Is there a simple way to fix that? I hope it's undestandable what I'm asking :)
Btw, i know there are solutions like Akka STM but i would prefer not to use those, unless necessary.
Thanks a lot for any answer!
edit: Also, i would prefer solution without locking. Eeeew :)
In your case, as MaurĂcio wrote, your collection is already thread safe because it is immutable. The only problem is reassigning the var, which may not be an atomic operation. For this particular problem, the easiest option is to use of the nice classes in java.util.concurrent.atomic, namely AtomicReference.
import java.util.concurrent.atomic.AtomicReference
class a {
private val x = new AtomicReference(Map[Int,Int]())
def update(p:(Int,Int)) {
while (true) {
val oldMap = x.get // get old value
val newMap = oldMap + p // update
if (x.compareAndSet(oldMap, newMap))
return // exit if update was successful, else repeat
}
}
}
The collection itself is thread safe as it has no shared mutable state, but your code is not and there is no way to fix this without locking, as you do have shared mutable state. Your best option is to lock the method itself marking it as synchronized.
The other solution would be use a mutable concurrent map, possibly java.util.concurrent.ConcurrentMap.
out of the box
atomicity
lock-free
O(1)
Check this out: http://www.scala-lang.org/api/2.11.4/index.html#scala.collection.concurrent.TrieMap
Re. Jean-Philippe Pellet's answer: you can make this a little bit more re-usable:
def compareAndSetSync[T](ref: AtomicReference[T])(logic: (T => T)) {
while(true) {
val snapshot = ref.get
val update = logic(snapshot)
if (ref.compareAndSet(snapshot, update)) return
}
}
def compareSync[T,V](ref: AtomicReference[T])(logic: (T => V)): V = {
var continue = true
var snapshot = ref.get
var result = logic(snapshot)
while (snapshot != ref.get) {
snapshot = ref.get
result = logic(snapshot)
}
result
}