If I have a stack using the mutable scala stack collection, is there a way I can copy the stack so that I can analyze its elements by popping it without altering the original stack? For instance, suppose I have a stack and code as follows:
import scala.collection.mutable.Stack
var stack1 = new Stack[Int]
/** Code that pushes integers on stack1*/
var stackCopy = stack1
while (!stackCopy.isEmpty) {
println(stackCopy.pop)
}
I want to use a while loop to print all elements in stack1. But when I make a copy and pop off that copy, the original stack (i.e. stack1) is also altered. I want to preserve the original stack, so how can I just get the contents, not the address?
You could use for comprehension:
for( i <- stack1 ) { println(i) }
or simply call the foreach method:
stack1 foreach println
If you insist using a while loop, you can convert it to a list first:
var is = stack1.toList
while( is.nonEmpty ) {
println( is.head )
is = is.tail
}
With all these methods, the original stack will be preserved.
Related
I've found a ton of questions on "retained size" and the accepted answer seems to be:
The retained size for an object is the quantity of memory this objects preserves from garbage collection.
Now, I've been working on programmatic computation of the retained size in a hprof file (as defined here), using the Netbeans profiler library (the retained size calculation is done in HprofHeap.java). Works just fine (sorry, used kotlin for brevity):
val heap: Heap = HeapFactory.createHeap(myHeap.toFile())
val threadClass: JavaClass = heap.getJavaClassByName("java.lang.Thread")
val instanceFilter = { it: Instance -> threadClass == it.getJavaClass() }
val sizeMap = heap.allInstances
.filter { instanceFilter(it) }
.toMap({ findThreadName(it) /* not shown */ }, { it.retainedSize })
What I noticed when the sizeMap had only marginal numbers of retained sizes is that Netbeans computes retained sizes only for objects that are not on the stack. So local variables (allocated on the stack) assigned to the Thread would not be included in the retained size.
My question is: is there a way to make the netbeans library consider the stack elements as dependent objects the way for example the Yourkit Profiler does it's calculation? How would I go about adding such a feature if the answer to the previous question is "no"?
A bit of digging found that the JVM heap dumper creates an entry of type ROOT JAVA FRAME for a stack local variable (compare VM_HeapDumper::do_thread). Since I can grep for that in the heap, here's what I did:
val threadClass: JavaClass = heap.getJavaClassByName("java.lang.Thread")
val keyTransformer = { it: Instance -> findThreadName(it) }
val instanceFilter = { it: Instance -> it.getJavaClass() == threadClass }
val stackLocals = heap.gcRoots
.filter { it.kind == GCRoot.JAVA_FRAME }
.groupBy { (it as JavaFrameGCRoot).threadGCRoot }
val sizeMap = heap.allInstances
.filter { instanceFilter(it) }
.toMap(
{ keyTransformer(it) },
{
val locals = stackLocals[heap.getGCRoot(it)]
val localSize = locals!!.sumBy { it.instance.retainedSize.toInt() }
it.retainedSize + localSize
})
return Report(
sizeMap.values.sum(),
sizeMap.keys.size.toLong(),
sizeMap.maxBy { it.value }?.let { it.toPair() } ?: ("n/a" to 0L))
This solution is based on finding the GC root for each thread (should be the Thread itself), then sort to the stored gc root of the JAVA FRAME (the thread [= GC root] id is part of the stored entry data).
There's still a slight difference compared to the values from Yourkit, probably due to me missing ROOT JNI LOCAL entities, but it's close enough for me.
I look for a way to retrieve the first elements of a DStream created as:
val dstream = ssc.textFileStream(args(1)).map(x => x.split(",").map(_.toDouble))
Unfortunately, there is no take function (as on RDD) on a dstream //dstream.take(2) !!!
Could someone has any idea on how to do it ?! thanks
You can use transform method in the DStream object then take n elements of the input RDD and save it to a list, then filter the original RDD to be contained in this list. This will return a new DStream contains n elements.
val n = 10
val partOfResult = dstream.transform(rdd => {
val list = rdd.take(n)
rdd.filter(list.contains)
})
partOfResult.print
The previous suggested solution did not compile for me as the take() method returns an Array, which is not serializable thus Spark streaming will fail with a java.io.NotSerializableException.
A simple variation on the previous code that worked for me:
val n = 10
val partOfResult = dstream.transform(rdd => {
rdd.filter(rdd.take(n).toList.contains)
})
partOfResult.print
Sharing a java based solution that is working for me. The idea is to use a custom function, which can send the top row from a sorted RDD.
someData.transform(
rdd ->
{
JavaRDD<CryptoDto> result =
rdd.keyBy(Recommendations.volumeAsKey)
.sortByKey(new CryptoComparator()).values().zipWithIndex()
.map(row ->{
CryptoDto purchaseCrypto = new CryptoDto();
purchaseCrypto.setBuyIndicator(row._2 + 1L);
purchaseCrypto.setName(row._1.getName());
purchaseCrypto.setVolume(row._1.getVolume());
purchaseCrypto.setProfit(row._1.getProfit());
purchaseCrypto.setClose(row._1.getClose());
return purchaseCrypto;
}
).filter(Recommendations.selectTopinSortedRdd);
return result;
}).print();
The custom function selectTopinSortedRdd looks like below:
public static Function<CryptoDto, Boolean> selectTopInSortedRdd = new Function<CryptoDto, Boolean>() {
private static final long serialVersionUID = 1L;
#Override
public Boolean call(CryptoDto value) throws Exception {
if (value.getBuyIndicator() == 1L) {
System.out.println("Value of buyIndicator :" + value.getBuyIndicator());
return true;
}
else {
return false;
}
}
};
It basically compares all incoming elements, and returns true only for the first record from the sorted RDD.
This seems to be always an issue with DStreams as well as regular RDDs.
If you don't want (or can't) to use .take() (especially in DStreams) you can think outside the box here and just use reduce instead. That is a valid function for both DStreams as well as RDD's.
Think about it. If you use reduce like this (Python example):
.reduce( lambda x, y : x)
Then what happens is: For every 2 elements you pass in, always return only the first. So if you have a million elements in your RDD or DStream it will shrink it to one element in the end which is the very first one in your RDD or DStream.
Simple and clean.
However keep in mind that .reduce() does not take order into consideration. However you can easily overcome this with a custom function instead.
Example: Let's assume your data looks like this x = (1, [1,2,3]) and y = (2, [1,2]). A tuple x where the 2nd element is a list. If you are sorting by the longest list for example then your code could look like below maybe (adapt as needed):
def your_reduce(x,y):
if len(x[1]) > len(y[1]):
return x
else:
return y
yourNewRDD = yourOldRDD.reduce(your_reduce)
Accordingly you will get '(1, [1,2,3])' as that has the longer list. There you go!
This has caused me some headaches in the past until I finally tried this. Hopefully this helps.
I have an instance of ByteString. To read data from it I should use it's iterator() method.
I read some data and then I decide than I need to create a view (separate iterator of some chunk of data).
I can't use slice() of original iterator, because that would make it unusable, because docs says that:
After calling this method, one should discard the iterator it was called on, and use only the iterator that was returned. Using the old
iterator is undefined, subject to change, and may result in changes to
the new iterator as well.
So, it seems that I need to call slice() on ByteString. But slice() has from and until parameters and I don't know from. I need something like this:
ByteString originalByteString = ...; // <-- This is my input data
ByteIterator originalIterator = originalByteString .iterator();
...
read some data from originalIterator
...
int length = 100; // < -- Size of the view
int from = originalIterator.currentPosition(); // <-- I need this
int until = from + length;
ByteString viewOfOriginalByteString = originalByteString.slice(from, until);
ByteIterator iteratorForView = viewOfOriginalByteString.iterator(); // <-- This is my goal
Update:
Tried to do this with duplicate():
ByteIterator iteratorForView = originalIterator.duplicate()._2.take(length);
ByteIterator's from field is private, and none of the methods seems to simply return it. All I can suggest is to use originalIterator.duplicate to get a safe copy, or else to "cheat" by using reflection to read the from field, assuming reflection is available in your deployment environment.
I have standard list of objects which is used for the some analysis. The analysis generates a list of Strings and i need to look through the standard list of objects and retrieve objects with same name.
case class TestObj(name:String,positions:List[Int],present:Boolean)
val stdLis:List[TestObj]
//analysis generates a list of strings
var generatedLis:List[String]
//list to save objects found in standard list
val lisBuf = new ListBuffer[TestObj]()
//my current way
generatedLis.foreach{i=>
val temp = stdLis.filter(p=>p.name.equalsIgnoreCase(i))
if(temp.size==1){
lisBuf.append(temp(0))
}
}
Is there any other way to achieve this. Like having an custom indexof method that over rides and looks for the name instead of the whole object or something. I have not tried that approach as i am not sure about it.
stdLis.filter(testObj => generatedLis.exists(_.equalsIgnoreCase(testObj.name)))
use filter to filter elements from 'stdLis' per predicate
use exists to check if 'generatedLis' has a value of ....
Don't use mutable containers to filter sequences.
Naive solution:
val lisBuf =
for {
str <- generatedLis
temp = stdLis.filter(_.name.equalsIgnoreCase(str))
if temp.size == 1
} yield temp(0)
if we discard condition temp.size == 1 (i'm not sure it is legal or not):
val lisBuf = stdLis.filter(s => generatedLis.exists(_.equalsIgnoreCase(s.name)))
I have a mutable variable in my code that I want to avoid by using some of aggregation function. Unfortunatelly I couldn't find solution for the following pseudocode.
def someMethods(someArgs) = {
var someMutableVariable = factory
val resources = getResourcesForVariable(someMutableVariable)
resources foreach (resource => {
val localTempVariable = getSomeOtherVariable(resource)
someMutableVariable = chooseBetteVariable(someMutableVariable, localTempVariable)
})
someMutableVariable
}
I have two places in my code where I need to build some variable, then in loop compare it with other possibilities and if it worse then replace it with this newly possibility.
If you resources variable supports it:
//This is the "currently best" and "next" in list being folded over
resources.foldLeft(factory)((cur, next) =>
val local = getSomeOther(next)
//Since this function returns the "best" between the two, you're solid
chooseBetter(local, cur)
}
and then you don't have mutable state.