Retained size calculation including stack frame variables? - netbeans

I've found a ton of questions on "retained size" and the accepted answer seems to be:
The retained size for an object is the quantity of memory this objects preserves from garbage collection.
Now, I've been working on programmatic computation of the retained size in a hprof file (as defined here), using the Netbeans profiler library (the retained size calculation is done in HprofHeap.java). Works just fine (sorry, used kotlin for brevity):
val heap: Heap = HeapFactory.createHeap(myHeap.toFile())
val threadClass: JavaClass = heap.getJavaClassByName("java.lang.Thread")
val instanceFilter = { it: Instance -> threadClass == it.getJavaClass() }
val sizeMap = heap.allInstances
.filter { instanceFilter(it) }
.toMap({ findThreadName(it) /* not shown */ }, { it.retainedSize })
What I noticed when the sizeMap had only marginal numbers of retained sizes is that Netbeans computes retained sizes only for objects that are not on the stack. So local variables (allocated on the stack) assigned to the Thread would not be included in the retained size.
My question is: is there a way to make the netbeans library consider the stack elements as dependent objects the way for example the Yourkit Profiler does it's calculation? How would I go about adding such a feature if the answer to the previous question is "no"?

A bit of digging found that the JVM heap dumper creates an entry of type ROOT JAVA FRAME for a stack local variable (compare VM_HeapDumper::do_thread). Since I can grep for that in the heap, here's what I did:
val threadClass: JavaClass = heap.getJavaClassByName("java.lang.Thread")
val keyTransformer = { it: Instance -> findThreadName(it) }
val instanceFilter = { it: Instance -> it.getJavaClass() == threadClass }
val stackLocals = heap.gcRoots
.filter { it.kind == GCRoot.JAVA_FRAME }
.groupBy { (it as JavaFrameGCRoot).threadGCRoot }
val sizeMap = heap.allInstances
.filter { instanceFilter(it) }
.toMap(
{ keyTransformer(it) },
{
val locals = stackLocals[heap.getGCRoot(it)]
val localSize = locals!!.sumBy { it.instance.retainedSize.toInt() }
it.retainedSize + localSize
})
return Report(
sizeMap.values.sum(),
sizeMap.keys.size.toLong(),
sizeMap.maxBy { it.value }?.let { it.toPair() } ?: ("n/a" to 0L))
This solution is based on finding the GC root for each thread (should be the Thread itself), then sort to the stored gc root of the JAVA FRAME (the thread [= GC root] id is part of the stored entry data).
There's still a slight difference compared to the values from Yourkit, probably due to me missing ROOT JNI LOCAL entities, but it's close enough for me.

Related

Adding a MMIO peripheral to a small rocket core

I have successfully added and simulated my MMIO perihperal coupled to a normal sized rocket core before.
But now I want to try to add it to a small core (so called TinyCore), and this is the part where I am having problems. Also, just in case it is relevant, the conexions with my peripheral are all trough FIFOs.
First, the error I am getting when trying to generate the design:
[error] java.lang.IllegalArgumentException: requirement failed: Ports cannot overlap: AddressSet(0x80000000, 0x3fff) AddressSet(0x80000000, 0xfffffff)
I imagine this comes from the fact that the small rocket config has a different memory map, which I don't know, and I am trying to add the peripheral to an address that doesn't exist in this configuration.
Here it is the configuration I am using:
class myTinyRocketConfig2 extends Config(
new freechips.rocketchip.subsystem.WithInclusiveCache(nBanks=1, nWays=4, capacityKB=128) ++
new freechips.rocketchip.subsystem.With1TinyCore ++ // single tiny rocket-core
new chipyard.config.AbstractConfig)
And this is how I added the peripheral, it shows the address and some other parameters:
class TLTxWriteQueue
(
depth: Int = 4,
csrAddress: AddressSet = AddressSet(0x2000, 0xff),
beatBytes: Int = 4,
)(implicit p: Parameters) extends TxWriteQueue(depth) with TLHasCSR {
val devname = "tlQueueIn"
val devcompat = Seq("ucb-art", "dsptools")
val device = new SimpleDevice(devname, devcompat) {
override def describe(resources: ResourceBindings): Description = {
val Description(name, mapping) = super.describe(resources)
Description(name, mapping)
}
}
// make diplomatic TL node for regmap
override val mem = Some(TLRegisterNode(address = Seq(csrAddress), device = device, beatBytes = beatBytes))
}
I apologize in advance for any stupid mistake, as I am a beginner trying to go trough with his first project. Thanks
The the Rocket TinyCore uses a default scratchpad instead of a backing memory. This scratchpad 0x80000000 to 0x80003fff is overlapping with the memport's address range.
You'll have to remove the memport. This is what chipyard's TinyRocketConfig does. This config should generate a design (just without an L2 Cache or backing memory).
class TinyRocketConfig extends Config(
new chipyard.config.WithTLSerialLocation(
freechips.rocketchip.subsystem.FBUS,
freechips.rocketchip.subsystem.PBUS) ++ // attach TL serial adapter to f/p busses
new chipyard.WithMulticlockIncoherentBusTopology ++ // use incoherent bus topology
new freechips.rocketchip.subsystem.WithNBanks(0) ++ // remove L2$
new freechips.rocketchip.subsystem.WithNoMemPort ++ // remove backing memory
new freechips.rocketchip.subsystem.With1TinyCore ++ // single tiny rocket-core
new chipyard.config.AbstractConfig)
If you wanted to include an InclusiveCache in your design, you can try using a modified version of chipyard's TinyRocketConfig. Though currently, it doesn't seem like you're addressing the entire L2 Cache, and I think it's microarchitecturally unused with TinyCore. If you simply need a larger scratchpad, you can modify the scratchpad to contain more sets:
class WithModifiedScratchPad extends Config((site, here, up) => {
case RocketTilesKey => up(RocketTilesKey, site) map { r =>
// each set is currently 64 bytes
r.copy(dcache = r.dcache.map(_.copy(nSets = 2048 /*128KiB scratchpad*/))) }
})

Chisel persist value in module until new write

I have created a basic module that is meant to represent a unit of memory in Chisel3:
class MemristorCellBundle() extends Bundle {
val writeBus = Input(UInt(1.W))
val dataBus = Input(UInt(8.W))
val cellBus = Output(UInt(8.W))
}
class MemCell() extends Module {
val io = IO(new MemCellBundle())
val write = Wire(UInt())
write := io.voltageBus
val internalValue = Reg(UInt())
// More than 50% of total voltage in (255).
when(write === 1.U) {
internalValue := io.dataBus
io.cellBus := io.dataBus
} .otherwise {
io.cellBus := internalValue
}
}
What I want is for it to output the internalValue when the write bus is logic LOW, and change this value with logic HIGH. My understanding of Chisel is that the register can persist this internalValue between clock cycles, so that this basically acts as a single unit of memory.
I'm doing it in this way as part of a larger project. However when writing a unit test I am finding that the 'read-after-write' scenario fails.
class MemCellTest extends FlatSpec with ChiselScalatestTester with Matchers {
behavior of "MemCell"
it should "read and write" in {
test(new MemCell()) { c =>
c.io.dataBus.poke(5.U)
c.io.write.poke(0.U)
c.io.cellBus.expect(0.U)
// Write
c.io.dataBus.poke(5.U)
c.io.write.poke(1.U)
c.io.cellBus.expect(5.U)
// Verify read-after-write
c.io.dataBus.poke(12.U)
c.io.write.poke(0.U)
c.io.cellBus.expect(5.U)
}
}
}
The first two expectations work just as I would expect. However, when I try to read after writing, the cellBus returns to 0 instead of persisting the 5 that I had written previously.
test MemCell Success: 0 tests passed in 1 cycles in 0.035654 seconds 28.05 Hz
[info] MemCellTest:
[info] MemCell
[info] - should read and write *** FAILED ***
[info] io_cellBus=0 (0x0) did not equal expected=5 (0x5) (lines in MyTest.scala: 10) (MyTest.scala:21)
Clearly the register is not keeping this value, and so internalValue reverts to 0. But why does this happen, and how would I be able to create a value that can persist?
Drakinite's comment is correct. You need to make sure to step the clock in order to see the register latch the value. I tweaked your test to include a couple of steps and it works as expected:
c.io.dataBus.poke(5.U)
c.io.writeBus.poke(0.U)
c.io.cellBus.expect(0.U)
c.clock.step() // Added step
// Write passthrough (same cycle)
c.io.dataBus.poke(5.U)
c.io.writeBus.poke(1.U)
c.io.cellBus.expect(5.U)
c.clock.step() // Added step
// Verify read-after-write
c.io.dataBus.poke(12.U)
c.io.writeBus.poke(0.U)
c.io.cellBus.expect(5.U)
Here's an executable example showing that this works (using chisel3 v3.4.4 and chiseltest v0.3.4): https://scastie.scala-lang.org/5E1rOEsYSzSUrLXZCvoyNA

Iterator[Something] to Iterator[Seq[Something]]

I need to process a "big" file (something that does not fit in memory).
I want to batch-process the data. Let's say for the example that I want to insert them into a database. But since it is too big to fit in memory, it is too slow too to process elements one-by-one.
So I'l like to go from an Iterator[Something] to an Iterator[Iterable[Something]] to batch elements.
Starting with this:
CSVReader.open(new File("big_file"))
.iteratorWithHeaders
.map(Something.parse)
.foreach(Jdbi.insertSomething)
I could do something dirty in the foreach statement with mutable sequences and flushes every x elements but I'm sure there is a smarter way to do this...
// Yuk... :-(
val buffer = ArrayBuffer[Something]()
CSVReader.open(new File("big_file"))
.iteratorWithHeaders
.map(Something.parse)
.foreach {
something =>
buffer.append(something)
if (buffer.size == 1000) {
Jdbi.insertSomethings(buffer.toList)
buffer.clear()
}
}
Jdbi.insertSomethings(buffer.toList)
If your batches can have a fixed size (as in your example), the grouped method on Scala's Iterator does exactly what you want:
val iterator = Iterator.continually(1)
iterator.grouped(10000).foreach(xs => println(xs.size))
This will run in a constant amount of memory (not counting whatever text in stored by your terminal in memory, of course).
I'm not sure what your iteratorWithHeaders returns, but if it's a Java iterator, you can convert it to a Scala one like this:
import scala.collection.JavaConverters.
val myScalaIterator: Iterator[Int] = myJavaIterator.asScala
This will remain appropriately lazy.
If I undestood correctly your problem, you can just use Iterator.grouped. So adapting a little bit your example:
val si: Iterator[Something] = CSVReader.open(new File("big_file"))
.iteratorWithHeaders
.map(Something.parse)
val gsi: GroupedIterator[Something] = si.grouped(1000)
gsi.foreach { slst: List[Something] =>
Jdbi.insertSomethings(slst)
}

scala read large files

Hello I am looking for fastest bat rather hi-level way to work with large data collection.
My task consist of two task read alot of large files in memory and then make some statistical calculations (the easiest way to work with data in this task is random access array ).
My first approach was to use java.io.ByteArrayOutputStream, becuase it can resize it's internal storage .
def packTo(buf:java.io.ByteArrayOutputStream,f:File) = {
try {
val fs = new java.io.FileInputStream(f)
IOUtils.copy(fs,buf)
} catch {
case e:java.io.FileNotFoundException =>
}
}
val buf = new java.io.ByteArrayOutputStream()
files foreach { f:File => packTo(buf,f) }
println(buf.size())
for(i <- 0 to buf.size()) {
for(j <- 0 to buf.size()) {
for(k <- 0 to buf.size()) {
// println("i " + i + " " + buf[i] );
// Calculate something amathing using buf[i] buf[j] buf[k]
}
}
}
println("amazing = " + ???)
but ByteArrayOutputStream can't get me as byte[] only copy of it. But I can not allow to have 2 copies of data .
Have you tried scala-io? Should be as simple as Resource.fromFile(f).byteArray with it.
Scala's built in library already provides a nice API to do this
io.Source.fromFile("/file/path").mkString.getBytes
However, it's not often a good idea to load whole file as byte array into memory. Do make sure the largest possible file can still fit into your JVM memory properly.

Scala - getting the contents (not address) of another stack

If I have a stack using the mutable scala stack collection, is there a way I can copy the stack so that I can analyze its elements by popping it without altering the original stack? For instance, suppose I have a stack and code as follows:
import scala.collection.mutable.Stack
var stack1 = new Stack[Int]
/** Code that pushes integers on stack1*/
var stackCopy = stack1
while (!stackCopy.isEmpty) {
println(stackCopy.pop)
}
I want to use a while loop to print all elements in stack1. But when I make a copy and pop off that copy, the original stack (i.e. stack1) is also altered. I want to preserve the original stack, so how can I just get the contents, not the address?
You could use for comprehension:
for( i <- stack1 ) { println(i) }
or simply call the foreach method:
stack1 foreach println
If you insist using a while loop, you can convert it to a list first:
var is = stack1.toList
while( is.nonEmpty ) {
println( is.head )
is = is.tail
}
With all these methods, the original stack will be preserved.