Adding a MMIO peripheral to a small rocket core - rocket-chip

I have successfully added and simulated my MMIO perihperal coupled to a normal sized rocket core before.
But now I want to try to add it to a small core (so called TinyCore), and this is the part where I am having problems. Also, just in case it is relevant, the conexions with my peripheral are all trough FIFOs.
First, the error I am getting when trying to generate the design:
[error] java.lang.IllegalArgumentException: requirement failed: Ports cannot overlap: AddressSet(0x80000000, 0x3fff) AddressSet(0x80000000, 0xfffffff)
I imagine this comes from the fact that the small rocket config has a different memory map, which I don't know, and I am trying to add the peripheral to an address that doesn't exist in this configuration.
Here it is the configuration I am using:
class myTinyRocketConfig2 extends Config(
new freechips.rocketchip.subsystem.WithInclusiveCache(nBanks=1, nWays=4, capacityKB=128) ++
new freechips.rocketchip.subsystem.With1TinyCore ++ // single tiny rocket-core
new chipyard.config.AbstractConfig)
And this is how I added the peripheral, it shows the address and some other parameters:
class TLTxWriteQueue
(
depth: Int = 4,
csrAddress: AddressSet = AddressSet(0x2000, 0xff),
beatBytes: Int = 4,
)(implicit p: Parameters) extends TxWriteQueue(depth) with TLHasCSR {
val devname = "tlQueueIn"
val devcompat = Seq("ucb-art", "dsptools")
val device = new SimpleDevice(devname, devcompat) {
override def describe(resources: ResourceBindings): Description = {
val Description(name, mapping) = super.describe(resources)
Description(name, mapping)
}
}
// make diplomatic TL node for regmap
override val mem = Some(TLRegisterNode(address = Seq(csrAddress), device = device, beatBytes = beatBytes))
}
I apologize in advance for any stupid mistake, as I am a beginner trying to go trough with his first project. Thanks

The the Rocket TinyCore uses a default scratchpad instead of a backing memory. This scratchpad 0x80000000 to 0x80003fff is overlapping with the memport's address range.
You'll have to remove the memport. This is what chipyard's TinyRocketConfig does. This config should generate a design (just without an L2 Cache or backing memory).
class TinyRocketConfig extends Config(
new chipyard.config.WithTLSerialLocation(
freechips.rocketchip.subsystem.FBUS,
freechips.rocketchip.subsystem.PBUS) ++ // attach TL serial adapter to f/p busses
new chipyard.WithMulticlockIncoherentBusTopology ++ // use incoherent bus topology
new freechips.rocketchip.subsystem.WithNBanks(0) ++ // remove L2$
new freechips.rocketchip.subsystem.WithNoMemPort ++ // remove backing memory
new freechips.rocketchip.subsystem.With1TinyCore ++ // single tiny rocket-core
new chipyard.config.AbstractConfig)
If you wanted to include an InclusiveCache in your design, you can try using a modified version of chipyard's TinyRocketConfig. Though currently, it doesn't seem like you're addressing the entire L2 Cache, and I think it's microarchitecturally unused with TinyCore. If you simply need a larger scratchpad, you can modify the scratchpad to contain more sets:
class WithModifiedScratchPad extends Config((site, here, up) => {
case RocketTilesKey => up(RocketTilesKey, site) map { r =>
// each set is currently 64 bytes
r.copy(dcache = r.dcache.map(_.copy(nSets = 2048 /*128KiB scratchpad*/))) }
})

Related

Chisel persist value in module until new write

I have created a basic module that is meant to represent a unit of memory in Chisel3:
class MemristorCellBundle() extends Bundle {
val writeBus = Input(UInt(1.W))
val dataBus = Input(UInt(8.W))
val cellBus = Output(UInt(8.W))
}
class MemCell() extends Module {
val io = IO(new MemCellBundle())
val write = Wire(UInt())
write := io.voltageBus
val internalValue = Reg(UInt())
// More than 50% of total voltage in (255).
when(write === 1.U) {
internalValue := io.dataBus
io.cellBus := io.dataBus
} .otherwise {
io.cellBus := internalValue
}
}
What I want is for it to output the internalValue when the write bus is logic LOW, and change this value with logic HIGH. My understanding of Chisel is that the register can persist this internalValue between clock cycles, so that this basically acts as a single unit of memory.
I'm doing it in this way as part of a larger project. However when writing a unit test I am finding that the 'read-after-write' scenario fails.
class MemCellTest extends FlatSpec with ChiselScalatestTester with Matchers {
behavior of "MemCell"
it should "read and write" in {
test(new MemCell()) { c =>
c.io.dataBus.poke(5.U)
c.io.write.poke(0.U)
c.io.cellBus.expect(0.U)
// Write
c.io.dataBus.poke(5.U)
c.io.write.poke(1.U)
c.io.cellBus.expect(5.U)
// Verify read-after-write
c.io.dataBus.poke(12.U)
c.io.write.poke(0.U)
c.io.cellBus.expect(5.U)
}
}
}
The first two expectations work just as I would expect. However, when I try to read after writing, the cellBus returns to 0 instead of persisting the 5 that I had written previously.
test MemCell Success: 0 tests passed in 1 cycles in 0.035654 seconds 28.05 Hz
[info] MemCellTest:
[info] MemCell
[info] - should read and write *** FAILED ***
[info] io_cellBus=0 (0x0) did not equal expected=5 (0x5) (lines in MyTest.scala: 10) (MyTest.scala:21)
Clearly the register is not keeping this value, and so internalValue reverts to 0. But why does this happen, and how would I be able to create a value that can persist?
Drakinite's comment is correct. You need to make sure to step the clock in order to see the register latch the value. I tweaked your test to include a couple of steps and it works as expected:
c.io.dataBus.poke(5.U)
c.io.writeBus.poke(0.U)
c.io.cellBus.expect(0.U)
c.clock.step() // Added step
// Write passthrough (same cycle)
c.io.dataBus.poke(5.U)
c.io.writeBus.poke(1.U)
c.io.cellBus.expect(5.U)
c.clock.step() // Added step
// Verify read-after-write
c.io.dataBus.poke(12.U)
c.io.writeBus.poke(0.U)
c.io.cellBus.expect(5.U)
Here's an executable example showing that this works (using chisel3 v3.4.4 and chiseltest v0.3.4): https://scastie.scala-lang.org/5E1rOEsYSzSUrLXZCvoyNA

Next steps for debugging lmdbjni access violation

I am using this fork of LMDBjni:https://github.com/deephacks/lmdbjni to
form the backend of a medium-sized databases project in scala.
I've been hitting a EXCEPTION_ACCESS_VIOLATION (0xc0000005) in the JNI code for this LMDB, and would like to know whether there is anything obvious I'm doing wrong or what the next steps for debugging should be. I'm not exactly sure what I'm looking for, so I'm going to list as much information about what's happening as I can and hope the symptoms make sense to somebody.
The access violation occurs in a Database with a single key mapping to a single 8-byte value, on approximately the 4000th access to that database (the exact number appears to be the same with each run), suggesting that this is a deterministic problem.
I believe I only have one thread accessing the database at a time, and regardless, in my understanding, as the operation is wrapped in a transaction, concurrent accesses should not matter anyway.
By looking through stack traces and printing values the issue comes from this generic construction I wrote for building transactions.
My code that causes the issue is here, the crash occurs in the marked db.get() call:
def transactionalGetAndSet[A](
key: Key,
db: Database
)(
compute: A => LMDBEither[A]
)(
implicit sa: Storeable[A],
env: Env
): LMDBEither[A] = {
import org.fusesource.lmdbjni.Transaction
// get a new transaction
val tx: Transaction = instance.env.createWriteTransaction()
println("tx = " + tx + " id = " + tx.getId)
// get the key as an Array[Byte]. This is done by converting the key to a base64 string then converting that to bytes (so arbitary objects can be made into keys)
val k = key.render
println("Key = " + key + " Rendered = " + new String(k))
// instantiate a result value, so there is something if it fails
var res: LMDBEither[A] = NoResult.left // initialise the result as a failure to begin with
try {
res = for { // This for construction chains together operations that return LMDBEithers into one LMDBEither
bytes <- LMDBEither(db.get(tx, k)) // error occurs in this Database.get() call
_ = println("bytes = " + bytes)
a <- sa.fromBytes(safeRetrieve(bytes)) // sa is effectively an unmarshaller/unmarshaller object which converts Vector[Byte] => LMDBEither[A]
_ = println("a = " + a)
res <- compute(a) // get the next value for the value at the key
_ = println("res = " + res)
_ <- LMDBEither(db.put(tx, k, sa.toBytes(res).toArray))
} yield a // effectively, if all these steps worked, res == Right(a)
res // return the result
} finally {
// Make sure you either commit or rollback to avoid resource leaks.
if (res.isRight) tx.commit() // if the result is not an error (ie Either.isRight is true)
else tx.abort()
tx.close()
}
}
Where LMDBEither[A] is an alias for Either[E, A] for a specific error type E, and LMDBEither(x) is a function that lifts an expression that might throw exceptions during execution into an LMDBEither, catching any exceptions.
the function safeRetrieve converts a possibly null Array[Byte] into a definitely not null Vector[Byte], as follows:
private def safeRetrieve(bytes: Array[Byte]): Vector[Byte] =
Option(bytes).fold(Vector[Byte]()){ // if the array is null, convert to an empty vector, otherwise call the array's wrapper's vector
arr =>
println("Vector = " + arr.toVector)
arr.toVector
}
To the best of my knowledge, this does not modify the memory where the array is stored (LMDB's protected memory)
The values printed up to and including the crash are as follows:
tx = org.fusesource.lmdbjni.Transaction#391cec1f id = 15104
Key = Vector(Objects) Rendered = 84507411390877848991196161
#
# A fatal error has been detected by the Java Runtime Environment:
#
# EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x000000018002453f, pid=10220, tid=7268
#
# JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build 1.8.0_60-b27)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode windows-amd64 compressed oops)
# Problematic frame:
# C [lmdbjni-64-0-7710432736670562378.4+0x2453f]
#
# Failed to write core dump. Minidumps are not enabled by default on client versions of Windows
#
# An error report file with more information is saved as:
# C:\dev\PartIIProject\hs_err_pid10220.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
Which is more evidence that the mdb_get() fails.
the full contents of the file referenced are here: https://pastebin.com/v6AmFBjq
Again, I would be extremely grateful for the small chance that anyone could point me in the right direction. What next steps should I be taking?

Flink: ProcessWindowFunction

I am recently studying ProcessWindowFunction in Flink's new release. It says the ProcessWindowFunction supports global state and window state. I use Scala API to give it a try. I can so far get the global state working but I do no have any luck to make it for the window state. What I'm doing is to process system logs and count the number of logs keyed by hostname and severity level. I would like to calculate the difference in log count between two adjacent windows. Here is my code implementing ProcessWindowFunction.
class LogProcWindowFunction extends ProcessWindowFunction[LogEvent, LogEvent, Tuple, TimeWindow] {
// Create a descriptor for ValueState
private final val valueStateWindowDesc = new ValueStateDescriptor[Long](
"windowCounters",
createTypeInformation[Long])
private final val reducingStateGlobalDesc = new ReducingStateDescriptor[Long](
"globalCounters",
new SumReduceFunction(),
createTypeInformation[Long])
override def process(key: Tuple, context: Context, elements: Iterable[LogEvent], out: Collector[LogEvent]): Unit = {
// Initialize the per-key and per-window ValueState
val valueWindowState = context.windowState.getState(valueStateWindowDesc)
val reducingGlobalState = context.globalState.getReducingState(reducingStateGlobalDesc)
val latestWindowCount = valueWindowState.value()
println(s"lastWindowCount: $latestWindowCount ......")
val latestGlobalCount = if (reducingGlobalState.get() == null) 0L else reducingGlobalState.get()
// Compute the necessary statistics and determine if we should launch an alarm
val eventCount = elements.size
// Update the related state
valueWindowState.update(eventCount.toLong)
reducingGlobalState.add(eventCount.toLong)
for (elem <- elements) {
out.collect(elem)
}
}
}
I always get 0 value from the window state instead of the previous updated count it should be. I've been struggling with such problem for several days. Can someone please help me to figure it out? Thanks.
The scope of the per-window state is a single window instance. In the case of your process method above, every time it is called a new window is in scope, and so the latestWindowCount is always zero.
For a normal, vanilla window that is only going to fire once, per-window state is useless. Only if a window somehow has multiple firings (e.g., late firings) can you make good use of the per-window state. If you are trying to remember something from one window to the next, then you can do this with the global window state.
For an example of using per-window state to remember data to use in late firings, see slides 13-19 in Flink's advanced window training.

Retained size calculation including stack frame variables?

I've found a ton of questions on "retained size" and the accepted answer seems to be:
The retained size for an object is the quantity of memory this objects preserves from garbage collection.
Now, I've been working on programmatic computation of the retained size in a hprof file (as defined here), using the Netbeans profiler library (the retained size calculation is done in HprofHeap.java). Works just fine (sorry, used kotlin for brevity):
val heap: Heap = HeapFactory.createHeap(myHeap.toFile())
val threadClass: JavaClass = heap.getJavaClassByName("java.lang.Thread")
val instanceFilter = { it: Instance -> threadClass == it.getJavaClass() }
val sizeMap = heap.allInstances
.filter { instanceFilter(it) }
.toMap({ findThreadName(it) /* not shown */ }, { it.retainedSize })
What I noticed when the sizeMap had only marginal numbers of retained sizes is that Netbeans computes retained sizes only for objects that are not on the stack. So local variables (allocated on the stack) assigned to the Thread would not be included in the retained size.
My question is: is there a way to make the netbeans library consider the stack elements as dependent objects the way for example the Yourkit Profiler does it's calculation? How would I go about adding such a feature if the answer to the previous question is "no"?
A bit of digging found that the JVM heap dumper creates an entry of type ROOT JAVA FRAME for a stack local variable (compare VM_HeapDumper::do_thread). Since I can grep for that in the heap, here's what I did:
val threadClass: JavaClass = heap.getJavaClassByName("java.lang.Thread")
val keyTransformer = { it: Instance -> findThreadName(it) }
val instanceFilter = { it: Instance -> it.getJavaClass() == threadClass }
val stackLocals = heap.gcRoots
.filter { it.kind == GCRoot.JAVA_FRAME }
.groupBy { (it as JavaFrameGCRoot).threadGCRoot }
val sizeMap = heap.allInstances
.filter { instanceFilter(it) }
.toMap(
{ keyTransformer(it) },
{
val locals = stackLocals[heap.getGCRoot(it)]
val localSize = locals!!.sumBy { it.instance.retainedSize.toInt() }
it.retainedSize + localSize
})
return Report(
sizeMap.values.sum(),
sizeMap.keys.size.toLong(),
sizeMap.maxBy { it.value }?.let { it.toPair() } ?: ("n/a" to 0L))
This solution is based on finding the GC root for each thread (should be the Thread itself), then sort to the stored gc root of the JAVA FRAME (the thread [= GC root] id is part of the stored entry data).
There's still a slight difference compared to the values from Yourkit, probably due to me missing ROOT JNI LOCAL entities, but it's close enough for me.

How to generate a big data stream on the fly

I have to generate a big file on the fly. Reading to the database and send it to the client.
I read some documentation and i did this
val streamContent: Enumerator[Array[Byte]] = Enumerator.outputStream {
os =>
// new PrintWriter() read from database and for each record
// do some logic and write
// to outputstream
}
Ok.stream(streamContent.andThen(Enumerator.eof)).withHeaders(
CONTENT_DISPOSITION -> s"attachment; filename=someName.csv"
)
Im rather new to scala in general only a week so don't guide for my reputation.
My questions are :
1) Is this the best way? I found this if i have a big file, this will load in memory, and also don't know what is the chunk size in this case, if it will send for each write() is not to convenient.
2) I found this method Enumerator.fromStream(data : InputStream, chunkedSize : int) a little better cause it has a chunk-size, but i don't have an inputStream cause im creating the file on the fly.
There's a note in the docs for Enumerator.outputStream:
Not [sic!] that calls to write will not block, so if the iteratee that is being fed to is slow to consume the input, the OutputStream will not push back. This means it should not be used with large streams since there is a risk of running out of memory.
If this can happen depends on your situation. If you can and will generate Gigabytes in seconds, you should probably try something different. I'm not exactly sure what, but I'd start at Enumerator.generateM(). For many cases though, your method is perfectly fine. Have a look at this example by Gaƫtan Renaudeau for serving a Zip file that's generated on the fly in the same way you're using it:
val enumerator = Enumerator.outputStream { os =>
val zip = new ZipOutputStream(os);
Range(0, 100).map { i =>
zip.putNextEntry(new ZipEntry("test-zip/README-"+i+".txt"))
zip.write("Here are 100000 random numbers:\n".map(_.toByte).toArray)
// Let's do 100 writes of 1'000 numbers
Range(0, 100).map { j =>
zip.write((Range(0, 1000).map(_=>r.nextLong).map(_.toString).mkString("\n")).map(_.toByte).toArray);
}
zip.closeEntry()
}
zip.close()
}
Ok.stream(enumerator >>> Enumerator.eof).withHeaders(
"Content-Type"->"application/zip",
"Content-Disposition"->"attachment; filename=test.zip"
)
Please keep in mind that Ok.stream has been replaced by Ok.chunked in newer versions of Play, in case you want to upgrade.
As for the chunk size, you can always use Enumeratee.grouped to gather a bunch of values and send them as one chunk.
val grouper = Enumeratee.grouped(
Traversable.take[Array[Double]](100) &>> Iteratee.consume()
)
Then you'd do something like
Ok.stream(enumerator &> grouper >>> Enumerator.eof)