Does cache block get overwritten when tags mismatch? - cpu-architecture

I am trying to understand how CPU cache works. Consider an example from Udacity's High Performance Architecture course. The processor has a 4KByte, direct-mapped, write-back, write-allocate, physically indexed, physically-tagged data Ll cache with 256-byte blocks. The cache starts empty, and the following instructions are executed in a sequence:
LD 0x0000ABFD
ST 0x0000ABFC
ST 0x0000AB7C
LD 0x0000BCEC
LD 0x0000ACEC
ST 0x0000BCEC
Address breakdown for cache access is:
tag (bits 31 through 12)
index (bits 11 through 8)
offset (bits 0 through 7)
Thus, the sequence of cache accesses is the following:
tag |idx|offs| hit or miss
-----------------------------
0000A | B | FD | Cache miss since it's empty
0000A | B | FC | Cache hit since the index and the tag match
0000A | B | 7C | Hit
0000B | C | EC | Miss (tag mismatch)
0000A | C | EC | Miss (tag mismatch)
0000B | С | EC | ?? Hit (0000B|С is still in the cache) ??
It seems to be clear, except the last row. In my understanding when LD 0x0000ACEC accessed cache (and got a miss due to the tag mismatch), it should have updated the cache entry at index C, including the tag, so the subsequent ST 0x0000BCEC should no longer see the tag 0000B, but 0000A and get a cache miss. But it looks like my view is wrong. Could somebody explain what actually happens when accessing an existing cache index with a different tag?
Another question, when we update cache entries do we update the whole cache block or just a value corresponding to the cache block offset? E.g. LD 0x0000AB11 followed by LD 0x0000AB34.
Is there any difference in terms of caching between load and store accesses?

Related

How do I track states across runners in a DataFlow Job?

I'm currently creating a Streaming Dataflow job that only carries out computation if and only if there is an increment in the "Ring" column of my data.
My data flow code
Job= (p | "Read" >> beam.io.ReadFromPubSub(topic=topic)
| "Parse Json" >> beam.Map(json.loads)
| "ParDo Divisors" >> ParDo(UpdateDelayTable()))
Data flowing in from pubsub:
Ring [
{...,"Ring":1},
{...,"Ring":1},
{...,"Ring":1},
{...,"Ring":2}
...]
I want my dataflow to track the current ring number and only triggers a function if and only if the ring number has incremented. How should I go about doing this.
Pub/Sub
There is no guarantee that {"Ring": 2} will definitely be received/sent by Pub/Sub after {"Ring": 1}.
It seems that you have to enable receiving messages in order first for Pub/Sub. And also make sure the Pub/Sub service receives Ring data incrementally.
Dataflow
Then to achieve it with Dataflow, you can use stateful processing.
But be mindful that the "state" of "Ring" is per key (and per window). To do what you want, all the elements need to have the same key and fall into the same window (global window in this case). It's going to be a very "hot" key.
An example code:
from apache_beam.transforms.userstate import ReadModifyWriteStateSpec
from apache_beam.coders import coders
class RingFn(beam.DoFn):
RING_STATE = ReadModifyWriteStateSpec(
name='Ring', coder=coders.VarIntCoder())
def process(self, element, ring=beam.DoFn.StateParam(RING_STATE)):
current_ring = ring.read() or 0
if element['Ring'] > current_ring:
print('Carry out your computation here!')
ring.write(element['Ring'])
# Usage
pcoll | beam.ParDo(RingFn())
# Check your keys if you are not sure what they are.
pcoll | beam.Keys() | beam.Map(print)

Concurrency in Apache JMeter load testing has strange behaviour

I'm doing some loadtesting of an API using a somewhat basic setup in JMeter.
The idea here is that the Thread group spawns a bunch of clients/threads and each of these clients has a bunch of loops which runs in parallel (using the Bzm - parallel controller).
Each loop represents some kind of action that a user can perform and each loop has a Uniform Timer Controller to adjust how often a given action is performed for each client.
One of the actions consists of two calls, first one (1) fetches som id's which are then extracted with a JSON extractor and modified a bit with a BeanShell Post Processor. The result from the post processor is then used as a param for the next call (2).
The issue I'm facing is that in my Summary report there is a lot more results from the first HTTP request (1) showing up than from the second one (2). I would expect them to always be called the same number of times.
My guess is that it all comes down to me lacking some basic understanding of flow and concurrency (and maybe timers) in JMeter, but I have been unable to figure it out, so I need help.
This is the setup, imagine there being multiple loops.
Thread group
+
|
+------ ---+ Parallel controller
| +
| |
| +-----------+ Loop
| +
| +----------+ Transaction
| | +
| | |
| | +---------+ Uniform random timer
| | +
| | |
| | |
| | +
| | (1) HTTP request
| | +
| | +---------+ JSON extractor
+ | | +
| | |
Summary Report | | +
| | BeanShell Post processor
| |
| |
| |
| +
|
| (2) HTTP request
|
|
|
Loop +----------------------------------+
|
|
Ok, so I figured it out. It all comes down to understanding the structure of the tests, diving in to the docs really helped as they are very detailed.
This is the relevant part:
Note that timers are processed before each sampler in the scope in
which they are found; if there are several timers in the same scope,
all the timers will be processed before each sampler. Timers are only
processed in conjunction with a sampler. A timer which is not in the
same scope as a sampler will not be processed at all. To apply a
timer to a single sampler, add the timer as a child element of the
sampler. The timer will be applied before the sampler is executed. To
apply a timer after a sampler, either add it to the next sampler, or
add it as the child of a Flow Control Action Sampler.
https://jmeter.apache.org/usermanual/component_reference.html#timers
Another extremely important thing to understand is that some building blocks (in relation to the tree structure) are hierarchical some are ordered and some are both. This is described in detail here https://jmeter.apache.org/usermanual/test_plan.html#scoping_rules
All in all my issue could be fixed by putting the Uniform random timer as a child of the first HTTP call (1) causing it to only affect that call, or by adding a Flow Control Action as a sibling after the second call (2) and adding the Uniform random timer as a child to that.

How to nicely decouple paging and heap (dynamic memory management) functionalities in OS development?

Background
I am following James Molly's OS tutorial to implement a toy operating system, of which I found the paging and heap code very involved because of their interdependence. For example, paging uses kmalloc provided by the heap because it needs to dynamically allocate space for the page table data structures and must have the virtual address. This is done in the following function call in paging.c:
dir->tables[table_idx] = (page_table_t*)kmalloc_ap(sizeof(page_table_t), &tmp);
Meanwhile, the heap relies on paging to allocate physical frames when it needs to grow, which can be seen in kheap.c:
alloc_frame( get_page(heap->start_address+i, 1, kernel_directory), (heap->supervisor)?1:0, (heap->readonly)?0:1);
They are tightly coupled together within the memory management module, like the following:
other modules
^ ^ ^
| | |
v v v
+-----------------------------------------+
| Memory Management |
| |
| +------------------+ |
| | paging | |
| | +-----+------------+ |
| | | | | |
| +------------+-----+ | |
| | heap | |
| +------------------+ |
+-----------------------------------------+
Question
I am wondering if it is likely to completely decouple paging and the heap. I expect it to be possible because conceptually, I think
Paging can be thought as a address mapping/translation mechanism (probably plus physical frame allocation??).
The heap is about dynamic memory management.
They seem pretty self-contained respectively. Can they be implemented in a decoupled fashion with only unidirectional dependency, for example like the stacking TCP/IP protocols?

Splayed table upsert leading to error: `cast

I built a data loader prototype that saves CSV into splayed tables. The workflow is as follows:
Create schema the first time e.g. volatilitysurface table:
volatilitysurface::([date:`datetime$(); ccypair:`symbol$()] atm_convention:`symbol$(); premium_included:`boolean$(); smile_type:`symbol$(); vs_type:`symbol$(); delta_ratio:`float$(); delta_setting:`float$(); wing_extrapolation:`float$(); spread_type:`symbol$());
For every file in the rawdata folder import it:
myfiles:#[system;"dir /b /o:gn ",string `$getenv[`KDBRAWDATA],"*.volatilitysurface.csv 2> nul";()];
if[myfiles~();.lg.o[`load;"no volatilitysurface files found!"];:0N];
.lg.o[`load;"loading data files ..."];
/ load each file
{
mypath:"" sv (string `$getenv[`KDBRAWDATA];x);
.lg.o[`load;"loading file name '",mypath,"' ..."];
myfile:hsym`$mypath;
tmp1:select date,ccypair,atm_convention,premium_included,smile_type,vs_type,delta_ratio,delta_setting,wing_extrapolation,spread_type from update date:x, premium_included:?[premium_included = `$"true";1b;0b] from ("ZSSSSSFFFS";enlist ",")0:myfile;
`volatilitysurface upsert tmp1;
} #/: myfiles;
delete tmp1 from `.;
.Q.gc[];
.lg.o[`done;"loading volatilitysurface data done"];
.lg.o[`save;"saving volatilitysurface schema to ",string afolder];
volatilitysurface::0!volatilitysurface;
.Q.dpft[afolder;`;`ccypair;`volatilitysurface];
.lg.o[`cleanup;"removing volatilitysurface from memory"];
delete volatilitysurface from `.;
.Q.gc[];
.lg.o[`done;"saving volatilitysurface schema done"];
This works perfectly. I use .Q.gc[]; frequently to avoid hitting the wsfull. When new CSV files are available I open the existing schema, upsert into it and save it again effectively overwriting the existing HDB file system.
Open schema:
.lg.o[`open;"tables already exists, opening the schema ..."];
#[system;"l ",(string afolder) _ 0;{.lg.e[`open;"failed to load hdb directory: ", x]; 'x}];
/ Re-create table index
volatilitysurface::`date`ccypair xkey select from volatilitysurface;
Re-run step #2 to append new CSV files into the existing volatilitysurfacetable, it upserts the first CSV perfectly but the second CSV fails with:
error: `cast
I debug to the point of the error and to double-check I see that the metadata of tmp1 and volatilitysurface are perfectly the same. Any ideas why this is happening? I get the same issue with any other table. I have tried cleaning the keys from the table after every upsert but doesn't help i.e.
volatilitysurface::0!volatilitysurface;
volatilitysurface::`date`ccypair xkey volatilitysurface;
And the metadata comparison at the point of the cast error:
meta tmp1
c | t f a
------------------| -----
date | z
ccypair | s
atm_convention | s
premium_included | b
smile_type | s
vs_type | s
delta_ratio | f
delta_setting | f
wing_extrapolation| f
spread_type | s
meta volatilitysurface
c | t f a
------------------| -----
date | z
ccypair | s p
atm_convention | s
premium_included | b
smile_type | s
vs_type | s
delta_ratio | f
delta_setting | f
wing_extrapolation| f
spread_type | s
UPDATE Using the input of the answer below I tried using Torq's .loader.loadallfiles function like this (it doesn't fail but nothing happens either, the table is not created in memory and the data is not written to the database):
.loader.loadallfiles[`headers`types`separator`tablename`dbdir`dataprocessfunc!(`x`ccypair`atm_convention`premium_included`smile_type`vs_type`delta_ratio`delta_setting`wing_extrapolation`spread_type;"ZSSSSSFFFS";enlist ",";`volatilitysurface;`:hdb; {[p;t] select date,ccypair,atm_convention,premium_included,smile_type,vs_type,delta_ratio,delta_setting,wing_extrapolation,spread_type from update date:x, premium_included:?[premium_included = `$"true";1b;0b] from t}); `:rawdata]
UDPATE2 This is the output I get from TorQ:
2017.11.20D08:46:12.550618000|wsp18497wn|dataloader|dataloader1|INF|dataloader|**** LOADING :rawdata/20171102_113420.disccurve.csv ****
2017.11.20D08:46:12.550618000|wsp18497wn|dataloader|dataloader1|INF|dataloader|reading in data chunk
2017.11.20D08:46:12.566218000|wsp18497wn|dataloader|dataloader1|INF|dataloader|Read 10000 rows
2017.11.20D08:46:12.566218000|wsp18497wn|dataloader|dataloader1|INF|dataloader|processing data
2017.11.20D08:46:12.566218000|wsp18497wn|dataloader|dataloader1|INF|dataloader|Enumerating
2017.11.20D08:46:12.566218000|wsp18497wn|dataloader|dataloader1|INF|dataloader|writing 4525 rows to :hdb/2017.09.12/volatilitysurface/
2017.11.20D08:46:12.581819000|wsp18497wn|dataloader|dataloader1|INF|dataloader|writing 4744 rows to :hdb/2017.09.13/volatilitysurface/
2017.11.20D08:46:12.659823000|wsp18497wn|dataloader|dataloader1|INF|dataloader|writing 731 rows to :hdb/2017.09.14/volatilitysurface/
2017.11.20D08:46:12.737827000|wsp18497wn|dataloader|dataloader1|INF|init|retrieving sort settings from :C:/Dev/torq//config/sort.csv
2017.11.20D08:46:12.737827000|wsp18497wn|dataloader|dataloader1|INF|sort|sorting the volatilitysurface table
2017.11.20D08:46:12.737827000|wsp18497wn|dataloader|dataloader1|INF|sorttab|No sort parameters have been specified for : volatilitysurface. Using default parameters
2017.11.20D08:46:12.737827000|wsp18497wn|dataloader|dataloader1|INF|sortfunction|sorting :hdb/2017.09.05/volatilitysurface/ by these columns : sym, time
2017.11.20D08:46:12.753428000|wsp18497wn|dataloader|dataloader1|ERR|sortfunction|failed to sort :hdb/2017.09.05/volatilitysurface/ by these columns : sym, time. The error was: hdb/2017.09.
I get the following error sorttab|No sort parameters have been specified for : volatilitysurface. Using default parameters where is this sorttab documented? does it use the table PK by default?
UPDATE3 Ok fixed UPDATE2 out by providing a non-default sort.csv under my config folder:
tabname,att,column,sort
default,p,sym,1
default,,time,1
volatilitysurface,,date,1
volatilitysurface,,ccypair,1
But now I see that if I call the function multiple times on the same files, it simply appends duplicated data instead of upserting it.
UPDATE4 Still not there yet ... assuming I can check to make sure that no duplicate file is used. When I load and then start the database I get some structure back that ressembles some sort of dictionary and not a table.
2017.10.31| (,`volatilitysurface)!,+`date`ccypair`atm_convention`premium_incl..
2017.11.01| (,`volatilitysurface)!,+`date`ccypair`atm_convention`premium_incl..
2017.11.02| (,`volatilitysurface)!,+`date`ccypair`atm_convention`premium_incl..
2017.11.03| (,`volatilitysurface)!,+`date`ccypair`atm_convention`premium_incl..
sym | `AUDNOK`AUDCNH`AUDJPY`AUDHKD`AUDCHF`AUDSGD`AUDCAD`AUDDKK`CADSGD`C..
Note that date is actually datetime Z and not just date. My full and latest version of the function invocation is:
target:hsym `$("" sv ("./";getenv[`KDBHDB];"/volatilitysurface"));
rawdatadir:hsym `$getenv[`KDBRAWDATA];
.loader.loadallfiles[`headers`types`separator`tablename`dbdir`partitioncol`dataprocessfunc!(`x`ccypair`atm_convention`premium_included`smile_type`vs_type`delta_ratio`delta_setting`wing_extrapolation`spread_type;"ZSSSSSFFFS";enlist ",";`volatilitysurface;target;`date;{[p;t] select date,ccypair,atm_convention,premium_included,smile_type,vs_type,delta_ratio,delta_setting,wing_extrapolation,spread_type from update date:x, premium_included:?[premium_included = `$"true";1b;0b] from t}); rawdatadir];
I'm going to add a second answer here to try and tackle the question about using TorQ's data loader.
I'd like to clarify what output you are getting after running this function? There should be some logging messages output, can you post these? For example when I run the function:
jmcmurray#homer ~/deploy/TorQ (master) $ q torq.q -procname loader -proctype loader -debug
<torq startup messages removed>
q).loader.loadallfiles[`headers`types`separator`tablename`dbdir`partitioncol`dataprocessfunc!(c;"TSSFJFFJJBS";enlist",";`quotes;`:testdb;`date;{[p;t] select date:.z.d,time:TIME,sym:INSTRUMENT,BID,ASK from t});`:csvtest]
2017.11.17D15:03:20.312336000|homer.aquaq.co.uk|loader|loader|INF|dataloader|**** LOADING :csvtest/tradesandquotes20140421.csv ****
2017.11.17D15:03:20.319110000|homer.aquaq.co.uk|loader|loader|INF|dataloader|reading in data chunk
2017.11.17D15:03:20.339414000|homer.aquaq.co.uk|loader|loader|INF|dataloader|Read 11000 rows
2017.11.17D15:03:20.339463000|homer.aquaq.co.uk|loader|loader|INF|dataloader|processing data
2017.11.17D15:03:20.339519000|homer.aquaq.co.uk|loader|loader|INF|dataloader|Enumerating
2017.11.17D15:03:20.340061000|homer.aquaq.co.uk|loader|loader|INF|dataloader|writing 11000 rows to :testdb/2017.11.17/quotes/
2017.11.17D15:03:20.341669000|homer.aquaq.co.uk|loader|loader|INF|dataloader|**** LOADING :csvtest/tradesandquotes20140422.csv ****
2017.11.17D15:03:20.349606000|homer.aquaq.co.uk|loader|loader|INF|dataloader|reading in data chunk
2017.11.17D15:03:20.370793000|homer.aquaq.co.uk|loader|loader|INF|dataloader|Read 11000 rows
2017.11.17D15:03:20.370858000|homer.aquaq.co.uk|loader|loader|INF|dataloader|processing data
2017.11.17D15:03:20.370911000|homer.aquaq.co.uk|loader|loader|INF|dataloader|Enumerating
2017.11.17D15:03:20.371441000|homer.aquaq.co.uk|loader|loader|INF|dataloader|writing 11000 rows to :testdb/2017.11.17/quotes/
2017.11.17D15:03:20.460118000|homer.aquaq.co.uk|loader|loader|INF|init|retrieving sort settings from :/home/jmcmurray/deploy/TorQ/config/sort.csv
2017.11.17D15:03:20.466690000|homer.aquaq.co.uk|loader|loader|INF|sort|sorting the quotes table
2017.11.17D15:03:20.466763000|homer.aquaq.co.uk|loader|loader|INF|sorttab|No sort parameters have been specified for : quotes. Using default parameters
2017.11.17D15:03:20.466820000|homer.aquaq.co.uk|loader|loader|INF|sortfunction|sorting :testdb/2017.11.17/quotes/ by these columns : sym, time
2017.11.17D15:03:20.527216000|homer.aquaq.co.uk|loader|loader|INF|applyattr|applying p attr to the sym column in :testdb/2017.11.17/quotes/
2017.11.17D15:03:20.535095000|homer.aquaq.co.uk|loader|loader|INF|sort|finished sorting the quotes table
After all this, I can run \l testdb and there is a table called "quotes" containing my loaded data
If you can post logging messages like these, it could be helpful to see what's going on.
UPDATE
"But now I see that if I call the function multiple times on the same files, it simply appends duplicated data instead of upserting it."
If I'm understanding the problem correctly, it sounds like you likely shouldn't call the function multiple times on the same files. Another process within TorQ could be useful here, the "file alerter". This process will monitor a directory for new & updated files, and can call a function on any that appear (so you can have it call the loader function with every new file automatically). It has a number of options such as moving files after processing (so you can "archive" loaded CSVs)
Note that the file alerter requires that a function take exactly two parameters - the directory & the file name. This effectively means you will need a "wrapper" function around the loader function, which takes a dictionary & a directory. I don't think TorQ includes a function similar to .loader.loadallfiles for a single file, so it might be necessary to copy the target file to a temporary directory, run loadallfiles on that directory and then delete the file from there before loading the next.
`cast error refers to a value not being enumerated
I can't see any enumeration going on here, splayed tables on disk need to have symbol columns enumerated. For example, this can be done with the following line, before calling .Q.dpft
volatilitysurface:.Q.en[afolder;volatilitysurface];
You may like to consider using an example CSV loader for loading your data. One such example is included in TorQ, the KDB framework developed by AquaQ Analytics (as a disclaimer, I work for AquaQ)
The framework is available (free of charge) here: https://github.com/AquaQAnalytics/TorQ
The specific component you will likely be interested in is dataloader.q and is documented here: http://aquaqanalytics.github.io/TorQ/utilities/#dataloaderq
This script will handle everything necessary, loading all files, enumerating, sorting on disk, applying attributes etc. as well as using .Q.fsn to prevent running out of memory

Pass control from one gen_fsm to another

I'm creating a generic Erlang server that should be able to handle hundreds of client connections concurrently. For simplicity, let's suppose that the server performs for every client some basic computation, e.g., addition or subtraction of every two values which the client provides.
As a starting point, I'm using this tutorial for basic TCP client-server interaction. An excerpt that represents the supervision tree:
+----------------+
| tcp_server_app |
+--------+-------+
| (one_for_one)
+----------------+---------+
| |
+-------+------+ +-------+--------+
| tcp_listener | + tcp_client_sup |
+--------------+ +-------+--------+
| (simple_one_for_one)
+-----|---------+
+-------|--------+|
+--------+-------+|+
| tcp_echo_fsm |+
+----------------+
I would like to extend this code and allow tcp_echo_fsm to pass the control over the socket to one out of two modules: tcp_echo_addition (to compute the addition of every two client values), or tcp_echo_subtraction (to compute the subtraction between every two client values).
The tcp_echo_fsm would choose which module to handle a socket based on the first message from the client, e.g., if the client sends <<start_addition>>, then it would pass control to tcp_echo_addition.
The previous diagram becomes:
+----------------+
| tcp_server_app |
+--------+-------+
| (one_for_one)
+----------------+---------+
| |
+-------+------+ +-------+--------+
| tcp_listener | + tcp_client_sup |
+--------------+ +-------+--------+
| (simple_one_for_one)
+-----|---------+
+-------|--------+|
+--------+-------+|+
| tcp_echo_fsm |+
+----------------+
|
|
+----------------+---------+
+-------+-----------+ +-------+--------------+
| tcp_echo_addition | + tcp_echo_subtraction |
+-------------------+ +-------+--------------+
My questions are:
Am I on the right path? Is the tutorial which I'm using a good starting point for a scalable TCP server design?
How can I pass control from one gen_fsm (namely, tcp_echo_fsm) to another gen_fsm (either tcp_echo_addition or tcp_echo_subtraction)? Or better yet: is this a correct/clean way to design the server?
This related question suggests that passing control between a gen_fsm and another module is not trivial and there might be something wrong with this approach.
For 2, you can use gen_tcp:controlling_process/2 to pass control of the tcp connection: http://erlang.org/doc/man/gen_tcp.html#controlling_process-2.
For 1, I am not sure of the value of spawning a new module as opposed to handling the subtraction and addition logic as part of the defined states in your finite state machine. Doing so creates code which is now running outside of your supervision tree, so it's harder to handle errors and restarts. Why not define addition and subtraction as different states within your state machines handle that logic within those two states?
You can create tcp_echo_fsm:subtraction_state/2,3 and tcp_echo_fsm:addition_state/2,3 to handle this logic and use your first message to transition to the appropriate state rather than adding complexity to your application.