How to write a local branch predictor? - cpu-architecture

I am trying to use runspec test my local branch predictor, but only find a disappointing result.
By now I have tried use a 64 terms LHT, and when the LHT is full, I use FIFO tactics replace a terms in LHT.I don't know if I use a tiny LHT or my improper replacement tactics makes it a terrible precision, anyway it's only 60.9095.
for (int i = 0; i < 1 << HL; i++)
{
if (tag_lht[i] == (addr&(1-(1<<HL))))
{
addr = addr ^ LHT[i].getVal();
goto here;
break;
}
}
index_lht = index_lht%(1<<HL);
tag_lht[index_lht] = (addr&(1-(1<<HL)));
LHT[index_lht] = ShiftReg<2>();
addr = addr ^ LHT[index_lht].getVal();
index_lht++;
here:
for (int i = 0; i < 1 << L; i++)
{
if (tag[i] == (addr))
{
return bhist[i].isTaken();
}
}
index = index % (1 << L);
tag[index] = (addr);
bhist[index].reset();
return bhist[index++].isTaken();
Here I make some explain about the code. bhist is a table store 2-bit status about each branch instructions when the table is full, use FIFO replacement tactics. tag is where the table store address of each instruction. Besides, likely I use tag_lht to store address of each instruction that stored in LHT. Function isTaken() can easily get the predict result.

Thank you all guys, I find that stupid mistake I make, and the code above is correct, but may not seem work prefect. The mistake bellow:
for (int i = 0; i < (1 << L); i++)
{
if (tag[i] == (addr))
{
if (takenActually)
{
LHT[j].shiftIn(1);
bhist[i].increase();
}
else
{
LHT[j].shiftIn(0);
bhist[i].decrease();
}
}
break;
}
But it should be like this:
for (int i = 0; i < (1 << L); i++)
{
if (tag[i] == (addr))
{
if (takenActually)
{
LHT[j].shiftIn(1);
bhist[i].increase();
}
else
{
LHT[j].shiftIn(0);
bhist[i].decrease();
}
break;
}
}
I am so stupid that I waste you helpful people' s time, I spent so much time to figure out why it don't work, at first I thought that wrong variable or argument are used, now I just think I am a careless man.
Again I thank all you ardent fellows. Then I will answer the question with my full code.
PS. wish that my terrible English have not confuse anyone.:)

Related

Bad address error when comparing Strings within BPF

I have an example program I am running here to see if the substring matches the string and then print them out. So far, I am having trouble running the program due to a bad address. I am wondering if there is a way to fix this problem? I have attached the entire code but my problem is mostly related to isSubstring.
#include <uapi/linux/bpf.h>
#define ARRAYSIZE 64
struct data_t {
char buf[ARRAYSIZE];
};
BPF_ARRAY(lookupTable, struct data_t, ARRAYSIZE);
//char name[20];
//find substring in a string
static bool isSubstring(struct data_t stringVal)
{
char substring[] = "New York";
int M = sizeof(substring);
int N = sizeof(stringVal.buf) - 1;
/* A loop to slide pat[] one by one */
for (int i = 0; i <= N - M; i++) {
int j;
/* For current index i, check for
pattern match */
for (j = 0; j < M; j++)
if (stringVal.buf[i + j] != substring[j])
break;
if (j == M)
return true;
}
return false;
}
int Test(void *ctx)
{
#pragma clang loop unroll(full)
for (int i = 0; i < ARRAYSIZE; i++) {
int k = i;
struct data_t *line = lookupTable.lookup(&k);
if (line) {
// bpf_trace_printk("%s\n", key->buf);
if (isSubstring(*line)) {
bpf_trace_printk("%s\n", line->buf);
}
}
}
return 0;
}
My python code here:
import ctypes
from bcc import BPF
b = BPF(src_file="hello.c")
lookupTable = b["lookupTable"]
#add hello.csv to the lookupTable array
f = open("hello.csv","r")
contents = f.readlines()
for i in range(0,len(contents)):
string = contents[i].encode('utf-8')
print(len(string))
lookupTable[ctypes.c_int(i)] = ctypes.create_string_buffer(string, len(string))
f.close()
b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="Test")
b.trace_print()
Edit: Forgot to add the error: It's really long and can be found here: https://pastebin.com/a7E9L230
I think the most interesting part of the error is near the bottom where it mentions:
The sequence of 8193 jumps is too complex.
And a little bit farther down mentions: Bad Address.
The verifier checks all branches in your program. Each time it sees a jump instruction, it pushes the new branch to its “stack of branches to check”. This stack has a limit (BPF_COMPLEXITY_LIMIT_JMP_SEQ, currently 8192) that you are hitting, as the verifier tells you. “Bad Address” is just the translation of kernel's errno value which is set to -EFAULT in that case.
Not sure how to fix it though, you could try:
With smaller strings, or
On a 5.3+ kernel (which supports bounded loops): without unrolling the loop with clang (I don't know if it would help).

Dataset or Statistics Mode

I expected this question to have a really straightforward answer but can't seem to find anything in the documentation.
I need to be able to get the mode of a dataset or a statistics element. All I can find is min, max, mean and median.
Is there no function that gives the mode?
Thank you
I ended up using the below function, but I still find it weird that the mode function is not built-in...
int maxCount=0;
for(int i=0; i<QueueLengthDailyMaq.size(); ++i)
{
int count=0;
for(int j=0; j<QueueLengthDailyMaq.size(); ++j)
{
if(QueueLengthDailyMaq.getY(j) == QueueLengthDailyMaq.getY(i))
{
count++;
}
if(count > maxCount)
{
maxCount = count;
mode = QueueLengthDailyMaq.getY(i);
}
}
}

CS50 pset 3: Tideman sort_pairs function

I need some assistance in understanding the logic behind this function. This is my current sort_pairs function in Tideman:
// Sort pairs in decreasing order by the strength of victory
void sort_pairs(void)
{
qsort(pairs, pair_count, sizeof(pair), compare);
return;
}
// Function for sort_pairs
int compare(const void *a, const void *b)
{
const pair *p1 = (const pair *) a;
const pair *p2 = (const pair *) b;
if (p1->winner < p2->winner)
{
return -1;
}
else if (p1->winner > p2->winner)
{
return 1;
}
else
{
return 0;
}
}
This does not clear check50 and I looked online to find how to approach this problem. It seems that most functions compare the values from the preferences array instead (eg preferences[pairs[i].winner][pairs[i].loser]) . My previous functions vote, record_preferences, and add_pairs all clear check50. I have not advanced beyond sort_pairs yet.
Why can't I compare the strength of victory directly from the pairs array instead since I already have the data stored there?
You don't need to make this so complex, you can use your own sorting here. Let's try a simple insertion sort-
void sort_pairs()
{
pair temp;
for (int i = 1, j; i < pair_count; i++)
{
temp = pairs[i];
j = i - 1;
for (; j >= 0 && preferences[pairs[j].winner][pairs[j].loser] < preferences[temp.winner][temp.loser]; j--)
{
pairs[j + 1] = pairs[j];
}
pairs[j + 1] = temp;
}
}
The pair struct looks like-
typedef struct
{
int winner;
int loser;
}
pair;
Explanation:-
We go through each pair of elements inside the pairs array - starting at 1 since I'm going to compare with the previous element (j = i - 1)
Now we check all the previous elements from the current element and compare them with the key - preferences[pairs[INDEX].winner][pairs[INDEX].loser]
This is the key you should be sorting by. preferences[WINNER_ID][LOSER_ID] means the amount of people that prefer WINNER_ID over LOSER_ID.
And that's pretty much it!, it's simply a insertion sort but the key is the important part.

order of execution of forked processes

#include<stdio.h>
#include<stdlib.h>
#include<unistd.h>
#include<sys/sem.h>
#include<sys/ipc.h>
int sem_id;
void update_file(int number)
{
struct sembuf sem_op;
FILE* file;
printf("Inside Update Process\n");
/* wait on the semaphore, unless it's value is non-negative. */
sem_op.sem_num = 0;
sem_op.sem_op = -1; /* <-- Amount by which the value of the semaphore is to be decreased */
sem_op.sem_flg = 0;
semop(sem_id, &sem_op, 1);
/* we "locked" the semaphore, and are assured exclusive access to file. */
/* manipulate the file in some way. for example, write a number into it. */
file = fopen("file.txt", "a+");
if (file) {
fprintf(file, " \n%d\n", number);
fclose(file);
}
/* finally, signal the semaphore - increase its value by one. */
sem_op.sem_num = 0;
sem_op.sem_op = 1;
sem_op.sem_flg = 0;
semop( sem_id, &sem_op, 1);
}
void write_file(char* contents)
{
printf("Inside Write Process\n");
struct sembuf sem_op;
sem_op.sem_num = 0;
sem_op.sem_op = -1;
sem_op.sem_flg = 0;
semop( sem_id, &sem_op, 1);
FILE *file = fopen("file.txt","w");
if(file)
{
fprintf(file,contents);
fclose(file);
}
sem_op.sem_num = 0;
sem_op.sem_op = 1;
sem_op.sem_flg = 0;
semop( sem_id, &sem_op, 1);
}
int main()
{
//key_t key = ftok("file.txt",'E');
sem_id = semget( IPC_PRIVATE, 1, 0600 | IPC_CREAT);
/*here 100 is any arbit number to be assigned as the key of the
semaphore,1 is the number of semaphores in the semaphore set, */
if(sem_id == -1)
{
perror("main : semget");
exit(1);
}
int rc = semctl( sem_id, 0, SETVAL, 1);
pid_t u = fork();
if(u == 0)
{
update_file(100);
exit(0);
}
else
{
wait();
}
pid_t w = fork();
if(w == 0)
{
write_file("Hello!!");
exit(0);
}
else
{
wait();
}
}
If I run the above code as a c code, the write_file() function is called after the update_file () function
Whereas if I run the same code as a c++ code, the order of execution is reverse... why is it so??
Just some suggestions, but it looks to me like it could be caused by a combination of things:
The wait() call is supposed to take a pointer argument (that can
be NULL). Compiler should have caught this, but you must be picking
up another definition somewhere that permits your syntax. You are
also missing an include for sys/wait.h. This might be why the
compiler isn't complaining as I'd expect it to.
Depending on your machine/OS configuration the fork'd process may
not get to run until after the parent yields. Assuming the "wait()"
you are calling isn't working the way we would be expecting, it is
possible for the parent to execute completely before the children
get to run.
Unfortunately, I wasn't able to duplicate the same temporal behavior. However, when I generated assembly files for each of the two cases (C & C++), I noticed that the C++ version is missing the "wait" system call, but the C version is as I would expect. To me, this suggests that somewhere in the C++ headers this special version without an argument is being #defined out of the code. This difference could be the reason behind the behavior you are seeing.
In a nutshell... add the #include, and change your wait calls to "wait(0)"

Atomically setting a variable without comparing first

I've been reading up on and experimenting with atomic memory access for synchronization, mainly for educational purposes. Specifically, I'm looking at Mac OS X's OSAtomic* family of functions. Here's what I don't understand: Why is there no way to atomically set a variable instead of modifying it (adding, incrementing, etc.)? OSAtomicCompareAndSwap* is as close as it gets -- but only the swap is atomic, not the whole function itself. This leads to code such as the following not working:
const int N = 100000;
void* threadFunc(void *data) {
int *num = (int *)data;
// Wait for main thread to start us so all spawned threads start
// at the same time.
while (0 == num) { }
for (int i = 0; i < N; ++i) {
OSAtomicCompareAndSwapInt(*num, *num+1, num);
}
}
// called from main thread
void test() {
int num = 0;
pthread_t threads[5];
for (int i = 0; i < 5; ++i) {
pthread_create(&threads[i], NULL, threadFunc, &num);
}
num = 1;
for (int i = 0; i < 5; ++i) {
pthread_join(threads[i], NULL);
}
printf("final value: %d\n", num);
}
When run, this example would ideally produce 500,001 as the final value. However, it doesn't; even when the comparison in OSAtomicCompareAndSwapInt in thread X succeeds, another thread Y can come in set the variable first before X has a chance to change it.
I am aware that in this trivial example I could (and should!) simply use OSAtomicAdd32, in which case the code works. But, what if, for example, I wanted to set a pointer atomically so it points to a new object that another thread can then work with?
I've looked at other APIs, and they seem to be missing this feature as well, which leads me to believe that there is a good reason for it and my confusion is just based on lack of knowledge. If somebody could enlighten me, I'd appreciate it.
I think that you have to check the OSAtomicCompareAndSwapInt result to guarantee that the int was actually set.