Monitoring system integrity by analyzing proof of work performance - t-test

So I have some code that spawns a number of processes which generate psudo-random sequences and hashes them checking if the hashes meet some criteria and then saves passing hashes, the random seed used, and the amount of time it took to generate a passing sequence from the random seed. My criteria is that the first 8 hex characters of the resulting sha256 hash are the same. I saw some strange output where my durations where roughly the same for a number of results and subsequently checked the durations by re-running the random seeds. I found that upon re-running the seeds the times were much shorter (>1000 seconds when before they were <5000 seconds). This seems like a red flag for system integrity but what todo about that is a separate question.
I want to perform a student-t test on the distribution of some n recent durations so that I can trigger a validation process that re-runs the seeds to check if their time to completion changes. What distribution should I use to test against and what's a good n for how many samples I should examine?

Related

How to Make a Code Run Multiple Times with Random Numbers in MATLAB

I have a code that involves generating a random number and then the rest of the calculations are based on that number. I want to run this 100x with 100 different random numbers, but obviously I don't want to manually run it and write down the final answer 100 times. Would I use a loop to get it to do this on its own? I am only familiar with writing loops where separate numbers are set for each run, like running a calculation with numbers 1-100 or with pulling out numbers from a vector. I am not sure how I would write one that runs a set number of times and generates a random number each time.

matlab: different instances start with the same random seed

Using MATLAB and trying to use a computer cluster to perform 100 repetitions of certain calculation with inherent stochastic nature. Each of those repetitions should include the same code, but with different random seed.
It seems that
rng('shuffle')
recommended by documentation may not achieve this if all jobs start running at the same time (on different machines) as the seed used is an integer which seems to be initialized from time (it is monotonously increasing, seems like precision of 100th of a second.
The precision seems reasonable, but "collisions" are still very likely if running 100-1000 instances at the same time, thus corrupting the results statistical interpretation as independent.
Any way to avoid such collisions without manually giving each instance an "instance id" used as seed?
Whatever you choose for the seed, it can only take on a 32-bit value, even if it will initialize a generator with a bigger state, such as Mersenne Twister ('twister', 19937 bits). There are certain issues with 32-bit seeds, as discussed in "C++ Seeding Surprises" by M. O'Neill. Presumably, the time-based seeds are likewise 32 bits long. A short seed means that only a limited number of pseudorandom sequences can be generated.
It appears that rng doesn't support seeds longer than 32 bits. On the other hand, recent versions of MATLAB support random number streams, which are designed, among other things, if you "want separate sources of randomness in a simulation". For your purposes, choose a generator that supports multiple streams, such as mrg32k3a, and create random number streams as follows (see also "Multiple Streams"):
[stream1, stream2]=RandStream.create('mrg32k3a','NumStreams',2)
I usually try to get some serial numbers from the machine or HDD, e.g.
dos('wmic bios get serialnumber')
or
dos('wmic cpu')
ProcessorId e.g. "BFEBFBFF000506E3" is another one that could be used and
be different across your cluster. Likely multicores thus use NumberOfCores
to split and have different seeds, maybe.

Matlab random number rng: choosing a seed

I would like to know more precisely what happends when you choose a custom seed in Matlab, e.g.:
rng(101)
From my (limited, nut nevertheless existing) understanding of how pseudo-random number generators work, one can see the seed conceptually as choosing a position in a "very long list of pseudo-random numbers".
Question: lets say, (in my Matlab script), I choose rng(100) for my first computation (a sequence of instructions) and then rng(1e6) for my second. Please, note that each time I do some computations it involves generating up to about 300k random numbers (each time).
-> Does that imply that I make sure there is no overlap between the sequence in the "list" starting at 100 and ending around 300k and the one starting at 1e6 and ending at 1'300'000 ? (the idead of "no overlap" comes from the fact since the rng(100) and rng(1e6) are separated by much more than 300k)
i.e. that these are 2 "independent" sequences, (as far as I remember this 'long list' would be generated by a special PRNG algorithm, most likely involing modular arithmetic..?)
No that is not the case. The mapping between the seed and the "position" in our list of generated numbers is not linear, you could actually interpret it as a hash/one way function. It could actually happen that we get the same sequence of numbers shifted by one position (but it is very unlikely).
By default, MATLAB uses the Mersenne Twister (source).
Not quite. The seed you give to rng is the initiation point for the Mersenne Twister algorithm (by default) that is used to generate the pseudorandom numbers. If you choose two different seeds (no matter their relative non-negative integer values, except for maybe a special case or two), you will have effectively independent pseudorandom number streams.
For "99%" of people, the major uses of seeding the rng are using the 'shuffle' argument (to use a non-default seed based on the time to help ensure independence of numbers generated across multiple sessions), or to give it one particular seed (to be able to reproduce the same pseudorandom stream at a later date). If you try to finesse the seeds further without being extremely careful, you are more likely to cause issues than do anything helpful.
RandStream can be used to break off separate streams of pseudorandom numbers if that really matters for your application (it likely doesn't).

One-time randomization

I have a matrix, ECGsig, with each row containing a 1-second-long ECG signal,
I will classify them later but I want to randomly change the rows like,
idx = randperm(size(ECGsig,1));
ECGsig = ECGsig(idx,:);
However I want this to happen just once and not every time that I run the program,
Or in other words to have the random numbers generated only once,
Because if it changes every time I would have different results for classification,
Is there any way to do this beside doing in a separate m file and saving it in a mat file?
Thanks,
You can set the random generation seed so that every time you run a random result, it will generate the same random result each time. You can do this through rng. This way, even though run the program multiple times, it will still generate the same random sequence regardless. As such, try doing something like:
rng(1234);
The input into rng would be the seed. However, as per Luis Mendo's comment, rng is only available with newer versions of MATLAB. Should rng not be available with your distribution of MATLAB, do this instead:
rand('seed', 1234);
You can also take a look at randstream, but that's a bit too advanced so let's not look at it right now. To reset the seed to what it was before you opened MATLAB, choose a seed of 0. Therefore:
rng(0); %// or
rand('seed', 0);
By calling this, any random results you generate from this point will be based on a pre-determined order. The seed can be any integer you want really, but use something that you'll remember. Place this at the very beginning of your code before you do anything. The main reason why we have control over how random numbers are generated is because this encourages the production of reproducible results and research. This way, other people can generate the results you have created should you decide to do anything with random or randomizing.
Even though you said you only want to run this randomization once, this will save you the headache of saving your results to a different file before you run the program multiple times. By setting the seed, even though you're running the program multiple times, you're guaranteed to generate the same random sequence each time.

Uniform Random Number blocks in my simulation model

I've used 2 Uniform Random Number blocks in my simulation model, but every time I run the program they generate last numbers (exactly the same). I need to test the model with new generated numbers. what should I do?
thanks for your helps in advance
The fact that random number generators generate the same random numbers "from the start" is a feature, not a bug. It allows for reproducible testing. You need to initialize your random number generator with a "random seed" in order to give a different result each time - you could use the current time, for example. When you do, it is recommended that you store the seed used - it means you can go back and run exactly the same code again.
For initializing a random seed, you can use the methods given in this earlier answer
In that answer, they are setting the seed to 0 - this is the opposite of what you are trying to do. You will want to generate a non-random number (like the date), and use that. A very useful article can be found here. To quote:
If you look at the output from rand, randi, or randn in a new MATLAB
session, you'll notice that they return the same sequences of numbers
each time you restart MATLAB. It's often useful to be able to reset
the random number generator to that startup state, without actually
restarting MATLAB. For example, you might want to repeat a calculation
that involves random numbers, and get the same result.
They recommend the command
rng shuffle
To generate a new random seed. You can access the seed that was used with
rng.seed
and store that for future use. So if you co
rng shuffle
seedStore = rng.seed;
Then next time you want to reproduce results, you set
rng(seedStore);