How do BigNums implementations work? - biginteger

I wanted to know how the BigInt and other such stuff are implemented. I tried to check out JAVA source code, but it was all Greek and Latin to me.
Can you please explain me the algo in words - no code, so that i understand what i am actually using when i use something from the JAVA API.
regards

Conceptually, the same way you do arbitrary size arithmentic by hand. You have something like an array of values, and algorithms for the various operations that work on the array.
Say you want to add 100 to 901. You start with the two numbers as arrays:
[0, 1, 0, 0]
[0, 9, 0, 1]
When you add, your addition algorithm starts from the right, takes 0+1, giving 1, 0+0, giving 0, and -- now the tricky part -- 9+1 gives 10, but now we need to carry, so we add 1 to the next column over, and put (9+1)%10 into the third column.
When your numbers grow big enough -- greater than 9999 in this example -- then you have to allocate more space somehow.
This is, of course, somewhat simplified if you store the numbers in reverse order.
Real implementations use full words, so the modulus is really some large power of two, but the concept is the same.
There's a very good section on this in Knuth.

Related

How can I address the SHA3 state vector in programming terms?

I've been working on an implementation of SHA3, and I'm getting a bit muddled on this particular aspect of the algorithm. The addressing scheme of the state vector is given by the following diagram:
My issue with the above is: How does one go about addressing this in terms of actual code? I am using a 3 dimensional array to express the state vector, but this leads to obvious issues since the conventional mapping of an array (0 index is first) differs from the above convention used in SHA3.
For example, if I wanted to address the (0,0,0) bit in the SHA3 state array, the following expression would achieve this:
state_vector[2][2][0]
I find this highly cumbersome however because when implementing the actual round algorithms, the intended x and y values do not directly map to the array indices. Addressing state_vector[0][0][0] would return the very first index in the array instead of the (0,0,0) bit in the SHA3 state array.
Is there a way I can get around this in code?
Sorry, I know this is probably a stupid question.
The way this is customarily implemented is as a 5×5 array of 64-bit words, an array of 25 64-bit words or, if you believe your architecture (say, AArch64) will have a lot of registers, as 25 individual 64-bit words. (I prefer the second option because it's simpler to work with.) Typically they are indeed ordered in the typical order for arrays, and one simply rewrites things accordingly.
Usually this isn't a problem, because the operations are specified in terms of words in relation to each other, such as in the theta and chi steps. It's common to simply code rho and pi together such that it involves reading a word, rotating it, and storing it in the destination word, and in such a case you can simply just reorder the rotation constants as you need to.
If you want to get very fancy, you can write this as an SIMD implementation, but I think it's easier to see how it works in a practical implementation if you write it as a one- or two-dimensional array of words first.

How does encoding as a bit string in Genetic algorithm helpful?

Suppose we have to solve a global optimization problem in which we have to find values of 5 variables, all of which are integers. Assume we get following two parent chromosomes,
Parent 1: 6, 10, 3, 5, 12
Parent 2: 12, 10, 3, 8, 11
If we do cross-over after first 2 elements, we get following
Child 1: 6, 10, 3, 8, 11
Child 2: 12, 10, 3, 5, 12
Here we can clearly see the children are related to parents.
But when we encode as bit strings, then each chromosome is encoded as a single string of bits and we can, at random, choose any point for crossover. I do not see how this is any more beneficial than completely randomly picking any trial solutions.
I have a similar question with mutation. We randomly flip a bit. If the bit flipped has a small place value, then the change will be small. But if it has a big place value, the change will be big. How is it better than completely randomly changing a chromosome?
Binary encoding is still common mainly because first works about GA used that encoding.
Furthermore it's often space efficient: [6, 10, 3, 5, 12] represented as a sequence of integers would probably require 5 * 32 bits; for a bit string representation 5 * 4 bits are enough (assuming numbers in the [0;15] range).
Under this aspect the knapsack problem is the best case for the bit string representation (each bit says if the corresponding object is in the knapsack).
we can, at random, choose any point for crossover. I do not see how this is any more beneficial than completely randomly picking any trial solutions
In general choosing a crossover point in the middle of a digit introduces an essentially arbitrary mutation of that digit with a destructive (negative) effect.
There is a nice example about this effect in the Local Search Algorithms and Optimization Problems - Genetic Algorithms section of Artificial Intelligence - A modern approach (Russel, Norvig).
Take also a look at a similar question on Software Engineering.

Implementing one hot encoding

I already understand the uses and concept behind one hot encoding with neural networks. My question is just how to implement the concept.
Let's say, for example, I have a neural network that takes in up to 10 letters (not case sensitive) and uses one hot encoding. Each input will be a 26 dimensional vector of some kind for each spot. In order to code this, do I act as if I have 260 inputs with each one displaying only a 1 or 0, or is there some other standard way to implement these 26 dimensional vectors?
In your case, you have to differ between various frameworks. I can speak for PyTorch, which is my goto framework when programming a neural network.
There, one-hot encodings for sequences are generally performed in a way where your network will expect a sequence of indices. Taking your 10 letters as an example, this could be the sequence of ["a", "b", "c" , ...]
The embedding layer will be initialized with a "dictionary length", i.e. the number of distinct elements (num_embeddings) your network can receive - in your case 26. Additionally, you can specify embedding_dim, i.e. the output dimension of a single character. This is already past the step of one-hot encodings, since you generally only need them to know which value to associate with that item.
Then, you would feed a coded version of the above string to the layer, which could be looking like this: [0,1,2,3, ...]. Assuming the sequence is of length 10, his will produce an output of [10,embedding_dim], i.e. a 2-dimensional Tensor.
To summarize, PyTorch essentially allows you to skip this rather tedious step of encoding it as a one-hot encoding. This is mainly due to the fact that your vocabulary can in some instances be quite large: Consider for example Machine Translation Systems, in which you could have 10,000+ words in your vocabulary. Instead of storing every single word as a 10,000-dimensional vector, using a single index is more convenient.
If that should not completely answer your question (since I am essentially telling you how it is generally preferred): Instead of making a 260-dimensional vector, you would again use a [10,26] Tensor, in which each line represents a different letter.
If you have 10 distinct elements(Ex: a,b....j OR 1,2...10) to be represented as 'one hot-encoding' vector of dimension-26 then, your inputs are 10 vectors only each of which is to be represented by 26-dim vector. Do this:
y = torch.eye(26) # If you want a tensor for each 'letter' of length 26.
y[torch.arange(0,10)] #This line gives you 10 one hot-encoding vector each of dimension 26.
Hope this helps a bit.

in genetic algorithm, How to deal with binary representation of constraints of functions?

For example, 0<=x<=31, the length of binary form of 31 is 5, since 31=11111 in base 2.
However, how to deal with, say, 0<=x<=25, if I keep length 5, numbers like 11110(30) may be generated, which exceeds 11001(25).
I wonder if there is a mapping which could solve this.
Thanks a lot!
If I understand you correctly, you are asking how to deal with automatically generated solutions that fall outside the constraint you have. In this case you have several options, firstly you could simply kill these invalid solutions and generate more until one fits within your constraint. The better option is to normalise all of your values within a specified range e.g. 0 to 31 or 0 to 64 etc.
I have an example of this type of normalisation in the Evaluate Fitness function of this example.
http://johnnewcombe.net/blog/gaf-part-2/
The code is based around the Genetic Algorithm Framework for .Net but the technique can be applied to any library or home grown algorithm.

Efficient Function to Map (or Hash) Integers and Integer Ranges into Index

We are looking for the computationally simplest function that will enable an indexed look-up of a function to be determined by a high frequency input stream of widely distributed integers and ranges of integers.
It is OK if the hash/map function selection itself varies based on the specific integer and range requirements, and the performance associated with the part of the code that selects this algorithm is not critical. The number of integers/ranges of interest in most cases will be small (zero to a few thousand). The performance critical portion is in processing the incoming stream and selecting the appropriate function.
As a simple example, please consider the following pseudo-code:
switch (highFrequencyIntegerStream)
case(2) : func1();
case(3) : func2();
case(8) : func3();
case(33-122) : func4();
...
case(10,000) : func40();
In a typical example, there would be only a few thousand of the "cases" shown above, which could include a full range of 32-bit integer values and ranges. (In the pseudo code above 33-122 represents all integers from 33 to 122.) There will be a large number of objects containing these "switch statements."
(Note that the actual implementation will not include switch statements. It will instead be a jump table (which is an array of function pointers) or maybe a combination of the Command and Observer patterns, etc. The implementation details are tangential to the request, but provided to help with visualization.)
Many of the objects will contain "switch statements" with only a few entries. The values of interest are subject to real time change, but performance associated with managing these changes is not critical. Hash/map algorithms can be re-generated slowly with each update based on the specific integers and ranges of interest (for a given object at a given time).
We have searched around the internet, looking at Bloom filters, various hash functions listed on Wikipedia's "hash function" page and elsewhere, quite a few Stack Overflow questions, abstract algebra (mostly Galois theory which is attractive for its computationally simple operands), various ciphers, etc., but have not found a solution that appears to be targeted to this problem. (We could not even find a hash or map function that considered these types of ranges as inputs, much less a highly efficient one. Perhaps we are not looking in the right places or using the correct vernacular.)
The current plan is to create a custom algorithm that preprocesses the list of interesting integers and ranges (for a given object at a given time) looking for shifts and masks that can be applied to input stream to help delineate the ranges. Note that most of the incoming integers will be uninteresting, and it is of critical importance to make a very quick decision for as large a percentage of that portion of the stream as possible (which is why Bloom filters looked interesting at first (before we starting thinking that their implementation required more computational complexity than other solutions)).
Because the first decision is so important, we are also considering having multiple tables, the first of which would be inverse masks (masks to select uninteresting numbers) for the easy to find large ranges of data not included in a given "switch statement", to be followed by subsequent tables that would expand the smaller ranges. We are thinking this will, for most cases of input streams, yield something quite a bit faster than a binary search on the bounds of the ranges.
Note that the input stream can be considered to be randomly distributed.
There is a pretty extensive theory of minimal perfect hash functions that I think will meet your requirement. The idea of a minimal perfect hash is that a set of distinct inputs is mapped to a dense set of integers in 1-1 fashion. In your case a set of N 32-bit integers and ranges would each be mapped to a unique integer in a range of size a small multiple of N. Gnu has a perfect hash function generator called gperf that is meant for strings but might possibly work on your data. I'd definitely give it a try. Just add a length byte so that integers are 5 byte strings and ranges are 9 bytes. There are some formal references on the Wikipedia page. A literature search in ACM and IEEE literature will certainly turn up more.
I just ran across this library I had not seen before.
Addition
I see now that you are trying to map all integers in the ranges to the same function value. As I said in the comment, this is not very compatible with hashing because hash functions deliberately try to "erase" the magnitude information in a bit's position so that values with similar magnitude are unlikely to map to the same hash value.
Consequently, I think that you will not do better than an optimal binary search tree, or equivalently a code generator that produces an optimal "tree" of "if else" statements.
If we wanted to construct a function of the type you are asking for, we could try using real numbers where individual domain values map to consecutive integers in the co-domain and ranges map to unit intervals in the co-domain. So a simple floor operation will give you the jump table indices you're looking for.
In the example you provided you'd have the following mapping:
2 -> 0.0
3 -> 1.0
8 -> 2.0
33 -> 3.0
122 -> 3.99999
...
10000 -> 42.0 (for example)
The trick is to find a monotonically increasing polynomial that interpolates these points. This is certainly possible, but with thousands of points I'm certain you'ed end up with something much slower to evaluate than the optimal search would be.
Perhaps our thoughts on hashing integers can help a little bit. You will also find there a hashing library (hashlib.zip) based on Bob Jenkins' work which deals with integer numbers in a smart way.
I would propose to deal with larger ranges after the single cases have been rejected by the hashing mechanism.