Correctly Implementing Zobrist Hashing - hash

I'm currently adding transposition tables in my chess engine, and I'm having issues with incrementally updating Zobrist keys. I did some research and implemented the basic idea, but it's not behaving as I expect. The problem I encountered was that equivalent board positions do not always have the same keys. For example, in the starting position, if both players just moved a knight and then moved it back, the key would be different from that of the starting position. However, doing this again (moving the knights) and returning to the starting position would result in the original key. So it seems that the period for such sequence is 4 moves for each player, when it should just be 2.
Has anyone encountered such a problem or can think of solution? I've included the relevant portions of my make/unmake methods. I don't include side-to-move, castling rights, etc; they shouldn't affect the particular case I brought up. HashValue stores the random values, with the first index being the piece type and second being the square.
void Make(Move m) {
ZobristKey ^= HashValue[Piece[m.From].Type][m.From];
ZobristKey ^= HashValue[Piece[m.From].Type][m.To];
ZobristKey ^= HashValue[Piece[m.To].Type][m.To];
//rest of make move
}
void Unmake(Move m) {
ZobristKey ^= HashValue[m.Captured.Type][m.To];
ZobristKey ^= HashValue[Element[m.To].Type][m.To];
ZobristKey ^= HashValue[Element[m.To].Type][m.From];
//rest of unmake
}

Make_a_move() {
hashval ^= Zobrist_array[oldpos][piece];
hashval ^= Zobrist_array[newpos][piece];
/* if there is a capture */
hashval ^= Zobrist_array[otherpos][otherpiece];
}
Undo_a_move() {
hashval ^= Zobrist_array[oldpos][piece];
hashval ^= Zobrist_array[newpos][piece];
/* if there was a capture */
hashval ^= Zobrist_array[otherpos][otherpiece];
}
Castling can be seen as the sum of two moves (without capture, obviously)
Promotion can be treated as removing a pawn from the board (from the 2 or 7 position) and adding the new piece (at the 1 or 8 position)

Related

Why does the Streaming-Operator in SystemVerilog reverse the byte order?

I simulated the following example:
shortint j;
byte unsigned data_bytes[];
j = 16'b1111_0000_1001_0000;
data_bytes = { >>{j}};
`uvm_info(get_type_name(), $sformatf("j data_bytes: %b_%b", data_bytes[1], data_bytes[0]), UVM_LOW)
Result:
UVM_INFO Initiator.sv(84) # 0: uvm_test_top.sv_initiator [Initiator] j data_bytes: 10010000_11110000
However, this seems strange to me, since the byte-order is reversed, as I expect the LSB to be at index 0 of data_byte[0] and the MSB at index 7 of data_byte[1]. Why does this happen? According to documentation (Cadence Help) this should not be the case.
As defined in section 6.24.3 Bit-stream casting of the IEEE 1800-2017 LRM, the [0] element of an unpacked dynamic array is considered the left index, and streaming >> goes from left to right indexes. To get the results you want, write
data_bytes = { << byte {j}};
This reverses the stream, but keeps the individual bytes in right to left order.

Hash function for 8 / 16 bit "graphics" on 8 bit processor

For an implementation of coherent noise (similar to Perlin noise), I'm looking for a hash function suitable for graphics.
I don't need it to be in any way cryptographic, and really, I don't even need it to be a super brilliant hash.
I just want to to combine two 16 bit numbers and output an 8 bit hash. As random as possible is good, but also, fast on a AVR processor (8 bit, as used by Arduino) is good.
Currently I'm using an implementation here:
const uint32_t hash(uint32_t a)
{
a -= (a<<6);
a ^= (a>>17);
a -= (a<<9);
a ^= (a<<4);
a -= (a<<3);
a ^= (a<<10);
a ^= (a>>15);
return a;
}
But given that I'm truncating all but 8 bits, and I don't need anything spectacular, can I get away with something using fewer instructions?
… I'm inspired in this search by the lib8tion library that's packaged with FastLED. It has specific functions to, for example, multiple two uint8_t numbers to give a uint16_t number in the fewest possible clock cycles.
Check out Pearson hashing:
unsigned char hash(unsigned short a, unsigned short b) {
static const unsigned char t[256] = {...};
return t[t[t[t[a & 0xFF] ^ (b & 0xFF)] ^ (a >> 8)] ^ (b >> 8)];
}

1mpl3m3nt vector stack & output a stored&counted item

Here's my code:
I currently have everything the user enters dumped into the stack and sorted, but I don't how/where to go from here. I tried solving it with a count variable, but my solution isn't proper (it should output "2 dog" only once if the user enters dog twice). If anybody can help or knows a way to solve this, please give an example.
There are multiple ways to do this. The easiest is a simple use of std::map:
#include <iostream>
#include <string>
#include <map>
int main()
{
std::map<std::string, unsigned int> mymap;
std::string s;
while (std::getline(std::cin, s) && !s.empty() && s != "END")
++mymap[s];
for (auto const& pr : mymap)
std::cout << pr.second << ':' << pr.first << '\n';
}
How it works
Each line is read, and if successful (not eof, not empty, and not equivalent to "END") is used for updating an entry in the map.
Per the documentation for std::map::operator [], if the requisite key is not already present in the map, it is added, and mapped-to value is value-initialized. For unsigned int that means the initial value is 0.
From there, the increment is applied to the returned unsigned int reference, which for a new element, results in the value 1, for existing elements, it simply increments the prior value.
This continues until the loop terminates.
Upon termination of the loop the results are reported in lexicographical order, preceded by their count.
Input
one
two
three
four
three
one
one
one
two
END
Output
1:four
4:one
2:three
2:two
If you wanted to sort the output based on count, more work would need to be done, but it isn't difficult. A set of pairs from the map, inverted so the count is first, the string second, makes short work of that:
#include <iostream>
#include <string>
#include <map>
#include <set>
int main()
{
std::map<std::string, unsigned int> mymap;
std::string s;
while (std::getline(std::cin, s) && !s.empty() && s != "END")
++mymap[s];
std::set<std::pair<unsigned int, std::string>> ms;
for (auto const& pr : mymap)
ms.insert(std::make_pair(pr.second, pr.first));
for (auto const& pr : ms)
std::cout << pr.first << ':' << pr.second << '\n';
}
An example run appears below:
Input
one
two
three
four
three
one
one
one
two
END
Output
1:four
2:three
2:two
4:one
Use std::map as mentioned in comment:
std::map<std::string, unsigned int> countMap;
while(enter!=endString){
getline(cin,enter);
countMap[enter]++; // Operator `[]` enters a new key if not present and
// default initializes the value.
//, else fetches and increases the corresponding value
}
// coutMap[str] gives the number of times `str` entered.
You should use map. But if you are searching for another answer, use a search over all elements.
after you read all elements from input, start looping over vector. get first element, store its value and remove it then check other size-1 elements to see if they are equal to this one. if yes, add counter and remove the item from vector.
Notice that size has decreased. now again do the same till size becomes 0.

How to generate a collision with a hash-string-to-int method?

I'm in the process of integrating a hash method (farmhash) to our software base. The hashing services seem to work appropriately. Basically, it turns a string of characters into an unique-ish integer value.
I've added an infrastructure to detect collisions (in a case where two input strings would result in the same output integer). Basically, for each string that is hashed, I keep the [hash result] -> [string] in a map, and every time a new string is hashed, I compare it to what's in the map; if the hash is already there, I make sure that it is the same string that has generated it. I am aware that it's potentially slow and it's potentially memory consuming, but I'm performing theses checks only on a "per request" basis: they are not enabled in release mode.
Now I'd like to test that infrastructure (as in get a collision, from a unit test point of view).
I could generate a bunch of strings (random or sequential), spam my hash infrastructure and hope to see a positive collision but I feel I'll waste my time, CPU cycles and fill the memory with a load of data without success.
How would one go about generating collisions?
Not-so-relevant-facts:
I'm using c++;
I can generate data using python;
The target int is uint32_t.
Update:
I have created a small naive program to brute force the detection of collision:
void
addToQueue(std::string&& aString)
{
//std::cout << aString << std::endl;
hashAndCheck( aString ); // Performs the hash and check if there is a collision
if ( mCount % 1000000 )
std::cout << "Did " << mCount << " checks so far" << std::endl;
mQueue.emplace( aString );
}
void
generateNextRound( const std::string& aBase )
{
//48 a 122 incl
for ( int i = 48; i <= 122; i++ )
{
addToQueue( std::move( std::string( aBase ).append( 1, static_cast<char>( i ) ) ) );
}
}
int main( void )
{
// These two generate a collision
//StringId id2 = HASH_SID( "#EF" ); // Hashes only, does not check
//StringId id1 = HASH_SID( "7\\:" ); // Hashes only, does not check
std::string base = "";
addToQueue( std::move( base ) );
while ( true )
{
const std::string val = mQueue.front();
mQueue.pop();
generateNextRound( val );
}
return 0;
}
I could eventually have added threading and stuff in there but I didn't need it because I found a collision in about 1 second (in debug mode).
If you brute force search for collisions offline, you could hard code strings that cause collisions into your test so that your test is as close to production code as possible, but doesn't suffer the performance penalty of doing the brute force work each time (or, like other people have said, you can make an intentionally junky hash algorithm that causes excessive collisions)
You could limit the range of the integer that is outputted by the hash function; in general you should be able to pass some number into it (n) so that results will lie between 0 & n-1. If you limit it to 10 say, Then you'll definitely end up with collisions.
For key k and hash function h, return a constant c:
h(k) = c
This always collides, regardless of what key you use.

Scanf missed line

I wrote a test program which should take in a 3x3 matrix of characters and output the entered matrix. However, I have to enter 4 lines in order for the program to produce the corresponding matrix. I have looked up problems on the scanf function, but none of the solutions I tried seemed to work...Could you help me out with this?
My code:
#include <stdio.h>
#include <stdlib.h>
int main(void) {
char a[3][3];
int i,j;
for(i=0;i<3;++i)
{
for(j=0;j<3;++j)
{
scanf("%c",&a[i][j]);
}
scanf("\n");
}
for(i=0;i<3;++i)
{
for(j=0;j<3;++j)
{
printf("%c",a[i][j]);
}
printf("\n");
}
system("PAUSE");
return(0); }
scanf("%c",...) get the whitespaces and the \n. You can solve it in many ways:
If you read like a b c
for(i=0;i<3;++i)
{
for(j=0;j<3;++j)
{
scanf("%c",&a[i][j]);
cin.get(); //Get the spaces after each character and the \n at the end of each line
}
}
or you can simple use cin (read char/string inputs with scanf is always a problem)
for(i=0;i<3;++i)
{
for(j=0;j<3;++j)
{
cin >> a[i][j];
}
}
if you are reading like abc, you only have to substitute your scanf("\n") for a cin.get()
#João Menighin's answer surely works. If you want to avoid c++, this would work:
for(i=0;i<3;++i)
{
for(j=0;j<3;++j)
{
scanf(" %c",&a[i][j]);
}
}
Although it would ignore ALL whitespace: both abc and a b c would be interpreted to be equivalent.
try adding a white space in your scanf right after the "
scanf(" %c",&a[i][j]);
I had the same problem in a two-dimension matrix and it worked for me.
I have no idea why though!!! I just spent 1 hour in front of my laptop trying different things...
Have tried your and IT WORKED. Although, I did make a few changes per comments:
#include <stdio.h> // added, but that shouldn't matter
main()
{
char a[3][3];
int i,j;
for(i=0;i<3;++i)
{
for(j=0;j<3;++j)
{
scanf("%c",&a[i][j]);
}
//scanf("\n"); // not necessary, see below
}
for(i=0;i<3;++i)
{
for(j=0;j<3;++j)
{
printf(" %c",a[i][j]);
}
printf("\n");
}
return(0);
}
Compiled and ran this code on Eclipse/Microsoft C Compiler and entered series of characters followed by enter.
abcdefghi
a b c
d e f
g h i
The point of confusion might be that scanf pulls the data from a console buffer. Typically, (although you can work around this) that buffer is returned to your program when you press enter. Also, the format specifier of %c also accepts blanks. Thus, I tried a second run with the following input and output.
a b c d e
a b
c
d e
You can tell the spaces were read and stored as well as the letters.
Hope this helps.