Collision Hash Function - hash

Hi all I have a big problem with my hash function. I try to explain my problem :
I have a set of char and I want to do an hash function because I want to change the set with hash set, for each char I have a index , so what I do now :
pair --> index p = 1 index a = 2 index i = 3 index r= 4---> so my hash return 1234
but if for example I have
so --> index s = 12 index o = 34 ---> hash 1234
COLLISION!!!!
P.S. : I cannot order my char in alphabetic number....
So , is there anyone that can help me?? THANKS A LOT :)

You could try the string hashing function of Java. This is my C# port which should be simple ported to c++:
int javaHash(String txt) {
uint h = 0;
if(txt.Length > 0) {
for(int i = 0; i < txt.Length; i++) {
h = 31 * h + txt[i];
}
}
return (int)h;
}

Related

Talend - split a string to n rows

I would like to split a string in a column to n rows in Talend.
For example :
column
2aabbccdd
The first number is the "n" which I use to define the row lenght, so the expected result should be :
row 1 = aa
row 2 = bb
row 3 = cc
row 4 = dd
The idea here is to iterate on the string and cut it every 2 characters.
Any idea please ?
I would use a tJavaFlex to split the string, with a trick to have n rows coming out of it.
tJavaFlex's main code:
int n = Integer.parseInt(row1.str.substring(0, 4)); //get n from the first 4 characters
String str2 = row1.str.substring(4); //get the string after n
int nbParts = (str2.length() + 1) / n;
System.out.println("number of parts = " + nbParts);
for (int i = 0; i < nbParts; i++)
{
String part = str2.substring(i * n);
if(part.length() > n)
{
part = part.substring(0, n);
}
row2.str = part;
And tJavaFlex's end code is just a closing brace:
}
The trick is to use a for loop in the main code, but only close it in the end code.
tFixedFlowInput contains just one column holding the input string.

merge sort performance compared to insertion sort

For any array of length greater than 10, is it safe to say that merge sort performs fewer comparisons among the array's elements than does insertion sort on the same array because the best case for the run time of merge sort is O(N log N) while for insertion sort, its O(N)?
My take on this. First off, you are talking about comparisons, but there are swaps as well that matter.
In insertion sort in the worst case (an array sorted in opposite direction) you have to do n^2 - n comparisons and swaps (11^2 - 11 = 121 - 11 = 110 for 11 elements, for example). But if the array is even partially sorted in needed order (I mean many elements already stay at correct positions or even not far from them), the number of swaps&comparisons may significantly drop. The right position for the element will be found pretty soon and there will be no need for performing as many actions as in case of an array sorted in opposite order. So, as you can see for arr2, which is almost sorted, the number of actions will become linear (in relation to the input size) - 6.
var arr1 = [11,10,9,8,7,6,5,4,3,2,1];
var arr2 = [1,2,3,4,5,6,7,8,11,10,9];
function InsertionSort(arr) {
var arr = arr, compNum = 0, swapNum = 0;
for(var i = 1; i < arr.length; i++) {
var temp = arr[i], j = i - 1;
while(j >= 0) {
if(temp < arr[j]) { arr[j + 1] = arr[j]; swapNum++; } else break;
j--;
compNum++;
}
arr[j + 1] = temp;
}
console.log(arr, "Number of comparisons: " + compNum, "Number of swaps: " + swapNum);
}
InsertionSort(arr1); // worst case, 11^2 - 11 = 110 actions
InsertionSort(arr2); // almost sorted array, few actions
In merge sort we always do aprox. n*log n actions - the properties of the input array don't matter. So, as you can see in both cases we will get both of our arrays sorted in 39 actions:
var arr1 = [11,10,9,8,7,6,5,4,3,2,1];
var arr2 = [1,2,3,4,5,6,7,8,11,10,9];
var actions = 0;
function mergesort(arr, left, right) {
if(left >= right) return;
var middle = Math.floor((left + right)/2);
mergesort(arr, left, middle);
mergesort(arr, middle + 1, right);
merge(arr, left, middle, right);
}
function merge(arr, left, middle, right) {
var l = middle - left + 1, r = right - middle, temp_l = [], temp_r = [];
for(var i = 0; i < l; i++) temp_l[i] = arr[left + i];
for(var i = 0; i < r; i++) temp_r[i] = arr[middle + i + 1];
var i = 0, j = 0, k = left;
while(i < l && j < r) {
if(temp_l[i] <= temp_r[j]) {
arr[k] = temp_l[i]; i++;
} else {
arr[k] = temp_r[j]; j++;
}
k++; actions++;
}
while(i < l) { arr[k] = temp_l[i]; i++; k++; actions++;}
while(j < r) { arr[k] = temp_r[j]; j++; k++; actions++;}
}
mergesort(arr1, 0, arr1.length - 1);
console.log(arr1, "Number of actions: " + actions); // 11*log11 = 39 (aprox.)
actions = 0;
mergesort(arr2, 0, arr2.length - 1);
console.log(arr2, "Number of actions: " + actions); // 11*log11 = 39 (aprox.)
So, answering your question:
For any array of length greater than 10, is it safe to say that merge sort performs fewer comparisons among the array's elements than does insertion sort on the same array
I would say that no, it isn't safe to say so. Merge sort can perform more actions compared to insertion sort in some cases. The size of an array isn't important here. What is important in this particular case of comparing insertion sort vs. merge sort is how far from the sorted state is your array. I hope it helps :)
BTW, merge sort and insertion sort have been united in a hybrid stable sorting algorithm called Timsort to get the best from both of them. Check it out if interested.

hash a sha256 to have a 16bytes

hiiii ,
to reduce my sha256 length , i had made my own hash algorithme with my own hash table ( from a lot of exmple ), look at this code source :
struct // structure pour stoké les valeurs de h
{
int a;
} hh[8];
struct DAT // 256 valeur de 0-255 aléatoir
{
int a;
}
T[256]={256 value }
ch=x[0]; // save the first byte
for (j=0; j<8; j++)
{
// a basic hash function Xor
h=0;
for (i=0; i<len; i++)
{
h=T[h ^ x[i]].a;
//printf("%d ",h);
}
hh[j].a=h; // save the result
//printf("%d ",hh[j].a);
x[0]=x[0]+1; // increment the first element by one
}
x[0]=ch; // restor the first byte
// concatenate the 8 stored values of h
I remplace this function
wsprintf(hex,"%02X%02X%02X%02X%02X%02X%02X%02X",
hh[0].a, hh[1].a,
hh[2].a, hh[3].a,
hh[4].a, hh[5].a,
hh[6].a, hh[7].a);
by this loop of i
printf("\n = ");
for (i=0; i<8; i++)
{
hex[i]=hh[i].a;
printf("%02X",hex[i]);
}
now I wante to compress my hash to half length
printf("\n compressed hash= ");
for (i=0; i<4; i++)
{
lhex[k] = hh[i].a ^ hh[i+4].a;
printf("%02X",lhex[k]);
fprintf(fw,"%02X",lhex[k]);
k++;
}
Question:
is it correct if I use my own function ?
is it correct when i replace wsprint by a loop to concatenate the 8 stored value ?
for the last one I know its incorect because if I have 2 hache like this L+R and L'+R'(L & R for the leftand right bits) if my two hash hase L=R' and R=L' I will have the same compressed hash :( , have any idea for that ?
Thnx iOuss

Creating an online palindrome

I am trying to create an online palindrome sensor(The alphabet consists of 0,1,2,3,...9). The code is as follows:
#include<iostream>
#include<cmath>
using namespace std;
int main()
{
int x=0;
int y=0;
int c;
int i=0;
while(1)
{
cin>>c;
//I keep a track of previous number in x and its reverse in y and use them to create the
//the new number and reverse at every input. Then I compare x and y. If equal the number is
//a palindrome.
/*eg:(When 121 is entered digit by digit)
i=0:-
x=10*0+1 y=0+ 10^0 *1
i=1:-
x=10*1+2 y=1+ 10^1 *2
i=2:-
x=10*12+1 y=21+ 10^2 *1
*/
x=10*x+c;
y=y+ static_cast<int>(pow(10.0,static_cast<double>(i)) *c);
cout<<"y= "<<y<<" and "<<"x= "<<x<<endl;
if(y==x)
cout<<"Palindrome"<<endl;
i++;
}
return 0;
}
First, I enter 1 and it was indicated as palindrome(as expected). Then, I entered 2 and nothing happened(as expected, 'y= 21 and x= 12' was printed). But, then I again entered 1 and this time too nothing happened(not as expected) and this was printed:
y= 120 and x= 121
Can anyone tell me, how did y become 120 when it was supposed to be 121?
You are doing far too much math:
public static boolean isPalindrom(char[] word){
int i1 = 0;
int i2 = word.length - 1;
while (i2 > i1) {
if (word[i1] != word[i2]) {
return false;
}
++i1;
--i2;
}
return true;
}
All you need to do is fill an array with values as the user enters them and invoke a function similar to this. The use of exponents is a colossal waste of resources when simpler solutions exist.

Carefully deleting N items from a "circular" vector (or perhaps just an NSMutableArray)

Imagine a std:vector, say, with 100 things on it (0 to 99) currently. You are treating it as a loop. So the 105th item is index 4; forward 7 from index 98 is 5.
You want to delete N items after index position P.
So, delete 5 items after index 50; easy.
Or 5 items after index 99: as you delete 0 five times, or 4 through 0, noting that position at 99 will be erased from existence.
Worst, 5 items after index 97 - you have to deal with both modes of deletion.
What's the elegant and solid approach?
Here's a boring routine I wrote
-(void)knotRemovalHelper:(NSMutableArray*)original
after:(NSInteger)nn howManyToDelete:(NSInteger)desired
{
#define ORCO ((NSInteger)[original count])
static NSInteger kount, howManyUntilLoop, howManyExtraAferLoop;
if ( ... our array is NOT a loop ... )
// trivial, if messy...
{
for ( kount = 1; kount<=desired; ++kount )
{
if ( (nn+1) >= ORCO )
return;
[original removeObjectAtIndex:( nn+1 )];
}
return;
}
else // our array is a loop
// messy, confusing and inelegant. how to improve?
// here we go...
{
howManyUntilLoop = (ORCO-1) - nn;
if ( howManyUntilLoop > desired )
{
for ( kount = 1; kount<=desired; ++kount )
[original removeObjectAtIndex:( nn+1 )];
return;
}
howManyExtraAferLoop = desired - howManyUntilLoop;
for ( kount = 1; kount<=howManyUntilLoop; ++kount )
[original removeObjectAtIndex:( nn+1 )];
for ( kount = 1; kount<=howManyExtraAferLoop; ++kount )
[original removeObjectAtIndex:0];
return;
}
#undef ORCO
}
Update!
InVariant's second answer leads to the following excellent solution. "starting with" is much better than "starting after". So the routine now uses "start with". Invariant's second answer leads to this very simple solution...
N times do if P < currentsize remove P else remove 0
-(void)removeLoopilyFrom:(NSMutableArray*)ra
startingWithThisOne:(NSInteger)removeThisOneFirst
howManyToDelete:(NSInteger)countToDelete
{
// exception if removeThisOneFirst > ra highestIndex
// exception if countToDelete is > ra size
// so easy thanks to Invariant:
for ( do this countToDelete times )
{
if ( removeThisOneFirst < [ra count] )
[ra removeObjectAtIndex:removeThisOneFirst];
else
[ra removeObjectAtIndex:0];
}
}
Update!
Toolbox has pointed out the excellent idea of working to a new array - super KISS.
Here's an idea off the top of my head.
First, generate an array of integers representing the indices to remove. So "remove 5 from index 97" would generate [97,98,99,0,1]. This can be done with the application of a simple modulus operator.
Then, sort this array descending giving [99,98,97,1,0] and then remove the entries in that order.
Should work in all cases.
This solution seems to work, and it copies all remaining elements in the vector only once (to their final destination).
Assume kNumElements, kStartIndex, and kNumToRemove are defined as const size_t values.
vector<int> my_vec(kNumElements);
for (size_t i = 0; i < my_vec.size(); ++i) {
my_vec[i] = i;
}
for (size_t i = 0, cur = 0; i < my_vec.size(); ++i) {
// What is the "distance" from the current index to the start, taking
// into account the wrapping behavior?
size_t distance = (i + kNumElements - kStartIndex) % kNumElements;
// If it's not one of the ones to remove, then we keep it by copying it
// into its proper place.
if (distance >= kNumToRemove) {
my_vec[cur++] = my_vec[i];
}
}
my_vec.resize(kNumElements - kNumToRemove);
There's nothing wrong with two loop solutions as long as they're readable and don't do anything redundant. I don't know Objective-C syntax, but here's the pseudocode approach I'd take:
endIdx = after + howManyToDelete
if (Len <= after + howManyToDelete) //will have a second loop
firstloop = Len - after; //handle end in the first loop, beginning in second
else
firstpass = howManyToDelete; //the first loop will get them all
for (kount = 0; kount < firstpass; kount++)
remove after+1
for ( ; kount < howManyToDelete; kount++) //if firstpass < howManyToDelete, clean up leftovers
remove 0
This solution doesn't use mod, does the limit calculation outside the loop, and touches the relevant samples once each. The second for loop won't execute if all the samples were handled in the first loop.
The common way to do this in DSP is with a circular buffer. This is just a fixed length buffer with two associated counters:
//make sure BUFSIZE is a power of 2 for quick mod trick
#define BUFSIZE 1024
int CircBuf[BUFSIZE];
int InCtr, OutCtr;
void PutData(int *Buf, int count) {
int srcCtr;
int destCtr = InCtr & (BUFSIZE - 1); // if BUFSIZE is a power of 2, equivalent to and faster than destCtr = InCtr % BUFSIZE
for (srcCtr = 0; (srcCtr < count) && (destCtr < BUFSIZE); srcCtr++, destCtr++)
CircBuf[destCtr] = Buf[srcCtr];
for (destCtr = 0; srcCtr < count; srcCtr++, destCtr++)
CircBuf[destCtr] = Buf[srcCtr];
InCtr += count;
}
void GetData(int *Buf, int count) {
int srcCtr = OutCtr & (BUFSIZE - 1);
int destCtr = 0;
for (destCtr = 0; (srcCtr < BUFSIZE) && (destCtr < count); srcCtr++, destCtr++)
Buf[destCtr] = CircBuf[srcCtr];
for (srcCtr = 0; srcCtr < count; srcCtr++, destCtr++)
Buf[destCtr] = CircBuf[srcCtr];
OutCtr += count;
}
int BufferOverflow() {
return ((InCtr - OutCtr) > BUFSIZE);
}
This is pretty lightweight, but effective. And aside from the ctr = BigCtr & (SIZE-1) stuff, I'd argue it's highly readable. The only reason for the & trick is in old DSP environments, mod was an expensive operation so for something that ran often, like every time a buffer was ready for processing, you'd find ways to remove stuff like that. And if you were doing FFT's, your buffers were probably a power of 2 anyway.
These days, of course, you have 1 GHz processors and magically resizing arrays. You kids get off my lawn.
Another method:
N times do {remove entry at index P mod max(ArraySize, P)}
Example:
N=5, P=97, ArraySize=100
1: max(100, 97)=100 so remove at 97%100 = 97
2: max(99, 97)=99 so remove at 97%99 = 97 // array size is now 99
3: max(98, 97)=98 so remove at 97%98 = 97
4: max(97, 97)=97 so remove at 97%97 = 0
5: max(96, 97)=97 so remove at 97%97 = 0
I don't program iphone for know, so I image std::vector, it's quite easy, simple and elegant enough:
#include <iostream>
using std::cout;
#include <vector>
using std::vector;
#include <cassert> //no need for using, assert is macro
template<typename T>
void eraseCircularVector(vector<T> & vec, size_t position, size_t count)
{
assert(count <= vec.size());
if (count > 0)
{
position %= vec.size(); //normalize position
size_t positionEnd = (position + count) % vec.size();
if (positionEnd < position)
{
vec.erase(vec.begin() + position, vec.end());
vec.erase(vec.begin(), vec.begin() + positionEnd);
}
else
vec.erase(vec.begin() + position, vec.begin() + positionEnd);
}
}
int main()
{
vector<int> values;
for (int i = 0; i < 10; ++i)
values.push_back(i);
cout << "Values: ";
for (vector<int>::const_iterator cit = values.begin(); cit != values.end(); cit++)
cout << *cit << ' ';
cout << '\n';
eraseCircularVector(values, 5, 1); //remains 9: 0,1,2,3,4,6,7,8,9
eraseCircularVector(values, 16, 5); //remains 4: 3,4,6,7
cout << "Values: ";
for (vector<int>::const_iterator cit = values.begin(); cit != values.end(); cit++)
cout << *cit << ' ';
cout << '\n';
return 0;
}
However, you might consider:
creating new loop_vector class, if you use this kind of functionality enough
using list if you perform many deletions (or few deletions (not from end, that's simple pop_back) but large array)
If your container (NSMutableArray or whatever) is not list, but vector (i.e. resizable array), you most definitely don't want to delete items one by one, but whole range (e.g. std::vector's erase(begin, end)!
Edit: reacting to comment, to fully realize what must be done by vector, if you erase element other than the last one: it must copy all values after that element (e.g. 1000 items in array, you erase first, 999x copying (moving) of item, that is very costly).
Example:
#include <iostream>
#include <vector>
#include <ctime>
using namespace std;
int main()
{
clock_t start, end;
vector<int> vec;
const int items = 64 * 1024;
cout << "using " << items << " items in vector\n";
for (size_t i = 0; i < items; ++i) vec.push_back(i);
start = clock();
while (!vec.empty()) vec.erase(vec.begin());
end = clock();
cout << "Inefficient method took: "
<< (end - start) * 1.0 / CLOCKS_PER_SEC << " ms\n";
for (size_t i = 0; i < items; ++i) vec.push_back(i);
start = clock();
vec.erase(vec.begin(), vec.end());
end = clock();
cout << "Efficient method took: "
<< (end - start) * 1.0 / CLOCKS_PER_SEC << " ms\n";
return 0;
}
Produces output:
using 65536 items in vector
Inefficient method took: 1.705 ms
Efficient method took: 0 ms
Note it's very easy to get inefficient, look e.g. have at http://www.cplusplus.com/reference/stl/vector/erase/