Here's my code:
I currently have everything the user enters dumped into the stack and sorted, but I don't how/where to go from here. I tried solving it with a count variable, but my solution isn't proper (it should output "2 dog" only once if the user enters dog twice). If anybody can help or knows a way to solve this, please give an example.
There are multiple ways to do this. The easiest is a simple use of std::map:
#include <iostream>
#include <string>
#include <map>
int main()
{
std::map<std::string, unsigned int> mymap;
std::string s;
while (std::getline(std::cin, s) && !s.empty() && s != "END")
++mymap[s];
for (auto const& pr : mymap)
std::cout << pr.second << ':' << pr.first << '\n';
}
How it works
Each line is read, and if successful (not eof, not empty, and not equivalent to "END") is used for updating an entry in the map.
Per the documentation for std::map::operator [], if the requisite key is not already present in the map, it is added, and mapped-to value is value-initialized. For unsigned int that means the initial value is 0.
From there, the increment is applied to the returned unsigned int reference, which for a new element, results in the value 1, for existing elements, it simply increments the prior value.
This continues until the loop terminates.
Upon termination of the loop the results are reported in lexicographical order, preceded by their count.
Input
one
two
three
four
three
one
one
one
two
END
Output
1:four
4:one
2:three
2:two
If you wanted to sort the output based on count, more work would need to be done, but it isn't difficult. A set of pairs from the map, inverted so the count is first, the string second, makes short work of that:
#include <iostream>
#include <string>
#include <map>
#include <set>
int main()
{
std::map<std::string, unsigned int> mymap;
std::string s;
while (std::getline(std::cin, s) && !s.empty() && s != "END")
++mymap[s];
std::set<std::pair<unsigned int, std::string>> ms;
for (auto const& pr : mymap)
ms.insert(std::make_pair(pr.second, pr.first));
for (auto const& pr : ms)
std::cout << pr.first << ':' << pr.second << '\n';
}
An example run appears below:
Input
one
two
three
four
three
one
one
one
two
END
Output
1:four
2:three
2:two
4:one
Use std::map as mentioned in comment:
std::map<std::string, unsigned int> countMap;
while(enter!=endString){
getline(cin,enter);
countMap[enter]++; // Operator `[]` enters a new key if not present and
// default initializes the value.
//, else fetches and increases the corresponding value
}
// coutMap[str] gives the number of times `str` entered.
You should use map. But if you are searching for another answer, use a search over all elements.
after you read all elements from input, start looping over vector. get first element, store its value and remove it then check other size-1 elements to see if they are equal to this one. if yes, add counter and remove the item from vector.
Notice that size has decreased. now again do the same till size becomes 0.
Related
I've spent days trying to make the library work on my system.
The library has several algorithms which generate MPHFs.
My understanding of minimal hash function is, that when I hash two distinct keys using the MPHF, they'll return two different ids.
This does not seem to be the case with the 2 million keys that I've generated (integers, read as string by the algorithm). I've tried couple of algorithms that the library implements but all of them result in duplicate 'ids' for a lot of keys.
Here is what I've written:
#include <cmph.h>
#include <iostream>
#include <fstream>
#include <bitset>
#include <string>
#include <sstream>
#include <limits.h>
using namespace std;
int main(int argc, char** argv){
FILE *fp = fopen("keys.txt", "r");
FILE *read = fopen("keys2.txt", "r");
ofstream ids("ids2.txt");
if(!fp || !read || !ids.is_open()){
cerr<<"Failed to open the file\n";
exit(1);
}
cmph_t* hash = NULL;
// source of keys
cmph_io_adapter_t *source = cmph_io_nlfile_adapter(fp);
cmph_config_t *config = cmph_config_new(source);
cmph_config_set_algo(config, CMPH_BDZ);
hash = cmph_new(config);
cmph_config_destroy(config);
char *k = (char *)malloc(sizeof(12));
while(fgets(k, INT_MAX, read) != NULL){
string key = k;
unsigned int id = cmph_search(hash, k, (cmph_uint32)key.length());
ids<<id<<"\n";
}
cmph_destroy(hash);
cmph_io_nlfile_adapter_destroy(source);
fclose(fp);
fclose(read);
ids.close();
}
Shouldn't the ids be unique for every distinct key if the algorithm claims to generate a minimal perfect hash function? There are 2048383 keys. For my project I would need the ids to map from 0 to 2048382, since I plan to use a minimal perfect hash function.
I am not sure where I am going wrong with my understanding.
Please help.
If your keys2.txt contains keys that weren't part of the set that was used to generate your hash, then, by definition of the mphf, you'll get either duplicate hashes or, possibly, values out of your range. It's up to you to store all keys that were used to generate hash and then verify that the key that was passed to cmph_search was the same as the one that resulted in the hash/id returned by cmph_search
I'm looking for a simple hash function that doesn't rely on integer overflow, and doesn't rely on unsigned integers.
The problem is that I have to create the hash function in blueprint from Unreal Engine (only has signed 32 bit integer, with undefined overflow behavior) and in PHP5, with a version that uses 64 bit signed integers.
So when I use the 'common' simple hash functions, they don't give the same result on both platforms because they all rely on bit-overflowing behavior of unsigned integers.
The only thing that is really important is that is has good 'randomness'. Does anyone know something simple that would accomplish this?
It's meant for a very basic signing symstem for sending messages to a server. Doesn't need to be top security... it's for storing high scores of a simple game on a server. The idea is that I would generate several hash-integers from the message (using different 'start numbers') and append them to make a hash-signature ). I just need to make sure that if people sniff the network messages send to the server that they cannot easily send faked messages. They would need to provide the correct hash-signature with their message, which they shouldn't be able to do unless they know the hash function being used. Ofcourse if they reverse engineer the game they can still 'hack' it, but I wouldn't know how to counter that...
I have no access to existing hash functions in the unreal engine blueprint system.
The first thing I would try would be to simulate the behavior of unsigned integers using signed integers, by explicitly applying the modulo operator whenever the accumulated hash-value gets large enough that it might risk overflowing.
Example code in C (apologies for the poor hash function, but the same technique should be applicable to any hash function, at least in principle):
#include <stdio.h>
#include <string.h>
int hashFunction(const char * buf, int numBytes)
{
const int multiplier = 33;
const int maxAllowedValue = 2147483648-256; // assuming 32-bit ints here
const int maxPreMultValue = maxAllowedValue/multiplier;
int hash = 536870912; // arbitrary starting number
for (int i=0; i<numBytes; i++)
{
hash = hash % maxPreMultValue; // make sure hash cannot overflow in the next operation!
hash = (hash*multiplier)+buf[i];
}
return hash;
}
int main(int argc, char ** argv)
{
while(1)
{
printf("Enter a string to hash:\n");
char buf[1024]; fgets(buf, sizeof(buf), stdin);
printf("Hash code for that string is: %i\n", hashFunction(buf, strlen(buf)));
}
}
i use embarcadero Xe7. I found that swscanf returns wrong result.
example
int _tmain(int argc, _TCHAR* argv[])
{
char *t1= " ";
wchar_t *t2= L" ";
int i1, i2;
i1= -1;
i1= sscanf (t1, "%d", &i2);
if(i1!=EOF)
printf("sscanf output i1=%d i2=%d\n", i1, i2);
else
printf("sscanf EOF\n");
i1= swscanf(t2, L"%d", &i2);
if(i1!=EOF)
printf("swscanf output i1=%d i2=%d\n", i1, i2);
else
printf("swscanf EOF\n");
return 0;
}
the result:
sscanf EOF
swscanf output i1=1 i2=0
The first result is ok. But the second is wrong.
This is a bug. This behaviour of swscanf() contradicts the C11 standard:
7.29.2.4/3 The swscanf function returns the value of the macro EOF if an input failure occurs before the first conversion (if any) has
completed. Otherwise, the swscanf function returns the number of input
items assigned, which can be fewer than provided for, or even zero, in
the event of an early matching failure.
Clearly, here it fails before the first conversion has started.
It also contradicts the XE7 sscanf/swscanf documentation:
If sscanf attempts to read at end-of-string, it returns EOF.
And again, clearly, here it attempts to read end of string.
There is no bug report for now on EDN. You should file one.
*Workaround: process the cases i1==EOF and i1==0 together, as in both cases you can't exploit the content of any variable.
I'm in the process of integrating a hash method (farmhash) to our software base. The hashing services seem to work appropriately. Basically, it turns a string of characters into an unique-ish integer value.
I've added an infrastructure to detect collisions (in a case where two input strings would result in the same output integer). Basically, for each string that is hashed, I keep the [hash result] -> [string] in a map, and every time a new string is hashed, I compare it to what's in the map; if the hash is already there, I make sure that it is the same string that has generated it. I am aware that it's potentially slow and it's potentially memory consuming, but I'm performing theses checks only on a "per request" basis: they are not enabled in release mode.
Now I'd like to test that infrastructure (as in get a collision, from a unit test point of view).
I could generate a bunch of strings (random or sequential), spam my hash infrastructure and hope to see a positive collision but I feel I'll waste my time, CPU cycles and fill the memory with a load of data without success.
How would one go about generating collisions?
Not-so-relevant-facts:
I'm using c++;
I can generate data using python;
The target int is uint32_t.
Update:
I have created a small naive program to brute force the detection of collision:
void
addToQueue(std::string&& aString)
{
//std::cout << aString << std::endl;
hashAndCheck( aString ); // Performs the hash and check if there is a collision
if ( mCount % 1000000 )
std::cout << "Did " << mCount << " checks so far" << std::endl;
mQueue.emplace( aString );
}
void
generateNextRound( const std::string& aBase )
{
//48 a 122 incl
for ( int i = 48; i <= 122; i++ )
{
addToQueue( std::move( std::string( aBase ).append( 1, static_cast<char>( i ) ) ) );
}
}
int main( void )
{
// These two generate a collision
//StringId id2 = HASH_SID( "#EF" ); // Hashes only, does not check
//StringId id1 = HASH_SID( "7\\:" ); // Hashes only, does not check
std::string base = "";
addToQueue( std::move( base ) );
while ( true )
{
const std::string val = mQueue.front();
mQueue.pop();
generateNextRound( val );
}
return 0;
}
I could eventually have added threading and stuff in there but I didn't need it because I found a collision in about 1 second (in debug mode).
If you brute force search for collisions offline, you could hard code strings that cause collisions into your test so that your test is as close to production code as possible, but doesn't suffer the performance penalty of doing the brute force work each time (or, like other people have said, you can make an intentionally junky hash algorithm that causes excessive collisions)
You could limit the range of the integer that is outputted by the hash function; in general you should be able to pass some number into it (n) so that results will lie between 0 & n-1. If you limit it to 10 say, Then you'll definitely end up with collisions.
For key k and hash function h, return a constant c:
h(k) = c
This always collides, regardless of what key you use.
I apologize for asking somewhat of a programming question, but I want to be sure I'm properly using this library cryptographically.
I have managed to implement ed25519-donna except for hashing the data for a signature.
As far as I can tell, this is the function that hashes data:
void ed25519_hash(uint8_t *hash, const uint8_t *in, size_t inlen);
but I can't figure out what *hash is. I'm fairly certain that *in and inlen are the data to be hashed and its length.
Is it something specific to SHA512?
How can one hash with ed25519-donna?
Program hangs
I've compiled with ed25519-donna-master/ed25519.o and the OpenSSL flags -lssl -lcrypto. The key generation, signing, and verification functions work as expected.
It's running without error, but the application hangs on these lines, and the cores are not running at 100%, so I don't think it's busy processing:
extern "C"
{
#include "ed25519-donna-master/ed25519.h"
#include "ed25519-donna-master/ed25519-hash.h"
}
#include <openssl/rand.h>
unsigned char* hash;
const unsigned char* in = convertStringToUnsignedCharStar( myString );
std::cout << in << std::endl;
std::cout << "this is the last portion output and 'in' outputs correctly" << std::endl;
ed25519_hash(hash, in, sizeof(in) );
std::cout << hash << std::endl;
std::cout << "this is never output" << std::endl;
How can this code be modified so that ed25519_hash can function? It works the same way regardless of whether hash and in are unsigned char* or uint8_t*s.
For uint8_t*, I used this code:
uint8_t* hash;
const uint8_t* in = reinterpret_cast<const uint8_t*>(myString.c_str());
“…but I can't figure out what *hash is.”
That uint8_t *hash is the buffer (unsigned char*) that will contain the resulting hash after you called the function.
So, you're looking at a function that expects 3 parameters (also known as arguments):
an uint8_t * buffer to hold the resulting hash,
the input data to be hashed,
the length of the input data to be hashed.
“Is it something specific to SHA512?”
Nope, it's regular C source. But I think you’re a bit confused by the documentation. It states…
If you are not compiling against OpenSSL, you will need a hash function.
…
To use a custom hash function, use -DED25519_CUSTOMHASH
when compiling ed25519.c and put your custom hash implementation
in ed25519-hash-custom.h. The hash must have a 512bit digest and
implement
…
void ed25519_hash(uint8_t *hash, const uint8_t *in, size_t inlen);
So, unless you are not compiling against OpenSSL and implementing your own hash function, you won't be needing this function. Looking at your code, you are compiling against OpenSSL, which means you're playing with the wrong function.
“How can one hash with ed25519-donna?”
By using the provided functionality the library offers.
Your question makes me wonder if you scrolled down to the “Usage” part of the readme, because it completely answers your question and tells you what functions to use.
For your convenience, let me point you to the part of the documentation you need to follow and where you find the functions you need to hash, sign, verify etc. using ed25519-donna:
To use the code, link against ed25519.o -mbits and:
#include "ed25519.h"
Add -lssl -lcrypto when using OpenSSL (Some systems don't
need -lcrypto? It might be trial and error).
To generate a private key, simply generate 32 bytes from a secure cryptographic source:
ed25519_secret_key sk;
randombytes(sk, sizeof(ed25519_secret_key));
To generate a public key:
ed25519_public_key pk;
ed25519_publickey(sk, pk);
To sign a message:
ed25519_signature sig;
ed25519_sign(message, message_len, sk, pk, signature);
To verify a signature:
int valid = ed25519_sign_open(message, message_len, pk, signature) == 0;
To batch verify signatures:
const unsigned char *mp[num] = {message1, message2..}
size_t ml[num] = {message_len1, message_len2..}
const unsigned char *pkp[num] = {pk1, pk2..}
const unsigned char *sigp[num] = {signature1, signature2..}
int valid[num]
/* valid[i] will be set to 1 if the individual signature was valid, 0 otherwise */
int all_valid = ed25519_sign_open_batch(mp, ml, pkp, sigp, num, valid) == 0;
…
As you see, it's all in there… just follow the documentation.