Variable output hash function - hash

I know that there are hash functions that from a variable length input can give a fixed output. To take the simplest one, using the module of ten no matter how big is the input number I will always get an output between 0 and 9.
I need to do have from an unknown password, a variable length output. My first thought was to use the module, increasing the prim number as much as many digits I need to have as output.
My problems are:
I must handle short passwords as well as I would with long passwords;
I don't know how long should the output be before writing the program, and even though I would know after the user has set the password I may need to change it if he modifies the file.
My first thought was using a simple function and modify it based on my needs.
If I have to hash 123 but I need to have 5 characters as output, that's what I would do:
I add 2 zeros on the right, changing the input to 12300;
I take the lowest 5 digits prime number (10007);
And I then I have my hash doing 12300 % 10007 = 02293.
But since I would probably need output in the order of hundreds if not thousands I'm pretty sure module is not the solution to my problem.
I could also try to create my own hash function, but I have no idea how to verify if it works or if it's trash.
Are there some common solutions in literature for this kind of problem?

Related

Reversing rand in perl 5.10.0, anyone know where to find the source code for rand/srand?

I am doing an assignment where I have a passwd file and I am to find all the passwords in it. Most of them were easy with Jack the ripper and some tweaking but the extra credit requires I find a 8 byte Alphanumeric password generated by rand in perl 5.10.0 and encrypted with crypt.
I came up with three ways to approaching this:
Brute force: 62^8 Computations = 300 Weeks on my machine. I could
rent a server with 300 times my machine power to do in 1 week.
Somehow that feels like a waste of resources/electricity for an
extra credit.
Break Crypt: Not sure on this one, I have however generated a
char-set from the other passwords I found, reducing the Incremental
brute force to 5 days, but I think that will only work if this
password contains only characters present in the previous ones (17
plain-texts), so maybe if i get lucky! (Highly Unlikely)
Break rand: If I can find the same seed used to generate the
password. I can then generate dictionaries to feed to Jack. In order
to get the seed from the file given to me however I have to
understand how perl is creating the seed (and if it is even possible
on 5.10.0).
From what I have researched on earlier Perl versions only the System Time was used as a seed. I made a script that uses the m_time (Time From Epoch) on the passwd file given to me (+-10 to be sure although I'm sure the file got generated in one second) as seed to generate a dictionary, in this format, since I do not know at what call of rand() my password actually starts:
abcdefgh bcdefghi cdefhijk
I fed the dictionary to Jack. Of course this didn't work because after Perl 5.004 Perl uses other stuff (the point of my question) to generate a seed.
So, my question is if anyone knows where to find the source code Perl uses to generate the seed, and/or source code for rand/srand. I was looking for something that looked like this, but for version 5.10.0:
What are the weaknesses of Perl's srand() default seed, post version 5.004?
I tried using grep in the /lib/perl directory but I get lost in all the #define structure files.
Also feel free to let me know if you think I am completely offtrack with the assignment and/or any advice on the matter.
You don't want to look in /lib/perl, you want to look in the Perl source.
Here is Perl_seed() in util.c as of v5.10.0, which is the function called if srand is called without an argument, or if rand is called without srand being called first.
As you can see, on a Unix system with random device support, it uses bytes from /dev/urandom to seed the RNG. On a system without such support, it uses a combination of the time (with microsecond resolution if possible), the PID of the Perl process, and memory locations of various data structures in the Perl interpreter.
In the urandom case, guessing the seed is effectively impossible. In the second case, it's still of difficulty probably similar to brute-forcing the passwords; you have 20 bits of unpredictability from the microsecond timestamp, up to 16 bits from the PID, and an unknown amount from the memory addresses, probably between 0 and 20 bits if you know details of the system where it was run, but up to 64 or 96 bits if you have no knowledge at all.
I would say that attacking Perl's rand by guessing the seed is probably not practical, and reversing it from its output is probably not either, especially if it was run on a system with drand48. Have you considered a GPU-based brute-forcing tool?

At which lines in my MATLAB code a variable is accessed?

I am defining a variable in the beginning of my source code in MATLAB. Now I would like to know at which lines this variable effects something. In other words, I would like to see all lines in which that variable is read out. This wish does not only include all accesses in the current function, but also possible accesses in sub-functions that use this variable as an input argument. In this way, I can see in a quick way where my change of this variable takes any influence.
Is there any possibility to do so in MATLAB? A graphical marking of the corresponding lines would be nice but a command line output might be even more practical.
You may always use "Find Files" to search for a certain keyword or expression. In my R2012a/Windows version is in Edit > Find Files..., with the keyboard shortcut [CTRL] + [SHIFT] + [F].
The result will be a list of lines where the searched string is found, in all the files found in the specified folder. Please check out the options in the search dialog for more details and flexibility.
Later edit: thanks to #zinjaai, I noticed that #tc88 required that this tool should track the effect of the name of the variable inside the functions/subfunctions. I think this is:
very difficult to achieve. The problem of running trough all the possible values and branching on every possible conditional expression is... well is hard. I think is halting-problem-hard.
in 90% of the case the assumption that the output of a function is influenced by the input is true. But the input and the output are part of the same statement (assigning the result of a function) so looking for where the variable is used as argument should suffice to identify what output variables are affected..
There are perverse cases where functions will alter arguments that are handle-type (because the argument is not copied, but referenced). This side-effect will break the assumption 2, and is one of the main reasons why 1. Outlining the cases when these side effects take place is again, hard, and is better to assume that all of them are modified.
Some other cases are inherently undecidable, because they don't depend on the computer states, but on the state of the "outside world". Example: suppose one calls uigetfile. The function returns a char type when the user selects a file, and a double type for the case when the user chooses not to select a file. Obviously the two cases will be treated differently. How could you know which variables are created/modified before the user deciding?
In conclusion: I think that human intuition, plus the MATLAB Debugger (for run time), and the Find Files (for quick search where a variable is used) and depfun (for quick identification of function dependence) is way cheaper. But I would like to be wrong. :-)

How arbitrary is one?

My teacher says our homework program must handle "an arbitrary number of input lines". It seems pretty arbitrary to only accept one line, but is it arbitrary enough? My roommate said seven is more a arbitrary number than one, and maybe he's right. But I just have no idea how to measure the arbitrariness of a number and Google doesn't seem to help.
UPDATE:
It sounds like maybe the best thing to do is accepty any given number of input lines, and hope the prof can see that that makes a lot more sense than insisting that the user just give you one specific arbitrary number of input lines. Especially since we weren't instructed to notify the user about what the arbitrary number is. You can't just make the user guess, that's crazy.
"Arbitrary" doesn't mean you get to pick a random number to accept. It means that it should handle an input with any number of lines.
So if someone decides to give your program an input with 0 lines, 1 line, 2 lines... n lines, then it should still do the right thing (and not crash).
Arbitrary means it could be ANY number. 0, 1, 7, 100124453225.
I would probably test for 0 and display some sort of error in that case since it's supposed to have SOME text. Other than that so long as there are more lines your program should keep doing whatever it's designed to do.
Typically when teachers indicate that a program should accept arbitrary amounts of input they are indicating to you that you should consider corner cases which you may not have thought about, one of the most common being no input at all which can often cause errors in programs if the programmer hasn't considered this case.
The point of the word is to emphasize that your program should be able to handle different inputs instead of simply crashing unless input comes in a certain quantity or is formatted in a specific way.

Parse bit strings in Perl

When working with unpack, I had hoped that b3 would return a bitstring, 3 bits in length.
The code that I had hoped to be writing (for parsing a websocket data packet) was:
my($FIN,$RSV1, $RSV2, $RSV3, $opcode, $MASK, $payload_length) = unpack('b1b1b1b1b4b1b7',substr($read_buffer,0,2));
I noticed that this doesn't do what I had hoped.
If I used b16 instead of the template above, I get the entire 2 bytes loaded into first variable as "1000000101100001".
That's great, and I have no problem with that.
I can use what I've got so far, by doing a bunch of substrings, but is there a better way of doing this? I was hoping there would be a way to process that bit string with a template similar to the one I attempted to make. Some sort of function where I can pass the specification for the packet on the right hand side, and a list of variables on the left?
Edit: I don't want to do this with a regex, since it will be in a very tight loop that will occur a lot.
Edit2: Ideally it would be nice to be able to specify what the bit string should be evaluated as (Boolean, integer, etc).
If I have understood correctly, your goal is to split the 2-bytes input to 7 new variables.
For this purpose you can use bitwise operations. This is an example of how to get your $opcode value:
my $b4 = $read_buffer & 0x0f00; # your mask to filter 9-12 bits
$opcode = $b4 >> 8; # rshift your bits
You can do the same manipulations (maybe in a single statement, if you want) for all your variables and it should execute at a resonable good speed.

Is there a such a way to know how much memory space a file would take?

Is there a such a way to know how much memory space a file would take before hand?
For example, lets say I have a file size of 1G bytes. How would this file size translate to memory size?
I take your example from the comment and elaborate on what might happen to a text file when loaded into memory: Some time ago, "text" usually meant ASCII (as the least common denominator at least). And lots of software, written in a language like C, would represent such ASCII strings as an char* type. This resulted in a more-or-less exact match in memory requirements: Every byte in the input file would occupy one byte when loaded into RAM.
But that has changed in the last years with the rise of Unicode. The same text file, loaded by a simple Java program (and using Java's String type, which is very likely) would take up two times the amount of RAM. This is so, because the Java String type represents each character internally using UTF-16 (16 bits per character minimum), whereas ASCII used only one byte per character.
What I'm trying to say here is: There is no easy answer to your question, it always depends on who reads the data and what he's about to do with it.
One thing is true quite often: by "loading", the data does not become smaller.
If you read the whole file into memory at once, you'll need at least the size of the file free memory. Much of the time people don't actually need to do so, they just don't know another way. For an explanation of the problem and alternatives see:
http://www.effectiveperlprogramming.com/2010/01/memory-map-files-instead-of-slurping-them/
You can check yourself by writing a little test script with Memory::Usage.
From its documentation's synopsis:
use Memory::Usage;
my $mu = Memory::Usage->new();
# Record amount of memory used by current process
$mu->record('starting work');
# Do the thing you want to measure
$object->something_memory_intensive();
# Record amount in use afterwards
$mu->record('after something_memory_intensive()');
# Spit out a report
$mu->dump();
Then you'll know how much your build of Perl, given whatever character encoding you intend to use, and whatever method of dealing with the file you intend to implement, will consume in memory.
If you can avoid loading the whole file at once, and instead just iterate over it line by line or record by record, the memory concern goes away. So it would help to know what you actually are trying to accomplish. You may have an XY problem.
perldoc -f stat
stat Returns a 13-element list giving the status info for a file,
either the file opened via FILEHANDLE or DIRHANDLE, or named by
EXPR. If EXPR is omitted, it stats $_. Returns the empty list
if "stat" fails. Typically used as follows:
($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,
$atime,$mtime,$ctime,$blksize,$blocks)
= stat($filename);
Note the $size return value. It is the size of the file in bytes. If you are going to slurp the entire file into memory you will need at least $size bytes. Then again, you might need a whole lot more (or even a whole lot less), depending on what you do to the contents of the file.