Different hashes for Keccak / SHA-3 with several programs?

Different hashes for Keccak / SHA-3 with several programs? - hash

I am developing the keccak sponge function and have some strange behaviour about the hash result.
I use the string "abc" with 24 bits (3 bytes).
The test vectors for SHA-3 on http://www.di-mgt.com.au/sha_testvectors.html say that the result of SHA3-512 is following:
SHA3-512 from Test Vector
b751850b1a57168a 5693cd924b6b096e 08f621827444f70d 884f5d0240d2712e 10e116e9192af3c9 1a7ec57647e39340 57340b4cf408d5a5 6592f8274eec53f0
I also used cryptopp version 5.62 and it gives me this output:
CryptoPP
18587dc2ea106b9a1563e32b3312421ca164c7f1f07bc922a9c83d77cea3a1e5d0c69910739025372dc14ac9642629379540c17e2a65b19d77aa511a9d00bb96
I use HashTab 5.2.0.14 on Windows and it gives me for a file with "abc" the same output:
HashTab 5.2.0.14 18587dc2ea106b9a1563e32b3312421ca164c7f1f07bc922a9c83d77cea3a1e5d0c69910739025372dc14ac9642629379540c17e2a65b19d77aa511a9d00bb96
So, there are some references but one of them is different to the others. It is explained on the website, that the input message is attached with 2 bits "10" defined in the FIPS-202 draft. So, CryptoPP and HashTab use maybe another implementation but which one??
Now I have my program and the "reference code" from the keccak site and also another implementation in python:
My program returns this hash value for "abc":
My Program
20FF13D217D5789FA7FC9E0E9A2EE627363EC28171D0B6C52BBD2F240554DBC94289F4D61CB57DF72DF08AAC4366022D5DF23E703B8FDFF6306021DB4D5E6760
This is the keccak reference code (http://keccak.noekeon.org/KeccakReferenceAndOptimized-3.2.zip) on the http://keccak.noekeon.org/files.html which calculates the same value:
Keccak-Reference 3.2
Message of size 2040 bits with Keccak[r=1024, c=576]
20FF13D217D5789FA7FC9E0E9A2EE627363EC28171D0B6C52BBD2F240554DBC94289F4D61CB57DF72DF08AAC4366022D5DF23E703B8FDFF6306021DB4D5E6760 (truncated to the same length)
This is the python implementation from https://github.com/mgoffin/keccak-python/blob/master/Keccak.py resulting in the same value:
keccak-python
Value after squeezing : 20FF13D217D5789FA7FC9E0E9A2EE627363EC28171D0B6C52BBD2F240554DBC94289F4D61CB57DF72DF08AAC4366022D5DF23E703B8FDFF6306021DB4D5E67601173D04BF5AEC3EBBCA87696355C5FB4D72D00D2CC4F843A0A3A0ED8924A16FC37769A3DB7C3A84F31E92375A7D74A0136D80A647FBC5AF8D733B43873A3709F
So my questions:
1) Is it true that Keccak and SHA-3 have different outputs based on the specification NIST made with FIPS 202?
2) Why do I have now three different hash values which are not unique?
3) Is the capacity changed in SHA3-512 so that it has 512 bits capacity and 1600-512 bitrate? This is another difference I read on a presentation about SHA-3 but I did not find it in the FIPS-202 document.
Thank you very much!
Regards,
Burak

2)
Like mentioned in https://crypto.stackexchange.com/questions/15727/what-are-the-key-differences-between-the-draft-sha-3-standard-and-the-keccak-sub
FIPS 202 was changed on April 7, 2014.
The last release from CryptoPP is from 2/20/2013 (which was the first release including SHA3)
see http://www.cryptopp.com/
This explains why CryptoPP produces a different hash compared to actual test vectors. I think same will apply for HashTab.

Related

Reversing rand in perl 5.10.0, anyone know where to find the source code for rand/srand?

I am doing an assignment where I have a passwd file and I am to find all the passwords in it. Most of them were easy with Jack the ripper and some tweaking but the extra credit requires I find a 8 byte Alphanumeric password generated by rand in perl 5.10.0 and encrypted with crypt.
I came up with three ways to approaching this:
Brute force: 62^8 Computations = 300 Weeks on my machine. I could
rent a server with 300 times my machine power to do in 1 week.
Somehow that feels like a waste of resources/electricity for an
extra credit.
Break Crypt: Not sure on this one, I have however generated a
char-set from the other passwords I found, reducing the Incremental
brute force to 5 days, but I think that will only work if this
password contains only characters present in the previous ones (17
plain-texts), so maybe if i get lucky! (Highly Unlikely)
Break rand: If I can find the same seed used to generate the
password. I can then generate dictionaries to feed to Jack. In order
to get the seed from the file given to me however I have to
understand how perl is creating the seed (and if it is even possible
on 5.10.0).
From what I have researched on earlier Perl versions only the System Time was used as a seed. I made a script that uses the m_time (Time From Epoch) on the passwd file given to me (+-10 to be sure although I'm sure the file got generated in one second) as seed to generate a dictionary, in this format, since I do not know at what call of rand() my password actually starts:
abcdefgh bcdefghi cdefhijk
I fed the dictionary to Jack. Of course this didn't work because after Perl 5.004 Perl uses other stuff (the point of my question) to generate a seed.
So, my question is if anyone knows where to find the source code Perl uses to generate the seed, and/or source code for rand/srand. I was looking for something that looked like this, but for version 5.10.0:
What are the weaknesses of Perl's srand() default seed, post version 5.004?
I tried using grep in the /lib/perl directory but I get lost in all the #define structure files.
Also feel free to let me know if you think I am completely offtrack with the assignment and/or any advice on the matter.

You don't want to look in /lib/perl, you want to look in the Perl source.
Here is Perl_seed() in util.c as of v5.10.0, which is the function called if srand is called without an argument, or if rand is called without srand being called first.
As you can see, on a Unix system with random device support, it uses bytes from /dev/urandom to seed the RNG. On a system without such support, it uses a combination of the time (with microsecond resolution if possible), the PID of the Perl process, and memory locations of various data structures in the Perl interpreter.
In the urandom case, guessing the seed is effectively impossible. In the second case, it's still of difficulty probably similar to brute-forcing the passwords; you have 20 bits of unpredictability from the microsecond timestamp, up to 16 bits from the PID, and an unknown amount from the memory addresses, probably between 0 and 20 bits if you know details of the system where it was run, but up to 64 or 96 bits if you have no knowledge at all.
I would say that attacking Perl's rand by guessing the seed is probably not practical, and reversing it from its output is probably not either, especially if it was run on a system with drand48. Have you considered a GPU-based brute-forcing tool?

How to run a disassembled code 6502?

I have to program in assembly the 6502.
I was forced to use the emulator Vice 128
I was told that the Commodore 128 is compatible with the instructions of 6502
I am a novice and I was made a practical demonstration but I did not understand anything.
There was an interface of 80 columns which passed with a command (which one?)
The instructions in machine language or assembly (the program)
were entered directly on this matrix of 80 columns.
Also the data are entered in this matrix.
So is this matrix the memory? Each line represents what?
I was told that this is disassembled code 6502. But I do not know what it means
I'm very confused
I want to run this simple program that
performs the sum of two numbers.
The two numbers are stored in the first page to the word zero and to the word one. I want to store the result in the second word of the first page.
I imagined that the first line contains 80 words. Is that right?
So I put here the data in hexadecimal (3 and 2).
$03 $02
LDA $00
ADC $01
STA $02
But I have a syntax error.
I hope someone can help me because it escapes me how things work.
Thanks in advance

Fir'st, in 6502, we use we deal with bytes, not words. (it's an 8 bit architecture)
You don't mention which macro assembler you are using, but I assume that its trying to interpret $03 as an op code, not data. I looked up two options
in ca65 you can use
.BYTE $03 $02
in dasm you use
HEX 03 02
In addition, 6502 has no concept of 80 anything (words, lines whatever). The only 80 I can think of is the old terminals that had 80 columns. I don't see how this is relevant here.

How to run a disassembled code 6502?
You have to assemble back the code.
Each 6502 instruction stands for 1, 2, or 3 bytes, the first is called the opcode, the optional second or third is the data used by the instruction (the operand).
You need a program to translate the instruction mnemonics to bytes. There were many such programs on the Commodore.
The Commodore 128 had a built-in monitor that let you enter instructions to assemble directly. You can enter it by typing MONITOR at the BASIC prompt. You would need to first set the address, then use "assemble" commands. Then use the "go" command at the starting address to run it. Use BASIC POKE command to set locations containing data, before you enter the monitor. The address 0B00 is a good address to use as it's the tape buffer which is unused except during tape I/O.
Good luck.

Difference between machine language, binary code and a binary file

I'm studying programming and in many sources I see the concepts: "machine language", "binary code" and "binary file". The distinction between these three is unclear to me, because according to my understanding machine language means the raw language that a computer can understand i.e. sequences of 0s and 1s.
Now if machine language is a sequence of 0s and 1s and binary code is also a sequence of 0s and 1s then does machine language = binary code?
What about binary file? What really is a binary file? To me the word "binary file" means a file, which consists of binary code. So for example, if my file was:
010010101010010
010010100110100
010101100111010
010101010101011
010101010100101
010101010010111
Would this be a binary file? If I google binary file and see Wikipedia I see this example picture of binary file which confuses me (it's not in binary?....)
Where is my confusion happening? Am I mixing file encoding here or what? If I were to ask one to SHOW me what is machine language, binary code and binary file, what would they be? =) I guess the distinction is too abstract to me.
Thnx for any help! =)
UPDATE:
In Python for example, there is one phrase in a file I/O tutorial, which I don't understand: Opens a file for reading only in binary format. What does reading a file in binary format mean?

Machine code and binary are the same - a number system with base 2 - either a 1 or 0. But machine code can also be expressed in hex-format (hexadecimal) - a number system with base 16. The binary system and hex are very interrelated with each other, its easy to convert from binary to hex and convert back from hex to binary. And because hex is much more readable and useful than binary - it's often used and shown. For instance in the picture above in your question -uses hex-numbers!
Let say you have the binary sequence 1001111000001010 - it can easily be converted to hex by grouping in blocks - each block consisting of four bits.
1001 1110 0000 1010 => 9 14 0 10 which in hex becomes: 9E0A.
One can agree that 9E0A is much more readable than the binary - and hex is what you see in the image.

I'm honestly surprised to not see the information I was looking for, looking back though, I guess the title of this thread isn't fully appropriate to the question the OP was asking.
You guys all say "Machine Code is a bunch of numbers".
Sure, the "CODE" is a bunch of numbers, but what people are wondering (I'm guessing) is "what actually is happening physically?"
I'm quite a novice when it comes to programming, but I understand enough to feel confident in 'roughly' answering this question.
Machine code, to the actual circuitry, isn't numbers or values.
Machine code is a bunch of voltage gates that are either open or closed, and depending on what they're connected to, a certain light will flicker at a certain time etc.
I'm guessing that the "machine code" dictates the pathway and timing for specific electrical signals that will travel to reach their overall destination.
So for 010101, 3 voltage gates are closed (The 0's), 3 are open (The 1's)
I know I'm close to the right answer here, but I also know it's much more sophisticated - because I can imagine that which I don't know.
010101 would be easy instructions for a simple circuit, but what I can't begin to fathom is how a complex computer processes all of the information.
So I guess let's break it down?
x-Bit-processors tell how many bits the processor can process at once.
A bit is either 1 or 0, "On" or "Off", "Open" or "Closed"
so 32-bit processors process "10101010 10101010 10101010 10101010" - this many bits at once.
A processor is an "integrated circuit", which is like a compact circuit board, containing resistors/capacitors/transistors and some memory. I'm not sure if processors have resistors but I know you'll usually find a ton of them located around the actual processor on the circuit board
Anyways, a transistor is a switch so if it receives a 1, it sends current in one direction, or if it receives a 0, it'll send current in a different direction... (or something like that)
So I imagine that as machine code goes... the segment of code the processor receives changes the voltage channels in such a way that it sends a signal to another part of the computer (why do you think processors have so many pins?), probably another integrated circuit more specialized to a specific task.
That integrated circuit then receives a chunk of code, let's say 2 to 4 bits 01 or 1100 or something, which further defines where the final destination of the signal will end up, which might be straight back to the processor, or possibly to some output device.
Machine code is a very efficient way of taking a circuit and connecting it to a lightbulb, and then taking that lightbulb out of the circuit and switching the circuit over to a different lightbulb
Memory in a computer is highly necessary because otherwise to get your computer to do anything, you would need to type out everything (in machine code). Instead, all of the 1's and 0's are stored inside some storage device, either a spinning hard disk with a magnetic head pin that 'reads' 1's or 0's based on the charge of the disk, or a flash memory device that uses a series of transistors, where sending a voltage through elicits 1's and 0's (I'm not fully aware how flash memory works)
Fortunately, someone took the time to think up a different base number system for programming (hex), and a way to compile those numbers (translate them) back into binary. And then all software programs have branched out from there.
Each key on the keyboard creates a specific signal in binary that translates to
a bunch of switches being turned on or off using certain voltages, so that a current could be run through the specific individual pixels on your screen that create "1" or "0" or "F", or all the characters of this post.
So I wonder, how does a program 'program', or 'make' the computer 'do' something... Rather, how does a compiler compile a program of a code different from binary?
It's hard to think about now because I'm extremely tired (so I won't try) but also because EVERYTHING you do on a computer is because of some program.
There are actively running programs (processes) in task manager. These keep your computer screen looking the way you've become accustomed, and also allow for the screen to be manipulated as if to say the pictures on the screen were real-life objects. (They aren't, they're just pictures, even your mouse cursor)
(Ok I'm done. enough editing and elongating my thoughts, it's time for bed)
Also, what I don't really get is how 0's are 'read' by the computer.
It seems that a '0' must not be a 'lack of voltage', rather, it must be some other type of signal
Where perhaps something like 1 volt = 1, and 0.5 volts = 0. Some distinguishable difference between currents in a circuit that would still send a signal, but could be the difference between opening and closing a specific circuit.
If I'm close to right about any of this, serious props to the computer engineers of the world, the level of sophistication is mouthwatering. I hope to know everything about technology someday. For now I'm just trying to get through arduino.
Lastly... something I've wondered about... would it even be possible to program today's computers without the use of another computer?

Machine language is a low-level programming language that generally consists entirely of numbers. Because they are just numbers, they can be viewed in binary, octal, decimal, hexadecimal, or any other way. Dave4723 gave a more thorough explanation in his answer.
Binary code isn't a very well-defined technical term, but it could mean any information represented by a sequence of 1s and 0s, or it could mean code in a machine language, or it could mean something else depending on context.
Technically, all files are stored in binary, we just don't usually look at the binary when we view a file. However, the term binary file is usually used to refer to any non-text file; e.g. an .exe, a .png, etc.

You have to understand how a computer works in its basic principles and this will clear things up for you... Therefore I recommend on reading into stuff like Neumann Architecture
Basically in a very simple computer you only have one memory like an array
which has instructions for your processor, the data and everything is a binary numbers.
Your program starts at a certain place in your memory and reads the first number...
so here comes the twist: these numbers can be instructions or data.
Your processor reads these numbers and interprets them as instructions
Example: the start address is 0
in 0 is a instruction like "read value from address 120 into the ALU (Math-Unit)
then it steps to address 1
"read value from address 121 into ALU"
then it steps to address 2
"subtract numbers in ALU"
then it steps to address 3
"if ALU-Value is smaller than zero go to address 10"
it is not smaller than zero so it steps to address 4
"go to address 20"
you see that this is a basic if(a < b)
You can write these instructions as numbers and they can be run by your processor but because nobody wants to do this work (that was what they did with punchcards in the 60s)
assembler was invented...
that looks like:
add 10 ,11, 20 // load var from address 10 and 11; run addition and store into address 20
In Conclusion:
Assembler (processor instructions) can be called binary because it's stored in plain numbers
But everything else can be a Binary file, too.
In reality if you have a simple .exe file it is both... If you have variables in there like a = 10 and b = 20, these values can be stored some where between if clauses and for loops... It depends on the compiler where it put these
But if you have a complex 3D-model it can be stored in a separate file with no executable code in it...
I hope it helps to clear things up a little.

xkcd: Externalities

So the April 1, 2013 xkcd Externalities web comic features a Skein 1024 1024 hash breaking contest. I'm assuming that this must be nothing more than a brute force effort where random strings are hashed in an effort to match Randall's posted hash? Is this correct?
Also, my knowledge of Skein hashing theory is virtually non-existent but being a halfway decent programmer I was able to download and run both SkeinFish (C#) and Maarten Bodewes Skein implementation (Java) locally in 1024 1024 mode with some input strings. The hashes that they gave, however, were different than the hash that xkcd returned for the same input. This may be an extremely naive question but do different Skein implementations give different hashes? And what Skein implementation is xkcd using?
Thanks for pardoning my ignorance!

There are several different iterations of the skein algorithm. XKCD is using version 1.3, which is also the most recent. Sources can be found here (look for "V1.3")
Interestingly enough, this brute-force method is the same one employed by Bitcoin to "mine" bitcoins. The big differences are the hash algorithm (SHA-256 in that case) and the target hash (which is dynamically determined to be any hash starting with a certain number of zeros.) It takes a lot of work to discover the hash, but once it has been found it is trivial to verify the source bits and that the resulting hash meets the criteria.

Here's the source code the Stanford team used. We ran this on about a hundred 8-core EC2 servers for a while, but not the whole competition.
https://github.com/jhiesey/skeincrack

If you were hashing non-alphanumeric characters (spaces, punctuation, etc.), you may have been getting different results due to HTML form encoding. The "enctype" attribute on the form XKCD was hosting was "application/octet-stream", which according to https://developer.mozilla.org/en-US/docs/HTML/Element/form is not a browser-supported standard. I assume the browser falls back on the URL-encoding type when it sees one it doesn't recognize.
I observed the string "=" being submitted URL-encoded in Chrome, and returning a different hash than what I got locally with the latest pyskein. But when I submitted it with this curl command line (no longer works), I got the expected hash:
curl -X POST --data-binary "hashable==" "http://almamater.xkcd.com/?edu=school.edu"
The Stanford code in another answer does the same thing, and they apparently had some success. I never got any random data to locally hash to a better score than even my own school, so I never got a chance to test thoroughly how to pass arbitrary data in properly. I don't know what the exact behavior was (e.g., perhaps if you omitted hashable= the server would detect that and just hash the whole POST body), but it may have intentionally been a little tricky as part of April Fool's.

Should all implementations of SHA512 give the same Hash?

I am working on writing a SHA512 function. When i check the file I am encrypting on different sources, a Linux SHA512SUM tool, a couple websites, and run it through the old source code i have for SHA512, they all give different hash values. My thought going into this project is that all Hash algorithms will output the same hash value if implemented correctly, to be used as a check sum. Am I wrong in thinking this? If I am wrong how would I really check to see if my work is correct?
Thanks in advance.

Yes, that's one of the basic building block of PKI: the same data block passed to a hash should always return the same hash value.
beware of the interpretation, though: the result of a SHA-2(512) hash is a block of 512 bits, not a string value so it will first be encoded for human consumption and it is therefore possible that you see what looks like visually different results when it's simply a matter of using different encodings.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse