How can I test an unsigned number? - easy68k

I'm using I/O Trap #4 to read in a number. This gives me a number, however it does not mention if it is read in as signed or unsigned. I would assume this is because it just reads it in as is and it could be either.
How can I check if my value is unsigned? i.e: How can I ensure it was between 0 and 2^32 inclusive?

There's no way to "check" this in code; the signedness of a number is not a property that is stored in the register. You have to know how to interpret the bits, i.e. which instructions to use when processing the number since different instructions treat the bits in different ways.
The documentation seems to be ... lacking regarding this. I would recommend simply testing it, what happens if you input -1? You should get 0xffffffff in the register.

Related

Implementing MD5: Inconsistent endianness?

So I tried implementing the MD5 algorithm according to RFC1321 in C# and it works, but there is one thing about the way the padding is performed that I don't understand, here's an example:
If I want to hash the string "1" (without the quotation marks) this results in the following bit representation: 10001100
The next step is appending a single "1"-Bit, represented by 00000001 (big endian), which is followed by "0"-Bits, followed by a 64-bit representation of the length of the original message (low-order word first).
Since the length of the original message is 8 (Bits) I expected 00000000000000000000000000001000 00000000000000000000000000000000 to be appended (low-order word first). However this does not result in the correct hash value, but appending 00010000000000000000000000000000 00000000000000000000000000000000 does.
This looks as if suddenly the little-endian format is being used, but that does not really seem to make any sense at all, so I guess there must be something else that I am missing?
Yes, for md5 you have to add message length in little-endian.
So, message representation for "1" -> 49 -> 00110001, followed by single bit and zeroes. And after add message length in reversed order of bytes (the least significant byte first).
You could also check permutations step by step on this site: https://cse.unl.edu/~ssamal/crypto/genhash.php.
Or there: https://github.com/MrBlackk/md5_sha256-512_debugger

Why are bools are sometimes referred to as "flags"?

Why are bools sometimes referred to as "flags"? Is it just a metaphor or is there some historical reason behind it?
Flags are an ancient way to convey information. A flag, if we ignore lowering it to half-mast, has only two states - raised or not raised. E.g., consider a white flag - raising it means surrendering. Not raising it, the default state, means that you are not surrendering.
A boolean variable, like a flag, only has two states - true and false.
Flag can be used as noun and as verb: To flag can mean to note, mark, signal something (Maybe this is derived from the use of nautical flags?)
An early (but probably not the first) use of the term flag in computer history can be found in the IBM 1620 from 1959 (my emphasis):
Memory was accessed two decimal digits at the same time (even-odd
digit pair for numeric data or one alphameric character for text
data). Each decimal digit was 6 bits, composed of an odd parity Check
bit, a Flag bit, and four BCD bits for the value of the digit in the
following format:
C F 8 4 2 1
The Flag bit had several uses:
In the least significant digit it was set to indicate a negative number (signed magnitude).
It was set to mark the most significant digit of a number (wordmark).
In the least significant digit of 5-digit addresses it was set for indirect addressing (an option on the 1620 I, standard on the 1620
II). Multi-level indirection could be used (you could even put the
machine in an infinite indirect addressing loop).
In the middle 3 digits of 5-digit addresses (on the 1620 II) they were set to select one of 7 index registers.
So a bit used to mark or indicate something was called flag bit.
Of course the use of "flag" in flag fields or status registers is then quite natural.
But once the association between flag and bit has been established it is also understandable that their use can become exchangeable. And of course this also holds for boolean variables.
PS: The same question was already asked, but unfortunately without answer.

Collision-proof hash-like identificator

I need to generate a 6 chars length (letters and digits) id to identify SaaS workspace (unique per user). Of course I could just go with numbers, but it shouldn't provide any clear vision about the real workspace number (for the end user).
So even for id 1 it should be 6 chars length and something like fX8gz6 and fully decodable to 1 or 000001 or something that i can parse to real workspace id. And of course it have to be collision-proof.
What would be the best approach for that?
This is something similar to what Amazon uses for its cloud assets, but it uses 8 chars. Actually 8 chars is suitable as it is the output range after Base64 encoding of 6 binary bytes.
Assuming you have the flexibility to use 8 characters. In original question you said 6 chars, but again assuming. Here is a possible scheme:
Number your assets in Unsigned Int32, possibly auto-increment fashion. call it real-id. Use this real-id for all your internal purposes.
When you need to display it, follow something like this:
Convert your integer to 4 binary Bytes. Every language has library to extract the bytes out of integers and vice versa. Call it real-id-bytes
take a two byte random number. Again you can use libraries to generate an exact 16 bit random number. You can use cryptographic random number generators for better result, or the plain rand is just fine. Call it rand-bytes
Obtain 6 byte display-id-bytes= array-concat(rand-bytes, real-id-bytes)
Obtain display-id= Base64(display-id-bytes). This is exactly 8 chars long and has a mix of lowercase, uppercase and digits.
Now you have a seemingly random 8 character display-id which can be mapped to the real-id. To convert back:
Take the 8 character display-id
display-id-bytes= Base64Decode(display-id)
real-id-bytes= Discard-the-2-random-bytes-from(display-id-bytes)
real-id= fromBytesToInt32(real-id-bytes)
Simple. Now if you really cannot go for 8-char display-id then you have to develop some custom base-64 like algo. Also you might restrict yourself to only 1 random bytes. Also note that This is just an encoding scheme, NOT a encryption scheme. So anyone having the knowledge of your scheme can effectively break/decode the ID. You need to decide whether that is acceptable or not. If not then I guess you have to do some form of encryption. Whatever that is, surely 6-chars will be far insufficient.

Displaying Unicode Characters

I already searched for answers to this sort of question here, and have found plenty of them -- but I still have this nagging doubt about the apparent triviality of the matter.
I have read this very interesting an helpful article on the subject: http://www.joelonsoftware.com/articles/Unicode.html, but it left me wondering about how one would go about identifying individual glyphs given a buffer of Unicode data.
My questions are:
How would I go about parsing a Unicode string, say UTF-8?
Assuming I know the byte order, what happens when I encounter the beginning of a glyph that is supposed to be represented by 6 bytes?
That is, if I interpreted the method of storage correctly.
This is all related to a text display system I am designing to work with OpenGL.
I am storing glyph data in display lists and I need to translate the contents of a string to a sequence of glyph indexes, which are then mapped to display list indices (since, obviously, storing the entire glyph set in graphics memory is not always practical).
To have to represent every string as an array of shorts would require a significant amount of storage considering everything I have need to display.
Additionally, it seems to me that 2 bytes per character simply isn't enough to represent every possible Unicode element.
How would I go about parsing a Unicode string, say UTF-8?
I'm assuming that by "parsing", you mean converting to code points.
Often, you don't have to do that. For example, you can search for a UTF-8 string within another UTF-8 string without needing to care about what characters those bytes represent.
If you do need to convert to code points (UTF-32), then:
Check the first byte to see how many bytes are in the character.
Look at the trailing bytes of the character to ensure that they're in the range 80-BF. If not, report an error.
Use bit masking and shifting to convert the bytes to the code point.
Report an error if the byte sequence you got was longer than the minimum needed to represent the character.
Increment your pointer by the sequence length and repeat for the next character.
Additionally, it seems to me that 2
bytes per character simply isn't
enough to represent every possible
Unicode element.
It's not. Unicode was originally intended to be a fixed-with 16-bit encoding. It was later decided that 65,536 characters wasn't enough, so UTF-16 was created, and Unicode was redefined to use code points between 0 and 1,114,111.
If you want a fixed-width encoding, you need 21 bits. But they aren't many languages that have a 21-bit integer type, so in practice you need 32 bits.
Well, I think this answers it:
http://en.wikipedia.org/wiki/UTF-8
Why it didn't show up the first time I went searching, I have no idea.

Decryption type and breaking (AES 128?)

My question has 2 parts. The first one is "what possible type of encryption i am on" and the other is, "what is the chance of breaking it" (as soon as the encryption algorithm was found).
So, I got the original file and the encrypted one and I was able to test the behaviour of the encrypted when something changes in the original. The most important clues I've found are:
The original and encrypted file have the exact same size (take note that the size is product of 0x10 = 128-bit)
The encryption block size seems to be 128-bit. When a byte changes on the original file, the same 128-bit block changes on the encrypted file and, sometimes (maybe) the previous or next block. But most times only this block. And the rest of the file doesn't change at all.
There are repeated sections on the original files (e.g. 16 bytes of 00 value) but non of them have the same 128-bit block result on the encrypted file. So, 16 bytes of 00 in the 2nd block has different encrypted result than 16 bytes of 00 on the next block.
Working that clues in mind, could you guess what type of algorithm could it be? I was thinking it is an AES 128-bit, but clue #2 excludes CBC mode, while clue #3 excludes ECB! Seems to be something "between" those... Could it be AES 128 on any other mode? What else could you think?
In case there is a couple of known algorithm which could possibly result on that behaviour, what are the chances of being able to break it, knowing the original data and being able to do tests on changes to the 2 files?
Thanks in advance
It sounds like it is a variation on ECB mode, where the plaintext block is XORed with a nonce that is derived from the block's position in the file before being encrypted in ECB mode.
This would result in the observed characteristics:
No increase in file size (so therefore no IVs);
A single byte change in the input affects an entire block of the output.
(The nonce could be as simple as a counter).
This scheme is weak. It would be susceptible to the same kind of frequency-analysis attacks that work against ECB mode - it would just take more ciphertext. Also any plaintext/ciphertext pairs you collect are re-usable for the same block positions in any unknown ciphertexts you find.