What is this table called and how do I read it? - specifications

I'm reading the powerpoint specification and I came across a table like this:
Do tables like these have a name? How do I read this?
I'm pretty sure it means that the first 4 bits identifies the recVer and the next 12 identifies the recInstance, but what about recLen? Do all 32 bits pull double-duty and identify the recLen or does that mean the next 32 bits do that?

It looks like some type of packet header. The numbers at the top are the bit position. It is read left to right, top to bottom, so it is telling you that the header is made up of 4 bits interpreted as the recVer, followed by 12 bits that is interpreted as recInstance, followed by 16 bits that is the recType, followed by 32 bits which is the recLen.
This is a common way to show the header structure, as can be seen on Wikipedia's TCP page.

This is just part of the binary format for the powerpoint file. the 0,1,2 etc are the bit numbers. So you can see bit's 0 - 3 inclusive are the recVer etc.
The specification will tell you want recVer, recInstance and recType mean.
I think recLen should be obvious but it'll be in the spec.
To read it, you'd read in the bytes and then do bit manipulation to decode those fields. You don't say what language you'll be using but you can do bit manipulation in a number of languages.

Not sure about an official/standard name, but this looks like a record layout map.
You read it left to right, every box is a single bit.
The record is composed of
4 bits recVer
12 bits recInstance
16 bits recType
32 bits recLen

Related

Why is this question worded like this regarding main memory?

I have this question:
1. How many bits are required to address a 4M × 16 main memory if main memory is word-addressable?
And before you say it, yes I have looked this question up and there have been posts on stackoverflow asking about how to answer it but my question is different.
This may sound like a silly question but I don't understand what it means when it says "How many bits are required to address...".
To my understanding and what I have been taught is that (if we're talking about word addressable) each cell would contain 16 bits in the RAM chip and the length would be 4M-1, with 2^22 words. But I don't understand what it is asking when it says 'How many bits are required...':
The answer says 22 bits would be required but I just don't understand. 22 bits for what? All I know is each word is 16 bits and each cell would be numbered from 0 - 4M-1. Can someone clear this up for me please?
Since you have 4 million cells, you need a number that is able to represent each cell. 22 bits is the size of the address to allow representing 2^22 cels (4,194,304 cells)
In computing, a word is the natural unit of data used by a particular processor design. A word is a fixed-sized piece of data handled as a unit by the instruction set or the hardware of the processor.
(https://en.m.wikipedia.org/wiki/Word)
Using this principle imagine a memory with a word that uses 2 bits only, and it is capable of storing 4 words:
XX|YY|WW|ZZ
Each word in this memory is represented by a number that tells to computer it's position.
XX is 0
YY is 1
WW is 2
ZZ is 3
The smallest binary number length that can represent 3 is a 2 bit binary length right? Now apply the same example to a largest memory. Doesn't matters if the word size is 16 bits or 2 bits. Only the length of words matters

Size of binary file after base64 encoding? Need explanation on the solution

So I'm studying for the upcoming exam, and there's this question: given a binary file with the size of 31 bytes what will its size be, after encoding it to base64?
The solution teacher gave us was (40 + 4) bytes as it needs to be a multiple of 4.
I'm not being able to come across this solution, and I have no idea how to solve this, so I was hoping somebody could help me figure this out.
Because base 64 encoding divide the input data in six bit block and one block use an ascii code.
If you have 31 byte in input you have 31*8/6 bit block to encode. As a rule of thumb every three byte in input you have 4 byte in output
If input data is not a multiple of six bit the base64 encoding fills the last block with 0 bit
In your example you have 42 block of six bit, with last filled with missing 0 bit.
Base 64 algorithm implementation filled the encoded data with '=' symbol in order to have of multiple of 4 as final result.

How do i calculate the size of a tag field?

I'm revising for an exam and i've came across a question that I have no idea how to do, i've looked through my notes and cant seem to find anything on it, can anyone help me?
Given a 64KB cache that contains 1024 blocks with 64 bytes per block, what is the size of the tag field for a 32-bit architecture?
The question is only worth 1 mark so i cant imagine the answer is too hard, but i cant seem to find anthing on it.
You need 32 bits for the address. You need 6 bits for the offset within a block. You need 10 bits to identify one of the 1,024 possible blocks in the cache. That's 16 bits in total. Therefore the tag needs to be 32 bits - 16 bits = 16 bits.
I recommend following the link that aruisdante provided and look at how to calculate this yourself.

Variable-byte encoding clarification

I am very new to the world of byte encoding so please excuse me (and by all means, correct me) if I am using/expressing simple concepts in the wrong way.
I am trying to understand variable-byte encoding. I have read the Wikipedia article (http://en.wikipedia.org/wiki/Variable-width_encoding) as well as a book chapter from an Information Retrieval textbook. I think I understand how to encode a decimal integer. For example, if I wanted to provide variable-byte encoding for the integer 60, I would have the following result:
1 0 1 1 1 1 0 0
(please let me know if the above is incorrect). If I understand the scheme, then I'm not completely sure how the information is compressed. Is it because usually we would use 32 bits to represent an integer, so that representing 60 would result in 1 1 1 1 0 0 preceded by 26 zeros, thus wasting that space as opposed to representing it with just 8 bits instead?
Thank you in advance for the clarifications.
The way you do it is by reserving one of the bits to mean "I'm not done with the value." Usually, that's the most significant bit.
When you read a byte, you process the lower 7 bits. If the most significant bit is 1, then you know there's one more byte to read, and you repeat the process, adding the next 7 bits to the current 7 bits.
The MIDI format uses that exact encoding to represent lengths of MIDI events, in the following manner:
ExpectedValue = 0
byte=ReadFromFile
ExpectedValue = ExpectedValue + (byte AND 0x7f)
if byte > 127 then
ExpectedValue = ExpectedValue SHL 7
Goto 2
Done
For example, the value 0x80 would be represented using the bytes 0x81 0x00. You can try running the algorithm on those two bytes, and you see you'll get the right value.
UTF-8 works similarly, but it uses a slightly more complex scheme to tell you how many bytes you should be expecting. This allows for some error correction, since you can easily tell if the bytes you're getting match the length claimed. Wikipedia describes their structure quite well.
You hit the nail on the head.
There are many encoding schemes, such as gamma and delta, which are special cases of elias coding. These are bit-level codes, as opposed to the byte-level code you used, and are useful when you have a strong skew towards small numbers (which can often be achieved by encoding deltas instead of absolute values).
Bit-level encoding schemes are much more difficult to implement than byte-level schemes and the additional CPU burden may outweigh the time saved by having less data to read, though most modern CPUs have "highest-bit" and "lowest-bit" instructions that dramatically improve the performance of bit-level codecs. As CPU speeds continue to outpace RAM speeds, bit-level schemes will become more attractive, though the simplicity of byte-level codecs is a big factor too.
Yes, you are right, you save space by encoding using one byte instead of 4.
Generally, you will save memory if the values you are encoding are much smaller than the maximum value that would have fit in your original fixed-width encoding.

Does a pronounceable encoding exist?

I am using UUIDs, but they are not particularly nice to read, write and communicate. So I would like to encode them. I could use base64, or base32, but they would not be easy anyway: base64 has capitalized letters and symbols. Base32 is a bit better, but you can still obtain clumsy stuff.
I was wondering if there's a nice and clean way to encode a number into palatable phonemes, so to achieve better readability and hopefully a bit of compression.
I hope you don't use this idea: The Automated Curse Generator :)
Bubble Babble is a good one to try. It generates nonsensical but readable output like:
xesef-disof-gytuf-katof-movif-baxux
This question is very old; interestingly, as old as the solution I'm about to present, but it hasn't been mentioned here yet.
It's Proquint. Similar to Bubble Babble, but the differences make the results easier to read, in my opinion.
Here's how it works, from their documentation:
In sum, we propose encoding a 16-bit string as a proquint [PRO-nouncable QUINT-uplet] of alternating consonants and vowels as follows.
Four-bits as a consonant:
0 1 2 3 4 5 6 7 8 9 A B C D E F
b d f g h j k l m n p r s t v z
Two-bits as a vowel:
0 1 2 3
a i o u
Whole 16-bit word, where "con" = consonant, "vo" = vowel:
0 1 2 3 4 5 6 7 8 9 A B C D E F
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|con |vo |con |vo |con |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Separate proquints using dashes, which can go un-pronounced or be pronounced "eh". The suggested
optional magic number prefix to a sequence of proquints is "0q-".
Here are some IP dotted-quads and their corresponding proquints.
127.0.0.1 lusab-babad
63.84.220.193 gutih-tugad
63.118.7.35 gutuk-bisog
140.98.193.141 mudof-sakat
64.255.6.200 haguz-biram
128.30.52.45 mabiv-gibot
147.67.119.2 natag-lisaf
212.58.253.68 tibup-zujah
216.35.68.215 tobog-higil
216.68.232.21 todah-vobij
198.81.129.136 sinid-makam
12.110.110.204 budov-kuras
Why not use something similar to what PGP does to create readable keys, simply find a nice list of words that are distinctive, lets say you're using 128 bit UUID's, a list of 256 words (2^8) means 16 words.
Stupid question but why are people reading/writing UUID's/etc. with respect to your application?
If all you want is a way to communicate hex values readably (ie, over the phone, or when instructing someone verbally what to type), then I suggest you use one of the various phonetic alphabets, such as the NATO Phonetic Alphabet or the US Army/Navy Phonetic Alphabet.
In the latter, the letters A-F are spoken as "able", "baker", "charlie", "dog", "easy", and "fox", respectively, so you would read the hex sequence "3fd2cc0e" as "three fox dog two charlie charlie zero easy". A uuid would be read out in exactly the same fashion.
Bubble babble and base32 are inefficient, especially in your case. I suggest that you make your own algorithm. Since there are 20 consonants and 6 vowels (including 'y') you can have approx. 20*6*2+6*6=276 consonant/vowel-vowel/consonant pairs. So every byte of your number can be represented by a pair. With a bit of tweaking your algorithm could produce pronounceable words much shorter than bubble babble. You could even play dice and replace all odd digits with a consonant/vowel. For example, 0123456789ABCDEF (hex) encodes to ABECIDOFUGYHKRM. 3141592654 (dec) encodes to HHIA-ROIR. You are left with ten spare consonants which can be paired with vowels to replace some double consonants etc.
S/KEY uses a dictionary of 2048 words to map 64 bit numbers to a sequence of 6 predefined words/syllables. (People will always find swear words if they are looking for them ;) )
Urbit's phonetic naming system wasn't mentioned yet. It uses 3 characters for 8 bits, 6 for 16, so it's less efficient than Proquint or Bubble Babble, but more divisible.
and hopefully a bit of compression
Not sure exactly what you mean there; making something "readable" or "pronouncable" will inevitably expand the space required for it. Maybe you meant "hopefully a bit of redundancy"? It would be good if, even if the user makes a small mistake, the system can detect and perhaps even correct it.
Really it depends very much on how big your UUIDs are and how they are most often communicated. If they need to be communicated over phone or VoIP, you want more audible redundancy. If they need to be entered into mobile devices with numeric keypads, it tends to be difficult to enter alphabetic characters, moreso if they are case-sensitive. If they are written down a lot, you need to worry about characters that look similar (O and 0 and o, for instance). If they need to be memorised, then probably strings of real words are the best (have a look at the PGP Word List).
However I think a great all-round solution is just using numberic digits. They're a lot harder to confuse with each other (both when spoken and written) than some alphabetic characters. Easy to enter on mobile devices, and people aren't too bad at memorising numbers.
And the length of the string is not too bad either. Let's compare base32 with base 10 (decimal). The length of a decimal string is log_10(32) times the length of the corresponding base32 string, or about 1.5 times as long. Ten characters of base32 correspond to 15 decimal digits.
Not much of a penalty, IMO, seeing as in base 32 it's easy to confuse C and T, or S, F and X (when spoken), and someone speaking with a foreign accent is more likely to cause trouble.
If they were easy to read they probably wouldn't be particularly unique.