Word Addressable Memory - word

For a 32 bit word addressable memory, the word has size of 4 bytes.
If I try to store a data structure uses less than 4 byte memory, say 2 bytes. Is the remaining 2 bytes wasted?
Should we consider the word size when we decide what data structure to use?
Got similar question here but not exactly what i am asking.
Please help.

On a modern CPU, memory itself is retrieved in usually chunks called cache lines (64 bytes on x86), but the CPU instruction set can address individual bytes.
If you had some esoteric machine with an instruction set that couldn't address individual bytes, then your compiler would hide that from you.
Whether or not memory is wasted in data structures smaller than a word would depend on the language you use and its implementation, but generally, records are aligned according to the field with the coarsest requirement. If you have an array of 16 bit integers, they will pack together tightly.

If you have 3 or 4 integers, it scarcely matters whether you store them in 2, 4, or 8 bytes.
If you have 3 or 4 billion integers, then it's probably worth considering a more space-efficient structure.
Generally speaking, the natural integer size for a given language implementation is supposed to be optimal in some way, so my advice is in general 'use int unless you know it's not appropriate' and let the compiler worry about it - until you have performance data to show otherwise.

Related

Loading in 256bit vector register in AVX2 Haswell processor

I want to load a 256 bit YMM register with 32 values, each of length 1 byte. All the intrinsic I looked into load either double word, i.e., 4 byte integers or quad word, i.e., 8 byte values.
How to load data of size lesser than these?
Is there any mnemonic which does this but don't has a equivalent intrinsic?
I don't think there is a way to gather bytes only. But it sounds to me like you need to rethink your problem. Is this pixel data? For Example RGBA values? If so, maybe you can change your application so it read/writes out for example RRRRGGGGBBBB (SSE). Then you don't have to gather bytes. You can read in 128/256 bits at once and that will be the most efficient use of SIMD.
Note that you may gain efficiency by using short int operations. I mean extent to 16bits and use 16bit integer SSE/AVX instructions.
Here is an example of bi-linear interpolation with SSE which reads in integers of four bytes (RGBA), and extends them to 16bits. This is faster then extending them to 32bits. The SSE3 example converts RGBARGBARGBARGBA to RRRRGGGGBBBB.
http://fastcpp.blogspot.no/2011/06/bilinear-pixel-interpolation-using-sse.html
This is a rather old question but I think what you might want is the AVX intrinsic __m256i _mm256_set_epi8 which takes 32 chars as input parameters.
There is no instruction which broadcasts single byte, but you can use _mm256_set1_epi8 intrinsic to achieve this effect.
You can simply use the _mm256_load_si256 intrinsic with a cast. This intrinsic corresponds to the VMOVDQA instruction.
Here is the code to read bytes from memory, and to store them in memory.
char raw[32] __attribute__ ((aligned (32)));
__v32qi foo = _mm256_loadu_si256( (__m256i*) raw ); // read raw bytes from memory into avx register
_mm256_store_si256( (__m256i*) raw, foo ); // store contents of avx register into memory
You can also load unaligned bytes, using _mm256_loadu_si256 if you prefer.
Where do you expect the 32 pointers to come from? Unless you're wanting to do 32 parallel lookups into a 256-byte lookup table, you don't really have space in the source operand to write the addresses required for the load.
I think you have to do four 8x32-bit gather operations and then merge the results; the gather operation supports unaligned accesses so you could load from an address adjusted to get the target byte in the right place within the YMM register, then just AND with masks and OR to merge.

Representation of a Kilo/Mega/Tera Byte

I was getting a little confused with the representation of different units of bytes.
It is accepted throughout that 1 byte = 8 bits.
However, in a lot of sources I have seen that
1 kiloByte = 2^10 bytes = 1024 bytes
AND
1 kiloByte = 1000 bytes
Doesn't this contradict as in both cases it is stated that 1 byte is 8 bits...?
Different sources claim different reasons for these different representations, thus I am not sure what the most important/real reason is for this rather confusing difference in representation.
Can someone please explain and clarify?
It is accepted throughout that 1 byte = 8 bits
However, in a lot of sources I have seen that
1 kiloByte = 2^ 10 bytes = 1024 bytes
AND
1 kiloByte = 1000 bytes
To make sure we're all clear, your question is "Is a kilobyte equal to 1024 bytes or 1000 bytes?".
Doesn't this contradict as in both cases it is stated that 1 byte is 8 bits...?
This is irrelevant to the question.
So, let's begin. In SI (metric), the multiplier of 1000 is called kilo, abbreviated k. k always means 1000, never anything else.
When binary computers entered the world, we noticed that 2 to the power of 10 is 1024, which is conveniently close to 1000. Computer engineers decided to abuse this coincidence and say that kilo means 1024. By extension, they say that mega means 10242 (instead of the proper definition of 10002), and so on with giga, tera, etc.
While the difference between 1000 and 1024 is small for many purposes, there are times when exact answers are required, and this is where the abusive terminology hurts everyone. Only after decades after kilo=1024 got established did anyone really try to fix the problem. The IEC proposed new prefixes for the binary multipliers: 1024 = kibi, 10242 = mebi, 10243 = gibi, etc.
In summary, the notion that kilo=1024 is an abusive deviation from the consistent SI definition of kilo=1000. While kilo=1024 is popular in the computer industry, it is nevertheless wrong and should be replaced by kibi=1024. Or numbers need to be recomputed to reflect the true definition of kilo/mega/etc. (For example, "512 MB" of RAM is actually about 536.9 MB.)
Btw, don't use random capitalization; it's spelled kilobyte, not kiloByte.
References and links:
http://physics.nist.gov/cuu/Units/binary.html
http://en.wikipedia.org/wiki/Kilo-
http://en.wikipedia.org/wiki/Kilobyte
http://en.wikipedia.org/wiki/Kibibyte
http://xkcd.com/394/
When you talk about data information in computer science, you always have to calculate the result by a power of two. See what wikipedia says:
"In computing, a binary prefix is a
specifier or mnemonic that is
prepended to the units of digital
information, the bit and the byte, to
indicate multiplication by a power of
2. In practice the powers used are multiples of 10, so the prefixes
denote powers of 1024 = 2^10."
Sometimes people use to round it as you have mentioned, but it is a bad use of it.
I don't see what the byte to bits has to do with anything if you are asking whether 1 kiloByte is equal to 1024 or 1000 bytes. These measurements are not set in stone and are not really controlled at all. Computer makers can (and have) used the 1000 conversion to make it look like they have more memory.
The problem comes up when thinking about binary (base 2) or base 10. Base 10 you would use 1000, base 2, 1024.

How many string characters should I read to get a good hash?

Here is a little conundrum for you: If you use a hash algorithm like CRC-64 then how many bytes in a string would be necessary to read to calculate a good hash? Lets say all your strings are at least 2 KB long then it seems a waste or resources using the whole string to calculate the cache, but just how many characters do you think is enough? Would just 8 ASCII-characters be enough since it equals 64-bits? Wont using more than 8 ASCII characters just be pointless? I want to know your though on this.
Update:
With a 'good hash' I mean the point where the likelihood of hash collisions can not get any less by using even more bytes to calculate it.
If you use CRC-64 over 8 bytes or less then there is no point in using CRC-64: just use the 8 bytes "as is". A CRC does not have any added value unless the input is longer than the intended output.
As a general rule, if your hash function has an output of n bits then collisions begin to appear once you have accumulated about 2n/2 strings. In shorter words, if you use 64 bits, then it is very unlikely that you encounter a collision in the first 2 billions of strings. If you get a 160-bit or more output, then collisions are virtually unfeasible (you will encounter much less collisions than hardware failures such as the CPU catching fire). This assumes that the hash function is "perfect". If your hash function begins by selecting a few data bytes, then, necessarily, the bytes that you do not select cannot have any influence on the hash output, so you'd better use the "good" bytes -- which utterly depends on the kind of strings that you are hashing. There is no general rule here.
My advice would be to first try using a generic hash function over the whole string; I usually recommend MD4. MD4 is a cryptographic hash function, which has been utterly broken, but for a problem with no security involved, it is still very good at mixing data elements (cryptographically speaking, a CRC is so much more broken than MD4). MD4 has been reported to actually be faster than CRC-32 on some platforms, so you could give it a shot. On a basic PC (my 2.4 GHz Core2), a MD4 implementation works at about 700 MBytes/s, so we are talking about 35000 hashed 2 kB strings per second, which is not bad.
What are the chances that the first 8 letters of two different strings are the same? Depending on what these strings are, it could be very high, in which case you'll definitely get hash collisions.
Hash the whole thing. A few kilobytes is nothing. Unless you actually have a need to save nanoseconds in your program, not hashing the full strings would be premature optimization.

What is the exact meaning of 'N' bit processor ? , clarification for freescale arch

While reading one Freescale processor manual I stuck somewhere, which specifies that it is a 32-bit processor.
May I know the exact meaning and logic behind that?
Update:
Does it specify its ALU width or its address width or its register width specifically or all of them together is N-bit each.
Update:
Hope you have heard of Freescale processors. I just came across their site which describes one of their latest Starcore-based processor known as SC3850 as a 16-bit processor. As far as I know, it has 32 bit program counters, including ALU, and 40-bit register width and 2x64 bit address bus width. Also the SC3850 can handle SIMD(2) instructions which are of 32 bit or 64 bit.
For more details please go through this link
One of the major reasons you would care about the register width of the processor is performance. Generally doubling the number of bits doubles the rate at which a processor can move data around, and compute. This is why we're not all using 8 bit processors.
The other major reason is address space. A 16 bit program counter limits you to 64k of address space, and a 32 bit counter limits you to 4 gigabytes. The new 64 bit processors make it possible, if all the address lines are present, to support 17,179,869,184 gigabytes of memory.
Firstly i dont have a definitive answer but i would guess that 8 being a power of 2, is an important factor. Being a power of 2 also means that certain optimisations may be performed by dividing the 8 bits into groups which also means lookup tables can be used for certain operations. 8 bits in the past was also the perfect size when dealing wiht plain old ascii characters. I can imagine that using 5 bit bytes and encoding a string of ascii characters across memory would be a pain.
Please check out the Wikipedia entry on 32-bit processors, from the entry:
In computer architecture, 32-bit
integers, memory addresses, or other
data units are those that are at most
32 bits (4 octets) wide. Also, 32-bit
CPU and ALU architectures are those
that are based on registers, address
buses, or data buses of that size.
32-bit is also a term given to a
generation of computers in which
32-bit processors were the norm.
Read and understand the article - then the answer for N will be obvious.

Variable-byte encoding clarification

I am very new to the world of byte encoding so please excuse me (and by all means, correct me) if I am using/expressing simple concepts in the wrong way.
I am trying to understand variable-byte encoding. I have read the Wikipedia article (http://en.wikipedia.org/wiki/Variable-width_encoding) as well as a book chapter from an Information Retrieval textbook. I think I understand how to encode a decimal integer. For example, if I wanted to provide variable-byte encoding for the integer 60, I would have the following result:
1 0 1 1 1 1 0 0
(please let me know if the above is incorrect). If I understand the scheme, then I'm not completely sure how the information is compressed. Is it because usually we would use 32 bits to represent an integer, so that representing 60 would result in 1 1 1 1 0 0 preceded by 26 zeros, thus wasting that space as opposed to representing it with just 8 bits instead?
Thank you in advance for the clarifications.
The way you do it is by reserving one of the bits to mean "I'm not done with the value." Usually, that's the most significant bit.
When you read a byte, you process the lower 7 bits. If the most significant bit is 1, then you know there's one more byte to read, and you repeat the process, adding the next 7 bits to the current 7 bits.
The MIDI format uses that exact encoding to represent lengths of MIDI events, in the following manner:
ExpectedValue = 0
byte=ReadFromFile
ExpectedValue = ExpectedValue + (byte AND 0x7f)
if byte > 127 then
ExpectedValue = ExpectedValue SHL 7
Goto 2
Done
For example, the value 0x80 would be represented using the bytes 0x81 0x00. You can try running the algorithm on those two bytes, and you see you'll get the right value.
UTF-8 works similarly, but it uses a slightly more complex scheme to tell you how many bytes you should be expecting. This allows for some error correction, since you can easily tell if the bytes you're getting match the length claimed. Wikipedia describes their structure quite well.
You hit the nail on the head.
There are many encoding schemes, such as gamma and delta, which are special cases of elias coding. These are bit-level codes, as opposed to the byte-level code you used, and are useful when you have a strong skew towards small numbers (which can often be achieved by encoding deltas instead of absolute values).
Bit-level encoding schemes are much more difficult to implement than byte-level schemes and the additional CPU burden may outweigh the time saved by having less data to read, though most modern CPUs have "highest-bit" and "lowest-bit" instructions that dramatically improve the performance of bit-level codecs. As CPU speeds continue to outpace RAM speeds, bit-level schemes will become more attractive, though the simplicity of byte-level codecs is a big factor too.
Yes, you are right, you save space by encoding using one byte instead of 4.
Generally, you will save memory if the values you are encoding are much smaller than the maximum value that would have fit in your original fixed-width encoding.