Can't reproduce cpu cache-miss - cpu-cache

Good day!
I'm reading this wonderful article: What every programmer should know about memory. Right now I'm trying to figure out how CPU caches work and reproduce the experiment with cache misses. The aim is reproduce performance degradation when amount of accessed data rises (figure 3.4). I wrote a little program, that should reproduce degradation, but it doesn't. The performance degradation appears after I allocate more than 4Gb of memory, and I don't understand why. I think it should appear when 12 or maybe 100 of MB allocated. Maybe the program is wrong and I miss something? I use
Intel Core i7-2630QM
L1: 256Kb
L2: 1Mb
L3: 6Mb
Here is the GO listing.
main.go
package main
import (
"fmt"
"math/rand"
)
const (
n0 = 1000
n1 = 100000
)
func readInt64Time(slice []int64, idx int) int64
func main() {
ss := make([][]int64, n0)
for i := range ss {
ss[i] = make([]int64, n1)
for j := range ss[i] {
ss[i][j] = int64(i + j)
}
}
var t int64
for i := 0; i < n0; i++ {
for j := 0; j < n1; j++ {
t0 := readInt64Time(ss[i], rand.Intn(n1))
if t0 <= 0 {
panic(t0)
}
t += t0
}
}
fmt.Println("Avg time:", t/int64(n0*n1))
}
main.s
// func readInt64Time(slice []int64, idx int) int64
TEXT ·readInt64Time(SB),$0-40
MOVQ slice+0(FP), R8
MOVQ idx+24(FP), R9
RDTSC
SHLQ $32, DX
ORQ DX, AX
MOVQ AX, R10
MOVQ (R8)(R9*8), R8 // Here I'm reading the memory
RDTSC
SHLQ $32, DX
ORQ DX, AX
SUBQ R10, AX
MOVQ AX, ret+32(FP)
RET

For those who interested. I reproduced the 'cache-miss' behavior. But the performance degradation was not so dramatic as the article describes. Here is the final benchmark listing:
main.go
package main
import (
"fmt"
"math/rand"
"runtime"
"runtime/debug"
)
func readInt64Time(slice []int64, idx int) int64
const count = 2 << 25
func measure(np uint) {
n := 2 << np
s := make([]int64, n)
for i := range s {
s[i] = int64(i)
}
t := int64(0)
n8 := n >> 3
for i := 0; i < count; i++ {
// Intex is 64 byte aligned, since cache line is 64 byte
t0 := readInt64Time(s, rand.Intn(n8)<<3)
t += t0
}
fmt.Printf("Allocated %d Kb. Avg time: %v\n",
n/128, t/count)
}
func main() {
debug.SetGCPercent(-1) // To eliminate GC influence
for i := uint(10); i < 27; i++ {
measure(i)
runtime.GC()
}
}
main_amd64.s
// func readInt64Time(slice []int64, idx int) int64
TEXT ·readInt64Time(SB),$0-40
MOVQ slice+0(FP), R8
MOVQ idx+24(FP), R9
RDTSC
SHLQ $32, DX
ORQ DX, AX
MOVQ AX, R10
MOVQ (R8)(R9*8), R11 // Read memory
MOVQ $0, (R8)(R9*8) // Write memory
RDTSC
SHLQ $32, DX
ORQ DX, AX
SUBQ R10, AX
MOVQ AX, ret+32(FP)
RET
I disabled garbadge collector to illiminate its infuence and made 64B index alignment, since my processor has 64B cache line.
The benchmark result is:
Allocated 16 Kb. Avg time: 22
Allocated 32 Kb. Avg time: 22
Allocated 64 Kb. Avg time: 22
Allocated 128 Kb. Avg time: 22
Allocated 256 Kb. Avg time: 22
Allocated 512 Kb. Avg time: 23
Allocated 1024 Kb. Avg time: 23
Allocated 2048 Kb. Avg time: 24
Allocated 4096 Kb. Avg time: 25
Allocated 8192 Kb. Avg time: 29
Allocated 16384 Kb. Avg time: 31
Allocated 32768 Kb. Avg time: 33
Allocated 65536 Kb. Avg time: 34
Allocated 131072 Kb. Avg time: 34
Allocated 262144 Kb. Avg time: 35
Allocated 524288 Kb. Avg time: 35
Allocated 1048576 Kb. Avg time: 39
I run this bench many times, and it gave me similar results for every run. If I removed read and write ops from asm code, then I got 22 cycles for all allocations, so this time difference is memory access time. As you can see, the first time shift is at 512 Kb allocation size. Just one cpu cycle, but it's very stable. The next time change at 2 Mb. At 8 Mb there is the most significant time change, but it still 4 cycles, and we out of cache at all.
After all that tests I see that there is no dramatic cost for the cache-miss. It still significant, since the time difference is 10-15 times, but not 50-500 times as we saw in the article. Maybe todays memory significantly faster, than 7 years ago? Looks promising =) Maybe after the next 7 years there will be architertures with no cpu caches at all. We'll see.
EDIT:
As #Leeor mentioned, RDTSC instruction has no serializing behaviour and there can be out of order execution. I tried RDTSCP instruction instead:
main_amd64.s
// func readInt64Time(slice []int64, idx int) int64
TEXT ·readInt64Time(SB),$0-40
MOVQ slice+0(FP), R8
MOVQ idx+24(FP), R9
BYTE $0x0F; BYTE $0x01; BYTE $0xF9; // RDTSCP
SHLQ $32, DX
ORQ DX, AX
MOVQ AX, R10
MOVQ (R8)(R9*8), R11 // Read memory
MOVQ $0, (R8)(R9*8) // Write memory
BYTE $0x0F; BYTE $0x01; BYTE $0xF9; // RDTSCP
SHLQ $32, DX
ORQ DX, AX
SUBQ R10, AX
MOVQ AX, ret+32(FP)
RET
Here what I got with this changes:
Allocated 16 Kb. Avg time: 27
Allocated 32 Kb. Avg time: 27
Allocated 64 Kb. Avg time: 28
Allocated 128 Kb. Avg time: 29
Allocated 256 Kb. Avg time: 30
Allocated 512 Kb. Avg time: 34
Allocated 1024 Kb. Avg time: 42
Allocated 2048 Kb. Avg time: 55
Allocated 4096 Kb. Avg time: 120
Allocated 8192 Kb. Avg time: 167
Allocated 16384 Kb. Avg time: 173
Allocated 32768 Kb. Avg time: 189
Allocated 65536 Kb. Avg time: 201
Allocated 131072 Kb. Avg time: 215
Allocated 262144 Kb. Avg time: 224
Allocated 524288 Kb. Avg time: 242
Allocated 1048576 Kb. Avg time: 281
Now I see that big difference between cahce and RAM access. The time is actualy 2 times lower than in the article, but it's predictable, since memory frequency is as twice higher.

This indeed doesn't produce the attempted observation, it's not clear exactly what your benchmark does - are you accessing randomly within the range? are you measuring access latency per single access?
It seems that your benchmark will incur a constant overhead per time measurement that is not amortized, so you're basically measuring the function call time (which is constant).
Only when the memory latency become large enough to pass that overhead (when you access the DRAM at 4GB), you actually start getting meaningful measurements.
You should switch to measuring the time over the entire loop (over count iterations) and divide.

Related

MongoDB slow - memory usage very high

I have a 5 note replicaset mongoDB - 1 primary, 3 secondaries and 1 arbiter.
I am using mong version 4.2.3
Sizes:
“dataSize” : 688.4161271536723,
“indexes” : 177,
“indexSize” : 108.41889953613281
My Primary is very slow - each command from the shell takes a long time to return.
Memory usage seems very high, and looks like mongodb is consuming more than 50% of the RAM:
# free -lh
total used free shared buff/cache available
Mem: 188Gi 187Gi 473Mi 56Mi 740Mi 868Mi
Low: 188Gi 188Gi 473Mi
High: 0B 0B 0B
Swap: 191Gi 117Gi 74Gi
------------------------------------------------------------------
Top Memory Consuming Process Using ps command
------------------------------------------------------------------
PID PPID %MEM %CPU CMD
311 49145 97.8 498 mongod --config /etc/mongod.conf
23818 23801 0.0 3.8 /bin/prometheus --config.file=/etc/prometheus/prometheus.yml
23162 23145 0.0 8.4 /usr/bin/cadvisor -logtostderr
25796 25793 0.0 0.4 postgres: checkpointer
23501 23484 0.0 1.0 /postgres_exporter
24490 24473 0.0 0.1 grafana-server --homepath=/usr/share/grafana --config=/etc/grafana/grafana.ini --packaging=docker cfg:default.log.mode=console
top:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
311 systemd+ 20 0 313.9g 184.6g 2432 S 151.7 97.9 26229:09 mongod
23818 nobody 20 0 11.3g 150084 17988 S 20.7 0.1 8523:47 prometheus
23162 root 20 0 12.7g 93948 5964 S 65.5 0.0 18702:22 cadvisor
serverStatus memeory shows this:
octopusrs0:PRIMARY> db.serverStatus().mem
{
"bits" : 64,
"resident" : 189097,
"virtual" : 321404,
"supported" : true
}
octopusrs0:PRIMARY> db.serverStatus().tcmalloc.tcmalloc.formattedString
------------------------------------------------
MALLOC: 218206510816 (208097.9 MiB) Bytes in use by application
MALLOC: + 96926863360 (92436.7 MiB) Bytes in page heap freelist
MALLOC: + 3944588576 ( 3761.9 MiB) Bytes in central cache freelist
MALLOC: + 134144 ( 0.1 MiB) Bytes in transfer cache freelist
MALLOC: + 713330688 ( 680.3 MiB) Bytes in thread cache freelists
MALLOC: + 1200750592 ( 1145.1 MiB) Bytes in malloc metadata
MALLOC: ------------
MALLOC: = 320992178176 (306122.0 MiB) Actual memory used (physical + swap)
MALLOC: + 13979086848 (13331.5 MiB) Bytes released to OS (aka unmapped)
MALLOC: ------------
MALLOC: = 334971265024 (319453.5 MiB) Virtual address space used
MALLOC:
MALLOC: 9420092 Spans in use
MALLOC: 234 Thread heaps in use
MALLOC: 4096 Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.
How can I detrmine what is causing this high memory consumption, and what can I do to return to nornal memory consumption?
Thanks,
Tamar

what are using large mem_reserve in a .net application?

We have an asp.net web service application that is having OutOfMemory issue with IIS w3wp.exe. We got mini memory dump of w3wp when OOM happened, and we see it has large mem_reserve but small managed heap size. Question is how can we find out what is using mem_reserve (virtual address space). I expect to see some mem_reserve but not 2.9G. Since mem_reserve doesn't add up to eeheap, would it be from unmanaged heap? then anyway to confirm that?
0:000> !address -summary
Mapping file section regions...
Mapping module regions...
Mapping PEB regions...
Mapping TEB and stack regions...
Mapping heap regions...
Mapping page heap regions...
Mapping other regions...
Mapping stack trace database regions...
Mapping activation context regions...
--- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
<unknown> 1664 d4c1d000 ( 3.324 GB) 92.34% 83.11%
Free 348 19951000 ( 409.316 MB) 9.99%
Image 1184 ad58000 ( 173.344 MB) 4.70% 4.23%
Heap 75 5ab4000 ( 90.703 MB) 2.46% 2.21%
Stack 193 1200000 ( 18.000 MB) 0.49% 0.44%
TEB 64 40000 ( 256.000 kB) 0.01% 0.01%
Other 10 35000 ( 212.000 kB) 0.01% 0.01%
PEB 1 1000 ( 4.000 kB) 0.00% 0.00%
--- Type Summary (for busy) ------ RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_PRIVATE 1323 d6f04000 ( 3.358 GB) 93.28% 83.96%
MEM_IMAGE 1828 dd75000 ( 221.457 MB) 6.01% 5.41%
MEM_MAPPED 40 1a26000 ( 26.148 MB) 0.71% 0.64%
--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_RESERVE 746 b8b4e000 ( 2.886 GB) 80.16% 72.15%
MEM_COMMIT 2445 2db51000 ( 731.316 MB) 19.84% 17.85%
MEM_FREE 348 19951000 ( 409.316 MB) 9.99%
--- Protect Summary (for commit) - RgnCount ----------- Total Size -------- %ofBusy %ofTotal
PAGE_READWRITE 1024 1f311000 ( 499.066 MB) 13.54% 12.18%
PAGE_EXECUTE_READ 248 b2e6000 ( 178.898 MB) 4.85% 4.37%
PAGE_READONLY 613 1c1d000 ( 28.113 MB) 0.76% 0.69%
PAGE_WRITECOPY 219 103e000 ( 16.242 MB) 0.44% 0.40%
PAGE_EXECUTE_READWRITE 144 59c000 ( 5.609 MB) 0.15% 0.14%
PAGE_EXECUTE_WRITECOPY 69 21f000 ( 2.121 MB) 0.06% 0.05%
PAGE_READWRITE|PAGE_GUARD 128 144000 ( 1.266 MB) 0.03% 0.03%
--- Largest Region by Usage ----------- Base Address -------- Region Size ----------
<unknown> c2010000 4032000 ( 64.195 MB)
Free 86010000 4000000 ( 64.000 MB)
Image 72f48000 f5d000 ( 15.363 MB)
Heap 64801000 fcf000 ( 15.809 MB)
Stack 1ed0000 7a000 ( 488.000 kB)
TEB ffe8a000 1000 ( 4.000 kB)
Other fffb0000 23000 ( 140.000 kB)
PEB fffde000 1000 ( 4.000 kB)
0:000> !ao
---------Heap 1 ---------
Managed OOM occured after GC #2924 (Requested to allocate 0 bytes)
Reason: Didn't have enough memory to allocate an LOH segment
Detail: LOH: Failed to reserve memory (117440512 bytes)
---------Heap 7 ---------
Managed OOM occured after GC #2308 (Requested to allocate 0 bytes)
Reason: Could not do a full GC
0:000> !vmstat
TYPE MINIMUM MAXIMUM AVERAGE BLK COUNT TOTAL
~~~~ ~~~~~~~ ~~~~~~~ ~~~~~~~ ~~~~~~~~~ ~~~~~
Free:
Small 4K 64K 34K 219 7,475K
Medium 72K 832K 273K 80 21,903K
Large 1,080K 65,536K 7,954K 49 389,759K
Summary 4K 65,536K 1,204K 348 419,139K
Reserve:
Small 4K 64K 14K 436 6,519K
Medium 88K 1,020K 303K 169 51,303K
Large 1,088K 65,528K 21,052K 141 2,968,407K
Summary 4K 65,528K 4,056K 746 3,026,231K
Commit:
Small 4K 64K 16K 2,019 33,386K
Medium 68K 1,024K 287K 275 79,043K
Large 1,028K 65,736K 7,315K 87 636,435K
Summary 4K 65,736K 314K 2,381 748,866K
Private:
Small 4K 64K 28K 851 23,911K
Medium 88K 1,024K 313K 222 69,687K
Large 1,028K 65,736K 18,429K 186 3,427,951K
Summary 4K 65,736K 2,797K 1,259 3,521,550K
Mapped:
Small 4K 64K 18K 23 431K
Medium 68K 1,004K 385K 11 4,239K
Large 1,520K 6,640K 3,684K 6 22,104K
Summary 4K 6,640K 669K 40 26,775K
Image:
Small 4K 64K 9K 1,581 15,562K
Medium 68K 1,000K 267K 211 56,419K
Large 1,028K 15,732K 4,299K 36 154,787K
Summary 4K 15,732K 124K 1,828 226,771K
0:000> !eeheap -gc
Number of GC Heaps: 8
------------------------------
Heap 0 (01cd1720)
generation 0 starts at 0x44470de4
generation 1 starts at 0x43f51000
generation 2 starts at 0x02161000
ephemeral segment allocation context: none
segment begin allocated size
02160000 02161000 02dee4a8 0xc8d4a8(13161640)
43f50000 43f51000 444b9364 0x568364(5669732)
Large object heap starts at 0x12161000
segment begin allocated size
12160000 12161000 121bd590 0x5c590(378256)
c2010000 c2011000 c6041020 0x4030020(67305504)
Heap Size: Size: 0x5281dbc (86515132) bytes.
------------------------------
Heap 1 (01cd65a8)
generation 0 starts at 0x8c4b38e0
generation 1 starts at 0x8c011000
generation 2 starts at 0x04161000
ephemeral segment allocation context: none
segment begin allocated size
04160000 04161000 054114f0 0x12b04f0(19596528)
25c40000 25c41000 25fc7328 0x386328(3695400)
8c010000 8c011000 8c4c4778 0x4b3778(4929400)
Large object heap starts at 0x13161000
segment begin allocated size
13160000 13161000 13161010 0x10(16)
Heap Size: Size: 0x1ae9fa0 (28221344) bytes.
------------------------------
Heap 2 (01cdb5c0)
generation 0 starts at 0x4a71a420
generation 1 starts at 0x49f51000
generation 2 starts at 0x06161000
ephemeral segment allocation context: none
segment begin allocated size
06160000 06161000 06e89b18 0xd28b18(13798168)
27c40000 27c41000 27f8f6dc 0x34e6dc(3466972)
9a010000 9a011000 9a5900b8 0x57f0b8(5763256)
49f50000 49f51000 4a72cc2c 0x7dbc2c(8240172)
Large object heap starts at 0x14161000
segment begin allocated size
14160000 14161000 14161010 0x10(16)
Heap Size: Size: 0x1dd1ee8 (31268584) bytes.
------------------------------
Heap 3 (01ce05d8)
generation 0 starts at 0x34fbe540
generation 1 starts at 0x34d11000
generation 2 starts at 0x08161000
ephemeral segment allocation context: none
segment begin allocated size
08160000 08161000 08c26248 0xac5248(11293256)
2bc40000 2bc41000 2bfc9294 0x388294(3703444)
34d10000 34d11000 34fcdbb0 0x2bcbb0(2870192)
Large object heap starts at 0x15161000
segment begin allocated size
15160000 15161000 151c3b10 0x62b10(404240)
Heap Size: Size: 0x116cb9c (18271132) bytes.
------------------------------
Heap 4 (01ce55f0)
generation 0 starts at 0x934a7cec
generation 1 starts at 0x93011000
generation 2 starts at 0x0a161000
ephemeral segment allocation context: none
segment begin allocated size
0a160000 0a161000 0b4bf898 0x135e898(20310168)
45f50000 45f51000 45f70df4 0x1fdf4(130548)
93010000 93011000 934b83ec 0x4a73ec(4879340)
Large object heap starts at 0x16161000
segment begin allocated size
16160000 16161000 161ab050 0x4a050(303184)
Heap Size: Size: 0x186fac8 (25623240) bytes.
------------------------------
Heap 5 (01cec608)
generation 0 starts at 0x31f1d4d0
generation 1 starts at 0x31c41000
generation 2 starts at 0x0c161000
ephemeral segment allocation context: none
segment begin allocated size
0c160000 0c161000 0d2120b0 0x10b10b0(17502384)
7aea0000 7aea1000 7af5c074 0xbb074(766068)
31c40000 31c41000 31f2caac 0x2ebaac(3062444)
Large object heap starts at 0x17161000
segment begin allocated size
17160000 17161000 17179ad0 0x18ad0(101072)
Heap Size: Size: 0x14706a0 (21431968) bytes.
------------------------------
Heap 6 (01cf5a20)
generation 0 starts at 0x5fbb5598
generation 1 starts at 0x5f821000
generation 2 starts at 0x0e161000
ephemeral segment allocation context: none
segment begin allocated size
0e160000 0e161000 0eb22284 0x9c1284(10228356)
5f820000 5f821000 5fbc9838 0x3a8838(3835960)
Large object heap starts at 0x18161000
segment begin allocated size
18160000 18161000 18161010 0x10(16)
Heap Size: Size: 0xd69acc (14064332) bytes.
------------------------------
Heap 7 (01cfaa38)
generation 0 starts at 0xad530e00
generation 1 starts at 0xad011000
generation 2 starts at 0x10161000
ephemeral segment allocation context: none
segment begin allocated size
10160000 10161000 109bd218 0x85c218(8765976)
a4010000 a4011000 a42f418c 0x2e318c(3027340)
41f50000 41f51000 42a99090 0xb48090(11829392)
ad010000 ad011000 ad5399a0 0x5289a0(5409184)
Large object heap starts at 0x19161000
segment begin allocated size
19160000 19161000 1980ddb0 0x6acdb0(6999472)
Heap Size: Size: 0x225cb84 (36031364) bytes.
------------------------------
GC Heap Size: Size: 0xf950f98 (261427096) bytes.
0:000> !heap -s
SEGMENT HEAP ERROR: failed to initialize the extention
LFH Key : 0x3a251113
Termination on corruption : ENABLED
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
(k) (k) (k) (k) length blocks cont. heap
-----------------------------------------------------------------------------
Virtual block: 1a830000 - 1a830000 (size 00000000)
Virtual block: 1a9d0000 - 1a9d0000 (size 00000000)
007b0000 00000002 16384 15520 16384 198 418 5 2 3 LFH
00280000 00001002 3136 1608 3136 7 23 3 0 0 LFH
00a40000 00001002 1088 828 1088 79 25 2 0 0 LFH
00c30000 00001002 1088 308 1088 13 5 2 0 0 LFH
00100000 00041002 256 4 256 2 1 1 0 0
01560000 00001002 256 24 256 3 4 1 0 0
01510000 00001002 64 4 64 2 1 1 0 0
01670000 00001002 64 4 64 2 1 1 0 0
00290000 00011002 256 4 256 1 2 1 0 0
01870000 00001002 256 8 256 3 3 1 0 0
004e0000 00001002 256 4 256 1 2 1 0 0
019e0000 00001002 256 4 256 2 1 1 0 0
01af0000 00001002 256 4 256 2 1 1 0 0
02080000 00001002 1280 396 1280 1 34 2 0 0 LFH
01fc0000 00041002 256 4 256 2 1 1 0 0
1b700000 00041002 256 256 256 6 4 1 0 0 LFH
21840000 00001002 64192 1880 64192 1639 13 12 0 0 LFH
External fragmentation 87 % (13 free blocks)
-----------------------------------------------------------------------------
If we recycle w3wp, then all memory will go back to system. So there seems no memory leak. I understand that mem_reserve is mapped to virtual address, and are allocated space but not committed. Question is: is there a way to figure out what is using up all these virtual address.
I have memory dump, but I don't have access the the server to get performance counter or something.
Thanks folks!!

Detect number of bytes required for an arbitrary number in Scala

I'm trying to figure out the simplest way to write a function to detect the amount of bytes required for a number in Scala.
For instance the number
0 should be 0 bytes
1 should be 1 byte
127 should be 1 byte
128 should be 2 bytes
32767 should be 2 bytes
32768 should be 3 bytes
8388607 should be 3 bytes
8388608 should be 4 bytes
2147483647 should be 4 bytes
2147483648 should be 5 bytes
549755813887 should be 5 bytes
549755813888 should be 6 bytes
9223372036854775807 should be 8 bytes.
-1 should be 1 byte
-127 should be 1 bytes
-128 should be 2 bytes
-32767 should be 2 bytes
-32768 should be 3 bytes
-8388607 should be 3 bytes
-8388608 should be 4 bytes
-2147483647 should be 4 bytes
-2147483648 should be 5 bytes
-549755813887 should be 5 bytes
-549755813888 should be 6 bytes
-9223372036854775807 should be 8 bytes
is there any way to do this besides doing the math figuring out where the number is wrt 2^N?
After all the precisions in the comments, I guess the algorithm for negative numbers would be: whatever the answer for their opposite would be; and Long.MinValue is not an acceptable input value.
Therefore, I suggest:
def bytes(x: Long): Int = {
val posx = x.abs
if (posx == 0L) 0
else (64 - java.lang.Long.numberOfLeadingZeros(posx)) / 8 + 1
}
Tests needed.
As I mentioned, you're basically asking for "what's the smallest power-of-2-number larger than my number", with a bit of adjustment for the extra digit for the sign (positive or negative).
Here's my solution, although the result differs for 0 and -128, because, as Bergi commented on your question, you can't really write 0 with 0 bytes, and -128 fits in 1 byte.
import Math._
def bytes(x: Double): Int = {
val y = if (x >= 0) x + 1 else -x
ceil((log(y)/log(2) + 1)/8).toInt
}

Padding in MD5 Hash Algorithm

I need to understand the Md5 hash algorithm. I was reading a documents and it states
"The message is "padded" (extended) so that its length (in bits) is
congruent to 448, modulo 512. That is, the message is extended so
that it is just 64 bits shy of being a multiple of 512 bits long.
Padding is always performed, even if the length of the message is
already congruent to 448, modulo 512."
I need to understand what this means in simple terms, especially the 448 modulo 512. The word MODULO is the issue. Please I will appreciate simple examples to this. Funny though, this is the first step to MD5 hash! :)
Thanks
Modulo or mod, is a function that results in telling you the remainder when two numbers are divided by each other.
For example:
5 modulo 3:
5/3 = 1, with 2 remainder. So 5 mod 3 is 2.
10 modulo 16 = 10, because 16 cannot be made.
15 modulo 5 = 0, because 15 goes into 5 exactly 3 times. 15 is a multiple of 5.
Back in school you would have learnt this as "Remainder" or "Left Over", modulo is just a fancy way to say that.
What this is saying here, is that when you use MD5, one of the first things that happens is that you pad your message so it's long enough. In MD5's case, your message must be n bits, where n= (512*z)+448 and z is any number.
As an example, if you had a file that was 1472 bits long, then you would be able to use it as an MD5 hash, because 1472 modulo 512 = 448. If the file was 1400 bits long, then you would need to pad in an extra 72 bits before you could run the rest of the MD5 algorithm.
Modulus is the remainder of division. In example
512 mod 448 = 64
448 mod 512 = 448
Another approach of 512 mod 448 would be to divide them 512/448 = 1.142..
Then you subtract 512 from result number before dot multiplied by 448:
512 - 448*1 == 64 That's your modulus result.
What you need to know that 448 is 64 bits shorter than multiple 512.
But what if it's between 448 and 512??
Normally we need to substract 448 by x(result of modulus).
447 mod 512 = 447; 448 - 447 = 1; (all good, 1 zero to pad)
449 mod 512 = 1; 448 - 449 = -1 ???
So this problem solution would be to take higher multiple of 512 but still shorter of 64;
512*2 - 64 = 960
449 mod 512 = 1; 960 - 449 = 511;
This happens because afterwards we need to add 64 bits original message and the full length have to be multiple of 512.
960 - 449 = 511;
511 + 449 + 64 = 1024;
1024 is multiple of 512;

What exactly does stand IKB for?

I have a question which goes as:
IKB is equal to:
(a) 1k bytes (b) 1024 bytes (c) 210 bytes (d) none of these
The answer says (b), but I haven't heard of this before, so my question is what exactly is IKB?
with thanks to Kilobyte
If you do a quick google, you will find a lot of explanations,
something like Bits and Bytes : An Explanation
This is almost certainly supposed to read “1KB” or 1 kilobyte, which is usually taken to mean 1024 bytes rather 1000 bytes.
"1kb" == "1 kilobyte" == 1024 bytes
"IKB" doesn't mean any of these things.
So I guess "d)" is the answer.
Can you say "Trick Question"? ;)
kilobyte (kB) 10^3 kibibyte (KiB) 2^10 = 1.024 × 103
megabyte (MB) 10^6 mebibyte (MiB) 2^20 ≈ 1.049 × 106
gigabyte (GB) 10^9 gibibyte (GiB) 2^30 ≈ 1.074 × 109
terabyte (TB) 10^12 tebibyte (TiB) 2^40 ≈ 1.100 × 1012
petabyte (PB) 10^15 pebibyte (PiB) 2^50 ≈ 1.126 × 1015
exabyte (EB) 10^18 exbibyte (EiB) 2^60 ≈ 1.153 × 1018
zettabyte (ZB) 10^21 zebibyte (ZiB) 2^70 ≈ 1.181 × 1021
yottabyte (YB) 10^24 yobibyte (YiB) 2^80 ≈ 1.209 × 102
This is all SI decimal prefixes(left)and EC binary prefixes(right)
as you can see not one of them used "lkb" so it must be a typo
Source: http://en.wikipedia.org/wiki/Kilobyte