I have heard many times, when reading about Unicode, that UTF-32 is a fixed width encoding.
Taking fixed width encoding to mean "a code which maps source symbols to a set number of bits," and, assuming that the source symbols in question are Unicode code points, this all makes sense. However, if you think of the underlying language of source symbols being graphemes, things get a lot more complicated.
So my question is this, in the sense of graphemes, is UTF-32 truly a fixed length encoding? And if not, is there a possible fixed length encoding in that sense?
One of the comments referenced Joel Spolsky's The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) article, which was written in 2003. At the time, it served as a wake-up call (it probably still does in some places). However, it is not without its (minor, but significant) technical problems — though the overall thesis ('you need to know about Unicode, and you need to know which encoding a string is in') remains valid. The comment then continued:
And yes, UTF-16 and UTF-32 are both fixed width. UTF-8 … isn't.
UTF-16 isn't really fixed width; some Unicode code points are one 16-bit code unit, others require two 16-bit code units — just like UTF-8 isn't fixed width; some Unicode code points require one 8-bit code units, others require two, three or even four 8-bit code units (but not five or six, despite the comment from Joel's article that mentions the possibility). UTF-32, on the other hand, is fixed width; all Unicode code points can be encoded in a single 32-bit code unit. (Indeed, the maximum possible Unicode code point is U+10FFFF, so Unicode is a 21-bit code set, though it does not use all possible combinations of 21 bits.)
However, code points are not identical to characters, let alone graphemes. The Unicode FAQ has a section on Characters and Combining Marks that discusses graphemes, referencing the glossary definition.
The better word for what end-users think of as characters is grapheme (as defined in the Unicode glossary): a minimally distinctive unit of writing in the context of a particular writing system.
Graphemes are not necessarily combining character sequences, and combining character sequences are not necessarily graphemes.
Q: How are characters counted when measuring the length or position of a character in a string?
A: Computing the length or position of a "character" in a Unicode string can be a little complicated, as there are four different approaches to doing so, plus the potential confusion caused by combining characters. The correct choice of which counting method to use depends on what is being counted and what the count or position is used for.
To address the question here:
If you mean something to do with 'it can take multiple Unicode code points to get a complete character (grapheme) with associated diacritics (combining markers, etc.)' then yes, even UTF-32 isn't necessarily fixed width and there is no fixed width encoding for Unicode.
UTF-32 employs a fixed-width encoding for each Unicode code point, but since it can take multiple code points to create a complete grapheme, even UTF-32 does not have a 1:1 mapping between code points and graphemes.
Of course, you can also find interesting character stacks in some comments on SO. For example:
#̮̘̮̜̤͓͓̓ͪ̓͆͗̑Ṷ̫̠̤̙̻͚̗ͭs̹͓̰̫͉̲̺̈̏̽̅̑ͩ̇̓̉e͖̝̦̦̿r͔̒̿̋̂̓n̹͖̥ͥͦͤ̍͊̏ä͇͖͚͖̃̎͊m̭͇̂͆͋̋͒e̫̠͇̰̱̦̹͗͋̓̿͒ ͔͖̫̬̗̪̪̳ͧ̄ͫB̜̥̣̬̮͈͒̄ͪ͊l̮͉̣̟̪̪̿̍ͫ͋͐̑a̜̦̪͗͗̈́ͣ͊ḫ̘̯͈̠̞͒ͯ ̣͕͚̗̠͖̫̆͌͒̓͛b̖̣͇̖̦̃̑ͬͭͥl͔͍͚͕̲̪̼͎ͧ̇̏ạ̖̪͚̯̊ͤͣͦͮ̌h̘͓͔̟͔͍̏ͣͦ̓̓ ̫̼̫ͮ͌̄ͤ̿̈͆b̙͍̼̜͍̹̬̬͎ͥ̓ͯ̂ḽ̜̟̲̾̅̆ͦ̃ͨa͇̰̝̺͊ͧͫ͛h̯̻͉̉̒̉̈́́ͥ̀.̖̩̭͇̭͔̹̈́̇͐ͬͦͦͨ̾̇.͍̪̣͂ͬ.̞͍̥̪̺̤̣̜͆ͫ̈́͑ͦ͂͑͑
Why/how do "Zalgo pings" work?
How does Zalgo text work?
Ȩ̸҉̟͎͚̹͚̙̟̖x̨͙̰͕̖͉̼̜̲̦̟͈́ͅͅą̷̘͕͈̹͓̣̮̼̣̠̹́c̼͙̠̭̫̰͈͍̮͢͡ţ̢̛̠͇̬̖̟̺͈̲̻̣̲͙͈̼͍̘̱ͅl̶̷̨̲͙͖̻̲̗̦͚͙̮͘͠y̭̖̰͚̞̣̗̳̠͕̻̼͡ͅ!̛͖̮͔͍̰͉͢ ̭̙̖͔̩̗̠͕̦̬͓͞͝ͅO҉҉̣̜̺̪̳͕̖͔̠͙͎͕̙̦ͅn̩͓͖̝̟̭͙͙͓͚̼͖͖͜͞ȩ̧̬̱̦̠̙̥͇͔̪́ ҉̸̗̦͇̰̪̰̭̘̹͘͢i̴͞͏̩̤̹̗̖̰͎̖̲̲̘͓̗̯͚̞͖̥̻͝s͞҉̲͈̙̹̤̫͇ ͚̭͎͉̠̺͉̮̞̻̣̰̺̖͖̀́͢͞e̷̪̭̯̼͓͎̹̠͖̲͔̪͈̦͈̱͍̭̩͠ņ͞҉̮̳͓͙͈̼͉̬͕͈̺͈̭̩̪o͇̗̱̠̱̠̯̕͢u̸̳̦̩̳̫̖̜ͅǵ̢̲̣͎̮̮̼̫̥̠͙̱̝̘͕͎̳̜̲̖h̸̛̩͚̮̤̖̹͙.̶̨̳̖̠̗̼̩͕͇͉͓̟̦͜͞ͅ
What you see, of course, depends on the quality of the Unicode support in your browser (which, in turn, depends in part on the quality of the O/S support). I get to see different results on two different Macs running rather different versions of Firefox, even though they're running the same base O/S version (10.10.4 Yosemite).
The second of those examples can be decoded from UTF-8 into the following sequence of Unicode code points — it is only 700 bytes on disk:
0xC8 0xA8 = U+0228
0xCC 0xB8 = U+0338
0xD2 0x89 = U+0489
0xCC 0x9F = U+031F
0xCD 0x8E = U+034E
0xCD 0x9A = U+035A
0xCC 0xB9 = U+0339
0xCD 0x9A = U+035A
0xCC 0x99 = U+0319
0xCC 0x9F = U+031F
0xCC 0x96 = U+0316
0x78 = U+0078
0xCC 0xA8 = U+0328
0xCD 0x99 = U+0359
0xCC 0xB0 = U+0330
0xCD 0x95 = U+0355
0xCC 0x96 = U+0316
0xCD 0x89 = U+0349
0xCC 0xBC = U+033C
0xCC 0x9C = U+031C
0xCC 0xB2 = U+0332
0xCC 0xA6 = U+0326
0xCC 0x9F = U+031F
0xCD 0x88 = U+0348
0xCC 0x81 = U+0301
0xCD 0x85 = U+0345
0xCD 0x85 = U+0345
0xC4 0x85 = U+0105
0xCC 0xB7 = U+0337
0xCC 0x98 = U+0318
0xCD 0x95 = U+0355
0xCD 0x88 = U+0348
0xCC 0xB9 = U+0339
0xCD 0x93 = U+0353
0xCC 0xA3 = U+0323
0xCC 0xAE = U+032E
0xCC 0xBC = U+033C
0xCC 0xA3 = U+0323
0xCC 0xA0 = U+0320
0xCC 0xB9 = U+0339
0xCC 0x81 = U+0301
0x63 = U+0063
0xCC 0xBC = U+033C
0xCD 0x99 = U+0359
0xCC 0xA0 = U+0320
0xCC 0xAD = U+032D
0xCC 0xAB = U+032B
0xCC 0xB0 = U+0330
0xCD 0x88 = U+0348
0xCD 0x8D = U+034D
0xCC 0xAE = U+032E
0xCD 0xA2 = U+0362
0xCD 0xA1 = U+0361
0xC5 0xA3 = U+0163
0xCC 0xA2 = U+0322
0xCC 0x9B = U+031B
0xCC 0xA0 = U+0320
0xCD 0x87 = U+0347
0xCC 0xAC = U+032C
0xCC 0x96 = U+0316
0xCC 0x9F = U+031F
0xCC 0xBA = U+033A
0xCD 0x88 = U+0348
0xCC 0xB2 = U+0332
0xCC 0xBB = U+033B
0xCC 0xA3 = U+0323
0xCC 0xB2 = U+0332
0xCD 0x99 = U+0359
0xCD 0x88 = U+0348
0xCC 0xBC = U+033C
0xCD 0x8D = U+034D
0xCC 0x98 = U+0318
0xCC 0xB1 = U+0331
0xCD 0x85 = U+0345
0x6C = U+006C
0xCC 0xB6 = U+0336
0xCD 0x98 = U+0358
0xE2 0x80 0x8C = U+200C
0xE2 0x80 0x8B = U+200B
0xCC 0xB7 = U+0337
0xCC 0xA8 = U+0328
0xCC 0xB2 = U+0332
0xCD 0x99 = U+0359
0xCD 0x96 = U+0356
0xCC 0xBB = U+033B
0xCC 0xB2 = U+0332
0xCC 0x97 = U+0317
0xCC 0xA6 = U+0326
0xCD 0x9A = U+035A
0xCD 0x99 = U+0359
0xCC 0xAE = U+032E
0xCD 0xA0 = U+0360
0x79 = U+0079
0xCC 0xAD = U+032D
0xCC 0x96 = U+0316
0xCC 0xB0 = U+0330
0xCD 0x9A = U+035A
0xCC 0x9E = U+031E
0xCC 0xA3 = U+0323
0xCC 0x97 = U+0317
0xCC 0xB3 = U+0333
0xCC 0xA0 = U+0320
0xCD 0x95 = U+0355
0xCC 0xBB = U+033B
0xCC 0xBC = U+033C
0xCD 0xA1 = U+0361
0xCD 0x85 = U+0345
0x21 = U+0021
0xCC 0x9B = U+031B
0xCD 0x96 = U+0356
0xCC 0xAE = U+032E
0xCD 0x94 = U+0354
0xCD 0x8D = U+034D
0xCC 0xB0 = U+0330
0xCD 0x89 = U+0349
0xCD 0xA2 = U+0362
0x20 = U+0020
0xCC 0xAD = U+032D
0xCC 0x99 = U+0319
0xCC 0x96 = U+0316
0xCD 0x94 = U+0354
0xCC 0xA9 = U+0329
0xCC 0x97 = U+0317
0xCC 0xA0 = U+0320
0xCD 0x95 = U+0355
0xCC 0xA6 = U+0326
0xCC 0xAC = U+032C
0xCD 0x93 = U+0353
0xCD 0x9E = U+035E
0xCD 0x9D = U+035D
0xCD 0x85 = U+0345
0x4F = U+004F
0xD2 0x89 = U+0489
0xD2 0x89 = U+0489
0xCC 0xA3 = U+0323
0xCC 0x9C = U+031C
0xCC 0xBA = U+033A
0xCC 0xAA = U+032A
0xCC 0xB3 = U+0333
0xCD 0x95 = U+0355
0xCC 0x96 = U+0316
0xCD 0x94 = U+0354
0xCC 0xA0 = U+0320
0xCD 0x99 = U+0359
0xCD 0x8E = U+034E
0xCD 0x95 = U+0355
0xCC 0x99 = U+0319
0xCC 0xA6 = U+0326
0xCD 0x85 = U+0345
0x6E = U+006E
0xCC 0xA9 = U+0329
0xCD 0x93 = U+0353
0xCD 0x96 = U+0356
0xCC 0x9D = U+031D
0xCC 0x9F = U+031F
0xCC 0xAD = U+032D
0xCD 0x99 = U+0359
0xCD 0x99 = U+0359
0xCD 0x93 = U+0353
0xCD 0x9A = U+035A
0xCC 0xBC = U+033C
0xCD 0x96 = U+0356
0xCD 0x96 = U+0356
0xCD 0x9C = U+035C
0xCD 0x9E = U+035E
0xC8 0xA9 = U+0229
0xCC 0xA7 = U+0327
0xCC 0xAC = U+032C
0xCC 0xB1 = U+0331
0xCC 0xA6 = U+0326
0xCC 0xA0 = U+0320
0xCC 0x99 = U+0319
0xCC 0xA5 = U+0325
0xCD 0x87 = U+0347
0xCD 0x94 = U+0354
0xCC 0xAA = U+032A
0xCC 0x81 = U+0301
0x20 = U+0020
0xD2 0x89 = U+0489
0xCC 0xB8 = U+0338
0xCC 0x97 = U+0317
0xCC 0xA6 = U+0326
0xCD 0x87 = U+0347
0xCC 0xB0 = U+0330
0xCC 0xAA = U+032A
0xCC 0xB0 = U+0330
0xCC 0xAD = U+032D
0xCC 0x98 = U+0318
0xCC 0xB9 = U+0339
0xCD 0x98 = U+0358
0xCD 0xA2 = U+0362
0x69 = U+0069
0xCC 0xB4 = U+0334
0xCD 0x9E = U+035E
0xCD 0x8F = U+034F
0xCC 0xA9 = U+0329
0xCC 0xA4 = U+0324
0xCC 0xB9 = U+0339
0xCC 0x97 = U+0317
0xCC 0x96 = U+0316
0xCC 0xB0 = U+0330
0xCD 0x8E = U+034E
0xCC 0x96 = U+0316
0xCC 0xB2 = U+0332
0xCC 0xB2 = U+0332
0xCC 0x98 = U+0318
0xCD 0x93 = U+0353
0xCC 0x97 = U+0317
0xCC 0xAF = U+032F
0xCD 0x9A = U+035A
0xCC 0x9E = U+031E
0xCD 0x96 = U+0356
0xCC 0xA5 = U+0325
0xCC 0xBB = U+033B
0xCD 0x9D = U+035D
0x73 = U+0073
0xCD 0x9E = U+035E
0xD2 0x89 = U+0489
0xCC 0xB2 = U+0332
0xCD 0x88 = U+0348
0xCC 0x99 = U+0319
0xCC 0xB9 = U+0339
0xCC 0xA4 = U+0324
0xCC 0xAB = U+032B
0xCD 0x87 = U+0347
0x20 = U+0020
0xCD 0x9A = U+035A
0xCC 0xAD = U+032D
0xCD 0x8E = U+034E
0xCD 0x89 = U+0349
0xCC 0xA0 = U+0320
0xCC 0xBA = U+033A
0xCD 0x89 = U+0349
0xCC 0xAE = U+032E
0xCC 0x9E = U+031E
0xCC 0xBB = U+033B
0xCC 0xA3 = U+0323
0xCC 0xB0 = U+0330
0xCC 0xBA = U+033A
0xCC 0x96 = U+0316
0xCD 0x96 = U+0356
0xCC 0x80 = U+0300
0xCC 0x81 = U+0301
0xCD 0xA2 = U+0362
0xCD 0x9E = U+035E
0x65 = U+0065
0xCC 0xB7 = U+0337
0xCC 0xAA = U+032A
0xCC 0xAD = U+032D
0xCC 0xAF = U+032F
0xCC 0xBC = U+033C
0xCD 0x93 = U+0353
0xCD 0x8E = U+034E
0xCC 0xB9 = U+0339
0xCC 0xA0 = U+0320
0xCD 0x96 = U+0356
0xCC 0xB2 = U+0332
0xCD 0x94 = U+0354
0xCC 0xAA = U+032A
0xCD 0x88 = U+0348
0xCC 0xA6 = U+0326
0xCD 0x88 = U+0348
0xCC 0xB1 = U+0331
0xCD 0x8D = U+034D
0xCC 0xAD = U+032D
0xCC 0xA9 = U+0329
0xCD 0xA0 = U+0360
0xC5 0x86 = U+0146
0xCD 0x9E = U+035E
0xD2 0x89 = U+0489
0xCC 0xAE = U+032E
0xCC 0xB3 = U+0333
0xCD 0x93 = U+0353
0xCD 0x99 = U+0359
0xCD 0x88 = U+0348
0xCC 0xBC = U+033C
0xCD 0x89 = U+0349
0xCC 0xAC = U+032C
0xCD 0x95 = U+0355
0xCD 0x88 = U+0348
0xCC 0xBA = U+033A
0xCD 0x88 = U+0348
0xCC 0xAD = U+032D
0xCC 0xA9 = U+0329
0xCC 0xAA = U+032A
0x6F = U+006F
0xCD 0x87 = U+0347
0xCC 0x97 = U+0317
0xCC 0xB1 = U+0331
0xCC 0xA0 = U+0320
0xCC 0xB1 = U+0331
0xCC 0xA0 = U+0320
0xCC 0xAF = U+032F
0xCC 0x95 = U+0315
0xCD 0xA2 = U+0362
0x75 = U+0075
0xCC 0xB8 = U+0338
0xCC 0xB3 = U+0333
0xCC 0xA6 = U+0326
0xCC 0xA9 = U+0329
0xCC 0xB3 = U+0333
0xCC 0xAB = U+032B
0xCC 0x96 = U+0316
0xCC 0x9C = U+031C
0xCD 0x85 = U+0345
0xE2 0x80 0x8C = U+200C
0xE2 0x80 0x8B = U+200B
0xC7 0xB5 = U+01F5
0xCC 0xA2 = U+0322
0xCC 0xB2 = U+0332
0xCC 0xA3 = U+0323
0xCD 0x8E = U+034E
0xCC 0xAE = U+032E
0xCC 0xAE = U+032E
0xCC 0xBC = U+033C
0xCC 0xAB = U+032B
0xCC 0xA5 = U+0325
0xCC 0xA0 = U+0320
0xCD 0x99 = U+0359
0xCC 0xB1 = U+0331
0xCC 0x9D = U+031D
0xCC 0x98 = U+0318
0xCD 0x95 = U+0355
0xCD 0x8E = U+034E
0xCC 0xB3 = U+0333
0xCC 0x9C = U+031C
0xCC 0xB2 = U+0332
0xCC 0x96 = U+0316
0x68 = U+0068
0xCC 0xB8 = U+0338
0xCC 0x9B = U+031B
0xCC 0xA9 = U+0329
0xCD 0x9A = U+035A
0xCC 0xAE = U+032E
0xCC 0xA4 = U+0324
0xCC 0x96 = U+0316
0xCC 0xB9 = U+0339
0xCD 0x99 = U+0359
0x2E = U+002E
0xCC 0xB6 = U+0336
0xCC 0xA8 = U+0328
0xCC 0xB3 = U+0333
0xCC 0x96 = U+0316
0xCC 0xA0 = U+0320
0xCC 0x97 = U+0317
0xCC 0xBC = U+033C
0xCC 0xA9 = U+0329
0xCD 0x95 = U+0355
0xCD 0x87 = U+0347
0xCD 0x89 = U+0349
0xCD 0x93 = U+0353
0xCC 0x9F = U+031F
0xCC 0xA6 = U+0326
0xCD 0x9C = U+035C
0xCD 0x9E = U+035E
0xCD 0x85 = U+0345
0x0A = U+000A
It gets tricky to decipher which parts of that are graphemes, but clearly, with all the stacked characters, this is not a fixed amount of data per grapheme, and there is no sane way to make Unicode work with a fixed width encoding per grapheme because, as the 'Zalgo' examples show, combining marks can basically be combined in arbitrary sequences.
The first grapheme in the second 'Zalgo' example contains:
0xC8 0xA8 = U+0228 LATIN CAPITAL LETTER E WITH CEDILLA
0xCC 0xB8 = U+0338 COMBINING LONG SOLIDUS OVERLAY
0xD2 0x89 = U+0489 CYRILLIC COMBINING MILLIONS SIGN
0xCC 0x9F = U+031F COMBINING PLUS SIGN BELOW
0xCD 0x8E = U+034E COMBINING UPWARDS ARROW BELOW
0xCD 0x9A = U+035A COMBINING DOUBLE RING BELOW
0xCC 0xB9 = U+0339 COMBINING RIGHT HALF RING BELOW
0xCD 0x9A = U+035A COMBINING DOUBLE RING BELOW
0xCC 0x99 = U+0319 COMBINING RIGHT TACK BELOW
0xCC 0x9F = U+031F COMBINING PLUS SIGN BELOW
0xCC 0x96 = U+0316 COMBINING GRAVE ACCENT BELOW
The next code point is U+0078 LATIN SMALL LETTER X, the start of a new grapheme. A couple of the combining marks appear several times each in that list.
UTF-32 is a fixed width Encoding and by the way, the only Unicode Encoding that maps the DWORD value directly to a Unicode codepoint. But there is a Limitation of values, the highest value is 0x10FFFF and the entire high- as well as low-surrogate range is invalid within UTF-32.
Related
Here is the Y86 code for reference
.pos 0
irmovq stack, %rsp # initialize stack pointer
call main
halt
.align 8
input_array:
.quad 6
.quad 4
.quad 5
.quad 2
.quad 3
.quad 1
count:
.quad 6
main:
irmovq input_array, %rdi
irmovq count, %rsi
mrmovq (%rsi), %rsi
call bubble_sort
ret
# bubble_sort(long *data, long count)
bubble_sort:
irmovq $1, %rcx
subq %rcx, %rsi # last = count - 1
irmovq $8, %r14 # %r14 = sizeof(int*)
rrmovq %rdi, %r13 # %r13 = data
outer_loop:
rrmovq %r13, %rdi # data = data (param)
irmovq $0, %rbx # i = 0
inner_loop:
mrmovq (%rdi), %r9 # %r9 = data[i]
addq %r14, %rdi # data += 1
mrmovq (%rdi), %r10 # %r10 = data[i + 1]
if_statement:
rrmovq %r9, %r11
subq %r10, %r11 # %r11 = data[i] - data[i + 1]
jg then # if data[i] > data[i + 1], goto then
jmp end_if
then:
rrmovq %r10, %r11 # temp = data[i + 1]
rmmovq %r9, (%rdi) # data[i + 1] = data[i]
rmmovq %r11, -8(%rdi) # data[i] = temp
end_if:
addq %rcx, %rbx # i++
rrmovq %rbx, %rax # %rax = i
subq %rsi, %rax # %rax = i - last
jl inner_loop # goto inner_loop if i < last
subq %rcx, %rsi # last--
jg outer_loop # goto outer_loop if last > 0
ret
.pos 0x100
stack:
For reference, I'm using this Y86-64 simulator. The problem occurs after the first sequence of inner loops is completed. The register value in %rsi is 4 and I have just substracted 1 from %rsi's previous value just before the jg instruction is processed, but it still doesn't jump to outer_loop.
Logic which is working in Abinitio platform.
Lets take sample value as “123456789”, for which we need to generate SHA256 and convert into unsigned integer(7). Expected result - 40876285344408085
m_eval 'hash_SHA256("123456789")'
[void 0x15 0xe2 0xb0 0xd3 0xc3 0x38 0x91 0xeb 0xb0 0xf1 0xef 0x60 0x9e 0xc4 0x19 0x42 0x0c 0x20 0xe3 0x20 0xce 0x94 0xc6 0x5f 0xbc 0x8c 0x33 0x12 0x44 0x8e 0xb2 0x25]
m_eval 'string_to_hex(hash_SHA256("123456789"))'
"15E2B0D3C33891EBB0F1EF609EC419420C20E320CE94C65FBC8C3312448EB225"
m_eval '(unsigned integer(7)) reinterpret(hash_SHA256("123456789"))'
40876285344408085
Scala Approach
println("Input Value : "+shaVal)
val shaCode="SHA-256"
val utf="UTF-8"
val digest = MessageDigest.getInstance(shaCode)
println("digest SHA-256 : "+digest)
val InpStr = StringUtils.stripStart(shaVal,"0")
println("InpStr : "+InpStr)
val hashUTF = digest.digest(InpStr.getBytes(utf))
println("hashUTF(UTF-8) : "+hashUTF.mkString(" "))
val hashBigInt= new BigInteger(1, digest.digest(InpStr.getBytes("UTF-8")))
println("hashBigInt : "+hashBigInt)
val HashKeyRes = String.format("%032x", hashBigInt)
println("HashKeyRes : "+HashKeyRes)
Console Output
hashUTF(UTF-16) : 21 -30 -80 -45 -61 56 -111 -21 -80 -15 -17 96 -98 -60 25 66 12 32 -29 32 -50 -108 -58 95 -68 -116 51 18 68 -114 -78 37
hashBigInt : 9899097673353459346982371669967256498000649460813128595014811958380719944229
HashKeyRes : 15e2b0d3c33891ebb0f1ef609ec419420c20e320ce94c65fbc8c3312448eb225
fromBase : 16
toBase : 10
So the hash key generated matches with the value, which is HEX format (Base16). But the expected output should be in(Base10) Unsigned Integer (7) as 40876285344408085
The summary of my problem is that I am trying to replicate the Matlab function:
mvnrnd(mu', sigma, 200)
into Julia using:
rand( MvNormal(mu, sigma), 200)'
and the result is a 200 x 7 matrix, essentially generating 200 random return time series data.
Matlab works, Julia doesn't.
My input matrices are:
mu = [0.15; 0.03; 0.06; 0.04; 0.1; 0.02; 0.12]
sigma = [0.0035 -0.0038 0.0020 0.0017 -0.0006 -0.0028 0.0009;
-0.0038 0.0046 -0.0011 0.0001 0.0003 0.0054 -0.0024;
0.0020 -0.0011 0.0041 0.0068 -0.0004 0.0047 -0.0036;
0.0017 0.0001 0.0068 0.0125 0.0002 0.0109 -0.0078;
-0.0006 0.0003 -0.0004 0.0002 0.0025 -0.0004 -0.0007;
-0.0028 0.0054 0.0047 0.0109 -0.0004 0.0159 -0.0093;
0.0009 -0.0024 -0.0036 -0.0078 -0.0007 -0.0093 0.0061]
Using Distributions.jl, running the line:
MvNormal(sigma)
Produces the error:
ERROR: LoadError: Base.LinAlg.PosDefException(4)
The matrix sigma is symmetrical but only positive semi-definite:
issym(sigma) #symmetrical
> true
isposdef(sigma) #positive definite
> false
using LinearOperators
check_positive_definite(sigma) #check for positive (semi-)definite
> true
Matlab produces the same results for these tests however Matlab is able to generate the 200x7 random return sample matrix.
Could someone advise as to what I could do to get it working in Julia? Or where the issue lies?
Thanks.
The issue is that the covariance matrix is indefinite. See
julia> eigvals(sigma)
7-element Array{Float64,1}:
-3.52259e-5
-2.42008e-5
2.35508e-7
7.08269e-5
0.00290538
0.0118957
0.0343873
so it is not a covariance matrix. This might have happened because of rounding so if you have access to unrounded data you can try that instead. I just tried and I also got an error in Matlab. However, in contrast to Julia, Matlab does allow the matrix to be positive semidefinite.
A way to make this work is to add a diagonal matrix to the original matrix and then input that to MvNormal. I.e.
julia> MvNormal(randn(7), sigma - minimum(eigvals(Symmetric(sigma)))*I)
Distributions.MvNormal{PDMats.PDMat{Float64,Array{Float64,2}},Array{Float64,1}}(
dim: 7
μ: [0.889004,-0.768551,1.78569,0.130445,0.589029,0.529418,-0.258474]
Σ: 7x7 Array{Float64,2}:
0.00353523 -0.0038 0.002 0.0017 -0.0006 -0.0028 0.0009
-0.0038 0.00463523 -0.0011 0.0001 0.0003 0.0054 -0.0024
0.002 -0.0011 0.00413523 0.0068 -0.0004 0.0047 -0.0036
0.0017 0.0001 0.0068 0.0125352 0.0002 0.0109 -0.0078
-0.0006 0.0003 -0.0004 0.0002 0.00253523 -0.0004 -0.0007
-0.0028 0.0054 0.0047 0.0109 -0.0004 0.0159352 -0.0093
0.0009 -0.0024 -0.0036 -0.0078 -0.0007 -0.0093 0.00613523
)
The "covariance" matrix is of course not the same anymore, but it is very close.
I'm implementing a gaussian smoothing filter on c++ by point wise multiplication in frequency space. To check that my results were correct, I implemented the same code in matlab and compared it to matlab's built in filtering function.
Here's the check:
% Gauss kernel, sigma = 1.
gaussfilter = fspecial('gaussian',[11 11], 1);
% Test matrix
testmatrix = ones(11);
testmatrix(6,6) = 5;
% FFT, pointwise multiplication in freq. space, and reverse FFT
testmatrix1 = fftshift(fftn(ifftshift(testmatrix),[]));
testmatrix1 = testmatrix1 .* gaussfilter;
testmatrix1 = fftshift(ifftn(ifftshift(testmatrix1),[],'nonsymmetric'));
abs(testmatrix1) % expect equal to c++
% Check that matlab is doing the same..
testmatrix2 = imfilter(testmatrix, gaussfilter);
abs(testmatrix2) % expect equal to testmatrix1
To my surprise, I see that matlab's imfilter is returning something different. testmatrix2 is not the same as testmatrix1.
Why should this be the case? Is there something wrong with my understanding of filters, or am I calling imfilter incorrectly? (flagging imfilter with 'replicate', or 'conv' doesn't solve my problem).
Here are both matrices:
testmatrix1 =
0.1592 0.1592 0.1593 0.1595 0.1597 0.1598 0.1597 0.1595 0.1593 0.1592 0.1592
0.1592 0.1593 0.1597 0.1604 0.1612 0.1616 0.1612 0.1604 0.1597 0.1593 0.1592
0.1593 0.1597 0.1609 0.1631 0.1656 0.1668 0.1656 0.1631 0.1609 0.1597 0.1593
0.1595 0.1604 0.1631 0.1681 0.1738 0.1764 0.1738 0.1681 0.1631 0.1604 0.1595
0.1597 0.1612 0.1656 0.1738 0.1830 0.1872 0.1830 0.1738 0.1656 0.1612 0.1597
0.1598 0.1616 0.1668 0.1764 0.1872 0.1922 0.1872 0.1764 0.1668 0.1616 0.1598
0.1597 0.1612 0.1656 0.1738 0.1830 0.1872 0.1830 0.1738 0.1656 0.1612 0.1597
0.1595 0.1604 0.1631 0.1681 0.1738 0.1764 0.1738 0.1681 0.1631 0.1604 0.1595
0.1593 0.1597 0.1609 0.1631 0.1656 0.1668 0.1656 0.1631 0.1609 0.1597 0.1593
0.1592 0.1593 0.1597 0.1604 0.1612 0.1616 0.1612 0.1604 0.1597 0.1593 0.1592
0.1592 0.1592 0.1593 0.1595 0.1597 0.1598 0.1597 0.1595 0.1593 0.1592 0.1592
testmatrix2 =
0.4893 0.6585 0.6963 0.6994 0.6995 0.6995 0.6995 0.6994 0.6963 0.6585 0.4893
0.6585 0.8863 0.9371 0.9413 0.9416 0.9417 0.9416 0.9413 0.9371 0.8863 0.6585
0.6963 0.9371 0.9910 0.9963 0.9997 1.0025 0.9997 0.9963 0.9910 0.9371 0.6963
0.6994 0.9413 0.9963 1.0114 1.0521 1.0860 1.0521 1.0114 0.9963 0.9413 0.6994
0.6995 0.9416 0.9997 1.0521 1.2342 1.3861 1.2342 1.0521 0.9997 0.9416 0.6995
0.6995 0.9417 1.0025 1.0860 1.3861 1.6366 1.3861 1.0860 1.0025 0.9417 0.6995
0.6995 0.9416 0.9997 1.0521 1.2342 1.3861 1.2342 1.0521 0.9997 0.9416 0.6995
0.6994 0.9413 0.9963 1.0114 1.0521 1.0860 1.0521 1.0114 0.9963 0.9413 0.6994
0.6963 0.9371 0.9910 0.9963 0.9997 1.0025 0.9997 0.9963 0.9910 0.9371 0.6963
0.6585 0.8863 0.9371 0.9413 0.9416 0.9417 0.9416 0.9413 0.9371 0.8863 0.6585
0.4893 0.6585 0.6963 0.6994 0.6995 0.6995 0.6995 0.6994 0.6963 0.6585 0.4893
Ok, I found the issue.
I just had to take the FFT of gaussfilter before multiplying it with testmatrix1, and pass the flag 'replicate' to imfilter.
Changes:
testmatrix1 = testmatrix1 .* fftshift(fftn(ifftshift(gaussfilter),[]));
and
testmatrix2 = imfilter(testmatrix, gaussfilter,'replicate');
Suppose I have value of 01010101 and it's canonical sequence of octets:
0x30 0x31 0x30 0x31 0x30 0x31 0x30 0x31
Now i need to concatenate with namespace identifier value which is hexa-representation.
Then I need to find the value like
sha1 (0x03 0xfb 0xac 0xfc 0x73 0x8a 0xef 0x46 0x91 0xb1 0xe5 0xeb 0xee 0xab 0xa4 0xfe 0x30 0x31 0x30 0x31 0x30 0x31 0x30 0x31) =
0xA8 0x82 0x16 0x4B 0x68 0xF9 0x01 0xE7 0x03 0xFC 0x7C 0x67 0x41 0xDC 0x66 0x97 0xB8 0xA1 0xA9 0x3E
After that how to ..
4b1682a8-f968-5701-83fc-7c6741dc6697
Hello everyone my problem is solved when i follow rfc4122 link
http://www.ietf.org/rfc/rfc4122.txt
There need some modification also in code.If anyone have same problem just ask me..
Thanks