In Squeak Smalltalk, how can type a number which is base-250 positional numeral system? - numbers

One thing that makes me particularly like about Smalltalk is that it
has the power to do arithemtic calculations of numbers with the base
of different integers. I guess no other language can do the same.
Please see the codes below.
Transcript show: 16raf * 32; cr.
Transcript show: 7r21 - 5r32; cr.
The output is
5600
-2
I understand that if the number is hexadecimal(16-based), abcdef can
be employed. But what if the integer I want to be the base is 250. On some position, there's 60. How can I type that number in squeak ?

Short answer: you cannot type arbitrary numbers for arbitrary bases above 36 without changing the parser.
Longer answer:
You can use arbitrary base above 36, but you will run into trouble print and write numbers that would need symbols above 36.
You can check all the symbols for a base:
base := 36.
number := 0.
1 to: base - 1 do: [ :i |
number := number * base + i
].
number printStringBase: base.
the above results in the following
'123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'
This is also hard-coded when printing in SmallInteger>>printOn:base:length:padded:
Note that for a number that is smaller than base, printStringBase: will just use ascii directly.
36 printStringBase: 37 '['
But even if you were to remove the hardcoded limitation and used ascii directly, you aren't helping yourself.
Sooner or later you will need ascii symbols that have different meaning in the syntax. For example following Z (ascii 90) is [ (ascii 91), which is used to start block.
So 37r2[ will be parsed into 37r2 and [ will be reserved for block (and thus will result in syntax error).
But otherwise you can use any base
2001rSpaceOdyssey -> 57685915098460127668088707185846682264

Related

perl6: Cannot unbox 65536 bit wide bigint into native integer

I try some examples from Rosettacode and encounter an issue with the provided Ackermann example: When running it "unmodified" (I replaced the utf-8 variable names by latin-1 ones), I get (similar, but now copyable):
$ perl6 t/ackermann.p6
65533
19729 digits starting with 20035299304068464649790723515602557504478254755697...
Cannot unbox 65536 bit wide bigint into native integer
in sub A at t/ackermann.p6 line 3
in sub A at t/ackermann.p6 line 11
in sub A at t/ackermann.p6 line 3
in block <unit> at t/ackermann.p6 line 17
Removing the proto declaration in line 3 (by commenting out):
$ perl6 t/ackermann.p6
65533
19729 digits starting with 20035299304068464649790723515602557504478254755697...
Numeric overflow
in sub A at t/ackermann.p6 line 8
in sub A at t/ackermann.p6 line 11
in block <unit> at t/ackermann.p6 line 17
What went wrong? The program doesn't allocate much memory. Is the natural integer kind-of limited?
I replaced in the code from Ackermann function the 𝑚 with m and the 𝑛 with n for better terminal interaction for copying errors and tried to comment out proto declaration. I also asked Liz ;)
use v6;
proto A(Int \m, Int \n) { (state #)[m][n] //= {*} }
multi A(0, Int \n) { n + 1 }
multi A(1, Int \n) { n + 2 }
multi A(2, Int \n) { 3 + 2 * n }
multi A(3, Int \n) { 5 + 8 * (2 ** n - 1) }
multi A(Int \m, 0 ) { A(m - 1, 1) }
multi A(Int \m, Int \n) { A(m - 1, A(m, n - 1)) }
# Testing:
say A(4,1);
say .chars, " digits starting with ", .substr(0,50), "..." given A(4,2);
A(4, 3).say;
Please read JJ's answer first. It's breezy and led to this answer which is effectively an elaboration of it.
TL;DR A(4,3) is a very big number, one that cannot be computed in this universe. But raku(do) will try. As it does you will blow past reasonable limits related to memory allocation and indexing if you use the caching version and limits related to numeric calculations if you don't.
I try some examples from Rosettacode and encounter an issue with the provided Ackermann example
Quoting the task description with some added emphasis:
Arbitrary precision is preferred (since the function grows so quickly)
raku's standard integer type Int is arbitrary precision. The raku solution uses them to compute the most advanced answer possible. It only fails when you make it try to do the impossible.
When running it "unmodified" (I replaced the utf-8 variable names by latin-1 ones)
Replacing the variable names is not a significant change.
But adding the A(4,3) line shifted the code from being computable in reality to not being computable in reality.
The example you modified has just one explanatory comment:
Here's a caching version of that ... to make A(4,2) possible
Note that the A(4,2) solution is nearly 20,000 digits long.
If you look at the other solutions on that page most don't even try to reach A(4,2). There are comments like this one on the Phix version:
optimised. still no bignum library, so ack(4,2), which is power(2,65536)-3, which is apparently 19729 digits, and any above, are beyond (the CPU/FPU hardware) and this [code].
A solution for A(4,2) is the most advanced possible.
A(4,3) is not computable in practice
To quote Academic Kids: Ackermann function:
Even for small inputs (4,3, say) the values of the Ackermann function become so large that they cannot be feasibly computed, and in fact their decimal expansions cannot even be stored in the entire physical universe.
So computing A(4,3).say is impossible (in this universe).
It must inevitably lead to an overflow of even arbitrary precision integer arithmetic. It's just a matter of when and how.
Cannot unbox 65536 bit wide bigint into native integer
The first error message mentions this line of code:
proto A(Int \m, Int \n) { (state #)[m][n] //= {*} }
The state # is an anonymous state array variable.
By default # variables use the default concrete type for raku's abstract array type. This default array type provides a balance between implementation complexity and decent performance.
While computing A(4,2) the indexes (m and n) remain small enough that the computation completes without overflowing the default array's indexing limit.
This limit is a "native" integer (note: not a "natural" integer). A "native" integer is what raku calls the fixed width integers supported by the hardware it's running on, typically a long long which in turn is typically 64 bits.
A 64 bit wide index can handle indices up to 9,223,372,036,854,775,807.
But in trying to compute A(4,3) the algorithm generates a 65536 bits (8192 bytes) wide integer index. Such an integer could be as big as 265536, a 20,032 decimal digit number. But the biggest index allowed is a 64 bit native integer. So unless you comment out the caching line that uses an array, then for A(4,3) the program ends up throwing the exception:
Cannot unbox 65536 bit wide bigint into native integer
Limits to allocations and indexing of the default array type
As already explained, there is no array that could be big enough to help fully compute A(4,3). In addition, a 64 bit integer is already a pretty big index (9,223,372,036,854,775,807).
That said, raku can accommodate other array implementations such as Array::Sparse so I'll discuss that briefly below because such possibilities might be of interest for other problems.
But before discussing bigger arrays, running the code below on tio.run shows the practical limits for the default array type on that platform:
my #array;
#array[2**29]++; # works
#array[2**30]++; # could not allocate 8589967360 bytes
#array[2**60]++; # Unable to allocate ... 1152921504606846977 elements
#array[2**63]++; # Cannot unbox 64 bit wide bigint into native integer
(Comment out error lines to see later/greater errors.)
The "could not allocate 8589967360 bytes" error is a MoarVM panic. It's a result of tio.run refusing a memory allocation request.
I think the "Unable to allocate ... elements" error is a raku level exception that's thrown as a result of exceeding some internal Rakudo implementation limit.
The last error message shows the indexing limit for the default array type even if vast amounts of memory were made available to programs.
What if someone wanted to do larger indexing?
It's possible to create/use other # (does Positional) data types that support things like sparse arrays etc.
And, using this mechanism, it's possible that someone could write an array implementation that supports larger integer indexing than is supported by the default array type (presumably by layering logic on top of the underlying platform's instructions; perhaps the Array::Sparse I linked above does).
If such an alternative were called BigArray then the cache line could be replaced with:
my #array is BigArray;
proto A(Int \𝑚, Int \𝑛) { #array[𝑚][𝑛] //= {*} }
Again, this still wouldn't be enough to store interim results for fully computing A(4,3) but my point was to show use of custom array types.
Numeric overflow
When you comment out the caching you get:
Numeric overflow
Raku/Rakudo do arbitrary precision arithmetic. While this is sometimes called infinite precision it obviously isn't actually infinite but is instead, well, "arbitrary", which in this context also means "sane" for some definition of "sane".
This classically means running out of memory to store a number. But in Rakudo's case I think there's an attempt to keep things sane by switching from a truly vast Int to a Num (a floating point number) before completely running out of RAM. But then computing A(4,3) eventually overflows even a double float.
So while the caching blows up sooner, the code is bound to blow up later anyway, and then you'd get a numeric overflow that would either manifest as an out of memory error or a numeric overflow error as it is in this case.
Array subscripts use native ints; that's why you get the error in line 3, when you use the big ints as array subscripts. You might have to define a new BigArray that uses Ints as array subscripts.
The second problem arises in the ** operator: the result is a Real, and when the low-level operations returns a Num, it throws an exception.
https://github.com/rakudo/rakudo/blob/master/src/core/Int.pm6#L391-L401
So creating a BigArray might not be helpful anyway. You'll have to create your own ** too, that always works with Int, but you seem to have hit the (not so infinite) limit of the infinite precision Ints.

Can I directly load text with numbers in CCC,CC format ? (K4)

I have input with floats stored like 1000,50, ie. the decimal points are replaced by commas.
Is there an option in K to load these numbers directly into floats ?
When using
data:("SFF" ;";",";") 0:. filename
I get 0ns, of course, because the numbers are not recognized as floats.
I load them as strings now, and convert them using ssr like
c:.:' .q.ssr'[data;",";"."]
but that is extremely slow.
Is there an option somewhere to have K load these numbers in CCC,CC format as floats directly ? Normal format and ccc,cc format are not mixed, any file has just one of them.
If there is not, I imagine that it must by quite easy to replace a "." somewhere in the Q-binary where the load-function sits, with a ",", to get a version which loads these numbers. Has anybody tried that ? Or any other tip to load big files with these numbers in reasonable time ?
Cheers,
Co
If ssr' is slow for your task you may find this tiny function useful:
c2p:{c:-1_sums count each x;p:ss[r:raze x;","];r[p]:".";(0,c) _ r}
Update: an alternative version:
c2p:{p:ss[r:raze x;","];r[p]:".";(0,-1_sums count'[x])_r}
It concatenates all strings into a single long string, finds positions of commas, replaces commas with periods then splits that long string:
q)N:1000000
q)s:string[N?100000],'",",'string N?1000
q)\t r1:ssr'[s;",";"."]
4284
q)\t r2:c2p s
242
q)r1~r2
1b
I was thinking something like find (?) combined with indexing/applying
q)N:1000000
q)s:string[N?100000],'",",'string N?1000
q)\ts {s[x;y]:"."}./:flip(til count s;s?\:",")
967 52972144
q)s
"93912.794"
"57144.788"
"77809.659"
"7839.47"
"6363.523"
"44761.244"
"65699.712"
It's not perfect but that's the general idea. I'm sure there is an easier way...

How to truncate a 2's complement output

I have data written into short data type. The data written is of 2's complement form.
Now when I try to print the data using %04x, the data with MSB=0 is printed fine for eg if data=740, print I get is 0740
But when the MSB=1, I am unable to get a proper print. For eg if data=842, print I get is fffff842
I want the data truncated to 4 bytes so expected output is f842
Either declare your data as a type which is 16 bits long, or make sure the printing function uses the right format for 16 bits value. Or use your current type, but do a bitwise AND with 0xffff. What you can do depends on the language you're doing it in really.
But whichever way you go, check your assumptions again. There seems to be a few issues in your question:
2s-complement applies to signed numbers only. There are no negative numbers in your question.
Assuming you mean C's short - it doesn't have to be 16 bits long.
"I get is fffff842 I want the data truncated to 4 bytes" - fffff842 is 4 bytes long. f842 is 2 bytes long.
2-bytes long value 842 does not have the MSB set.
I'm assuming C (or possibly C++) as the language here.
Because of the default argument promotions involved when calling a variable argument function (such as printf), your use of a short will result in an integer promotion, which states that "If an int can represent all values of the original type (as restricted by the width, for a
bit-field), the value is converted to an int".
A short is converted to an int by means of sign-extension, and 0xf842 sign-extended to 32 bits is 0xfffff842.
You can use a bitwise AND to mask off the most significant word:
printf("%04x", data & 0xffff);
You could also add the h length specifier to state that you only want to print an (unsigned) short worth of bits from an int:
printf("%04hx", data);

Printing a number in brainfuck?

I've searched for a while, but i couldn't find anything that could help me.
Let's say the first cell(or value, etc.) equals 165. How do i print "165"?
My idea was to cut the number into seperate pieces: 1,6 and 5. It would than be no problem to print them.
Note: I don't just want to print "165". I want to print the value the first cell has. No matter if it's 165, 255, 0, 1 or anything else.
use a famous modulo procedure ( http://esolangs.org/wiki/brainfuck_algorithms will help you)
>+++++++++++[-<+++++++++++++++>] # initialize 165 at first cell
>++++++++++<<[->+>-[>+>>]>[+[-<+>]>+>>]<<<<<<]>>[-]>>>++++++++++<[->-[>+>>]>[+[-
<+>]>+>>]<<<<<]>[-]>>[>++++++[-<++++++++>]<.<<+>+>[-]]<[<[->-<]++++++[->++++++++
<]>.[-]]<<++++++[-<++++++++>]<.[-]<<[-<+>]
I have created a simple language that compiles to brainfuck which can be found here: http://code.google.com/p/brainfuck-compiler. There is a compiler for the language implemented in java.
I use the following function in that language to output a number. Maybe you can analyze the generated code and see if anything can be of use to you from it. One word of warning though, it does generate a bit of redundant copying around of cells (never got around to optimizing this). Anyway here's an example program and the BF code it generates. (note that indentations MUST be tabs in my language)
declare n, 165
declare digits
while n
push n % 10
digits = digits + 1
n = n / 10
if digits
while digits
out pop + 48
digits = digits - 1
else
outs "0"
And here is the generated code for that:
>>>>>>>>>+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++[<+>-]<[>>+>+<<<-]>>>[<<<+>>>-]<[[-]<<[>>>+>+<<<<-]>>>>[<<<<+>>>>-
]++++++++++<[>>+>+<<<-]>>>[<<<+>>>-]<[>+<-]>[<<[>>>+>+<<<<-]>>>>[<<<<+>>>>-]<[>+
<<-[>>[-]>+<<<-]>>>[<<<+>>>-]<[<-[<<->>[-]]+>-]<-]<<+>]<[>>+<<-]>>[<<<[>+>+<<-]>
>[<<+>>-]>-]<<[<<->>-]<[-]<[>>>>>>>>+<<<<<<<<-]>>>>>>>>>[>>]+[<<]>[>[>>]<+<[<<]>
-]<<<<<<<<<<[>>+>+<<<-]>>>[<<<+>>>-]+[<+>-]<<<[-]>>[<<+>>-]<<<[>>>+>+<<<<-]>>>>[
<<<<+>>>>-]++++++++++<[>>+<<-]>>[<[>>+>+<<<-]>>>[<<<+>>>-]<[>+<<-[>>[-]>+<<<-]>>
>[<<<+>>>-]<[<-[<<<->>>[-]]+>-]<-]<<<+>>]<[-]<<<<[-]>>>[<<<+>>>-]<<<[>>>+>+<<<<-
]>>>>[<<<<+>>>>-]<[<+>-]<]<[>+>+<<-]>>[<<+>>-]<[>+<[-]]+>[<[-]<[>>>+>+<<<<-]>>>>
[<<<<+>>>>-]<[[-]>>>>>>>>[>>]<[<[<<]<<<<<+>>>>>>>[>>]<-]<-<<[<<]<<<<<>++++++++++
++++++++++++++++++++++++++++++++++++++[<+>-]<.[-]<<<<[>>>>+>+<<<<<-]>>>>>[<<<<<+
>>>>>-]+[<->-]<<<<<[-]>>>>[<<<<+>>>>-]<<<<[>>>>+>+<<<<<-]>>>>>[<<<<<+>>>>>-]<[<+
>-]<]<[-]]<[>>++++++[<++++++++>-]<.[-]<[-]]<[-]<[-]>>>>>>>>>>>>[>[-]>]<<[-<<]<<<
<<<<<<<<<<<<<<[-]<[-]

How can I convert the tiger hash values from the official implementations into the form used by Direct Connect?

I am trying to implement a Direct Connect Client, and I am currently stuck at a point where I need to hash the files in order to be able to upload them to other clients.
As the all other clients require a TTHL (Tiger Tree Hashing Leaves) support for verification of the downloaded data. I have searched for implementations of the algorithm, and found tiger-hash-python.
I have implemented a routine that uses the hash function from before, and is able to hash large files, according to the logic specified in Tree Hash EXchange format (THEX) (basically, the tree diagram is the important part on that page).
However, the value produced by it is similar to those shown on Wikipedia, a hex digest, but is different from those shown in the DC clients I'm using for reference.
I have been unable to find out how the hex digest form is converted to this other one (39 characters, A-Z, 0-9). Could someone please explain how that is done?
Well ... I tried what Paulo Ebermann said, using the following functions:
def strdivide(list,length):
result = []
# Calculate how many blocks there are, using the condition: i*length = len(list).
# The additional maths operations are to deal with the last block which might have a smaller size
for i in range(0,int(math.ceil(float(len(list))/length))):
result.append(list[i*length:(i+1)*length])
return result
def dchash(data):
result = tiger.hash(data) # From the aformentioned tiger-hash-python script, 48-char hex digest
result = "".join([ "".join(strdivide(result[i:i+16],2)[::-1]) for i in range(0,48,16) ]) # Representation Transform
bits = "".join([chr(int(c,16)) for c in strdivide(result,2)]) # Converting every 2 hex characters into 1 normal
result = base64.b32encode(bits) # Result will be 40 characters
return result[:-1] # Leaving behind the trailing '='
The TTH for an empty file was found to be 8B630E030AD09E5D0E90FB246A3A75DBB6256C3EE7B8635A, which after the transformation specified here, becomes 5D9ED00A030E638BDB753A6A24FB900E5A63B8E73E6C25B6. Base-32 encoding this result yielded LWPNACQDBZRYXW3VHJVCJ64QBZNGHOHHHZWCLNQ, which was found to be what DC++ generates.
The only mention of the format of the hash in the Direct Connect protocol I found is on the $SR page on the NMDC Protocol wiki:
For files containing TTH, the <hub_name> parameter is replaced with TTH:<base32_encoded_tth_hash> (ref: TTH_Hash).
So, it is Base32-encoding. This is defined in RFC 4648 (and some earlier ones), section 6.
Basically, you are using the capital letters A-Z and the decimal digits 2 to 7, and one base32 digit represents 5 bits, while one base16 (hexadecimal) digit represents only 4 ones.
This means, each 5 hex digits map to 4 base32-digits, and for a Tiger hash (192 bits) you will need 40 base32-digits (in the official encoding, the last one would be a = padding, which seems to be omitted if you say that there are always 39 characters).
I'm not sure of an implementation of a conversion from hex (or bytes) to base32, but it shouldn't be too complicated with a lookup table and some bit-shifting.