Using 'ash' in LISP to perform a binary search? - lisp

So, I'm reading Land of Lisp now, and Lisp is turning out to be quite different than other programming languages that I've seen.
Anyways, the book provides some code that we're meant to enter into the CLISP REPL:
(defparameter *small* 1)
(defparameter *big* 100)
(defun guess-my-number ()
(ash (+ *small* *big*) -1))
(defun smaller ()
(setf *big* (1- (guess-my-number)))
(guess-my-number))
(defun bigger ()
(setf *small* (1+ (guess-my-number)))
(guess-my-number))
Now, the basic goal is to create a number guessing game wherein the user/player chooses a number, and then the computer tries to guess the number. It performs a "binary search", to find the player's number, by having the player report whether the computer-guessed number is higher or lower than the player's number.
I'm a little bit confused about the ash function. It's my understanding that this is vital to the binary search, but I'm not really sure why. The book somewhat explains what it does, but it's a little confusing.
What does the ash function do? Why is it passed the parameters of *small* added to *big* and -1? How does it work? What purpose does it serve to the binary search?

Google gives you this page which explains that ash is an arithmetic shift operation. So (ash x -1) shift x by one bit to the right, so gives its integer half.

Thanks to Basile Starynkevitch for the help on this one...
Anyhow, ash performs an arithmetic shift operation.
In the case of (ash x -1) it shifts x by one bit to the right, which ultimately returns the integer half.
For example, consider the binary number 1101. 1101 in binary is equivalent to 13 in decimal, which can be calculated like so:
8 * 1 = 8
4 * 1 = 4
2 * 0 = 0
1 * 1 = 1
8 + 4 + 0 + 1 = 13
Running (ash 13 -1) would look at the binary representation of 13, and perform an arithmetic shift of -1, shifting all the bits to the right by 1. This would produce a binary output of 110 (chopping off the 1 at the end of the original number). 110 in binary is equivalent to 6 in decimal, which can be calculated like so:
4 * 1 = 4
2 * 1 = 2
1 * 0 = 0
4 + 2 + 0 = 6
Now, 13 divided by 2 is not equivalent to 6, it's equivalent to 6.5, however, since it will return the integer half, 6 is the acceptable answer.
This is because binary is base 2.

Q. What does the ash function do? Why is it passed the parameters of small added to big and -1? How does it work? What purpose does it serve to the binary search?
It does operation of of shifting bits, more precisely Arithmetic shifting as explained/represented graphically for particular case of Lisp:
> (ash 51 1)
102
When you do (ash 51 1) it will shift the binary of 51 i.e 110011 by 1 bit place towards left side and results in 1100110 which gives you 102 in decimal. (process of binary to decimal conversion is explained in this answer)
Here it adds 0 in the vacant most right place (called Least Significant Bit).
> (ash 51 -1)
25
When you do (ash 51 -1) it will shift the binary of 51 i.e 110011 by 1 bit place towards right side (negative value stands for opposite direction) and results in 11001 which gives you 102 in decimal.
Here it discards the redundant LSB.
In particular example of "guess-my-number" game illustrated in Land of Lisp, we are interested in halving the range or to average. So, (ash (+ *small* *big*) -1)) will do halving of 100+1 = 100 / 2 to result in 50. We can check it as follows:
> (defparameter *small* 1)
*SMALL*
> (defparameter *big* 100)
*BIG*
>
(defun guess-my-number ()
(ash (+ *small* *big*) -1))
GUESS-MY-NUMBER
> (guess-my-number)
50
An interesting thing to notice is you can double the value of integer by left shifting by 1 bit and (approximately) halve it by right shifting by 1 bit.

Related

How to print formulae and literals when using CVC5?

I am playing with the CVC5 example at https://github.com/cvc5/cvc5/blob/main/examples/api/python/pythonic/linear_arith.py.
from cvc5.pythonic import *
slv = SolverFor('QF_LIRA')
x = Int('x')
y = Real('y')
slv += And(x >= 3 * y, x <= y, -2 < x)
slv.push()
print(slv.check(y-x <= 2/3))
slv.pop()
slv.push()
slv += y-x == 2/3
print(slv.check())
slv.pop()
It works as it is supposed to work.
However, whenever I try to print the content of the formula (i.e., print(slv)), it raises the following error: Cannot print: Kind.CONST_INTEGER
The same happens with literals that compound the formula: i.e., print(x >= 3); but not with variables: print(x) returns x.
I would like to have this printing capability, since Z3 allows it and I am trying my (originally-in-Z3-made) implementation with different SMT sovlers. Any idea?
Note that print(slv) does return info ([]), when it is empty. I tried using str(), but the error persists and indeed I guess print() uses str() before printing.
PS: I am using CVC5, should I use CVC4 or is CVC5 mature enough?
I think this is a bug. You should report it to the CVC5 folks at https://github.com/cvc5/cvc5/issues. (i.e., they should be able to handle this case just fine.)
In the interim, you can use the following workaround:
print(slv.sexpr())
which prints:
(and (let ((_let_1 (to_real x))) (and (>= _let_1 (* 3.0 y)) (<= _let_1 y) (> x (- 2)))))
which takes a bit of squinting to see that this is what you asserted, but it should do the trick.

How to use fast division operation without support of division instruction, for a constant divisor? [duplicate]

I've been reading about div and mul assembly operations, and I decided to see them in action by writing a simple program in C:
File division.c
#include <stdlib.h>
#include <stdio.h>
int main()
{
size_t i = 9;
size_t j = i / 5;
printf("%zu\n",j);
return 0;
}
And then generating assembly language code with:
gcc -S division.c -O0 -masm=intel
But looking at generated division.s file, it doesn't contain any div operations! Instead, it does some kind of black magic with bit shifting and magic numbers. Here's a code snippet that computes i/5:
mov rax, QWORD PTR [rbp-16] ; Move i (=9) to RAX
movabs rdx, -3689348814741910323 ; Move some magic number to RDX (?)
mul rdx ; Multiply 9 by magic number
mov rax, rdx ; Take only the upper 64 bits of the result
shr rax, 2 ; Shift these bits 2 places to the right (?)
mov QWORD PTR [rbp-8], rax ; Magically, RAX contains 9/5=1 now,
; so we can assign it to j
What's going on here? Why doesn't GCC use div at all? How does it generate this magic number and why does everything work?
Integer division is one of the slowest arithmetic operations you can perform on a modern processor, with latency up to the dozens of cycles and bad throughput. (For x86, see Agner Fog's instruction tables and microarch guide).
If you know the divisor ahead of time, you can avoid the division by replacing it with a set of other operations (multiplications, additions, and shifts) which have the equivalent effect. Even if several operations are needed, it's often still a heck of a lot faster than the integer division itself.
Implementing the C / operator this way instead of with a multi-instruction sequence involving div is just GCC's default way of doing division by constants. It doesn't require optimizing across operations and doesn't change anything even for debugging. (Using -Os for small code size does get GCC to use div, though.) Using a multiplicative inverse instead of division is like using lea instead of mul and add
As a result, you only tend to see div or idiv in the output if the divisor isn't known at compile-time.
For information on how the compiler generates these sequences, as well as code to let you generate them for yourself (almost certainly unnecessary unless you're working with a braindead compiler), see libdivide.
Dividing by 5 is the same as multiplying 1/5, which is again the same as multiplying by 4/5 and shifting right 2 bits. The value concerned is CCCCCCCCCCCCCCCD in hex, which is the binary representation of 4/5 if put after a hexadecimal point (i.e. the binary for four fifths is 0.110011001100 recurring - see below for why). I think you can take it from here! You might want to check out fixed point arithmetic (though note it's rounded to an integer at the end).
As to why, multiplication is faster than division, and when the divisor is fixed, this is a faster route.
See Reciprocal Multiplication, a tutorial for a detailed writeup about how it works, explaining in terms of fixed-point. It shows how the algorithm for finding the reciprocal works, and how to handle signed division and modulo.
Let's consider for a minute why 0.CCCCCCCC... (hex) or 0.110011001100... binary is 4/5. Divide the binary representation by 4 (shift right 2 places), and we'll get 0.001100110011... which by trivial inspection can be added the original to get 0.111111111111..., which is obviously equal to 1, the same way 0.9999999... in decimal is equal to one. Therefore, we know that x + x/4 = 1, so 5x/4 = 1, x=4/5. This is then represented as CCCCCCCCCCCCD in hex for rounding (as the binary digit beyond the last one present would be a 1).
In general multiplication is much faster than division. So if we can get away with multiplying by the reciprocal instead we can significantly speed up division by a constant
A wrinkle is that we cannot represent the reciprocal exactly (unless the division was by a power of two but in that case we can usually just convert the division to a bit shift). So to ensure correct answers we have to be careful that the error in our reciprocal does not cause errors in our final result.
-3689348814741910323 is 0xCCCCCCCCCCCCCCCD which is a value of just over 4/5 expressed in 0.64 fixed point.
When we multiply a 64 bit integer by a 0.64 fixed point number we get a 64.64 result. We truncate the value to a 64-bit integer (effectively rounding it towards zero) and then perform a further shift which divides by four and again truncates By looking at the bit level it is clear that we can treat both truncations as a single truncation.
This clearly gives us at least an approximation of division by 5 but does it give us an exact answer correctly rounded towards zero?
To get an exact answer the error needs to be small enough not to push the answer over a rounding boundary.
The exact answer to a division by 5 will always have a fractional part of 0, 1/5, 2/5, 3/5 or 4/5 . Therefore a positive error of less than 1/5 in the multiplied and shifted result will never push the result over a rounding boundary.
The error in our constant is (1/5) * 2-64. The value of i is less than 264 so the error after multiplying is less than 1/5. After the division by 4 the error is less than (1/5) * 2−2.
(1/5) * 2−2 < 1/5 so the answer will always be equal to doing an exact division and rounding towards zero.
Unfortunately this doesn't work for all divisors.
If we try to represent 4/7 as a 0.64 fixed point number with rounding away from zero we end up with an error of (6/7) * 2-64. After multiplying by an i value of just under 264 we end up with an error just under 6/7 and after dividing by four we end up with an error of just under 1.5/7 which is greater than 1/7.
So to implement divison by 7 correctly we need to multiply by a 0.65 fixed point number. We can implement that by multiplying by the lower 64 bits of our fixed point number, then adding the original number (this may overflow into the carry bit) then doing a rotate through carry.
Here is link to a document of an algorithm that produces the values and code I see with Visual Studio (in most cases) and that I assume is still used in GCC for division of a variable integer by a constant integer.
http://gmplib.org/~tege/divcnst-pldi94.pdf
In the article, a uword has N bits, a udword has 2N bits, n = numerator = dividend, d = denominator = divisor, ℓ is initially set to ceil(log2(d)), shpre is pre-shift (used before multiply) = e = number of trailing zero bits in d, shpost is post-shift (used after multiply), prec is precision = N - e = N - shpre. The goal is to optimize calculation of n/d using a pre-shift, multiply, and post-shift.
Scroll down to figure 6.2, which defines how a udword multiplier (max size is N+1 bits), is generated, but doesn't clearly explain the process. I'll explain this below.
Figure 4.2 and figure 6.2 show how the multiplier can be reduced to a N bit or less multiplier for most divisors. Equation 4.5 explains how the formula used to deal with N+1 bit multipliers in figure 4.1 and 4.2 was derived.
In the case of modern X86 and other processors, multiply time is fixed, so pre-shift doesn't help on these processors, but it still helps to reduce the multiplier from N+1 bits to N bits. I don't know if GCC or Visual Studio have eliminated pre-shift for X86 targets.
Going back to Figure 6.2. The numerator (dividend) for mlow and mhigh can be larger than a udword only when denominator (divisor) > 2^(N-1) (when ℓ == N => mlow = 2^(2N)), in this case the optimized replacement for n/d is a compare (if n>=d, q = 1, else q = 0), so no multiplier is generated. The initial values of mlow and mhigh will be N+1 bits, and two udword/uword divides can be used to produce each N+1 bit value (mlow or mhigh). Using X86 in 64 bit mode as an example:
; upper 8 bytes of dividend = 2^(ℓ) = (upper part of 2^(N+ℓ))
; lower 8 bytes of dividend for mlow = 0
; lower 8 bytes of dividend for mhigh = 2^(N+ℓ-prec) = 2^(ℓ+shpre) = 2^(ℓ+e)
dividend dq 2 dup(?) ;16 byte dividend
divisor dq 1 dup(?) ; 8 byte divisor
; ...
mov rcx,divisor
mov rdx,0
mov rax,dividend+8 ;upper 8 bytes of dividend
div rcx ;after div, rax == 1
mov rax,dividend ;lower 8 bytes of dividend
div rcx
mov rdx,1 ;rdx:rax = N+1 bit value = 65 bit value
You can test this with GCC. You're already seen how j = i/5 is handled. Take a look at how j = i/7 is handled (which should be the N+1 bit multiplier case).
On most current processors, multiply has a fixed timing, so a pre-shift is not needed. For X86, the end result is a two instruction sequence for most divisors, and a five instruction sequence for divisors like 7 (in order to emulate a N+1 bit multiplier as shown in equation 4.5 and figure 4.2 of the pdf file). Example X86-64 code:
; rbx = dividend, rax = 64 bit (or less) multiplier, rcx = post shift count
; two instruction sequence for most divisors:
mul rbx ;rdx = upper 64 bits of product
shr rdx,cl ;rdx = quotient
;
; five instruction sequence for divisors like 7
; to emulate 65 bit multiplier (rbx = lower 64 bits of multiplier)
mul rbx ;rdx = upper 64 bits of product
sub rbx,rdx ;rbx -= rdx
shr rbx,1 ;rbx >>= 1
add rdx,rbx ;rdx = upper 64 bits of corrected product
shr rdx,cl ;rdx = quotient
; ...
To explain the 5 instruction sequence, a simple 3 instruction sequence could overflow. Let u64() mean upper 64 bits (all that is needed for quotient)
mul rbx ;rdx = u64(dvnd*mplr)
add rdx,rbx ;rdx = u64(dvnd*(2^64 + mplr)), could overflow
shr rdx,cl
To handle this case, cl = post_shift-1. rax = multiplier - 2^64, rbx = dividend. u64() is upper 64 bits. Note that rax = rax<<1 - rax. Quotient is:
u64( ( rbx * (2^64 + rax) )>>(cl+1) )
u64( ( rbx * (2^64 + rax<<1 - rax) )>>(cl+1) )
u64( ( (rbx * 2^64) + (rbx * rax)<<1 - (rbx * rax) )>>(cl+1) )
u64( ( (rbx * 2^64) - (rbx * rax) + (rbx * rax)<<1 )>>(cl+1) )
u64( ( ((rbx * 2^64) - (rbx * rax))>>1) + (rbx*rax) )>>(cl ) )
mul rbx ; (rbx*rax)
sub rbx,rdx ; (rbx*2^64)-(rbx*rax)
shr rbx,1 ;( (rbx*2^64)-(rbx*rax))>>1
add rdx,rbx ;( ((rbx*2^64)-(rbx*rax))>>1)+(rbx*rax)
shr rdx,cl ;((((rbx*2^64)-(rbx*rax))>>1)+(rbx*rax))>>cl
I will answer from a slightly different angle: Because it is allowed to do it.
C and C++ are defined against an abstract machine. The compiler transforms this program in terms of the abstract machine to concrete machine following the as-if rule.
The compiler is allowed to make ANY changes as long as it doesn't change the observable behaviour as specified by the abstract machine. There is no reasonable expectation that the compiler will transform your code in the most straightforward way possible (even when a lot of C programmer assume that). Usually, it does this because the compiler wants to optimize the performance compared to the straightforward approach (as discussed in the other answers at length).
If under any circumstances the compiler "optimizes" a correct program to something that has a different observable behaviour, that is a compiler bug.
Any undefined behaviour in our code (signed integer overflow is a classical example) and this contract is void.

Kdb q negative numbers and mod

Basic questions about negative numbers and mod in Kdb
Below gives -1 as expected
q) neg 7 mod 2
but
q) a:neg 7
q) a mod 2
gives 1
And below
q) -7 mod 2
gives 1
Anyone please explain this?
KDB execute statements from right to left. So statement neg 7 mod 2 is same as neg(7 mod 2).
First KDB executes 7 mod 2 and then apply neg function on the result like below.
q) 7 mod 2 // 1
q) neg 1 // -1
which is same as
q) neg 7 mod 2 // -1
Last 2 cases -7 mod 2 and neg[7] mod 2 are equivalent. And the result for that is 1.
The mod function, as shown in the kx reference page(https://code.kx.com/v2/ref/mod/), only returns positive values. Therefore, 1 is the expect answer for -7 mod 2, and a mod 2 in your example.
The reason that neg 7 mod 2 returns -1 is that q evaluates arithmetic from right to left.
As 7 mod 2 return 1, the neg function returns -1 after taking in the value from 7 mod 2.
Hope this helps!
As Rahul has covered, this is expected behaviour that occurs as a result of the right to left execution of KDB, in conjunction with the fact that mod will always return a positive result in kdb. If you want to better understand how the execution of a given command is being implemented you can always parse it out, which will show the underly k parse tree.
q)mod
k){x-y*x div y}
q)neg
-:
q)parse "neg 7 mod 2"
-:
(k){x-y*x div y};7;2)
Here we can see that neg (-:) is being applied to the result of the mod (k){x-y*x div y}) of 7 and 2.
Right to left trips up many that are learning kdb. It will be useful to keep this aspect in mind as a possible cause for any problem that you encounter with kdb as you learn the basics, I can guarantee that it will trip you up at least a few more times.
I'd really recommend that you read/work through Q For Mortals 3, which has been made free by Kx

Borrow during subtracting operation (sbc asm instruction) on 6502?

When the borrow (i.e. carry flag is cleared) happens during subtracting operation (sbc asm instruction) on 6502 used by NES? Is it each time the result is negative (-1 to -128)?
Many thanks!
Thanks
STeN
On a 6502 SBC n is exactly identical to ADC (n EOR $FF) — it's one's complement. So carry is clear when A + (operand ^ 0xff) + existing carry is less than 256.
EDIT: so, if carry is set then the subtraction occurs without borrow. If carry is clear then subtraction occurs with borrow. Therefore if carry is set after the subtraction then there was no borrow. If carry is clear then there was borrow.
If you want to test whether a result is negative, check the sign bit implicitly via a BMI or BPL.
It's a bit more complicated than that if in decimal mode on a generic 6502 but the NES variant doesn't have decimal mode so ignore anything you read about that.
To clarify re: the comments below; if you're treating numbers as signed then 127 is +127, 128 is -128, etc. Normal two's complement. Nothing special. E.g.
LDA #-63 ; i.e. 1100 0001
SEC
SBC #65 ; i.e. 0100 0001
; result in accumulator is now -128, i.e. 1000 0000,
; and carry remains set because there was no borrow
BPL somewhere ; wouldn't jump, because -128 is negative
BMI somewhereElse ; would jump, because -128 is negative
The following is exactly equivalent in terms of inner workings:
LDA #-63 ; i.e. 1100 0001
SEC ; ... everything the same up until here ...
ADC #65 ; i.e. 1011 1110 (the complement of 0100 0001)
; result = 1100 0001 + 1011 1110 + 1 = [1] 0111 1111 + 1 = [1] 1000 0000
; ^
; |
; carry
; = -128
So, as above, defining "the result" as per the 6502 manual and ordinary programmatic meaning of "the thing sitting in the accumulator", you can test whether the result is positive or negative as stated above, e.g.
SBC $23
BMI resultWasNegative
resultWasPositive: ...
If you're interested in whether the complete result would have been negative (i.e. had it fitted into the accumulator) then you can also check the overflow flag. If overflow is set then that means that whatever is in the accumulator has the wrong sign because of the 8-bit limit. So you can do the equivalent of an exclusive OR between overflow and sign:
SBC $23
BVC signIsTheOpposite
BMI resultWasNegative
JMP resultWasPositive
signIsTheOpposite:
BPL resultWasNegative
JMP resultWasPositive
Tommy's answer is correct, but I have a simpler way of looking at it.
Operations in the 6502's ALU are all 8 bit so you can think of a subtraction like this (for $65 and $64):
01100101
-01100100
========
00000001
What I do is imagine the subtraction is a 9 bit (unsigned) operation with the 9th bit of the accumulator set to 1, so $65 - $64 would look like this:
1 01100101
- 01100100
==========
1 00000001
Whereas $64 - $65 would look like this
1 01100100
- 01100101
==========
0 11111111
The new carry bit is the imaginary 9th bit of the result.
Essentially, the carry is set when the operand interpreted as an unsigned number is greater than the accumulator interpreted as an unsigned number. Or to be pedantic when
A < operand - 1 + oldcarry
Nope, the result may as well be positive.
Example:
lda #$10
sec
sbc #$f0
Carry will be clear after that and Accumulator will be $20.
To test for positive/negative values after substraction use the N(egative)-flag of the status-register and the branches evaluating it (BMI/BPL).

Convert Integer to Float in Elisp

I am having trivial problems converting integer division to a floating point solution in Emacs Lisp 24.5.1.
(message "divide: %2.1f" (float (/ 1 2)))
"divide: 0.0"
I believe this expression is first calculating 1/2, finds it is 0 after truncating, then assigning 0.0 to the float. Obviously, I'm hoping for 0.5.
What am I not seeing here? Thanks
The / function performs a floating-point division if at least one of its argument is a float, and an integer quotient operation (rounded towards 0) if all of its arguments are integers. If you want to perform a floating-point division, make sure that at least one of the arguments is a float.
(message "divide: %2.1f" (/ (float 1) 2))
(or of course if they're constants you can just write (/ 1.0 2) or (/ 1 2.0))
Many programming languages work this way.