How can I convert a real number to an integer in LISP?
Is there any primitive function?
Example:
3.0 => 3
There are multiple ways.
I will be using f instead of a float number below.
If you're interested in the next-highest integer, (ceiling f) gives you that. If you are interested in the next-lowest integer, (floor f) gives you that (for values like 1.0, the two functions will return the same integer value). If you prefer having the closest integer, you can use (round f) to find it.
Those are the three simplest and most portable ways I can think of.
Other option is TRUNCATE. Examples
> (truncate 2.2)
=> 2
0.20000005
> (truncate 2.9)
=> 2
0.9000001
Related
I searched internet for a function to find exact square root of BigInt using scala programming language. I didn't get one, But saw one Java Program and I converted that function into Scala version. It is working but I am not sure, whether it can handle very large BigInt. But it returns BigInt only. Not BigDecimal as Square Root. It shows there is some bit manipulation done in the code with some hard coding of numbers like shiftRight(5), BigInt("8") and shiftRight(1). I can understand the logic clearly, But not the hard coding of these bitshift numbers and the number 8. May be these bitshift functions are not available in scala, and thats why it is needed to convert to java BigInteger at few places. These hard coded numbers may impact the precision of the result.I just changed the java code into scala code just copying the exact algorithm. And here is the code I have written in scala:
def sqt(n:BigInt):BigInt = {
var a = BigInt(1)
var b = (n>>5)+BigInt(8)
while((b-a) >= 0) {
var mid:BigInt = (a+b)>>1
if(mid*mid-n> 0) b = mid-1
else a = mid+1
}
a-1
}
My Points are:
Can't we return a BigDecimal instead of BigInt? How can we do that?
How these hardcoded numbers shiftRight(5), shiftRight(1) and 8 are related
to precision of the result.
I tested for one number in scala REPL: The function sqt is giving exact square root of the squared number. but not for the actual number as below:
scala> sqt(BigInt("19928937494873929279191794189"))
res9: BigInt = 141169888768369
scala> res9*res9
res10: scala.math.BigInt = 19928937494873675935734920161
scala> sqt(res10)
res11: BigInt = 141169888768369
scala>
I understand shiftRight(5) means divide by 2^5 ie.by 32 in decimal and so on..but why 8 is added here after shift operation? why exactly 5 shifts? as a first guess?
Your question 1 and question 3 are actually the same question.
How [do] these bitshifts impact [the] precision of the result?
They don't.
How [are] these hardcoded numbers ... related to precision of the result?
They aren't.
There are many different methods/algorithms for estimating/calculating the square root of a number (as can be seen here). The algorithm you've posted appears to be a pretty straight forward binary search.
Pick a number a guaranteed to be smaller than the target (square root of n).
Pick a number b guaranteed to be larger than the target (square root of n).
Calculate mid, the whole number mid-point between a and b.
If mid is larger than (or equal to) the target then move b to mid (-1 because we know it's too large).
If mid is smaller than the target then move a to mid (+1 because we know it's too small).
Repeat 3,4,5 until a is no longer less than b.
Return a-1 as the square root of n rounded down to a whole number.
The bitshifts and hardcoded numbers are used in selecting the initial value of b. But b only has be greater than the target. We could have just done var b = n. Why all the bother?
It's all about efficiency. The closer b is to the target, the fewer iterations are needed to find the result. Why add 8 after the shift? Because 31>>5 is zero, which is not greater than the target. The author chose (n>>5)+8 but he/she might have chosen (n>>7)+12. There are trade-offs.
Can't we return a BigDecimal instead of BigInt? How can we do that?
Here's one way to do that.
def sqt(n:BigInt) :BigDecimal = {
val d = BigDecimal(n)
var a = BigDecimal(1.0)
var b = d
while(b-a >= 0) {
val mid = (a+b)/2
if (mid*mid-d > 0) b = mid-0.0001 //adjust down
else a = mid+0.0001 //adjust up
}
b
}
There are better algorithms for calculating floating-point square root values. In this case you get better precision by using smaller adjustment values but the efficiency gets much worse.
Can't we return a BigDecimal instead of BigInt? How can we do that?
This makes no sense if you want exact roots: if a BigInt's square root can be represented exactly by a BigDecimal, it can be represented by a BigInt. If you don't want exact roots, you'll need to specify precision and modify the algorithm (and for most cases, Double will be good enough and much much much faster than BigDecimal).
I understand shiftRight(5) means divide by 2^5 ie.by 32 in decimal and so on..but why 8 is added here after shift operation? why exactly 5 shifts? as a first guess?
These aren't the only options. The point is that for every positive n, n/32 + 8 >= sqrt(n) (where sqrt is the mathematical square root). This is easiest to show by a bit of calculus (or just by building a graph of the difference). So at the start we know a <= sqrt(n) <= b (unless n == 0 which can be checked separately), and you can verify this remains true on each step.
I am having trivial problems converting integer division to a floating point solution in Emacs Lisp 24.5.1.
(message "divide: %2.1f" (float (/ 1 2)))
"divide: 0.0"
I believe this expression is first calculating 1/2, finds it is 0 after truncating, then assigning 0.0 to the float. Obviously, I'm hoping for 0.5.
What am I not seeing here? Thanks
The / function performs a floating-point division if at least one of its argument is a float, and an integer quotient operation (rounded towards 0) if all of its arguments are integers. If you want to perform a floating-point division, make sure that at least one of the arguments is a float.
(message "divide: %2.1f" (/ (float 1) 2))
(or of course if they're constants you can just write (/ 1.0 2) or (/ 1 2.0))
Many programming languages work this way.
Is there any benefit to structuring boolean expressions like:
if (0 < x) { ... }
instead of
if (x > 0) { ... }
I have always used the second way, always putting the variable as the first operand and using whatever boolean operator makes sense, but lately I have read code that uses the first method, and after getting over the initial weirdness I am starting to like it a lot more.
Now I have started to write all my boolean expressions to only use < or <= even if this means the variable isn't the first operand, like the above example. To me it seems to increase readability, but that might just be me :)
What do other people think about this?
Do whatever is most natural for whatever expression you are trying to compare.
If you're wondering about other operations (like ==) there are previous topics comparing the orderings of operands for those comparisons (and the reasons why).
It is mostly done to avoid the problem of using = instead of == in if conditions. To keep the consistency many people use the same for with other operators also. I do not see any problem in doing it.
Use whatever 'reads' best. One thing I'd point out is that if I'm testing to see if a value is within bounds, I try to write it so the bounds are on the 'outside' just like they might be in a mathematical expression:
So, to test that (0 < x <= 10):
if ((0 < x) && (x <= 10)) { ... }
instead of
if ((0 < x) && (10 >= x)) { ... }
or
if ((x > 0) && (10 >= x)) { ... }
I find this pattern make is somewhat easier to follow the logic.
An advantage for putting the number first is that it can prevent bug of using = when == is wanted.
if ( 0 == x ) // ok
if ( 0 = x ) //is a compiler error
compare to the subtle bug:
if ( x = 0 ) // assignment and not comparison. most likely a typo
To be honest it's unusual to write expressions with the variable on the right-side, and as a direct consequence of that unusualness readability suffers. Coding conventions have intrinsic value merely by virtue of being conventions; people are used to code being written in particular standard ways, x >= 0 being one example. Unnecessarily deviating from simple norms like these should be avoided without good cause.
The fact that you had to "get over the initial weirdness" should perhaps be a red flag.
I would not write 0 < x just as I would not use Hungarian notation in Java. When in Rome, do as the Romans do. The Romans write x >= 0. No, it's not a huge deal, it just seems like an unnecessary little quirk.
For various reasons that aren't too germane to the question, I've got a table with a composite key made out of two integers and I want to create a single unique key out of those two numbers. My initial thought was to just concatenate them, but I ran into a problem quickly when I realized that a composite key of (51,1) would result in the same unique key as (5,11), namely, 511.
Does anyone have a clever way to generate an integer out of two integers such that the generated integer is unique to the pair of start integers?
Edit: After being confronted with an impressive amount of math, I'm realizing that one detail I should have included is the sizes of the keys in question. In the originating pair, the first key is currently 6 digits and will probably stay in 7 digits for the life of the system; the second key has yet to get larger than 20. Given these constraints, it looks like the problem is much less daunting.
You can mathematically prove this is impossible if you want the resulting key to comprise the same number of bits as its two components. However, if you start with two 32 bit ints, and can use a 64 bit int for the result, you could obviously do something like this:
key1 << 32 | key2
SQL Syntax
SELECT key1 * POWER(2, 32) + key2
This has been discussed in a fair amount of detail already (as recursive said, however, the output must be comprised of more bits than the individual inputs).
Mapping two integers to one, in a unique and deterministic way
How to use two numbers as a Map key
http://en.wikipedia.org/wiki/Cantor_pairing_function#Cantor_pairing_function
Multiply one with a high enough value
SELECT id1 * 1000000 + id2
Or use text concatenation:
SELECT CAST(CAST(id1 AS nvarchar(10)) + RIGHT('000000' + CAST(id2 AS nvarchar(10)), 6) AS int)
Or skip the integer thing and separate the IDs with something non-numeric:
SELECT CAST(id1 AS nvarchar) + ':' + CAST(id2 AS nvarchar)
You can only do it if you have an upper bound for one of the keys. Say you have key1 and key2, and up1 is a value that key1 will never reach, then you can combine the keys like this:
combined = key2 * up1 + key1;
Even if the keys could theoretically grow without limit, it's usually possible to estimate a save upper bound in practice.
As I like the theoretical side of your question (it really is beautiful), and to contradict what many of the practical answers say, I would like to give an answer to the "math" part of your tags :)
In fact it is possible to map any two numbers (or actually any series of numbers) to a single number. This is called the Gödel number and was first published in 1931 by Kurt Gödel.
To give a quick example, with your question; say we have two variables v1 and v2. Then v3=2v1*3v2 would give a unique number. This number also uniquely identifies v1 and v2.
Of course the resulting number v3 may grow undesirably rapid. Please, just take this answer as a reply to the theoretical aspect in your question.
Both of the suggested solutions require some knowledge about the range of accepted keys.
To avoid making this assumption, one can riffle the digits together.
Key1 = ABC => Digits = A, B, C
Key2 = 123 => Digits = 1, 2, 3
Riffle(Key1, Key2) = A, 1, B, 2, C, 3
Zero-padding can be used when there aren't enough digits:
Key1 = 12345, Key2 = 1 => 1020304051
This method also generalizes for any number of keys.
wrote these for mysql they work fine
CREATE FUNCTION pair (x BIGINT unsigned, y BIGINT unsigned)
RETURNS BIGINT unsigned DETERMINISTIC
RETURN ((x + y) * (x + y + 1)) / 2 + y;
CREATE FUNCTION reversePairX (z BIGINT unsigned)
RETURNS BIGINT unsigned DETERMINISTIC
RETURN (FLOOR((-1 + SQRT(1 + 8 * z))/2)) * ((FLOOR((-1 + SQRT(1 + 8 * z))/2)) + 3) / 2 - z;
CREATE FUNCTION reversePairY (z BIGINT unsigned)
RETURNS BIGINT unsigned DETERMINISTIC
RETURN z - (FLOOR((-1 + SQRT(1 + 8 * z))/2)) * ((FLOOR((-1 + SQRT(1 + 8 * z))/2)) + 1) / 2;
At the risk of sounding facetious:
NewKey = fn(OldKey1, OldKey2)
where fn() is a function that looks up a new autonumbered key value from a column added to your existing table.
Obviously, two integer fields can hold exponentially more values than a single integer field.
Why don't you just use ROW_NUMBER() or IDENTITY(int,1,1) to set new ID? Do they REALLY need to be in relation?
The problem in general:
I have a big 2d point space, sparsely populated with dots.
Think of it as a big white canvas sprinkled with black dots.
I have to iterate over and search through these dots a lot.
The Canvas (point space) can be huge, bordering on the limits
of int and its size is unknown before setting points in there.
That brought me to the idea of hashing:
Ideal:
I need a hash function taking a 2D point, returning a unique uint32.
So that no collisions can occur. You can assume that the number of
dots on the Canvas is easily countable by uint32.
IMPORTANT: It is impossible to know the size of the canvas beforehand
(it may even change),
so things like
canvaswidth * y + x
are sadly out of the question.
I also tried a very naive
abs(x) + abs(y)
but that produces too many collisions.
Compromise:
A hash function that provides keys with a very low probability of collision.
Cantor's enumeration of pairs
n = ((x + y)*(x + y + 1)/2) + y
might be interesting, as it's closest to your original canvaswidth * y + x but will work for any x or y. But for a real world int32 hash, rather than a mapping of pairs of integers to integers, you're probably better off with a bit manipulation such as Bob Jenkin's mix and calling that with x,y and a salt.
a hash function that is GUARANTEED collision-free is not a hash function :)
Instead of using a hash function, you could consider using binary space partition trees (BSPs) or XY-trees (closely related).
If you want to hash two uint32's into one uint32, do not use things like Y & 0xFFFF because that discards half of the bits. Do something like
(x * 0x1f1f1f1f) ^ y
(you need to transform one of the variables first to make sure the hash function is not commutative)
Like Emil, but handles 16-bit overflows in x in a way that produces fewer collisions, and takes fewer instructions to compute:
hash = ( y << 16 ) ^ x;
You can recursively divide your XY plane into cells, then divide these cells into sub-cells, etc.
Gustavo Niemeyer invented in 2008 his Geohash geocoding system.
Amazon's open source Geo Library computes the hash for any longitude-latitude coordinate. The resulting Geohash value is a 63 bit number. The probability of collision depends of the hash's resolution: if two objects are closer than the intrinsic resolution, the calculated hash will be identical.
Read more:
https://en.wikipedia.org/wiki/Geohash
https://aws.amazon.com/fr/blogs/mobile/geo-library-for-amazon-dynamodb-part-1-table-structure/
https://github.com/awslabs/dynamodb-geo
Your "ideal" is impossible.
You want a mapping (x, y) -> i where x, y, and i are all 32-bit quantities, which is guaranteed not to generate duplicate values of i.
Here's why: suppose there is a function hash() so that hash(x, y) gives different integer values. There are 2^32 (about 4 billion) values for x, and 2^32 values of y. So hash(x, y) has 2^64 (about 16 million trillion) possible results. But there are only 2^32 possible values in a 32-bit int, so the result of hash() won't fit in a 32-bit int.
See also http://en.wikipedia.org/wiki/Counting_argument
Generally, you should always design your data structures to deal with collisions. (Unless your hashes are very long (at least 128 bit), very good (use cryptographic hash functions), and you're feeling lucky).
Perhaps?
hash = ((y & 0xFFFF) << 16) | (x & 0xFFFF);
Works as long as x and y can be stored as 16 bit integers. No idea about how many collisions this causes for larger integers, though. One idea might be to still use this scheme but combine it with a compression scheme, such as taking the modulus of 2^16.
If you can do a = ((y & 0xffff) << 16) | (x & 0xffff) then you could afterward apply a reversible 32-bit mix to a, such as Thomas Wang's
uint32_t hash( uint32_t a)
a = (a ^ 61) ^ (a >> 16);
a = a + (a << 3);
a = a ^ (a >> 4);
a = a * 0x27d4eb2d;
a = a ^ (a >> 15);
return a;
}
That way you get a random-looking result rather than high bits from one dimension and low bits from the other.
You can do
a >= b ? a * a + a + b : a + b * b
taken from here.
That works for points in positive plane. If your coordinates can be in negative axis too, then you will have to do:
A = a >= 0 ? 2 * a : -2 * a - 1;
B = b >= 0 ? 2 * b : -2 * b - 1;
A >= B ? A * A + A + B : A + B * B;
But to restrict the output to uint you will have to keep an upper bound for your inputs. and if so, then it turns out that you know the bounds. In other words in programming its impractical to write a function without having an idea on the integer type your inputs and output can be and if so there definitely will be a lower bound and upper bound for every integer type.
public uint GetHashCode(whatever a, whatever b)
{
if (a > ushort.MaxValue || b > ushort.MaxValue ||
a < ushort.MinValue || b < ushort.MinValue)
{
throw new ArgumentOutOfRangeException();
}
return (uint)(a * short.MaxValue + b); //very good space/speed efficiency
//or whatever your function is.
}
If you want output to be strictly uint for unknown range of inputs, then there will be reasonable amount of collisions depending upon that range. What I would suggest is to have a function that can overflow but unchecked. Emil's solution is great, in C#:
return unchecked((uint)((a & 0xffff) << 16 | (b & 0xffff)));
See Mapping two integers to one, in a unique and deterministic way for a plethora of options..
According to your use case, it might be possible to use a Quadtree and replace points with the string of branch names. It is actually a sparse representation for points and will need a custom Quadtree structure that extends the canvas by adding branches when you add points off the canvas but it avoids collisions and you'll have benefits like quick nearest neighbor searches.
If you're already using languages or platforms that all objects (even primitive ones like integers) has built-in hash functions implemented (Java platform Languages like Java, .NET platform languages like C#. And others like Python, Ruby, etc ).
You may use built-in hashing values as a building block and add your "hashing flavor" in to the mix. Like:
// C# code snippet
public class SomeVerySimplePoint {
public int X;
public int Y;
public override int GetHashCode() {
return ( Y.GetHashCode() << 16 ) ^ X.GetHashCode();
}
}
And also having test cases like "predefined million point set" running against each possible hash generating algorithm comparison for different aspects like, computation time, memory required, key collision count, and edge cases (too big or too small values) may be handy.
the Fibonacci hash works very well for integer pairs
multiplier 0x9E3779B9
other word sizes 1/phi = (sqrt(5)-1)/2 * 2^w round to odd
a1 + a2*multiplier
this will give very different values for close together pairs
I do not know about the result with all pairs