snowflake: "left shift" made that result exceeds long.max value - id-generation

((timestamp - 1288834974657) << 32)
I included some more bits information, for example, total 32 bits after timestamp information needs, then the timestamp needs to be left shift 32 bits, such that the result exceeds long.max value. The result shown a negative value something like -7187691577906700288, it was wrong.
Hope I described my question correctly. Please help...

I don't know snowflake well (I assume it's a language?) I also don't know what format that timestamp is. If 1288834974657 a unix timestamp, it's in the year 42811.
The issue is that this particular timestamp is larger than 32bit. Since you move it up another 32bit, your number overflows. It looks like the long in your language might be unsigned, which means that the maximum number is probably 2^63-1. If the long were unsigned, the maximum number would probably be 2^64-1.

Related

T-SQL Data type for fixed precision and variable scale

I have a set of data with a precision of 16 digits, however this can range from very large numbers with all 16 digits to the left of the decimal point to very small number with all digits to right of the decimal point. (e.g. 1234567890123456.0 & 0.1234567890123456 ) I am trying to figure out the correct ("best") data type to store this data in. I need to store the exact values and not an approximations so float & real are not viable options. Numeric or decimal seem appropriate, however I am getting hung up on the most efficient precision & scale to set, it seems I must go with (32,16) to account for both extremes, but that seem inefficient as I am requesting twice the bit storage that I will ever use. Is there a better option?
Thank You for your assistance.

How precise should I encode a Unix Time?

I came across this because I am working with time across multiple platforms and seems like they all differ a little bit from each other in how unix time is implemented and/or handled in their system. Thus the question.
Quoting Wikipedia page on Unix Time:
Unix has no tradition of directly representing non-integer Unix time numbers as binary fractions. Instead, times with sub-second precision are represented using composite data types that consist of two integers, the first being a time_t (the integral part of the Unix time), and the second being the fractional part of the time number in millionths (in struct timeval) or billionths (in struct timespec). These structures provide a decimal-based fixed-point data format, which is useful for some applications, and trivial to convert for others.
Which seems to be the implemention in Go (UnixNano). However, in practice, there are many languages/platforms which use milliseconds (Java?) and also some platforms uses Float (to try to maintain some precision) and others mostly uses Int.
So if I'm implementing a transport format and I only have exactly 64 bits available to store a time value and no more, my question is two-fold:
Should I encode it as an integer or a floating-point value? And
Should I use seconds, milliseconds or nanosecond precision?
The main goal being to try to be as accurate as possible across as many languages and platforms as possible (without resorting to custom code in every single platform, of course).
p.s. I know this is a little subjective but I believe it's still possible to make a good, objective answer. Feel free to close if that's not the case.
It depends on what the required precision of the time value is, and its maximal range.
When storing nanoseconds in an unsigned 64bit integer, the range is about 584 years (2^64 ns), so precise and long enough for any practical application already.
Using a floating point format has the advantage that both very small and very large values can be stored, with higher absolute precision for smaller values. But with 64bit it this probably not a problem anyways.
If the time value is an absolute point in time instead of duration, the transform format would also need to define what date/time the value 0 stands for. (i.e. the Epoch)
Getting the current time on a UNIX-like system can be done using gettimeofday(), for example, which returns a struct with a seconds and microseconds value. This can then be converted into a single 64bit integer giving a value in microseconds. The Epoch for UNIX time is 1 January 1970 00:00:00 UT. (The clock() function does not measure real time, but instead the duration of time that the processor was active.)
When a time value for the same transport format is generated on another platform (for example Windows with GetSystemTime(), it would need to be converted to the same unit and epoch.
So the following things would need to be fixed for a transport protocol:
The unit of the time value (ms, us, ...), depending on required precision and range
If the time is a time point and not a duration, the Epoch (date and time of value 0)
Whether it is stored in an integer (unsigned or signed, if it is a duration that can be negative), or as a floating point
The endianess of the 64bit value
If floating point is used, the format of the floating point value (normally IEEE 754)
Because different platforms have different APIs to get the current time, probably it would always need some code to properly convert the time value, but this is trivial.
For maximum portability and accuracy, you should probably go with a type specified by POSIX. That way, the code will be portable across all Unixes and other operating systems conforming to POSIX.
I suggest that you use clock_t and the clock() function for time. This has a variety of uses, including measuring time and distance between one point in a program and another. Just make sure to cast the result to a double and divide by CLOCKS_PER_SEC afterwards to convert that time into a human-readable format.
So, to answer your question:
Use both an integer and a floating-point value
Unsure precision (the number of clock cycles between calls) but accurate enough for all non-critical applications and some more important ones

Numeric vs Real Datypes for Storing Monetary Values

An answer to a question about a good schema for stock data recommended this schema:
Symbol - char 6
Date - date
Time - time
Open - decimal 18, 4
High - decimal 18, 4
Low - decimal 18, 4
Close - decimal 18, 4
Volume - int
In addition, Postgres documentation says:
"If you require exact storage and calculations (such as for monetary amounts), use the numeric type instead (of floating point types)."
I'm fairly new at SQL, and I hope this is not a really naive question. I'm wondering about the necessity of using the numeric datatype (especially 18,4) - it seems like overkill to me. And "exact" is not really something I'd specify, if exact means correct out to 12 decimal places.
I'm thinking of using real 10,2 for the monetary columns. Here's my rationale.
A typical calculation might compare a stock price (2 decimal places) to a moving average (that could have many decimal places), to determine which is larger. My understanding is that the displayed value of the average (and any calculated results) would be rounded to 2 decimal places, but that calculations would be performed using the higher precision of the stored internal number.
So such a calculation would be accurate to at least 2 decimal places, which is really all I need, I think.
Am I way off base here, and is it possible to get an incorrect answer to the above comparison by using the real 10,2 datatype?
I'd also welcome any other comments, pro or con, about using the numeric datatype.
Thanks in advance.
Floating point variables are vulnerable to floating point errors. Therefore, if accuracy is important (anytime money is involved) it's always recommended to use a numeric type.
https://en.wikipedia.org/wiki/Floating-point_arithmetic#Accuracy_problems
Floating point inaccuracy examples
Let's start with the schema above, and look how 18,4 would look like in floating point numbers:
select '12345678901234.5678'::float4;
float4
-------------
1.23457e+13
(1 row)
select '12345678901234.5678'::double precision;
float8
------------------
12345678901234.6
(1 row)
Therefore 14 numbers (before the decimal point) will always round your number, and you store rounded (and therefore wrong) values.
Also your assumption about rounding to two decimal places - where is that assumption coming from?
select '1.2345678'::float4;
float4
---------
1.23457
(1 row)
Therefore, so far you presented a number of assumptions, and shortcuts, without showing why you want to use floating point numbers instead of numeric. What is your compelling reason? Just save some bytes?
My next question is: if your application expands, and does more than just "avg" calculations - do you need to chance the data type to numeric again?

Why this website shows currentMiliseconds value greater than Integer.MAX_VALUE?

The MAX_VALUE for Integer (32-bit) is , 2_147_483_647 and this is the maximum limit of time in the future (unless we switch to 64-bit Integers).
But this website show current time in milliseconds equals to, 1_423_079_895_486, and it shows the correct time.
How come the value is way too bigger than Integer.MAX_VALUE or maximum milliseconds value in unix time ?
Am I missing something basic ?
It's probably just using 64 bits to represent the time in milliseconds.
This is unremarkable. The system I'm typing this on has a 64-bit time_t type.
Are you perhaps assuming that the C types int and time_t have to be the same size? They don't. And a 32-bit number representing milliseconds can only span a duration of just under 50 days.
We don't even know how the web site is implemented; it could well be using some scripting language with support for variable-width integers.

Selecting floating point numbers in decimal form

I've a small number in a PostgreSQL table:
test=# CREATE TABLE test (r real);
CREATE TABLE
test=# INSERT INTO test VALUES (0.00000000000000000000000000000000000000000009);
INSERT 0 1
When I run the following query it returns the number as 8.96831e-44:
test=# SELECT * FROM test;
r
-------------
8.96831e-44
(1 row)
How can I show the value in psql in its decimal form (0.00000000000000000000000000000000000000000009) instead of the scientific notation? I'd be happy with 0.0000000000000000000000000000000000000000000896831 too. Unfortunately I can't change the table and I don't really care about loss of precision.
(I've played with to_char for a while with no success.)
Real in Postgres is a floating point datatype, stored on 4 bytes, that is 32 bits.
Your value,
0.00000000000000000000000000000000000000000009
Can not be precisely represented in a 32bit IEEE754 floating point number. You can check the exact values in this calculator
You cold try and use double precision (64bits) to store it, according to the calculator, that seems to be an exact representation. NOT TRUE Patricia showed that it was just the calculator rounding the value, even though explicitly asking it not to... Double would mean a bit more precision, but still no exact value, as this number is not representable using finite number of binary digits. (Thanks, Patricia, a lesson learnt (again): don't believe what you see on the Intertubez)
Under normal circumstances, you should use a NUMERIC(precision, scale) format, that would store the number precisely to get back the correct value.
However, your value to store seems to have a scale larger than postgres allows (which seems to be 30) for exact decimal represenations. If you don't want to do calculations, just store them (which would not be a very common situation, I admit), you could try storing them as strings... (but this is ugly...)
EDIT
This to_char problem seems to be a known bug...
Quote:
My immediate reaction to that is that float8 values don't have 57 digits
of precision. If you are expecting that format string to do something
useful you should be applying it to a numeric column not a double
precision one.
It's possible that we can kluge things to make this particular case work
like you are expecting, but there are always going to be similar-looking
cases that can't work because the precision just isn't there.
In a quick look at the code, the reason you just get "0." is that it's
rounding off after 15 digits to ensure it doesn't print garbage. Maybe
it could be a bit smarter for cases where the value is very much smaller
than 1, but it wouldn't be a simple change.
(from here)
However, I find this not defendable. IMHO a double (IEEE754 64bit floating point to be exact) will always have ~15 significant decimal digits, if the value fits into the type...
Recommended reading:
What Every Computer Scientist Should Know About Floating-Point Arithmetic
Postgres numeric types
BUG #6217: to_char() gives incorrect output for very small float values