Postgresql max TransactionId > 4 billion - postgresql

The max transactionId of Postgresql should be 2^31 which is 2 billion, however, when I query the current transactionId from DB via select cast(txid_current() as text) I got the number 8 billion. why does this happen? The autovacuum_freeze_max_age is 200 million.

As the documentation for the function family you are using says:
The internal transaction ID type (xid) is 32 bits wide and wraps around every 4 billion transactions. However, these functions export a 64-bit format that is extended with an "epoch" counter so it will not wrap around during the life of an installation.

Related

PostgreSQL: smaller timestamptz type?

Timestamptz time is 8 bytes in PostgreSQL. Is there a way to get a 6 bytes timestamptz dropping some precision?
6 bytes is pretty much out of the question, since there is no data type with that size.
With some contortions you could use a 4-byte real value:
CREATE CAST (timestamp AS bigint) WITHOUT FUNCTION;
SELECT (localtimestamp::bigint / 1000000 - 662774400)::real;
float4
--------------
2.695969e+06
(1 row)
That would give you the time since 2021-01-01 00:00:00 with a precision of about a second (but of course for dates farther from that point, the precision will deteriorate).
But the whole exercise is pretty much pointless. Trying to save 2 or 4 bytes in such a way will not be a good idea:
the space savings will be minimal; today, when you can have terabytes of storage with little effort, that seems pointless
if you don't carefully arrange your table columns, you will lose the bytes you think you have won to alignment issues
using a number instead of a proper timestamp data type will make your queries more complicated and the results hard to interpret, and it will keep you from using date arithmetic
For all these reasons, I would place this idea firmly in the realm of harmful micro-optimization.

Handling oddly-formatted timestamp in Postgres?

I have about 32 million tuples of data of the format:
2012-02-22T16:46:28.9670320+00:00
I have been told that the +00:00 indicates an hour:minute timezone offset, but also that Postgres only takes in hour offset (even in decimals), not the minute. So would I have to process the data in order to remove the last :00 from every tuple and read the data in as timestamps? I would like to avoid pre-processing the data file, but if Postgres will not accept the values otherwise, then I will do so.
In addition, the precision specified in the given data is 7 decimal places in the seconds part, whereas Postgres timestamp data type allows for maximum 6 decimal place precision (milliseconds). Would I have to modify the 7 decimal place precision to 6 in order to allow Postgres to read the records in, or will Postgres automatically convert the 7 to 6 as it reads the tuples?
pgsql=# SELECT '2016-07-10 20:12:21.8372949999+02:30'::timestamp with time zone AS ts;
ts-------------------------------
2016-07-10 17:42:21.837295+00
(1 row)
It seems that at least in PostgreSQL 9.4 and up (maybe earlier), minutes timezone offset handling is not documented, but does get processed properly if used. In a similar vein, if I try to read in a timestamp that has 7 decimal place precision in the seconds, then it will automatically convert that to 6 decimal place (microsecond) precision instead.

Need advice on efficiently inserting millions of time series data into a Cassandra DB

I want to use a Cassandra database to store time series data from a test site. I am using Pattern 2 from the "Getting started with Time Series Data Modeling" tutorial but am not storing the date to limit the row size as a date, but as an int counting the number of days elapsed since 1970-01-01, and the timestamp of the value is the number of nanoseconds since the epoch (some of our measuring devices are that precise and the precision is needed). My table for the values looks like this:
CREATE TABLE values (channel_id INT, day INT, time BIGINT, value DOUBLE, PRIMARY KEY ((channel_id, day), time))
I created a simple benchmark, taking into account using asynchronity and prepared statements for batch loading instead of batches:
def valueBenchmark(numVals: Int): Unit = {
val vs = session.prepare(
"insert into values (channel_id, day, time, " +
"value) values (?, ?, ?, ?)")
val currentFutures = mutable.MutableList[ResultSetFuture]()
for(i <- 0 until numVals) {
currentFutures += session.executeAsync(vs.bind(-1: JInt,
i / 100000: JInt, i.toLong: JLong, 0.0: JDouble))
if(currentFutures.length >= 10000) {
currentFutures.foreach(_.getUninterruptibly)
currentFutures.clear()
}
}
if(currentFutures.nonEmpty) {
currentFutures.foreach(_.getUninterruptibly)
}
}
JInt, JLong and JDouble are simply java.lang.Integer, java.lang.Long and java.lang.Double, respectively.
When I run this benchmark for 10 million values, this needs about two minutes for a locally installed single-node Cassandra. My computer is equipped with 16 GiB of RAM and a quad-core i7 CPU. I find this quite slow. Is this normal performance for inserts with Cassandra?
I already read these:
Anti-Patterns in Cassandra
Another question on write performance
Are there any other things I could check?
Simple maths:
10 millions inserts/2 minutes ≈ 83 333,33333 inserts/sec which is great for a single machine, did you expect something faster?
By the way, what are the specs of your hard-drives ? SSD or spinning disks ?
You should know that massive insert scenarios are more CPU bound than I/O bound. Try to execute the same test on a machine with 8 physical cores (so 16 vcores with Hyper Threading) and compare the results.

Convert PostgreSQL age function output to upper case

I am working with PostgreSQL 8.4.4. I am calculating time difference between two Unix time-stamps using PostgreSQL's age function. I am getting the output as expected. The only thing I want is to convert the time difference in UPPERCASE.
For example,
select coalesce(nullif(age(to_timestamp(1389078075), to_timestamp(1380703432))::text,''), UPPER('Missing')) FROM transactions_transactions WHERE id = 947
This query giving the result as
3 mons 4 days 22:17:23
But I want this output to be like
3 MONTHS 4 DAYS 22:17:23
Note: I am using this for dynamic report generation purpose. So I cannot convert it to UPPERCASE after fetching from database. I want it to be in UPPERCASE at the time of coming from database itself, i.e., in the query.
PostgreSQL's upper() function should be use
SELECT upper(age(to_timestamp(1389078075), to_timestamp(1380703432))::text)
FROM transactions_transactions WHERE id = 947
as per OP's comment and edit
select upper(coalesce(nullif(age(to_timestamp(1389078075), to_timestamp(1380703432))::text,''), UPPER('Missing')))

unsigned short data type in my Sql server 2008 r2

I want to store Port numbers in my SQL server database. In general and any port can have values from (0 to 65,535).
And on the following link http://msdn.microsoft.com/en-us/library/s3f49ktz%28v=vs.71%29.aspx it is mentioned that “unsigned short” will be suitable for storing Port number.
But in my case I am suing Sql server 2008 r2, so which data type I can use to represent “unsigned short”?
Regards
See the documentation on data types.
You could use a decimal/numeric:
Precision Storage bytes
1-9 5
10-19 9
20-28 13
29-38 17
but even the smallest precision (1-9) is 5 bytes.
Looking at the integer family (and ignoring bigint because it's overkill):
Data type Range Storage
int -2^31 (-2,147,483,648) to 2^31-1 (2,147,483,647) 4 Bytes
smallint -2^15 (-32,768) to 2^15-1 (32,767) 2 Bytes
tinyint 0 to 255 1 Byte
... a smallint is too small, so just use integer. It'll save you an extra byte every time compared to decimal/numeric every time.